Tuesday, 20 September 2016

Hiding data in Images

Here is a still from my favourite movie
The Martian
Now, what if I told you that I have hidden a big message inside this image. Would you be able to find out what it is?

Hiding information in images is called Steganography. To understand this concept we first have to understand that a color image is a 3D Matrix in mathematical form. The first two dimensions represent the spatial distribution of pixels, whereas the third dimension defines the different color channels. In a typical color image, there are three channels, Red, Green and Blue (abbreviated as RGB). The simplest form of Steganography works by spreading a message throughout the image data. This is achieved by replacing the least significant bit (LSB) in each image pixel by some part of the message. As seen above, this has no effect on the visual quality of the image, as manipulating least significant bit only affects the pixel values by +-1.

The above image contains the following first few lines from the book hidden in the LSBs of BLUE channel only (total 1063 characters):
"I'm pretty much f*****.
That's my considered opinion.
Six days in to what should be a greatest two months of my life, and it's turned in to a nightmare.
I don't even know who'll read this. I guess someone will find it eventually. Maybe a hundred years from now.
For the record... I didn't die on Sol 6. Certainly the rest of the crew thought I did, and I can't blame them. Maybe there'll be a day of national mourning for me, and my Wikipedia page will say "Mark Watney is the only human being to have died on Mars." And itll be right, probably. Cause I'll surely die here. Just not on Sol 6 when everyone thinks I did.
Let's see... where do I begin?
The Ares program. Mankind reaching out to Mars to send people to another planet for the very first time and expand the horizons of humanity blah, blah, blah.  The Ares 1 crew did their thing and came back heroes. They got the parades and fame and love of the world.
Ares 2 did the same thing, in a different location on Mars. They got a firm handshake and a hot cup of coffee when they got home."

We can only see the effect of the data when we subtract the encoded image from the original and renormalize the resulting image:
Difference image
Here we see that the total message (1063 characters) only occupied a bold blue line on the left of this difference image!!! However one may wonder how much data can be hidden in this simple way?? Let's do the math!!

A typical image has 4 MegaPixels = 4,000,000 pixels. Each pixel has three channels = 3x4,000,000 = 12,000,000. Considering that we only use the least significant bit (although we can use more bits with a slight tradeoff of visual quality), we can hide 12,000,000/8 = 1,500,000 characters. Assuming that on average each word may contain 5 letters and a space then we can hide 1,500,000/6 = 250,000 words in a single image!!!!

The code for the above example can be accessed at: https://github.com/devkicks/HiddingDataInImages
In this example I use simple bit manipulations in C++ with OpenCV cv::Mat to access and modify the pixel values.