Tuesday, 16 July 2013

Information - How much do we need?

The good thing about doing research is that there is a possibility that you can come across a new technique/method everyday. Recently I have been working with a lot of information from different datasets and trying to extract the useful information content, which can be eventually used to describe the major trend in the dataset.

A major hurdle in this is to quantify that how much information is actually informative. One possible method which is widely used to achieve this is called Singular Value Decomposition (SVD). As this is a widely used technique, the details of this method can be easily found across the web. This blog post will visualize the content information and try to comprehend how much is required for computer vision applications. Here I am using one of the applications of SVD, which is to compress the content of an image. Although there exist better approaches to achieve this, the content used can be quantified easily using SVD.

Below is a sample grayscale image that I have used for this blog post. This is the original image, without any compression using SVD. Notice that there are a lot of regions in the image that look similar.

When the above image is decomposed using singular value decomposition, it gives a total of 480 singular values. Compression in the image is determined by using lesser number of these decomposed singular values for reconstruction. More detail about this can be found in literature, however the aim of this post is to visualize how the information looks at different levels of compression. Below is the visualization of compression using different number of singular values.

From the above image, it is clear that most of the information is stored in the first 55 singular values yielding to a massive 88.5% compression with little loss of visual quality.

To quantify the errors, I have compared the output from compression process with the original image. The results support the observation made above. These results are summarized in the graph below. It can be seen the errors decrease in an exponential trend with the increase in singular values used.

I hope the above example helps you visualize the change in data and think about how much information is really required. There is no definitive boundary on this, however one thing that can be done see the graph is to choose a value which gives an adequate balance between data compression and square error.

The code for the above method is given below:

close all
clear all
%reading and converting the image
% decomposing the image using singular value decomposition
% Using different number of singular values (diagonal of S) to compress and
% reconstruct the image
dispEr = [];
numSVals = [];
for N=5:25:300
    % store the singular values in a temporary var
    C = S;
    % discard the diagonal values not required for compression
    % Construct an Image using the selected singular values
    % display and compute error
    buffer = sprintf('Image output using %d singular values', N)
    total = double(size(C, 1));
    cur = double(N);
    pCompress = ((total-cur)/total)*100
    buffer = sprintf('%d singular values', N)
    text(10,460, [buffer], 'Color', 'w', 'FontSize', 15);
    buffer = sprintf('%0.01f', pCompress);
    text(270,460, [buffer '% Compressed'], 'Color', 'w', 'FontSize', 20);
    im = getframe();
    buffer = sprintf('%d.png', N);
    imwrite(im.cdata, buffer);
    % store vals for display
    dispEr = [dispEr; error];
    numSVals = [numSVals; N];
% dislay the error graph
title('Error in compression');
plot(numSVals, dispEr);
grid on
xlabel('Number of Singular Values used');
ylabel('Error between compress and original image');