Sunday, 15 December 2013

One Image hiding over eight thousand different stories...

Working with large datasets has its own pros and cons. Whatever the implementation or field might be, there is always a need for training a machine learning algorithm to recognize the pattern in that data. We often discuss this "Pattern" in many different instants and a big chunk of literature addresses this recognition problem. However it is often not considered important to get to know how this pattern looks like? why is it even called "Pattern" in the first place??

Interestingly the answer lies in the above image which shows a collection of 8000 different samples, arranged in columns. Here the first thing to notice is that there actually is a repeating pattern in the data. This is the exact pattern which we are trying to learn. It may not make sense when looking at it, however with correct label representation, each sample can be used to build a model which is able to identify each class with high accuracy.