Login Register
Follow Us

Big data, and how to manage it all

In the era of Big Data, we are like the blind men of the parable standing in front of an elephant, trying to guess what the giant animal looks like.

Show comments

Atanu Biswas & Bimal Roy

In the era of Big Data, we are like the blind men of the parable standing in front of an elephant, trying to guess what the giant animal looks like. However, unless we understand what Big Data is, we will never get it right. 

It’s important to know why fancy analytics tools that we have used for small data sets cannot be replicated when our data grows. For example, if we want to find simple average of ‘n’ numbers, we just add them, and divide the sum by ‘n’. The same approach is followed  whether the ‘n’ is 100 or 100 billion. 

However, if all the numbers are large positive, then the sum of 100 billion such numbers might not fit in a computer’s memory. We need to adjust the algorithm appropriately to find the average. That's the extra bit of cosmetic surgery needed for handling Big Data.

Decode complex stats

Data analytics mostly comprise statistical methodologies like regression analyses, classification and clustering techniques, standard estimation and testing procedures, etc. While most of such theories are neatly developed in statistical literature and easily applied for small to moderate-sized data, one might need to manipulate intelligently and devise novel techniques for unusual format of data. But, the real challenge, even for standard ready-to-use techniques, lies in the limitations of using data with huge number of variables. 

One reason is the presence of ‘spurious’ or nonsense correlations among different variables. The more the variables we handle, the more we counter such correlations. And unless we can sift out the unimportant variables, we cannot have meaningful analyses of data. 

It’s theoretically challenging too. In addition, even in a standard regression analysis, for example, with loads of data and say, 10,000 variables, we need additional computational techniques.

Information management  

So, how do we handle the ocean of data? Now, with virtually everything confined under the system of Internet of Things, a gigantic amount of data is generated continuously. The ever-expanding horizon of data is now growing faster than ever. An IBM report of mid-2017 described that 2.5 quintillion bytes of data are created per day, and according to a Forbes article (2015), by 2020, new information of about 1.7 megabytes per second is expected to be created for every human being on the planet.

How to store data

Big Data is a boon and a curse at the same time. Are we really capable of leveraging it? With the present expertise, the answer is ‘no’. We need to devise statistical techniques to accommodate data. Only the top statisticians and computational experts together might produce such techniques, that too in a case-by-case manner.

Understand the power of data

Consider the example of multiplication. We need some additional techniques for multiplying two big numbers, say with hundreds of digits. We use our memory, multiply one number by every digit of the other, one by one, starting from the unit place. Finally we add all the rows. This algorithm for multiplication is a derivative of the knowledge of tables, combined with some special techniques. This can be interpreted as a Big Data problem. And special techniques are needed for solving it.

Consider another simple mathematical problem — sorting. Suppose we are to sort five numbers in increasing order. In our elementary classes, we could easily sort them by looking at the numbers; certainly some algorithm within our brain runs to sort them manually. But, we cannot sort 100 numbers, or say 100,000 numbers just by looking at them. We need some algorithm to reach the answer. We have been tackling the Big Data problem for years now.

—The writers are professors at the Indian Statistical Institute, Kolkata

Show comments
Show comments

Top News

View All

Scottish Sikh artist Jasleen Kaur shortlisted for prestigious Turner Prize

Jasleen Kaur, in her 30s, has been nominated for her solo exhibition entitled ‘Alter Altar' at Tramway contemporary arts venue in Glasgow

Amritsar: ‘Jallianwala Bagh toll 57 more than recorded’

GNDU team updates 1919 massacre toll to 434 after two-year study

Meet Gopi Thotakura, a pilot set to become 1st Indian to venture into space as tourist

Thotakura was selected as one of the six crew members for the mission, the flight date of which is yet to be announced

Most Read In 24 Hours