Skip to content

Big Data concepts in simple language

Let us try to understand Big Data concept in simple language.

Your friend brought you one of one hard disk of say 2 TB which contained lots of data in the form of dat files (exported from relational database), lots of excel files, csv files. He asked you that he needs some critical reports based on this data. You created one new database and related schemas. Then you took all this data and imported it into your datbase by standard tool. You used data querying language like SQL and fetched required reports. Mission accomplished.

After some days your friend came back again. This time with a bigger hard disk of say 5 TB. He said there is ‘some’ data in different format in the hard disk that need to be arranged ‘somewhere’ and required information/reports needs to be fetched out of it. Remembering you successfully gave him the reports last time, you took the challenge again this time. You checked the hard disk and got amazed by the kind of data you found this time. It contained lots and lots of data but not just it was huge but it was of so varied format that your Relational Database (like Oracle) won’t be able to work on it. On the hard disk, you found data with varied format like huge log file, trace files, XML files, text messages, call records, other unstructured text files, many image files, some audio and video files too.

So basically you found information from many more data channels and categories on the  hard disk along with traditional sources of data information (dat files, excel etc).

This new kind of ‘Data Collection’ is now termed as ‘Big’ Data. Your friend brought you second time just very very small piece of ‘Big Data’. In actual the data volume is just too huge. Also another point to note is that Big Data is not just about size (Big) but it is also about ‘variety/diversity‘, and fast-changing nature of data. [ Never Think that Big Data is “just lots more enterprise data”]

Next question you that can come to your mind is how and why this Big data came into picture?

It is the way we started living our life in last 10-15 years (read “because we went ONLINE!“) and our increased awareness about data that started this data explosion and all the euphoria surrounding it.
Read below examples to understand why suddenly new technologies like Hadoop are emerging in last couple of years to tackle Big Data and how to use this ‘new’ data effectively.

  • Facebook creates 500 terabytes approx of new data every day
  • Boeing Plane will generate 240 terabytes approx of flight data during a single flight across the US
  • Smart Phones that we all started using is creating huge cumulative data
  • So many Electronic equipments/machine have now sensors embedded into everyday objects. It result in billions of constantly growing data feeds of various type.

There are so many other data sources and examples.

All this new kind of data can be used in good (or bad, search “critics of big data” in Google to read some interesting articles) ways. Data management is a responsibility!

Some examples where the value of Big Data has already been used are:

  • Some Retailers now track the user web clicks to identify behavioral trends that helps these retailers to improve their campaigns, adjust pricing and stockage.
  • Many energy Company captures household energy usage levels to predict outages and to have efficient energy consumption.
  • Some Governments agencies in US can detect and track the emergence of disease outbreaks via social media signals.
  • Oil and gas companies can take the output of sensors in their drilling equipment to make more efficient and safer drilling decisions.
  • Companies tracking Social Data can capture any negative trends emerging about the company and take preventive action immediately.

Big Data is still a growing field. There are so many other ways by which Big Data value can be used in real life to the advantage of business, science and general people. It can help businesses act more swiftly and also allow them to adapt to changes faster than their competitors. It promises to offer cost-effective opportunities to improve decision-making in critical development areas like health care, employment, economic productivity, security, natural disaster, resource management etc.

MapReduce, Hadoop, NoSQL database like MongoDB are are some of the tools to store big data and extract the value from it.

In some of our upcoming articles we will discuss about Hadoop and how it helps in getting the value out of the Big Data.

Brijesh Gogia
Leave a Reply