For quite some time now, cloud has been the leading theme for
panel discussions and industry fora. Discussions on cloud have
spread from the server room to the board room, reaching even the
dining room. But with 2.5 quintillion bytes of data being generated
every day, Big Data is the next buzzword in the IT industry; it has
scale and potential similar to that of the cloud. While the
enormity of the data set cannot be ignored, the real issue is
analyzing the data quick enough to make better and faster business
decisions. So what kind of technology and solutions are available
for this? How are businesses crafting their big data
strategies?
As per a report by McKinsey and Company, Big Data refers to data
sets whose size is beyond the ability of traditional enterprise
technology to capture, store, manage and analyze. The definition is
intentionally subjective and does not quantify how big is too big.
It can range from a few dozen terabytes to multiple petabytes.
“I believe that Big Data is contextual though in sheer
numbers, I would place the market beyond 100 TB when
‘normal’ systems start struggling a bit,” says
Arun Gupta, Group CIO of Shoppers Stop.
The data scenario has changed drastically over the past few
years. The McKinsey Global Institute estimates that data volume is
growing 40 percent per year, and will grow 44x between 2009 and
2020. Every day, we create 2.5 quintillion bytes of data — so
much that 90 percent of the data in the world today has been
created in the last two years alone.
This data comes from different sources such as sensors used to
gather climate information; posts to social media sites, blogs,
public forums; digital pictures and online videos; transaction
records of ATM machines and credit card readers; and cell phone GPS
signals, to name a few. This data is collectively referred to as
Big Data.
Big Data represents a new era in data exploration and
utilization. More than a challenge, it is an opportunity to find
insight in new and emerging types of data, to make your business
more agile, and to answer questions that in the past were beyond
reach. Big Data and information integration capabilities make it
possible to generate insight from vast quantities of data —
fundamentally changing the way organizations use information. t
means filtering petabytes of data per second from almost any
connected device, analyzing the data while still in motion.
Today, customers feel their grievances get resolved faster when
they complain via Twitter or Facebook rather than over the phone or
e-mail. This means two things. A lot of data is being generated
online by the users and someone is analyzing that data to make
sensible business decisions. That’s the power of Big Data
analytics.
Have no doubt that Big Data analytics happens at a very high
speed and in real time. Consider a retail company selling gift
items during Christmas. Based on the customers’ feedback on
Twitter, the shop management can customize its products and sales
offers to derive better sales the following day. The analysis of
Big Data has two objectives. “First, to build a predictive
model that’s accurate. Second, to segment the target
group,” says Arun Ramachandran, Country Manager, Data
Computing Division, EMC India & SAARC.
Big Data is being largely generated by consumer-centric and
IT-savvy industry verticals such as retail, video surveillance,
healthcare, telecommunication, BFSI, oil & gas, law enforcement
and government. The Unique Identification Authority of India
(UIDAI) project is one such example where huge amounts of data is
being collected and collated by the government of India.
NOT ALL ABOUT SIZE
While it’s often the most visible parameter, Big Data is
not just about volume. Making sense of the Big Data or Big Data
analytics must also consider velocity, variety, value, and
complexity. Velocity refers to the data being generated and
ingested for analysis in real time. Variety refers to the different
forms of data such as tabular, documents, e-mail, video, image and
audio. Value refers to the economic value of different data, which
significantly varies from one another. Lastly, complexity means the
different standards, domain rules and storage formats per data
type.
Agrees Syed Masroor, Pre Sales Manager at NetApp, “Big
Data is not as much about the size of the data as it is about the
size of the problem. It essentially revolves around three
components namely analysis, bandwidth and content. A good example
of analysis would be the way Flipkart.com analyzes the next top
sales item, while an example of bandwidth would be the massive
volume of customer data that is processed by banks. To find an
example of content, one needs to look no further than Facebook
— as we all know, or can guess, the amount of content
generated by Facebook users per day is a staggering two
petabytes.”
Big Data is not as much about the size of the data as it is
about the size of the problem
Syed Masroor Pre Sales Manager, NetApp
Data warehousing powerhouse Teradata believes Big Data is not a
particularly new thing as they have always had customers with many
petabytes of data. “It’s not the size of the data that
matters so much. Large amount of data is not Big Data.
ERP-generated data is not Big Data. It is really about the large
volume of unstructured data generated by the users through sources
such as the social media, mobile devices, and video,” says
Dinesh Jain, Country Manager, Teradata India.
Big Data is generated with the embracement of new types of data
into the information architecture of an organization, and often
this is semistructured data, not the traditional rows-and-columns
relational data, says Kapil Sood, Vice President, Systems Business,
Oracle India. “I think a prime example of semistructured data
is data coming from sensors or machine-generated data such as RFID
type data and location information coming from mobile devices.
Semi-structured data also includes the documents and e-mails that
all organizations have. These new data elements are often produced
at much higher rates than the classical transactional data."
“Semi-structured data coming from sensors, machines
and mobiles is often produced at a much higher rate than the
classical transactional data
Kapil Sood VP - Systems Business, Oracle India
Another characteristic of Big Data is the need for deeper and
more sophisticated analysis of data. “You want to be able to
do new types of statistical analysis — not over a small
sample of gigabytes of data, but over terabytes, potentially even
petabytes,” adds Sood.