Managing and analyzing data have always offered the greatest benefits and the greatest challenges for organizations of all sizes and across all industries.
Some data is structured and stored in a traditional relational database, while other data, including documents, customer service records, and even pictures and videos, is unstructured. Companies also have to consider new sources of data generated by machines such as sensors.
Although each data source can be independently managed and searched, the challenge today is how companies can make sense of the intersection of all these different types of data. Although we have always had a lot of data, the difference today is that significantly more of it exists, and it varies in type and timeliness. That is the opportunity and challenge of big data.
Big Data is defined as any kind of data source that has at least three shared characteristics:
Extremely large Volumes of data
Extremely high Velocity of data
Extremely wide Variety of data
Big data is important because it enables organizations to gather, store, manage, and manipulate vast amounts data at the right speed, at the right time, to gain the right insights. Big data is not a stand-alone technology; rather, it is a combination of the last 50 years of technology evolution.
Each data management wave is born out of the necessity to try and solve a specific type of data management problem. When the relational database came to market, it needed a set of tools to allow managers to study the relationship between data elements. When companies started storing unstructured data, analysts needed new capabilities such as natural language–based analysis tools to gain insights that would be useful to business.
The data management waves over the past five decades have culminated in where we are today: the initiation of the big data era.
With big data, it is now possible to virtualize data so that it can be stored efficiently and, utilizing cloud-based storage, more cost-effectively as well. In addition, improvements in network speed and reliability have removed other physical limitations of being able to manage massive amounts of data at an acceptable pace.