A Subtle Intro to Big Data

Meharaj Ul Mahmmud
6 min readJun 7, 2023
Photo by Franki Chamaki on Unsplash

We are producing a massive amount of data every day whether we know about it or not. Our every activity on the internet, every click of our mouse, every sensor in our mobile phone is generating data, and that’s about 2.5 Quintillion bytes of data every day.

This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few.

So, what to do with all these data? Well, the platform where we are generating the data can use the data to get insights about their product, their service, and last but not least their business models. A platform could be a social media site, an eCommerce site, our mobile phones, and many others. Business leaders are appointing Data scientists for gathering the insights from these data and improving their business model.

It is believed that actual structured data is only 5% of the total data and therefore you need better ways to analyze the remaining 95%. Traditional database analysis or searching standard texts does not help complete the overall analysis.

Big Data

The real problem occurs when there is a massive amount of data at hand, and our usual data analysis tools and processes fail to analyze those data. This is where Big Data Analytics comes in.

Big Data is a collection of data that is huge in volume, yet growing exponentially. One big data example would be the data generated by a Jet Engine. A single Jet Engine generates more than 10 Terabytes (TB) of data in 30 minutes of flight time. A single flight from London, UK to New York, USA takes around 8 hours of flight time. So, there you go with the calculation.

Another example could be the data generated by Social-Media sites. Statistics show that more than 500 Terabytes (TB) of data is being generated by the users every day and those include photos, videos, text messages, comments, etc. I hope you got the gist.

Big Data Challenges

There a few challenges in managing, analyzing, and processing these huge amount of data. First of all these data is not structured and what I mean by being structured is you have a fixed format to access, store, and process the data. In case of big data, mostly there is no fixed format.

Then there is limitation of processing power. At the rate data is being generated, half of the the business leaders do not have the proper resources to gather insights of majority of their users’ data. Therefore, the percentage of data the business can process is going down.

Characteristics of Big Data

Three V’s of Big Data (old model)

Volume

Big Data is all about Volume. Volumes of data that can reach unprecedented heights in the face. It was estimated that there will be 40 Zettabytes of data created by 2020 but the actual amount was around 60 Zettabytes [1 Zettabyte = 1 Billion Terabytes]. So, it is not uncommon for large companies to have Petabytes of data in storage. And it helps in shaping the future of a company.

But most of the companies do not have the resources to analyze all the data they are acquiring. They are getting a lot of data for sure, but they have the resources to process a certain amount of data. A big amount of data is unutilized, and companies cannot have any insights from those data. Those data are said to be in the BLIND ZONE.

Velocity

Velocity denotes the growth of data or the rate of data creation. Velocity essentially measures how fast the data is coming in, being stored, and its associated retrieval rate. In these days, data comes in form of streams, and they have to be processed in near real time to identify a pattern or a problem. Such as Sensor data. In case of a heat sensor in a fire alarm, it is continuously gathering data about the temperature in a room, it is detecting a pattern over and over. If the pattern breaks and the temperature goes really high then the sensor detects a fire. So, the data generation is continuous, data storing is continuous, and the processing of data is also continuous.

Variety

As we talked about before there are a few types of data, besides the traditional Structured data, such as Semi-Structured, Quasi-Structured, and Un-Structured data.

We get the structured data in a fixed format, but in the case of unstructured data, we do not have a fixed format. Text messages, Image files, video files, MP3 files are some of the examples of unstructured data. The satellites images, sensor data, website contents, etc. are also unstructured data.
Semi-structured data is in between Structured and Unstructured data. Semi-structured data doesn’t reside in a relational database but does have some organizational properties that make it easier to analyze. With some processes, we can store them in relation database, but the semi-structure exists to ease space, clarity, or compute.

XML and JSON documents are semi-structured documents, as well as NoSQL databases are considered semi-structured.

Quasi-structured data is more of textual data with erratic data formats. It can be formatted with effort, tools, and time. This data type includes web clickstream data such as Google searches. Other examples are pasted texts which yield a network map based on similarity of language within the text, as well as the proximity of words to each other within the text.

Though the three V’s are the most widely accepted core of attributes, there are several extensions that can be considered.

Another V… Veracity

Data veracity, in general, indicates how accurate or trustworthy a data set may be. More specifically, when it comes to the accuracy of big data, it’s not just the quality of the data itself but how authentic the data source, type, and processing of it is.

If we think about a network traffic which is a data stream of course, it has thousands of nodes flowing, all these nodes might not be authentic. There could be a lot of malicious data, and nodes. And we do not want to increase our data size with unwanted malicious data. So data authenticity is important.

… and More V’s (Six V’s of Big Data)

Value

Data value is often quantified as the potential social or economic value that the data might create. However, the whole concept is weakly defined since without proper intention or application, high valuable data might sit at your warehouse without any value. This is often the case when the actors producing the data are not necessarily capable of putting it into value.

Variability

This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively. The data could vary from time to time as out business is changing. Therefore, we will need to change our model to process the new kind of data. So, our big data analytical process and machine learning algorithms should be able to keep up with the data variability.

Applications in the Real World

Big Data has greatly enhanced decision-making processes for organizations by providing a wealth of data to test hypotheses and address problems more effectively. Risk management and decision-making are crucial areas that have benefited significantly from the availability of vast data sets.

The impact of Big Data on customer experience is profound. Companies now have access to an unprecedented amount of customer data, allowing them to offer personalized recommendations and tailored offers. Customers willingly provide this data in exchange for the personalized services they receive. The personalized recommendations we see on platforms like Netflix, Amazon, and Flipkart are a direct result of leveraging Big Data.

Machine Learning has also experienced significant advancements due to the rise of Big Data. With larger datasets available for training ML models, performance improvements are observed. Additionally, Machine Learning enables the automation of tasks that were previously manual, thanks to the insights derived from Big Data.

Demand forecasting has become more precise as companies gather more data on customer purchases. This enables them to build accurate forecasting models, facilitating production scalability and reducing costs associated with storing unsold inventory in warehouses.

In addition, Big Data finds extensive applications in areas such as product development and fraud detection, further showcasing its versatile uses and benefits.

EndNote

In this article, we discussed what we mean by Big Data, the characteristics and types of Big Data, and some real-world applications of Big Data.

Thank you for reading.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Meharaj Ul Mahmmud
Meharaj Ul Mahmmud

No responses yet

Write a response