Big Data vs Data Science vs Data Analytics Part 1

Welcome to the blog that brings you all the data. Well, probably not all the data, but we are going to talk a lot about data in this article. Terms like 'Big Data', 'Data Science', and 'Data Analytics' are thrown around a lot, but what do they mean? How do they affect our lives? Let's dive in. In this post -- Part 1, we'll look at Big Data. 

Big Data

Big data refers to extremely large sets of data (information) that businesses collect that can then be processed to gain insights into customer or human behavior. In my last blog post, on machine learning, I talked about the algorithms companies like Netflix, Amazon, and Tesla uses to allow their machines to take raw data and extrapolate new insights into behavior not explicitly coded into their machines. Big data describes that raw data. Every mile drove on Autopilot in a Tesla, every movie we watch on Netflix, every order on Amazon, is recorded and delivered to the humans behind these machines and websites. That's big data. 

Now, the challenge with big data is not the collection. Recording/receiving all of this information is the easy part and is in most cases executed by a few lines of code. Instead, the challenge is storing big data, processing it, and gaining meaningful answers or strategical directions from patterns and trends within these huge pools of data. Every day, hundreds of terabytes of information -- a digital ocean of data points and statistics -- are generated by consumer traffic and use on the internet. Each company that collects this data, like Google, Netflix, Amazon, etc, is faced with the task of using this data to make their services better and more suited to you individual needs. And in doing so, they make their services more useful to us and thus more profitable to them. 

In the early 2000s, data analyst Doug Laney defined Big Data in three terms: (credit sas.com/insights)

1. Volume: The sheer amount of data being collected presents a problem both in storage in processing. Thankfully, more advanced servers and secure digital storage tech have made storing data collected by companies much easier. Processing has become faster via computer programs and increasingly powerful computer hardware.

2. Velocity: Data is collected at incredible speeds because of the amount of traffic coming in. Amazon processes 35 orders per SECOND. 

3. Variety: Big data comes in in all kinds of forms -- text, video, audio, charts, etc. 

Should we be afraid of big data? A significant contingency of people takes issue with internet data collection and big data. Looking at this article and my machine learning article, you could definitely be put off by the idea of computer algorithms tracking your every keystroke online and using it to fine tune your user experience. On the other hand, consider the ways we gain. I personally love Amazon's shopping suggestions or Netflix's movie suggestions. Moreover, I appreciate Google tracking our search patterns and improving their services. Also, I hate to be the one that breaks it to you, but at this point being a part of big data collection is completely unavoidable. Even your shopping cart in brick and mortar supermarkets is analyzed at a corporate level via data collected by the cashier computers. Ultimately, I try to look at it as a *relatively* non-intrusive way companies strive to actively make their products and services better to serve our needs.

If you want a slightly more technical view of big data, check out this awesome resource: https://www.sas.com/en_us/insights/big-data/what-is-big-data.html#dmhistory