Data Science basics: Data vs. Big Data

We are going through a series of blog posts on computational modeling and data science. In this post, we will be discussing about Big Data. If this is the first time you are joining us, then there is nothing to be worried about as you can visit our old posts anytime you wish and its totally free. In a previous post, we had a discussion on the importance of computer programming and how it can help a researcher to be more productive. Background behind the origin of data science has been discussed in our last blog post where a working definition of Data Science has been described from the Data Science Initiative website of University of Michigan.

How big is a Big Data?

In Data Science, data sets which are either too big, too complex or too lacking in structure to be analyzed using standard approaches are referred to as Big Data. But this is not a universally accepted definition of Big Data and data scientists are still working on this definition. It is a common misconception that databases of big file size are big data and this approach of defining big data is highly erroneous. File size of a data set is an important indicator but surely this is not everything.


Another common misconception would be about the requirement of computational resources and infrastructure to analyze big data. Indeed, we need powerful computers to solve the mysteries hidden underneath those numbers but knowledge of mathematical science, computational science and information technology is crucial about how computational resources will be employed efficiently. For the optimum use of computational resources, Information technology, computer science and mathematical science must go hand in hand to develop a more sophisticated data analysis method.

By incorporating the right analytical and strategic approach, big data can be of great service for social benefit by reshaping the world of knowledge the way it works now. What if we can accurately track the occurrence of a probable hurricane in a location specific and timely manner, millions of life and property damage can be minimized.

This blog post is part of a series of posts on data science and computational data science. Please feel free to visit my website or LinkedIn profile to have access to the other posts in this series.

Leave a Reply

Your email address will not be published. Required fields are marked *