At the very beginning of this blog post on Data Science, I would like to recall a quote from the movie “A Beautiful Mind (2001)” which was made upon the biography of the mathematical genius John Forbes Nash, Jr.:
Mathematicians won the war. Mathematicians broke the Japanese codes… and built the A-bomb. Mathematicians… like you. In medicine or economics, in technology or space, battle lines are being drawn. To triumph, we need results. Publishable, applicable results.
But to make the results applicable and publishable, we need data, authentic and quality controlled data that will empirically support a hypothesis. Collected data may not be of good quality as heterogeneity exists in great deal on data source as well as in collection procedures. To address this problem, John Wilder Tukey, one of the most influential statistician of the last 50 years, proposed a new branch of science that will be specifically interested in learning from data.
John Chambers, Bill Cleveland and Leo Breiman independently once again urged academic statistics to expand its boundaries beyond the classical domain of theoretical statistics. Chambers called for more emphasis on data preparation and presentation rather than statistical modeling; and Breiman called for emphasis on prediction rather than inference. Cleveland even suggested the catchy name ‘Data Science’ for his envisioned field. That was the start and the growth of data science is remarkable.
When I am writing this blog post, the University of Michigan announced a $100 Million Data Science Initiative (DSI) program on September 8, 2015 to foster education and research on data science. This is not just Michigan University, universities like New York University, Columbia University, Massachusetts University of Technology have already undertaken a number of DSI-like initiatives to promote innovation in data science. Universities like UC Berkeley, Stanford, NYU etc. are offering Master’s Degree program on data science to meet the challenge of the next generation data-driven innovation. The web site for DSI gives us an idea what Data Science is:
This coupling of scientific discovery and practice involves the collection, management, processing, analysis, visualization, and interpretation of vast amounts of heterogeneous data associated with a diverse array of scientific, translational, and inter-disciplinary applications.