Statistics: The language of science

Posted by Unknown on 2:37 AM with No comments

A recent KD Nuggets poll caught my attention. It asked respondents to complete a sentence as follows:
“With the trend towards Big Data and Data-driven Machine Learning methods
  • Statistics will become less important
  • Statistics importance will not change
  • Statistics will become more important, as the foundation of Data Science
  • Not sure”
“Statistics will become more important” was the clear winner. While this poll is not a scientific survey, it’s still interesting to see what people are willing to take the time to express their views on. The word “statistics” has different contexts, and it's unfortunate that it has been viewed so negatively for so long by so many. This is perhaps why we’ve seen other terms come in (and out) of fashion: data mining, data science, predictive analytics, etc., to rebrand what are essentially statistical concepts. It’s ironic that many people put statistics in a box as though it’s forever one thing — only deals with small data, is about hypothesis testing, etc. — when statistics is fundamentally about dealing with change.
In other posts, I’ve written about statistics as its own discipline, how it is core to data analysis and value creation, and how statistical literacy is growing in importance (celebrating the first-ever International Year of Statistics). Jeff Leek of Simply Statistics has a nice YouTube video on The Landscape of Data Analysis showing that statistics is foundational to data analysis, and while other disciplines also contribute, they don’t contribute as directly.
The first time I heard statistics described as “the language of science” was many years ago in a conversation with David Salsburg, author of The Lady Tasting Tea and first statistician Pfizer ever hired. To be more scientific in any decisions you make — in science, in industry, in government — you will need statistics! And statistics needs you! Robert Tibshirani, eminent statistician at Stanford University, was quoted in The New York Times Bits blog last year: “Statistics is unusual.  … It’s a service field to other disciplines. It doesn’t rely on its own work. It needs others.” This is similarly expressed in a recent interview I did with Professor Shirley Coleman.
It wasn’t too long ago that universities required you to meet foreign language requirements, especially for graduate degrees in the sciences. It now appears that a new language — statistics — is working its way in to the curricula for degrees in science as well as business. Recently, a proposal was made to establish a statistics curriculum within the chemistry departments of US colleges and universities. This is apparently true in parts of Europe as well, as reported in a recent issue of The Analytical Scientist. In addition to the evolving curricula in business schools to include more statistics/data mining/predictive analytics (and offering new degrees in these areas), even the hard sciences are incorporating more statistics to better prepare their graduates for jobs in industry.
Many of our customers confirm that they spend a few years investing in their new hires to instill in them best statistical practices because their academic training has not adequately prepared them to do the work that is needed. One of our longtime partners, Predictum, has been offering courses like Data Analysis and Statistics for Scientists and Engineers since 1997.
Statistics as a word may have some baggage (many unfortunately did not have the best introduction to this powerful subject and think of it as “sadistics”), but “statistical thinking” is another term that casts everything in a more strategic light. Statistical thinking is being scientific about problem-solving, speaking the language of science in any given context, because what is science? Science — good science — is the efficient and effective way of understanding the natural and social world to be more informed, and make better use of that information. In a recent webcast with Russ Wolfinger, we got to see and hear about some really interesting applications of statistical thinking in science.
Good science and speaking its language with some level of proficiency, is required to derive value from the growing volume and complexity of data we continue to amass. May we all learn, at some level, the language of science so we can make more informed decisions, best utilize scarce resources and compel better actions.
Note: A version of this post first appeared in the International Institute for Analytics blog.