A recent KD Nuggets
poll caught my attention. It asked respondents to complete a sentence as follows:
“With the trend towards
Big Data and Data-driven Machine Learning methods
- Statistics will become less important
- Statistics importance will not change
- Statistics will become more important, as the foundation of Data Science
- Not sure”
“Statistics will become more important” was the clear winner. While
this poll is not a scientific survey, it’s still interesting to see what
people are willing to take the time to express their views on. The word
“statistics” has different contexts, and it's unfortunate that it has
been viewed so negatively for so long by so many. This is perhaps why
we’ve seen other terms come in (and out) of fashion: data mining, data
science, predictive analytics, etc., to rebrand what are essentially
statistical concepts. It’s ironic that many people put statistics in a
box as though it’s forever one thing — only deals with small data, is
about hypothesis testing, etc. — when statistics is fundamentally about
dealing with change.
In
other posts, I’ve written about
statistics
as its own discipline, how it is core to data analysis and value
creation, and how statistical literacy is growing in importance
(celebrating the first-ever
International Year of Statistics). Jeff Leek of
Simply Statistics has a nice YouTube video on
The Landscape of Data Analysis
showing that statistics is foundational to data analysis, and while
other disciplines also contribute, they don’t contribute as directly.
The first time I heard statistics described as “the language of science” was many years ago in a conversation with
David Salsburg, author of
The Lady Tasting Tea and first statistician Pfizer ever hired. To be more scientific in
any
decisions you make — in science, in industry, in government — you will
need statistics! And statistics needs you! Robert Tibshirani, eminent
statistician at Stanford University, was quoted in The
New York Times Bits blog
last year: “Statistics is unusual. … It’s a service field to other
disciplines. It doesn’t rely on its own work. It needs others.” This is
similarly expressed in a recent interview I did with Professor
Shirley Coleman.
It wasn’t too long ago that universities required you to meet foreign
language requirements, especially for graduate degrees in the sciences.
It now appears that a new language — statistics — is working its way in
to the curricula for degrees in science as well as business. Recently, a
proposal was made to establish a statistics curriculum within the
chemistry departments of US colleges and universities. This is
apparently true in parts of Europe as well, as reported in a recent
issue of
The Analytical Scientist.
In addition to the evolving curricula in business schools to include
more statistics/data mining/predictive analytics (and offering new
degrees in these areas), even the hard sciences are incorporating more
statistics to better prepare their graduates for jobs in industry.
Many of our customers confirm that they spend a few years investing
in their new hires to instill in them best statistical practices because
their academic training has not adequately prepared them to do the work
that is needed. One of our longtime partners,
Predictum, has been offering courses like Data Analysis and Statistics for Scientists and Engineers since 1997.
Statistics as a word may have some baggage (many unfortunately did
not have the best introduction to this powerful subject and think of it
as “sadistics”), but “
statistical thinking”
is another term that casts everything in a more strategic light.
Statistical thinking is being scientific about problem-solving, speaking
the language of science in any given context, because what is science?
Science — good science — is the efficient and effective way of
understanding the natural and social world to be more informed, and make
better use of that information. In a recent
webcast with Russ Wolfinger, we got to see and hear about some really interesting applications of statistical thinking in science.
Good science and speaking its language with some level of
proficiency, is required to derive value from the growing volume and
complexity of data we continue to amass. May we all learn, at some
level, the language of science so we can make more informed decisions,
best utilize scarce resources and compel better actions.
Note: A version of this post first appeared in the
International Institute for Analytics blog.