Tuesday Oct 22, 2013

Understanding Data Science: Recent Studies

If you need such a deeper understanding of data science than Drew Conway's popular venn diagram model, or Josh Wills' tongue in cheek characterization, "Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician." two relatively recent studies are worth reading.  

'Analyzing the Analyzers,' an O'Reilly e-book by Harlan Harris, Sean Patrick Murphy, and Marck Vaisman, suggests four distinct types of data scientists -- effectively personas, in a design sense -- based on analysis of self-identified skills among practitioners.  The scenario format dramatizes the different personas, making what could be a dry statistical readout of survey data more engaging.  The survey-only nature of the data,  the restriction of scope to just skills, and the suggested models of skill-profiles makes this feel like the sort of exercise that data scientists undertake as an every day task; collecting data, analyzing it using a mix of statistical techniques, and sharing the model that emerges from the data mining exercise.  That's not an indictment, simply an observation about the consistent feel of the effort as a product of data scientists, about data science. 

And the paper 'Enterprise Data Analysis and Visualization: An Interview Study' by researchers Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffery Heer considers data science within the larger context of industrial data analysis, examining analytical workflows, skills, and the challenges common to enterprise analysis efforts, and identifying three archetypes of data scientist.  As an interview-based study, the data the researchers collected is richer, and there's correspondingly greater depth in the synthesis.  The scope of the study included a broader set of roles than data scientist (enterprise analysts) and involved questions of workflow and organizational context for analytical efforts in general.  I'd suggest this is useful as a primer on analytical work and workers in enterprise settings for those who need a baseline understanding; it also offers some genuinely interesting nuggets for those already familiar with discovery work.

We've undertaken a considerable amount of research into discovery, analytical work/ers, and data science over the past three years -- part of our programmatic approach to laying a foundation for product strategy and highlighting innovation opportunities -- and both studies complement and confirm much of the direct research into data science that we conducted. There were a few important differences in our findings, which I'll share and discuss in upcoming posts.

Friday Oct 18, 2013

Defining Discovery: Core Concepts

Discovery tools have had a referencable working definition since at least 2001, when Ben Shneiderman published 'Inventing Discovery Tools: Combining Information Visualization with Data Mining'.  Dr. Shneiderman suggested the combination of the two distinct fields of data mining and information visualization could manifest as new category of tools for discovery, an understanding that remains essentially unaltered over ten years later.  An industry analyst report titled Visual Discovery Tools: Market Segmentation and Product Positioning from March of this year, for example, reads, "Visual discovery tools are designed for visual data exploration, analysis and lightweight data mining."

Tools should follow from the activities people undertake (a foundational tenet of activity centered design), however, and Dr. Shneiderman does not in fact describe or define discovery activity or capability. As I read it, discovery is assumed to be the implied sum of the separate fields of visualization and data mining as they were then understood.  As a working definition that catalyzes a field of product prototyping, it's adequate in the short term.  In the long term, it makes the boundaries of discovery both derived and temporary, and leaves a substantial gap in the landscape of core concepts around discovery, making consensus on the nature of most aspects of discovery difficult or impossible to reach.  I think this definitional gap is a major reason that discovery is still an ambiguous product landscape.

To help close that gap, I'm suggesting a few definitions of four core aspects of discovery.  These come out of our sustained research into discovery needs and practices, and have the goal of clarifying the relationship between discvoery and other analytical categories.  They are suggested, but should be internally coherent and consistent.  

Discovery activity is: "Purposeful sense making activity that intends to arrive at new insights and understanding through exploration and analysis (and for these we have specific defintions as well) of all types and sources of data."

Discovery capability is: "The ability of people and organizations to purposefully realize valuable insights that address the full spectrum of business questions and problems by engaging effectively with all types and sources of data."

Discovery tools: "Enhance individual and organizational ability to realize novel insights by augmenting and accelerating human sense making to allow engagement with all types of data at all useful scales."

Discovery environments: "Enable organizations to undertake effective discovery efforts for all business purposes and perspectives, in an empirical and cooperative fashion."

Note: applicability to a world of Big data is assumed - thus the refs to all scales / types / sources - rather than stated explicitly.  I like that Big Data doesn't have to be written into this core set of definitions, b/c I think it's a transitional label - the new version of Web 2.0 - and goes away over time.

References and Resources:


Exploring the emerging space of discovery interactions, analytics, and sensemaking.


« October 2013 »