In my previous post To sample or not to sample, we discussed some of the issues involved in sampling data for use in machine learning. In this post, we look at...

In my previous post To sample or not to sample, we discussed some of the issues involved in sampling data for use in machine learning. In this post, we look at using Oracle R Enterprise transparency layer to perform a few types of sampling: simple random sampling, with and without replacement, and stratified sampling. When your data is too large to fit in memory, you're left with a paradox: you need to sample the data so it fits in memory, but you need to load it into memory...

In my previous post To sample or not to sample, we discussed some of the issues involved in sampling data for use in machine learning. In this post, we look at using Oracle R Enterprise transparency...

Ideally, we would know the exact answer to every question. How many people support presidential candidate A vs. B? How many people suffer from H1N1 in a given...

Ideally, we would know the exact answer to every question. How many people support presidential candidate A vs. B? How many people suffer from H1N1 in a given state? Does this batch of manufactured widgets have any defective parts? Knowing exact answers is expensive in terms of time and money and, in most cases, is impractical if not impossible. Consider asking every person in a region for their candidate preference, testing every person with flu symptoms for H1N1 (assuming...

Ideally, we would know the exact answer to every question. How many people support presidential candidate A vs. B? How many people suffer from H1N1 in a given state? Does this batch of manufactured...

This installment of the Data Science Maturity Model (DSMM) blog series contains a summary table of the dimensions and levels. Enterprises embracing data science...

This installment of the Data Science Maturity Model (DSMM) blog series contains a summary table of the dimensions and levels. Enterprises embracing data science as a core competency may want to evaluate what level they have achieved relative to each dimension - in some cases, an enterprise may straddle more than one level. As a next step, the enterprise may use this maturity model to identify a level in each dimension to which they aspire, or fashion a new Level 6. ...

This installment of the Data Science Maturity Model (DSMM) blog series contains a summary table of the dimensions and levels. Enterprises embracing data science as a core competency may want...

In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'deployment': How easily can data science work products be...

In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'deployment': How easily can data science work products be placed into production to meet timely business objectives? Data science comes with the expectation that amazing insights and predictions will transform the business and take the enterprise to a new level of performance. Too often, however, data science projects fail to "lift-off," resulting is significant opportunity...

In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'deployment': How easily can data science work products be placed into production to meet...

In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'tools': What tools are used within the enterprise for data...

In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'tools': What tools are used within the enterprise for data science? Can data scientists take advantage of open source tools in combination with high performance and scalable production quality infrastructure? A wide range of tools support data science ranging from open source to proprietary, relational database to "big data" platforms, simple analytics to complex machine...

In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'tools': What tools are used within the enterprise for data science? Can data scientists...

In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'asset management': How are data science assets managed and...

In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'asset management': How are data science assets managed and controlled? Assets are typically both tangible and intangible things of value. For this discussion, we will consider the array of data science work products as assets and can define 'asset management' at a high level as "any system that monitors and maintains things of value to an entity or group." As we introduced...

In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'asset management': How are data science assets managed and controlled? Assets are typically...

In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'scalability': Do the tools scale and perform for data...

In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'scalability': Do the tools scale and perform for data exploration, preparation, modeling, scoring, and deployment? As data, data science projects, and the data science team grow, is the enterprise able to support these adequately? The term 'scalability' can be defined as the "capability of a system, network, or process to handle a growing amount of work, or its potential to be...

In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'scalability': Do the tools scale and perform for data exploration, preparation, modeling,...

In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'methodology': What is the enterprise approach or...

In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'methodology': What is the enterprise approach or methodology to data science? The most often cited methodology for 'data mining' - a key element of data science - is CRISP-DM. However, the breadth and growth of data science may require expanding beyond the traditional phases introduced by CRISP-DM: Business Understanding, Data Understanding, Data Preparation, Modeling,...

In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'methodology': What is the enterprise approach or methodology to data science? The most often cited...

In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'collaboration': How do data scientists collaborate among...

In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'collaboration': How do data scientists collaborate among themselves and with others in the enterprise, e.g., business analysts, application and dashboard developers, to evolve and hand-off data science work products? Data science projects often involve significant collaboration, defined as "two or more people or organizations working together to realize or achieve a goal."...

In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'collaboration': How do data scientists collaborate among themselves and with others in...