X

Tips and Tricks

Text Analytics using a pre-built Wikipedia-based Topic Model

In my previous post, Explicit Semantic Analysis (ESA) for Text Analytics, we explored the basics of the ESA algorithm and how to use it in Oracle R Enterprise to build a model from scratch and use that model to score new text.  While creating your own domain-specific model may be necessary in many situations, others may benefit from a pre-built model based on millions of Wikipedia articles reduced to 200,000 topics. This model is downloadable here with details of how to...

Thursday, November 30, 2017 | Tips and Tricks | Read More

Building "partition models" with Oracle R Enterprise

There are many approaches for improving model accuracy - anything from enriching or cleansing the data you start with to optimizing algorithm parameters or creating ensemble models. One technique that Oracle R Enterprise users sometimes employ is to partition data based on the distinct values of one or more columns and build a model for each partition. By building a model on each partition, forming a kind of ensemble model, better accuracy is possible. The embedded R...

Thursday, October 5, 2017 | Tips and Tricks | Read More

Integrating custom algorithms with Oracle Advanced Analytics with R

Data scientists and other users of machine learning and predictive analytics technology often have their favorite algorithm for solving particular problems. If they are using a tool like Oracle Advanced Analytics -- with Oracle R Enterprise and Oracle Data Mining -- there's a desire to use these algorithms within that tool's framework. Using ORE's embedded R execution, users can already use 3rd party R packages in combination with Oracle  Database for execution at the...

Tuesday, October 3, 2017 | Tips and Tricks | Read More

Parallel Training of Multiple Foreign Exchange Return Models

In a variety of machine learning applications, there are often requirements for training multiple models. For example, in the internet of things (IoT) industry, a unique model needs to be built for each household with installed sensors that measure temperature, light or power consumption. Another example can be found in the online advertising industry. To serve personalized online advertisements or recommendations, a huge number of individualized models has to be built and...

Friday, May 26, 2017 | Tips and Tricks | Read More

Using SVD for Dimensionality Reduction

SVD, or Singular Value Decomposition, is one of several techniques that can be used to reduce the dimensionality, i.e., the number of columns, of a data set. Why would we want to reduce the number of dimensions? In predictive analytics, more columns normally means more time required to build models and score data. If some columns have no predictive value, this means wasted time, or worse, those columns contribute noise to the model and reduce model quality or predictive...

Friday, February 5, 2016 | Tips and Tricks | Read More

Consolidating wide and shallow data with ORE Datastore

Clinical trial data are often characterized by a relatively small set of participants (100s or 1000s) while the data collected and analyzed on each may be significantly larger (1000s or 10,000s). Genomic data alone can easily reach the higher end of this range. In talking with industry leaders, one of the problems pharmaceutical companies and research hospitals encounter is effectively managing such data. Storing data in flat files on myriad servers, perhaps even “closeted”...

Thursday, September 10, 2015 | Tips and Tricks | Read More