Oracle Big Data Preparation Cloud Service provides a highly intuitive and interactive way for analysts to prepare unstructured, semi-structured and structured data for downstream processing. Built natively in Hadoop and Spark for scale, it simply aims to Simplify Big Data Preparation.
We are pleased to announce the release of Big Data Preparation Cloud Service (BDP) 16.4.1.
This release has a significant number of bug fixes and some exciting new features. Please review the What’s New section in our on-line documentation with detailed information about the changes on this release.
Local HDFS Support – Users now have the flexibility to quickly get started preparing datasets out of the box by uploading, processing, publishing, and downloading datasets from the local HDFS cluster without requiring pre-existing Storage Cloud accounts. Note that this feature is purely for Dev/Test purposes and customer still need Storage Cloud for production processing.
Multi-File Processing – Powerful feature enables multiple files from a single directory to be ingested and prepared by the service simultaneously regardless of file format or structure and published as a single prepared and enriched result. Extremely useful for processing upstream logs or any form of source consisting of multiple small files generated by upstream applications.
Null Replace – The profile service already discovers nulls in data, this new features allows users to address discovered nulls in their data via the already existing search and replace functionality.
Dataset Similarity Web Service – The first exposure of an exciting new capability that empowers business users and domain experts to quickly discover similarities between two files at the column level by leveraging the deep column signatures discovered by the profiling.
Knowledge Service – Various enhancements were made to the knowledge service, such as increasing the maximum size of custom knowledge, improving performance, reducing memory requirements and adding access control and editing restrictions.
Architectural Improvements (internal) – The processing pipeline is now a single orchestrated Spark job which will enable powerful interactivity from the UI as well as provide future support for layered service architectures.