X

The Oracle NoSQL Database Blog covers all things Oracle NoSQL Database. On-Prem, Cloud and more.

Recent Posts

NoSQL Database

Power your Geo Enabled application with Oracle NoSQL Database

Introduction GeoJSON is becoming a very popular data format among many GIS technologies and services — it's simple, lightweight and straightforward. According to GeoJSON Specification (RFC 7946): GeoJSON is a format for encoding a variety of geographic data structures […]. A GeoJSON object may represent a region of space (a Geometry), a spatially bounded entity (a Feature), or a list of Features (a FeatureCollection). GeoJSON supports the following geometry types: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection. Oracle NoSQL Database supports all of the above geometry types and allows you to query, create indexes and use geo defined functions to extract data. GeoJSON Types GeoJSON is represented as JSON and defines the below geometrical entities Type Description Example Point For a Point the "coordinates" is for a single position { "type" : "point", "coordinates" : [ 23.549, 35.2908 ] } LineSegment A LineString is one or more connected lines; the end-point of one line is the start-point of the next line. The "coordinates" member is an array of two or more positions { "type" : "LineString", "coordinates" : [ [121.9447, 37.2975],[121.9500,37.3171], [121.9892,37.3182],[122.1554, 37.3882],[122.2899,37.4589], [122.4273,37.6032],[122.4304, 37.6267],[122.3975,37.6144]] } Polygon A polygon defines a surface area by specifying its outer perimeter and the perimeters of any potential holes inside the area { "type" : "polygon", "coordinates" : [ [ [23.48, 35.16],[24.30, 35.16], [24.30, 35.50],[24.16, 35.61], [23.74, 35.70],[23.56, 35.60], [23.48, 35.16]]] } MultiPoint For a MultiPoint the "coordinates" field is an array of two or more positions. { "type" : "MultiPoint", "coordinates" : [ [121.9447, 37.2975], [121.9500, 37.3171], [122.3975, 37.6144] ] } MultiLineString For a MultiLineString the "coordinates" member is an array of LineString coordinate arrays { "type": "MultiLineString", "coordinates": [[ [100.0, 0.0], [01.0, 1.0] ], [ [102.0, 2.0], [103.0, 3.0] ]] } MultiPolygon For a MultiPolygon the "coordinates" member is an array of Polygon coordinate arrays { "type": "MultiPolygon", "coordinates": [[[ [102.0, 2.0],[103.0, 2.0], [103.0, 3.0],[102.0, 3.0], [102.0, 2.0] ]], [ [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ]] ] } GeometryCollection Instead of a “coordinates” field, a GeometryCollection has a “geometries” field. The value of "geometries" is an array. { "type": "GeometryCollection", "geometries": [ { "type": "Point", "coordinates": [100.0, 0.0] }, { "type": "LineString", "coordinates": [ [101.0, 0.0], [102.0, 1.0] ] } ] } Querying GeoJson Data Oracle NoSQL Database provides four functions to query for Geo data that have a relationship based on a certain geometry. geo_intersect - geometries that intersect with a GeoJSON geometry select t.poi as park from PointsOfInterest t where t.poi.kind = “nature park” and geo_intersect(t.poi.location, { “type” : “polygon”, “coordinates” : [[ [121.94, 36.28], [117.52, 37.38], [119.99,39.00],[120.00,41.97],[124.21,41.97],[124.39,40.42], [121.94,36.28] ]] }) geo_inside - geometries within a bounding GeoJSON geometry. geo_within_distance - geospatial objects in proximity to a point on a sphere geo_near - geospatial objects in proximity to a point select t.poi as gas_station, geo_distance(t.poi.location, { “type” : “LineString”, “coordinates” : [ [121.9447, 37.2975], [121.9500, 37.3171], [121.9892, 37.3182], [122.1554, 37.3882], [122.2899,37.4589],[122.4273,37.6032],[122.4304,37.6267],[122.3975,37.6144] ] }) as distance from PointsOfInterest t where t.poi.kind = “gas station” and geo_near(t.poi.location, { “type” : “LineString”, “coordinates” : [ [121.9447,37.2975],[121.9500,37.3171],[121.9892,37.3182],[122.1554, 37.3882], [122.2899,37.4589],[122.4273,37.6032],[122.4304,37.6267],[122.3975,37.6144] ] }, 1609) Indexes on GeoJson Data Indexing GeoJson data is similar to indexing a JSON type data in Oracle NoSQL Database. When defining a GeoJson index the “geometry” or “point” keyword must be used after an index etc. For optimal performance “point” must be used when rows in the table are expected to have single point geometries. Indexing of geometries is based on geo hashing that encodes a longitude/latitude pair to a string. A more detailed explanation for indexes and geo hashing can be found here. create index idx_test1 on testTable(coord.point as point); Using Geo Queries: Hyperlocal marketing is the process of targeting prospective customers in a highly specific, geographically restricted area, sometimes just a few blocks or streets, often with the intention of targeting people conducting “near me” searches on their mobile device. For example, let’s say you’re regular customer is in your vicinity. You as a book store that maintains customer profile and their buying details have created ad campaigns for different customer profiles to increase foot fall into your store. Hyperlocal marketing will help you push these notifications to your customers on probable personalized offers that you want to specifically send to your customers. Sample data: {   "id" : 1,   "info" : {     "kind" : "store1",     "country": "greece",     "region" : "crete",     "county" : "chania",     "city" : null,     "point" : { "type" : "point", "coordinates" : [ 23.549, 35.2908 ] }   } } {   "id" : 2,   "info" : {     "kind" : "store2",     "country": "greece",     "region" : "crete",     "county" : "iraklion",     "city" : null,     "point" : { "type" : "point", "coordinates" : [ 24.9, 35.4 ] }   } } Table The table structure would have 2 column with the GeoJSON data stored in “info”.   create table points (id integer, info json, primary key(id)); Index We will create 2 indexes, one which creates an index on the “point” attribute and another one on the “city” attribute create index idx_ptn on points(info.point as point)   create index idx_kind_ptn_city on points(info.kind as string,                                                                           info.point as point,                                                                           info.city as string) Queries Here we are searching for all stores from the current point (location of the customer) within a radius (configured) # # All stores # declare         $radius2 double;         $point1 double;         $point2 double;   select /* FORCE_PRIMARY_INDEX(points) */        id, p.info.point from points p where       geo_within_distance(p.info.point,                           { "type" : "point",                             "coordinates" : [$point1, $point2]                           },                           $radius2) More detailed documentation and examples of GeoJson support can be found in the official Oracle NoSQL Database SQL Specification.

Introduction GeoJSON is becoming a very popular data format among many GIS technologies and services — it's simple, lightweight and straightforward. According to GeoJSON Specification (RFC 7946): GeoJSON ...

NoSQL Database

Migrate MongoDB data to Oracle NoSQL Database

Introduction Migrating data from a database to another entail design decisions related to data consistency, downtime of applications, and the key design differences between the two databases. When it comes to migrating across NoSQL databases, the challenges are multifold considering the many flavors such as document, key value, wide column, graph, etc. available today. Oracle NoSQL Database makes it easy to migrate data from different data models be it document, key-value or wide column databases. Oracle NoSQL Database is a truly multi-model NoSQL data store which allows the customer to choose between key-value, JSON or table representation of their data. It is shard local ACID and provides control at the operation level for consistency and durability. One can leverage the efficient Elastic Search integration for their full-text index needs which includes full-text indexes on JSON documents. One can store Large Objects in Oracle NoSQL Database. The database provides a rich and extremely efficient LOB API. The database allows you to start with the key-value model of storing data and then can easily migrate to the table model and take advantage of the table semantics and SQL like Query feature. If you need joins then the database also provides the customers to build their data model as Parent-Child as against modeling as large nested documents. Export-Import utility Oracle NoSQL Database includes an Export/Import utility that allows you to export and import Binary data files from/to NoSQL database. With the 19.1 release of the database, this utility has been further enhanced to process multiple data formats. The utility now supports export data from NoSQL database to data files in Binary or JSON format, and import data files in Binary, JSON or MongoDB JSON format to Oracle NoSQL database. Data format – Binary Export from Oracle NoSQL Database    Import into Oracle NoSQL Database Data format – JSON    Export from Oracle NoSQL Database    Import into Oracle NoSQL Database Data format – MongoDB JSON          Import into Oracle NoSQL Database Understanding Source and Sink:  A source in the Import utility is file-based which contains the data exported from a database. Let us consider we want to migrate data from MongoDB to Oracle NoSQL Database. Here all the data from MongoDB are exported in JSON format and stored in files. These files are identified as “source”. A sink in the Import utility is always Oracle NoSQL Database. We have already identified MongoDB extracted JSON file as the source and we want to import this data into Oracle NoSQL Database which would be called as “sink”. User Experience - Export: Using the export utility, one can export all the data/metadata from the Oracle NoSQL Database store to local file system. Using the utility one can export: Application created data Schema definitions such as table definitions, Avro schemas, and the index definitions. TTL of every table record The utility creates a package which contains data, schema, and logging information in a well-defined format which can be consumed by the import package. The syntax for export: # Export the entire kvstore or a given table to an export store. Use config file to specify export store parameters.  java -jar KVHOME/lib/kvtool.jar export              -export-all | -table table_names | -namespace namespaces              -store storeName              -helper-hosts helper_hosts              -config config_file_name              [-format BINARY | JSON]              [-username user]              [-security security-file-path]              [-verbose] User Experience – Import: Using the import utility, one can import the data from local file system) to Oracle NoSQL Database.  Using the utility one can import: Import MongoDB JSON format exported data into Oracle NoSQL Database All the schema definitions and the data (table, Avro and none format data) from the backed up store to the Oracle NoSQL Database store. The utility first imports all the schema definitions and then imports the user data into the Oracle NoSQL Database. Individual table or tables with specified namespace The TTL of the record The syntax for import: # To import the schema definitions and data from Oracle NoSQL store into local filesystem: java -jar KVHOME/lib/kvtool.jar import         -import-all | -table table_names | -namespace namespaces | -external         -store storeName         -helper-hosts helper_hosts         -config config_file_name         [-status status_file]         [-format BINARY | JSON | MONGODB_JSON]         [-username user]         [-security security-file-path]         [-verbose] Using Export-Import: While examples can also be found in the official documentation page Using Export Import Utility. For this article let us consider an existing Vacation booking application that stores Country information in MongoDB and decides to migrate its data to the enterprise scale Oracle NoSQL Database. The country information is stored as JSON document and contains information such as – landlocked, shares borders with, latitude and longitude, currency, official name, official languages spoken etc. and more such information that might be useful for a person trying to book a vacation to a desired country. A sample JSON document is given below: {             "_id": {                         "$oid": "55a0f42f20a4d760b5fc3153"             },             "altSpellings": ["US", "USA", "United States of America"],             "area": 9.37261e+06,             "borders": ["CAN", "MEX"],             "callingCode": ["1"],             "capital": "Washington D.C.",             "cca2": "US",             "cca3": "USA",             "ccn3": "840",             "cioc": "USA",             "currency": ["USD", "USN", "USS"],             "demonym": "American",             "landlocked": false,             "languages": {                         "eng": "English"             },             "latlng": [38, -97],             "name": {                         "common": "United States",                         "native": {                                     "eng": {                                                 "common": "United States",                                                 "official": "United States of America"                                     }                         },                         "official": "United States of America"             },             "region": "Americas",             "subregion": "Northern America",             "tld": [".us"],             "translations": {                         "deu": {                                     "common": "Vereinigte Staaten von Amerika",                                     "official": "Vereinigte Staaten von Amerika"                         },                         "fin": {                                     "common": "Yhdysvallat",                                     "official": "Amerikan yhdysvallat"                         },                         "fra": {                                     "common": "États-Unis",                                     "official": "Les états-unis d'Amérique"                         },                         "hrv": {                                     "common": "Sjedinjene Američke Države",                                     "official": "Sjedinjene Države Amerike"                         },                         "ita": {                                     "common": "Stati Uniti D'America",                                     "official": "Stati Uniti d'America"                         },                         "jpn": {                                     "common": "アメリカ合衆国",                                     "official": "アメリカ合衆国"                         },                         "nld": {                                     "common": "Verenigde Staten",                                     "official": "Verenigde Staten van Amerika"                         },                         "por": {                                     "common": "Estados Unidos",                                     "official": "Estados Unidos da América"                         },                         "rus": {                                     "common": "Соединённые Штаты Америки",                                     "official": "Соединенные Штаты Америки"                         },                         "spa": {                                     "common": "Estados Unidos",                                     "official": "Estados Unidos de América"                         }             } } You can see above that this data is a typical MongoDB record with the auto-generated $oid. The Export-Import utility can read this MongoDB JSON format and load it into Oracle NoSQL Database. Let us look at how easy and with minimal steps this can be performed. The Export/Import utility requires a JSON represented config file. You can customize your import operation by including one or many of the following options: {     "configFileVersion": <version>,     "abortOnError": [false | true],     "errorOutput": <error-dir>,     "errorFileSizeMB": <error-file-chunk-size-mb>,     "errorFileCount": <error-file-count>,     "path": <dir-or-file>,                    "namespace": <namespace>,      "tableName": <table-name>,                     "ignoreFields": [<field1>, <field2>, ...],     "renameFields": {         <old-name>:<new-name>,         ...     }     "charset": <charset-name>,     "ddlSchemaFile": <file>,     "continueOnDdlError": <false | true>,     "streamConcurrency": <stream-parallelism>,     "overwrite": <false | true>,     “ttlRelativeDate”: <date-to-use in UTC>,     “dateTimeToLong”: <true | false> } The most frequently used config options are path – path to the JSON data file Namespace – the namespace in Oracle NoSQL Database that you would like the imported table to be created. IgnoreFields – the attributes in the JSON document that you would want to ignore renameFields – the attributes in the JSON document that you would like to rename in the table in Oracle NoSQL Database. ddlSchemaFile – point this to the table schema DDL script file Run the import utility java -jar kvtool.jar import -helper-hosts localhost:5000 -store kvstore -external -format MONGODB_JSON -config ./mongoimp/config/config.json –verbose The parameters for the kvtool.jar are: -import – to specify that the current operation is an Import. -helper-hosts – the Oracle NoSQL Database connection details. -store – the registered name of the kvstore -external specifies that the data to import has been generated by a source other than Oracle NoSQL Database. -format specifies the format to import. -config – the location of the config file explained above. With just 2 steps you would be able to migrate your MongoDB data to Oracle NoSQL Database. Happy migration!

Introduction Migrating data from a database to another entail design decisions related to data consistency, downtime of applications, and the key design differences between the two databases. When...

NoSQL Database

Announcing a feature rich Oracle NoSQL Database Export-Import

We are pleased to announce the release of Oracle NoSQL Database Version 19.1. This release contains the following new features: Export-Import utility: With this release, we have enhanced the export-import utility making it easier to move the data in and out of the Oracle NoSQL Database. The utility now supports: Export table data from Oracle NoSQL Database and store the data as JSON formatted files on a local (or network mounted) file system. Import ad-hoc JSON data generated from a relational database or other sources, and JSON data generated via MongoDB strict export. Export data and metadata from one or more existing Oracle NoSQL Database tables, raw key/value based data, and large object data to a compact binary format. Read data from, or write data to files in the file system. Import one or more tables into an Oracle NoSQL Database. Restart from a checkpoint if an import or export fails before completion. JSON datatype support for Full Text Search: Users can now create full text search indexes on a JSON field and on the attributes within that JSON field. Oracle continues to enhance Oracle NoSQL Database to meet customer requirements. To download - Visit Here The Release Notes can be found Here. Join our mailing list - NoSQL-Database@oss.oracle.com Visit our LinkedIn page Follow us @OracleNoSQL and oracle-nosql

We are pleased to announce the release of Oracle NoSQL Database Version 19.1. This release contains the following new features: Export-Import utility: With this release, we have enhanced...

Working with Oracle NoSQL Database Aggregate Expressions

Introduction As an application developer building modern applications you will be constantly required to handle speed for both inserts and querying of data. Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across a configurable set of machine. It is truly multi model with flexibility in defining your data models as Key-Value, strict table schema or JSON document structure. While modern applications need data model flexibility, the need for aggregate operations that can process records and return computed results simplifies application code to a great extent. Oracle NoSQL Database provides a rich set of aggregate operations that perform calculations on your data sets via the SQL like query language. The database provides a SQL like interface that can be used to query data from a flat relational data model, hierarchical typed data and schema-less JSON data models seamlessly. Queries can be executed on either the command line sql interface or using the Java API. SQL like Query Language SQL (Structured Query Language) is widely used in programming primarily for managing data held in data stores. It is particularly defined in the form of statements which are classified as DML (Data Manipulation Language), DDL (Data Definition Language) among others. To get a detailed introduction with examples on the SQL like query language for Oracle NoSQL Database see Getting Started with SQL for Oracle NoSQL Database. As with the 18.3 version of Oracle NoSQL Database the SQL like query language includes support for non-updating queries (SELECT), updating queries (INSERT, DELETE) and DDL queries (CREATE, ALTER). Currently Oracle NoSQL Database supports Atomic types (Integer, Long, Float, Double, Number, String, Boolean, Binary, Timestamp), Complex Types (Arrays, Map, Record) and JSON type of data. It also includes the ability to perform simple aggregates and perform join operations on Parent-Child tables. In this post let us look at details of how using simple aggregates you can provide a real time or near real time representation of your table data. Use Case - Customer 360 Let’s take a Cab Aggregator application (web/mobile) which allows a typical customer to book a cab using its service. This customer sets himself on the application by creating a demographic profile before he starts using the core service. A 360-degree customer view helps companies to get a complete view of customers buying and behavioral pattern by aggregating data from various touch points that a customer may use to purchase or receive a service. This requires identification of new ways to capture Customer’s data which could be structured (third party applications) or unstructured (social media channels) and combine them in a central location and analyze them. Oracle NoSQL Database with its ability to be truly multi-model is a perfect fit to build and store your Customer 360 profile and run aggregate queries which can help to analyze and identify customer needs over and beyond serving their requests. Typical questions that can be answered using the aggregate operations are: How many movies has a customer booked while booking a cab to a multiplex cinema hall? How many metro train tickets has a customer booked while booking a cab to the nearest metro station? What is the total revenue generated by a customer? What is the minimum amount spent by a customer for a booked cab? What is the maximum amount spent by a customer for a booked cab? What is the average amount spent by a customer for a booked cab? A list of all the supported Aggregate operations can be found here. Let us see some examples of how aggregation queries in Oracle NoSQL Database can be used to analyze a customer’s behavior. Examples: Below are examples of queries that can be run on a typical Customer 360 profile for the Cab Aggregator Industry. The table(s) used are: DataLake CREATE TABLE DataLake (ID INTEGER, profile JSON, PRIMARY KEY (ID)) DataLake.MetroBooking CREATE TABLE DataLake.MetroBooking (MB_ID INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1 MAXVALUE 10000), metroBookingDetails JSON, PRIMARY KEY (MB_ID)) DataLake.MovieBooking CREATE TABLE DataLake.MovieBooking (MV_ID INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1 MAXVALUE 10000), movieBookingDetails JSON, PRIMARY KEY (MV_ID)) NOTE: The assumption made below – the Cab Aggregator mobile app includes Value Added Services as Customer Touch Points like Movie Booking and Train Ticket Booking while they book their cabs to reach a Multiplex Theater or the nearest Local Train station. Movies booked by User/Customer with ID 1 in the last 1 year (in the query i = each month of the year) select count(*) as count from DataLake.MovieBooking dlm where month(cast(dlm.movieBookingDetails.createTime as Timestamp(0))) = "+i+" and dlm.ID=1 The query above uses the following constructs: Aggregate function – count Timestamp Function – month Utility Function – cast (Timestamp is represented as String within a JSON) Train tickets booked by User/Customer with ID 1 in the last 1 year (in the query i = each month of the year) select count(*) as count from DataLake.MetroBooking dlm where month(cast(dlm.metroBookingDetails.createTime as Timestamp(0))) = "+i+" and dlm.ID=1 The query above uses the following constructs: Aggregate function – count Timestamp Function – month Utility Function – cast (Timestamp is represented as String within a JSON) Total number of Cabs booked by a User/Customer with ID 1 select size(dl.profile.customerRideDetails) as size from DataLake dl where cp.ID=1 The query above uses the following constructs: Aggregate function – size Total revenue generated by a User/Customer with ID 1  in booking Metro train tickets select sum(cast(dlm.metroBookingDetails.fare as INTEGER)) from DataLake.MetroBooking dlm where dlm.ID = 1 The query above uses the following constructs: Aggregate function – sum Utility Function – cast (Timestamp is represented as String within a JSON) Total revenue generated by User/Customer with ID 1 on cab rides booked. select sum(dl.profile.customerRideDetails[].price) as sum from DataLake dl where cp.ID=1 The query above uses the following constructs: Aggregate function – sum Summary Oracle NoSQL Database supports aggregate functions like sum, count, size, avg, max and min. More details can be found in the official documentation – Getting Started with SQL for Oracle NoSQL Database and SQL for NoSQL Specification.  

Introduction As an application developer building modern applications you will be constantly required to handle speed for both inserts and querying of data. Oracle NoSQL Database is a scalable,...

NoSQL Database

15 minutes to Oracle NoSQL Database Cloud Service

Introduction Oracle NoSQL Database cloud service is a fully managed NoSQL database cloud service for today’s most demanding applications that require predictable low latencies, flexible data models, and elastic scaling for dynamic workloads. The database cloud service is suitable for applications such as Internet of Things, user experience personalization, real time fraud detection, and online display advertising. Developers focus on application development without dealing with the hassle of managing back-end servers, storage expansion, cluster deployments, software installation/patches/upgrades, backup, operating systems, and high availability configurations. If you would like to understand some of the key features of Oracle NoSQL Database cloud service please read this. Getting started Getting started with Oracle NoSQL Database cloud service requires you to use your favorite Java development Application IDE and create your tables. The only pre-requisite is a valid Oracle Cloud Subscription. For more detailed steps on activation of your Oracle Cloud account, creating an Oracle NoSQL Database cloud user and basic configuration requirements for your Oracle NoSQL Database application to connect to Oracle NoSQL Database cloud service read this. Your First Application To get started with your Oracle NoSQL Database Cloud service you would need to create a table. As mentioned above and ensuring that you have the basic configuration information like client credentials, entitlement id, idcs URL and service end point URL you can create your table and start building your application. The following are the service end-point URL’s supported today with more regions being added in the coming months: North America East - ndcs.uscom-east-1.oraclecloud.com North America West - ndcs.uscom-west-1.oraclecloud.com Europe - ndcs.eucom-central-1.oraclecloud.com In addition to the configuration information you will need: Cloud Java Driver Eclipse Plugin to ease your application development. To create your first table with instructions on where to find client credentials can be found here. Tools Some of the tools provided by Oracle NoSQL Database Cloud service to take you further in building a complete user application are: Oracle NoSQL Database Capacity Estimator and Cost Estimator. Oracle NoSQL Database Cloud Simulator Oracle NoSQL Database Cloud Eclipse plugin.

Introduction Oracle NoSQL Database cloud service is a fully managed NoSQL database cloud service for today’s most demanding applications that require predictable low latencies, flexible data models,...

NoSQL Database

Ease Your Oracle NoSQL Database Cloud Application Development with Eclipse

How about building an Oracle NoSQL Database Cloud Service application up and running quickly? If you missed how you can run a locally deployable Oracle NoSQL Database Cloud Simulator to build, test and debug your Java application read this. To enhance your experience of building an application with Oracle NoSQL Database Cloud Service we are announcing a plugin for the commonly used open source Integrated Development Environments (IDEs) - Eclipse. With this Eclipse plugin you can: Connect to a running instance of the Oracle NoSQL Database Cloud Simulator or Oracle NoSQL Database cloud service to Explore development/test data from tables associated with your Oracle Cloud account Build and test your Oracle NoSQL Database queries Paginate results Add a NoSQL Nature to an existing project – this simplifies adding Oracle NoSQL Database Cloud service client and dependent libraries automatically to your project classpath Browse Examples Excited! Read further to learn more on how to install the plugin and explore the features that have been provided. Install Oracle NoSQL Database Cloud Simulator Oracle NoSQL Database Cloud Simulator is distributed as cloudsim.jar within the SDK. The latest SDK and Driver packages can be downloaded from Oracle Technology Network. Installation of Cloud Simulator is as simple as unzip/gunzip the distribution. Detailed instructions can be found in the official documentation. Starting Oracle NoSQL Database Cloud Simulator Oracle NoSQL Database Cloud SDK comes with helper scripts which will help you to start Cloud Simulator. To start cloudsim the only parameter required is –root which is where the database files are created. For other options and detailed steps to start Cloud Simulator is here. Install Eclipse Plugin for Oracle NoSQL Database Cloud Simulator NOTE: Oracle NoSQL Database Eclipse Plugin has been built to work with Eclipse Neon 4.6 and later releases. Now that you have started Cloud Simulator, you can install the Eclipse Plugin to build your application. Follow the screenshots below for detailed instructions to install the plugin. Before installing ensure you have downloaded the plugin distribution and unzipped to a desired location. Help -> Install Software Point to the directory where you have unzipped the distribution If the plugin is already installed – this step will update the installation Accept the License and click on Finish This will take a few mins to complete the installation and prompt you to restart Eclipse. Once Eclipse is restarted to verify the plugin installation: Right click on your existing application -> Configure. You should an option to Add Oracle NoSQL Nature Alternately: Window->Show View -> Other You should see Oracle NoSQL – Schema Explorer Configure the plugin Once the plugin is installed and verified you need to ensure the plugin understands where to find the Oracle NoSQL Database Cloud SDK. Instructions on downloading the SDK, client libraries can be found here. Once you have the SDK follow these steps to set preferences for the plugin. Window -> Preferences Oracle NoSQL Preferences – The plugin allows you to work with either Cloud Simulator or Oracle NoSQL Database Cloud Service. If you would like your application to work with Cloud Simulator: Select the Profile Type CloudSim. Enter the Service URL – http://localhost:8080 Tenant Identifier – this will be the namespace under which all your tables would be created Location of the downloaded SDK If you would like your application to work with the Oracle NoSQL Database Cloud Service: Select the Profile Type as Cloud Enter the Endpoint – ndcs.uscom-east-1.oraclecloud.com/ Entitlement ID – this will be the 9 digit entitlement ID associated with your cloud account. IDCS URL – this will be the idcs – Authentication URL associated with your cloud account. NOTE: The endpoint is specific to one of the available data centers. Please ensure to check and use the right endpoint where your Oracle Cloud account has been created. NOTE: In case you don’t have the Entitlement ID and IDCS URL follow this to obtain details specific to your Oracle Cloud account. Adding a Oracle NoSQL Database Nature to an existing Eclipse Project To add an Oracle NoSQL nature to your existing application Right click on the application Configure Add Oracle NoSQL Nature Browse Oracle NoSQL Database Examples To browse Examples that are bundled as part of the SDK, the plugin allows you to create a new Eclipse Project of type NoSQL Examples File -> New -> Other -> Oracle NoSQL – NoSQl Examples Click on next and enter the desired Project Name Select one/all the bundled examples and click on Finish This will create a new Project with the required Oracle NoSQL Database libraries and the source of the examples. Explore Tables and Data A very useful feature that the plug-in provides is the ability to browse data that you store and use in your development/test environment in Cloud Simulator or data stored in your Oracle NoSQL Database Cloud Service. Ensure to select the appropriate Profile Type before exploring the tables and data. With the Data Explorer a developer can: View all the development and test tables created. View the detailed structure of the tables created which includes Primary key, Shard key, indexes, columns and types. Build, test and run Oracle NoSQL queries on your development/test data before using it within your application. Paginate data stored in Cloud Simulator. The amount of data that you want to show in a single view can be configured in Window->Preferences. Explore Data stored in Oracle NoSQL Database Cloud Service Before you explore data stored in your Oracle NoSQL Database Cloud Service account ensure to have read Getting Started with Oracle NoSQL Database Cloud and Develop your NoSQL Application in Oracle Cloud. Once you have read these and have setup your Oracle NoSQL Database Eclipse Plugin preferences with details related to your cloud account you should select the Profile Type as Cloud and follow the steps outlined earlier. So you can see how easy it is get started with building, testing and debugging your application for Oracle NoSQL Database Cloud Service.

How about building an Oracle NoSQL Database Cloud Service application up and running quickly? If you missed how you can run a locally deployable Oracle NoSQL Database Cloud Simulator to build, test and...

NoSQL Database

Build an Oracle NoSQL Database cloud service application locally

Want to build your Oracle NoSQL Database cloud Java application locally before migrating it to the Oracle cloud? We are excited to provide a tool for accelerating your Oracle NoSQL Database cloud application development and testing with the Oracle NoSQL Database Cloud Simulator. The Cloud Simulator or “cloudsim” allows you to develop, debug and test your application against a locally deployable database instance with all functionality similar to the actual Oracle NoSQL Database cloud service. When you are ready to run your application on production on Oracle Cloud, all that is needed is change the endpoint from cloudsim to the cloud service endpoint with the right authentication and authorization credentials. Refer to Getting Started with Oracle NoSQL Database cloud for details on creating a cloud account. Details and an example of the minimal code changes can be found in our Authentication and Authorization FAQ. To start building your application for Oracle NoSQL Database cloud service, you would need the Oracle NoSQL Cloud SDK and the Oracle NoSQL Cloud Java Driver. These can be obtained from the official cloud download page. The Oracle NoSQL Cloud Simulator is part of the Oracle NoSQL Cloud SDK and is a standalone, local version of the Oracle NoSQL Database cloud service. The Oracle NoSQL Cloud Simulator gives you a head start to play around, understand the easy to use API’s of Oracle NoSQL Database cloud driver without having to purchase or use any Oracle NoSQL Database cloud credits. You can take advantage of the Oracle NoSQL Cloud Simulator and run the database service as a background process in your development machine which could be your laptop or a VM or a VM that is shared by all developers of the application team. The Oracle NoSQL Cloud Simulator supports all the Java client API’s that are required to communicate with Oracle NoSQL Database cloud service, which means code written once is portable to run against the actual cloud service. A few things to remember: The Oracle NoSQL Cloud SDK can only be used on standalone/local development environments on Windows, Mac and Linux platforms for development and test. Developers should have Java Developer Kit (JDK) Version 10 installed on their client systems Read for more detailed difference between Oracle NoSQL Database Cloud Simulator and Oracle NoSQL Database cloud service. Excited! If you would like to read in detail – Oracle NoSQL Database cloud – Developer On-boarding. If you are a developer using the Eclipse IDE Develop your applications using programming languages other than Java Keep watching this space to know our commitment to enhance Oracle NoSQL Database cloud developer experience.

Want to build your Oracle NoSQL Database cloud Java application locally before migrating it to the Oracle cloud? We are excited to provide a tool for accelerating your Oracle NoSQL Database cloud...

IDENTITY column in Oracle NoSQL Database

Introduction Oracle NoSQL Database introduces IDENTITY column to provide the ability of auto incrementing a value each time a row is inserted into the table. IDENTITY Column A primary key for a row in Oracle NoSQL Database table must be unique. But, how can we ensure that the primary key is always unique? One way would be to use a formula to generate the primary key. This may work well but is complex and not foolproof. The IDENTITY column feature can generate a unique numeric value in sequence. These are typically used for Primary Key columns and are termed as an Identity column in the relational database world. With the 18.3 release, Oracle NoSQL Database tables can Include an Identity Column (Primary Key or Shard Key) to be associated with a Sequence Generator. Associate a Sequence Generator to any numeric/Integer/Long type column in the table that requires an auto increment value. An IDENTITY column Sequence Generator can be created with several configurable attributes that offer more flexibility. START WITH. The first value in the sequence. Default is 1. INCREMENT BY. The next number to increment the current value in the sequence is generated by adding INCREMENT BY value. The increment value can be a positive number or negative number. When INCREMENT BY is a positive number, values ascend as the SG adds them. If INCREMENT BY is a negative number, values descend/ The SG decrements from the last value with each iteration. The Default is 1. MINVALUE. The lower bound of the sequence value. MAXVALUE. The upper bound of the sequence value. CACHE. Determines how many values are available for each client to use for assigning IDENTITY numbers. Default is 1000. CYCLE or NO CYCLE. Indicates whether the sequence generator will reuse the IDENTITY numbers. The default is NO CYCLE. Using CYCLE indicates that IDENTITY numbers cannot be unique after the first CYCLE phase. Use Case – Customer 360 This use case is about a typical bank which has a growing customer base. The bank is interested in in understanding their customer behavior and perform targeted marketing. They have multiple systems which capture different aspects of customer information and want to collate all of them in one table as a JSON document. Having built an interface which builds one JSON document they now want to store them in Oracle NoSQL Database. The bank only needs to identify what should be their Primary Key in this table. Let us look at different table structures which can be created using the Sequence Generator which will aid in generating the Primary Key while the bank interface only bothers about building their JSON document. IDENTITY column on a Primary key CREATE TABLE customer(ID INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1 MAXVALUE 10000), customerProfile JSON, PRIMARY KEY (ID)); IDENTITY column on a Shard Key CREATE TABLE customer(ID INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1 MAXVALUE 10000), SSN INTEGER, customerProfile JSON, PRIMARY KEY (shard(ID),SSN)); Identity column on an integer type (not a Primary/Shard Key). CREATE TABLE customer(ID STRING, customerProfile JSON, uniqueID INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1 MAXVALUE 10000), PRIMARY KEY (ID)); IDENTITY column on an Indexed column CREATE TABLE customer (ID STRING, customerProfile JSON, uniqueId INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 10 MAXVALUE 10000), PRIMARY KEY (ID));                                                                           Create index uniqueIdIndex on customer(uniqueId); Alter Table to add an IDENTITY column CREATE TABLE customer (ID STRING, customerProfile JSON, PRIMARY KEY (ID)); ALTER TABLE customer (ADD uniqueId INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1 MAXVALUE 10000)); IDENTITY column on a Parent and Child table. CREATE TABLE customer (ID INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1 MAXVALUE 10000), customerProfile JSON, PRIMARY KEY (ID)); CREATE TABLE customer.address (addressID INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1 MAXVALUE 10000), customerAddress JSON, PRIMARY KEY (addressID));  Summary Sequence Generator can be on a column with a numeric data types, generates numbers sequentially whenever a new row is added into the database. The Sequence Generator is commonly used to generate primary keys. Sequence Generator can also be used for columns other than Primary Keys. More details, limitations can be found in the official documentation – Sequence Generator.

Introduction Oracle NoSQL Database introduces IDENTITY column to provide the ability of auto incrementing a value each time a row is inserted into the table. IDENTITY Column A primary key for a row in...

Oracle NoSQL Database Version 18.3 Now Available

  We are pleased to announce the release of Oracle NoSQL Database Version 18.3. This release contains the following new features : Querying GeoJSON Data: Introducing support for spatial queries on RFC7946 compliant GeoJSON standard data.  A number of spatial functions, as well as indexing for GeoJSON data, have been added. IDENTITY column: Tables can now specify the IDENTITY type, unburdening the application developer from having to generate unique primary keys. Namespace: A convenience container for easing the administration of authorization, import/export, and application development for applications that must manage their own tenants. Admin Web Service: The Oracle NoSQL Admin service can now be accessed as a web service over the standard HTTP/S ports (80/443).  This new access paradigm exposes RESTful interfaces easing the task of operational automation. SQL for NoSQL Enhancements: Oracle NoSQL Database continues to power applications that require very fast access to data and flexible data models with enhanced SQL functionality to include INSERT and UPSERT. Oracle continues to enhance Oracle NoSQL Database to meet customer requirements. To download - Visit Here To read about the new features - Visit Here The Release Notes can be found Here. Join our mailing list - NoSQL-Database@oss.oracle.com Visit our LinkedIn page

  We are pleased to announce the release of Oracle NoSQL Database Version 18.3. This release contains the following new features : Querying GeoJSON Data: Introducing support for spatial queries on...

JSON Partial Update using the Oracle NoSQL Database Query Language

Background A general direction that we see in NoSQL and relational databases is the adoption of the document model. While each document-oriented database implementation differs on the details of this definition, in general, they all assume documents encapsulate and encode data (or information) in some standard format or encoding such as XML, YAML, JSON, and BSON . JavaScript Object Notation (JSON) has become the accepted medium for interchange between numerous Internet services and for unstructured data. The JSON-style document model enables customers to build services that are schema-less. Sure, you could always store JSON as an opaque JSON blob in Database, but that meant that you could not easily work with JSON data at the database level. Starting with Oracle NoSQL Database 4.2.10 we added support for a native JSON data type defined by RFC 7159 that enables efficient access to data in JSON documents. The JSON data type provides following benefits over storing JSON-format strings in a string column : Automatic JSON Document Validation : Only valid JSON documents can be stored in a JSON column, so you get automatic validation of your data. Invalid documents produce an error while doing the “put” or insert operation in Oracle NoSQL Database. Efficient Access : JSON documents stored in JSON columns are converted to an internal binary (optimized) format that allows quick read access to document elements. Let’s say you want to find a particular name/value pair from a JSON document and If you were to store JSON documents as blob, your best option would be to read the JSON data in memory, run it through a JSON parser, which meant that you have to scan the JSON document and convert at the application level – needless to say it won’t be performant. Conversely, with native JSON data type, the database parses and stores JSON internally in native data format and querying JSON data is no different than querying data in other columns. Performance : Improve your query performance by creating indexes on a specific field within the JSON column value. We continue to enhance the Oracle NoSQL Database support for JSON datatypes. This includes recent enhancements: Indexes into JSON column. An index is a JSON index if it indexes at least one field that is contained inside JSON data. We support both simple type JSON indexes and multi-key types (which index arrays or map) JSON indexes.This allows developers access to the nested attributes embedded deep within a JSON document. Queries (SQL for NoSQL) involving JSON data types. Update JSON documents using SQL Update Statement. An Update statement can be used to update a row in a table that contains JSON column. This also allow users to perform partial updates or updating a subset of fields in the JSON column. While each of above point deserve a separate blog post, in this post, we are  going to focus on the last point about performing an update on the JSON document using the SQL Update statement.  Updating JSON Documents While some systems don’t recommend manipulating JSON documents, Oracle NoSQL Database with SQL Update statement has made is possible for developers to modify or partially update (subset of fields) JSON documents. Let’s take a real world example, where you want to model some User profile data (personal data associated with specific users) in Oracle NoSQL Database. User profile makes an interesting case for JSON data type, because a user profile data can have varying sets of attributes – users have different interest and expense patterns, some users have more than one work address, some users have more than one type of phone (work, home etc), some are like to specify firstInitial}{lastname} and others want {firstname}.{lastname} etc. Thus, a user profile data doesn’t really conform to the rigid schema (schema, data-types etc) of the traditional system and there is a need for modelling option that reduces the liturgy (you don’t have to define schemas) and increases flexibility (store, all sorts of data without prior definition). Enter native support for JSON in Oracle NoSQL Database – that provides the scalability and flexibility together, which makes developers more productive. Let’s create a simple table called Persons with column person defined as that holds the user profile CREATE TABLE IF NOT EXISTS JSONPersons ( id integer, person JSON, primary key (id) ); And here’s sample data that we want to insert (notice how each users have a different set of fields): { "id":1, "person" : { "firstname":"David", "lastname":"Morrison", "age":25, "income":100000, "lastLogin" : "2016-10-29T18:43:59.8319", "address":{"street":"150 Route 2", "city":"Antioch", "state":"TN", "zipcode" : 37013, "phones":[{"type":"home", "areacode":423, "number":8634379}] }, "connections":[2, 3], "expenses":{"food":1000, "gas":180} } } { "id":2, "person": { "firstname":"Peter", "lastname":"Smith", "age":38, "income":80000, "lastLogin" : "2016-10-19T09:18:05.5555", "address":{"street":"364 Mulberry Street", "city":"Leominster", "state":"MA", "phones":[{"type":"work", "areacode":339, "number":4120211}, {"type":"work", "areacode":339, "number":8694021}, {"type":"home", "areacode":339, "number":1205678}, null, {"type":"home", "areacode":305, "number":8064321} ] }, "connections":[3, 5, 1, 2], "expenses":{"food":6000, "books":240, "clothes":2000, "shoes":1200} } } { "id":6, "person" : { "mynumber":5, "myarray":[1,2,3,4] } } You can store the above JSON records in a JSON file and use the load utility to insert the records in Oracle NoSQL Database. Alternatively, you can also use the API to “put” the JSON records in the table. After successful inserting the data, you may need to modify (change field values, add or remove elements from user profile) the user records. And reasonable to say that the modification shall satisfy follow requirements : Shall take place at server:  Eliminating the read-modify-write cycle, that is, the need to fetch the whole user row at the client, compute new values for the targeted fields (potentially based on their current values) and then send the entire row back to the server. Shall be thread safe and not lead in race conditions (if updates are happening concurrently from different sessions) Shall be Atomic: Other threads shall see the most up-to-date values. Enter SQL UPDATE statement that can be used to update the rows.  UPDATE statements make server-side updates directly, without requiring a Read/Modify/Write update cycle. Both syntactically and semantically, the update statement of Oracle NoSQL Database is similar to the update statement of standard SQL, but with extensions to handle the richer data model of Oracle NoSQL Database.   Let’s take a look how this works: Case 1: Changing field values, this is simplest form of update Let’s say you want to modify user with id 3 and set the field mynumber to 300. We can use the SET clause as shown below : sql-> UPDATE JSONPersons j SET j.person.mynumber = 300 WHERE j.id = 3 RETURNING id, j.person.mynumber AS MyNumber; +----+---------------------+ | id | MyNumber | +----+---------------------+ | 3 | 300 | +----+---------------------+ 1 row returned Case: 2 Add elements in Array Let’s say we want to ADD more elements (50 and 100) to the array myarray field for user id =3. And we want to add 50 at 0th position and 100 at the end. Let’s use the ADD clause for the same: sql-> UPDATE JSONPersons j ADD j.person.myarray 0 50, ADD j.person.myarray 100 WHERE j.id = 3 RETURNING *; +----+-------------------+ | id | person | +----+-------------------+ | 3 | myarray | | | 50 | | | 1 | | | 2 | | | 3 | | | 4 | | | 100 | | | mynumber | 300 | +----+-------------------+ 1 row returned Notice that multiple ADD clauses are used in the query above. Case 3: Remove elements from Array Let’s say we want to remove the elements(myarray and mynumber) from the user id=3. Clearly, this user looks different from other users. We can do that using the single UPDATE statement sql-> UPDATE JSONPersons j REMOVE j.person.myarray, REMOVE j.person.mynumber WHERE j.id = 3 RETURNING *; +----+-----------------+ | id | person | +----+-----------------+ | 3 | | +----+-----------------+ 1 row returned Case 4: Add elements to the Map Next, let’s say we want to modify the user with id=3 and add the firstname and lastname for this user. Here, we use a single PUT clause that specifies a map with multiple elements: “ sql-> UPDATE JSONPersons j PUT j.person {"firstname" : "Wendy", "lastname" : "Purvis"} WHERE j.id = 3 RETURNING *; +----+--------------------+ | id | person | +----+--------------------+ | 3| firstname | Wendy | | | lastname | Purvis | +----+--------------------+ 1 row returned Next, we can specify the age, connections, expenses, income, and lastLogin fields using multiple PUT clauses on a single UPDATE statement: sql-> UPDATE JSONPersons j PUT j.person {"age": 43}, PUT j.person {"connections”: [2, 3]}, PUT j.person {"expenses”: {"food”: 1100, "books”: 210, "travel”: 50}}, PUT j.person {"income”: 80000}, PUT j.person {"lastLogin”: "2017-06-29T16:12:35.0285"} WHERE j.id = 3 RETURNING *; +----+----------------------------------------+ | id | person | +----+----------------------------------------+ | 3 | age | 43 | | | connections | | | 2 | | | 3 | | | expenses | | | books | 210 | | | food | 1100 | | | travel | 50 | | | firstname | Wendy | | | income | 80000 | | | lastLogin | 2017-06-29T16:12:35.0285 | | | lastname | Purvis | +----+----------------------------------------+ 1 row returned Conclusion Oracle NoSQL Database with native support for the JSON datatype makes it easy to store data that doesn’t confirm to rigid schema. In addition, the SQL UPDATE statement provides a simple and intuitive way to add, modify, remove elements from the JSON documents eliminating need to do the Read-Modify-Write cycle. Do try this at home!                                

Background A general direction that we see in NoSQL and relational databases is the adoption of the document model. While each document-oriented database implementation differs on the details of this...

Greater developer flexibility building JSON applications with Oracle NoSQL Database

In this article, we look at how to go about storing and retrieving JSON data with Oracle NoSQL Database as our choice of the database because it further enhances developer flexibility. If you are a developer building microservices for the web or mobile applications that communicate with other services over HTTP using REST (REpresentational State Transfer) architecture, then you are, usually working with JSON data. JSON has become the choice of developers for building microservices with its simplicity and flexibility. As you, or developers, started using JSON as the communication between microservices, writing and reading this JSON data to and from backend database became the need of the hour. Only because, that saved developers from converting (serializing/deserializing) JSON data fetched from the database as a result of a query run, thus further unleashing the power of JSON. Over the last few years, NoSQL databases have stepped up to provide comprehensive support for storing and querying JSON data, and relational databases have followed the trend.   Flexibility of JSON JSON is a common data interchange format and provides flexibility to developers when it comes to designing and evolving data models for building microservices. It is useful for cases where data is unorganized. How to manipulate JSON data boils down to two aspects while developing applications.   Modeling data to unleash  JSON flexibility Querying the free and wild JSON data effectively   It is important to note that databases store the JSON data in a manner which the record itself stores the type information.  Therefore each of the records is self-describing rather than schema-less. However, they do not have to adhere to any conventional structure within the scope of a table or record collection. As an example, If we are building contacts database for Person entity, a sample JSON data could look like below - { "account" : 1, "person" : {      "lastName" : "Jones",      "firstName" : "Joe",      "address" : {        "home" : {          "street" : "15 Elm",          "city" : "Lakeville",          "zip" : "12345"        },        "work" : {          "street" : "12 Main",          "city" : "Lakeville",          "zip" : "12345"        }      },      "phone" : {        "home" : "800-555-1234",       "work" : "877-123-4567"      }   } }   Here, please note that there might be records of Persons for which the phone map is not available. Thus allowing the application to work with JSON data that conforms to various combinations of its object structure. If the developer gets a new requirement that  “Person could have 2 or more phone numbers.”, He would be able to incorporate that use case with minimal code changes as opposed to the instance if he had to adhere to a standard structure for all records. Application developers can unleash this flexibility when they use JSON. Before we go further, It is also important to note that - while the JSON data itself can have its structure, in all cases the context of information including the type (static or dynamic) of field is known to the application. In other words, Application still has the business context of data and knows how to manipulate the JSON data. It does not make any sense to store 2 JSON records that have entirely different structure and also, more importantly, different information context ( one record represents Person Contact, and another record represents Shopping Cart)  in one table or collection. The application (Object Mapping frameworks when used) brings in the type consistency to the free and wild JSON data format. Now let's look at how does Oracle NoSQL Database extend this flexibility - for both data modeling and querying. Data Modeling - Oracle NoSQL database allows developers to manipulate data using the popular and well-understood paradigm of Tables. JSON is a native data-type for the column of this Table.  Meaning the Create Table DDL call can create JSON column types. So, let's see how this model allows for greater flexibility. Let's consider following create table statement - CREATE TABLE personContacts ( account INTEGER, person JSON, Primary Key (account) ); After statement execution, it creates a Table with primary key and JSON column. This model is equivalent to Document NoSQL stores such that each record in the table is a JSON document but with a primary (or shard) key defined outside. You can assume that some of the properties of person JSON data are always defined (either by default or some value). It is possible to consider another way to define the same Table such as - CREATE TABLE personContacts ( account INTEGER, person JSON, primary_phone INTEGER, Primary Key (account) ); Here, the table schema allows for self-described JSON type to be mixed with columns with other data types. Allowing for taking common fields or properties out of the JSON for the application to gain better control wherever applicable. Querying - Oracle NoSQL Database provides comprehensive ways to query individual fields or properties within the  JSON data type. Including the ability to create indexes on fields or properties from within the JSON data type. Querying JSON is possible through API calls as well as the SQL-like interface. In this blog, we will look at some of the examples using the SQL-like interface, for APIs supported to perform queries over JSON data, please see this section in the product documentation. In general, querying can be a topic of series of blog posts, but here I will discuss some of the high-level aspects that contribute towards the flexibility and in turn power of querying JSON data. Oracle NoSQL Database supports Expressions that represent a set of operations using various operators (arithmetic, logical, value and sequence comparisons), function calls and other grammatical constructs performed over the records stored to produce a result. Consider the query below - sql-> select p.person.lastName, p.account from personContacts p where p.person.address.home.city = "Lakeville"; {"lastName":"Jones","account":1}   1 row returned   The query returns lastName and account number associated with all users that live in “Lakeville.” The expression steps into the JSON data to identify the lastName property. The example is a typical Select-From-Where query in which the where clause can use specific attributes from the JSON data as a predicate. Such expressions can be written to form a variety of SFW (Select-From-Where) queries to retrieve results based on complex criteria or conditions.  They can be added at multiple places in the supported grammar further enhancing the flexibility. The supported grammar for SFW queries is below- SELECT <expression> FROM <table name> [WHERE <expression>] [ORDER BY <expression> [<sort order>]] [OFFSET <number>] [LIMIT <number>]; The beauty of the query support is that the same set of queries would work on complex data types such as Array, Map, and Record which are supported by Oracle NoSQL Database.  As a developer if you want to move from wild and free JSON to more structured schema approach by mapping JSON data to multiple column types, the queries will remain untouched. So as you firm up your application schema, you will have a choice to stay with JSON formatted data or move to a "type consistent" table schema. Oracle NoSQL Database version 4.5 or higher will include support for JSON related features we discussed earlier; you can download this version from here. To learn more about building real-world web-scale JSON data applications with Oracle NoSQL Database, refer to this extensive tutorial.

In this article, we look at how to go about storing and retrieving JSON data with Oracle NoSQL Database as our choice of the database because it further enhances developer flexibility. If you are a deve...

Hardware Strategies for Horizontal and Vertical Scalability – Oracle NoSQL Database

Since almost all NoSQL databases have built-in high availability, it is easy to change hardware configuration which involves shutting down, replacing, or adding a server to a cluster while maintaining the continuous operations. Seamless, online scalability is one of the key attributes of NoSQL systems which are sharded, shared nothing databases. Users can start with a small cluster (few servers) and then grow the cluster (by adding more hardware) as throughput and/or storage requirements grow. Similarly, users can also contract the cluster as throughput and/or storage requirements shrink due to decreased workloads. Hence, most NoSQL databases are considered to have horizontal scalability. Figure 1 shows horizontal scalability during different operational workloads. Figure 1 - Horizontal Scaling for various workloads   Scalability is appealing to businesses for a variety of reasons.  In many cases, the peak workload is not known in advance of the initial setup of a NoSQL database cluster. Scalability enables the organization to deploy enough capacity to meet the expected workload without unnecessary over-expenditure on the hardware initially. As the workload increases, more hardware can be added to meet the peak demand. As the workload decreases, the hardware can be reclaimed and re-purposed for other applications. The cluster can grow and shrink as the workload changes with zero downtime. Most NoSQL systems claim linear or near-linear scalability as the cluster size grows. Linear scalability simply means that when more hardware is added to the cluster, it is expected to deliver the performance that is roughly proportional to its capacities. In other words, if the number of servers in the NoSQL cluster is doubled, it is expected to handle roughly double the workload (throughput and storage) compared to the original configuration, if the additional servers have the same processing and storage capabilities. Since almost all NoSQL databases have built-in high availability, it is easy to replace hardware incrementally which involves shutting down a machine, removing it from the cluster, bringing in a new server, and adding it to the cluster. Everything continues to work without downtime or interruption of the application.   From a capital expenditure perspective, it can be more cost effective to purchase newer and “bigger” servers because hardware costs decline over time. Newer hardware has more processing and storage capability relative to older generations of hardware at the same price. Replacing older hardware with newer, more powerful and capable hardware gives NoSQL database vertical scalability. From an operational cost perspective (OPEX) , a cluster with fewer servers is preferable to a cluster with a large number of servers, assuming both options deliver the same performance and storage capabilities. Hardware needs to be actively managed, needs space, cooling and consumes power. Smaller clusters can improve operational efficiency. Over time, the combined effect of growing the number of servers in a cluster and/or replacing old servers with newer hardware will result in a cluster which has a mix of servers of varying processing and storage capabilities and capacity. It is common to find such “mixed” clusters in production NoSQL applications that have been upgraded over a period of time. In the context of “mixed” cluster scenarios, it is important to choose a NoSQL solution that can leverage hardware with varying processing and storage capabilities in order to address the application requirements efficiently and effectively. As mentioned earlier, most NoSQL products claim horizontal scalability. But, does an administrator really want to maintain a large cluster for NoSQL product X if a smaller cluster running a different NoSQL product will do the job, or exceed the requirements? The example in Figure 2 shows that over time, and with increased storage and processing power, a 12-machine system could be replaced with just a 3-machine system. Figure 2 - Replacing smaller servers with larger, more capable ones. When evaluating a NoSQL product, it is wise to consider the horizontal as well as vertical scalability characteristics. Oracle NoSQL Database is designed for horizontal and vertical scalability in order to deliver the most cost-effective NoSQL solution for modern applications. For every server deployed in Oracle NoSQL Database cluster, the administrator specifies the capacities (e.g., storage) of the server at the time of the initial setup.  Oracle NoSQL Database uses this information to automatically assign workloads such that each server process manages a subset of data that is roughly proportional to the storage capacities of that server.  A smaller server will be assigned less workload compared to a larger server.  Similarly, each process uses RAM based on the amount of physical memory available on the machine.  Figure 3 shows the Oracle NoSQL Database cluster with mixed hardware capacities. Figure 3 - A Mixed cluster Oracle NoSQL Database distributes the workload across the available hardware based on the capabilities of the server. More importantly, during the cluster creation and expansion/contraction operations, the system ensures that each replica for every shard (remember, Oracle NoSQL Database is a sharded, HA database) is distributed across different physical servers. This “rule” ensures that the failure of any single server will never result in the loss of a complete shard.   Also, in steady-state operation, it is possible that servers might be shut down and restarted at various points in time.  Oracle NoSQL Database automatically monitors the health and workload on each server in the cluster and rebalances the workload across the servers in order to avoid “hotspots”.  All of this happens automatically and without any administrative intervention, thus ensuring reliable and predictable performance and high availability over mixed clusters and cluster transitions. Oracle NoSQL Database leverages hardware effectively in order to meet the performance and availability requirements of the most demanding and innovative applications..  When deciding which NoSQL solution to use, vertical scalability is just as important as horizontal scalability in order to ensure the lowest Total Cost of Ownership for the overall solution.  

Since almost all NoSQL databases have built-in high availability, it is easy to change hardware configuration which involves shutting down, replacing, or adding a server to a cluster while maintaining...

NoSQL Database

Real Time Tracking with Oracle NoSQL Database

Real Time Tracking with Oracle NoSQL Database I recently got back from a short trip to Boston to watch my daughter race in the Boston Marathon. As soon as my wife and I found some spots to watch, about 1 mile from the finish line, we got out our phones and fired up the tracking software. We were both disappointed at the ability to get timely updates regarding the progress of our daughter. Remember that once you have a space to stand and watch, you basically don’t move for greater than 4 hours and try to figure out when your favorite runner will be passing your spot. An effective and efficient tracking application is critical for this. I got to thinking about the application for tracking runners, now that RFID tags are common place and so inexpensive. Each numbered bib that the runner wears contains an RFID chip that can be activated as the runner passes on or through the data activation mat. Here is what the sensor looks like from an actual Boston Marathon bib. During the race, at specific intervals, the time of activation of the sensor is captured, stored, and some simple computations are then performed, such as the most recent minutes/mile and an extrapolation of what the expected finishing time will be. A NoSQL application for the timing of runners would be quite straightforward to develop. Let’s look at two of the basics.   Registration – when someone registers to run in a race, the basic information must be acquired, including name, address, phone and birthday. The birthday is actually quite important, as qualifying times are based on the age at the time of the race, as well as how a participant places within their respective age group. For example, a JSON Document could be created at registration time with the following information. { "RunnerId":12345, "Runner" : { "firstname":"John", "lastname":"Doe", "birthdayyear":”1985”, “birthdaymonth”:”02”, “birthdayday”:”15”, "email" : "john.doe@example.net", “payment”:”Mastercard” “paymentstatus”: “paid” "social":{ "twitter":"@jdoe", "instagram":"jdoe_pics" }, } } As the race begins, each runner passes over a mat on the ground which activates the RFID chip and records the start time. As the runner progress over the race course, at specified intervals the runners cross more of these mats and the times are recorded. Simple math can then determine the elapsed time for that specific runner, as well as the minutes per mile over the past interval, as well as extrapolate the expected finish time. The JSON data as the race progresses may look like below which is quite small and can be transmitted to the servers quite quickly, or even batched up (depending on the transmitting device capability) and sent every few hundred runners, or when there is a break in the runners crossing the mat { "RunnerId":”12345”, "milestone":"milestone_5k", "timestamp":"2017-04-12T10:00:00" } Then, this information could be added to the race record for the runner as they make progress. "Marathon_Boston" : {                 “RunnerID”:12345, "start_time":"2017-04-12T10:00:00",   "milestone_5k":"2017-04-12T10:21:00", "milestone_10k":"2017-04-12T10:44:00", "milestone_15k":"2017-04-12T11:10:00", "milestone_20k":"2017-04-12T11:25:00", "milestone_25k":"2017-04-12T11:42:00", "milestone_30k":"2017-04-12T11:56:00", "milestone_35k":"2017-04-12T12:09:00", "milestone_40k":"2017-04-12T12:28:00", "milestone_41k":"2017-04-12T12:42:00", "milestone_42k_end":"2017-04-12T12:45:00" } Overall, this would be an ideal application to use a NoSQL system. The amount of data, even for a 35,000 person race would not be very much, and as the runners spread out, even less so than comparted to the starting gates. If we assume that each runners record would consume about 1K of data, then for the entire race there would only be about 35 MB of raw data. If we then assume a replication factor of 3, and include some overhead, the entire race data would need about 225 MB of storage, which could easily fit on a USB thumb drive. Using high speed SSDs can store in the Terabyte (TB) range, so that thousands of marathons results could be stored in a single Oracle NoSQL Database. This still doesn’t answer the question as to why the updates were so slow, but from my source in the Boston area, the downtown is notorious for poor cell service and add many thousands of race watchers trying to use their tracking apps at basically the same time, and you can start to understand the delays. At least we know that if a system were based on NoSQL, it would not be the culprit.

Real Time Tracking with Oracle NoSQL Database I recently got back from a short trip to Boston to watch my daughter race in the Boston Marathon. As soon as my wife and I found some spots to watch, about...

Oracle NoSQL Database on Docker

Docker makes it easier for developers to package, deploy and run applications by using Containers. It makes applications portable across operating systems running across Cloud platforms. The latest release of Oracle NoSQL Database (version 4.3) is now available as Docker container on the Docker Hub. This means that if you have Docker already installed, you can get setup and running with Oracle NoSQL Database with just one command. $ docker run -d --name=kvlite oracle/nosql The Docker container for Oracle NoSQL Database would download the latest Oracle NoSQL Database Community Edition and jump-start the kvlite instance, a single node - single shard Oracle NoSQL Database store. Developers can start playing around with Oracle NoSQL Database including some of the new and exciting features of version 4.3 in no time. Oracle NoSQL Database provides strong security support and starting with version 4.3, the security would be enabled by default. Please note that, however, the Docker version by default bootstraps the kvliteinstance with security disabled.  CMD ["java","-jar", "lib/kvstore.jar", "kvlite","-secure-config", "disable"] Building your own Docker version with security enabled is very simple. Clone github.com/oracle/docker-images/nosql Update 4.3.11/Dockerfile to include following command that will bring up the secured kvlite instance when the container is run CMD ["java", "-jar", "lib/kvstore.jar", "kvlite"] To connect to the secured kvlite instance, please follow the command to start the CLI as noted in the Oracle NoSQL Database documentation, here. Oracle NoSQL Database Docker container will get you started and you are ready to play with storing and retrieving data in minutes, but there is a lot more you can do with this. Abhishek Gupta put together a simple Jersey (JAX-RS implementation) web application that uses Oracle NoSQL Database over REST APIs. In this example, he builds 2 Docker containers, one for Oracle NoSQL Database and another that runs the Jersey application. You can follow these simple steps and simply extend the web application to build your own Oracle NoSQL Database application. Happy coding! Docker and Docker logo are trademarks or registered trademarks of Docker, Inc. in the United States and/or other countries. Docker, Inc. and other parties may also have trademark rights in other terms usedherein.

Docker makes it easier for developers to package, deploy and run applications by using Containers. It makes applications portable across operating systems running across Cloud platforms. The latest...

Oracle NoSQL Database Keeps Your Data Secure

Recent news hasbrought back the focus on how a poorly secured database server can causeirreversible damage to the reputation of the software vendor apart from manyother tangible and intangible losses. The security features in Oracle NoSQL Database makes it amember of Oracle family of products which prides themselves in being called as verysecure. This blog briefly describes these security features. 1) There are two levels of security: network security anduser authentication and authorization Networksecurity provides an outer layer of protection at the network level and isconfigured during the installation process using cryptographic keys, X.509certificate chains and trusted certificates. What this means is thecommunication between the client and server nodes and also between the servernodes is encrypted using the SSL/TLS protocol. Userauthentication and authorization can either be managed by using Oracle NoSQL Databaseutilities or relegated to any Kerberos compliant LDAP/Single-sign-on server. 2) Starting with release 4.3 of Oracle NoSQL Database,the security features are enabled by default. 3) Access to a secure Oracle NoSQL Database is limited toonly authenticated users. Oracle NoSQL Database provides tools for userand password management. 4) Password credentials for connecting to database are storedin a client-side OracleWallet, a secure software container used to storeauthentication and signing credentials. With Oracle Wallet, applications nolonger need to embed user and password in application code and scripts. This considerablyreduces risks because the user credentials are no longer exposed in the clearand password management policies are more easily enforced without changingapplication code whenever user names or passwords change. 5) Oracle NoSQL Database provides a set of default rules forcreating and updating a user password in order to enhance security. These rulesenable the administrator to enforce strongpassword policies such as minimum and maximum password length,minimum number of upper and lower case characters, digits and specialcharacters, password expiry, list of restricted passwords and maximum passwordsto be remembered that cannot be reused when setting new password. 6) Successfully authenticated users receive an identifierfor a login session that allows a single login operation to be shared acrossStorage Nodes. That session has an initial lifetime associated with it, afterwhich the session is no longer valid. The server notifies the user with anerror once the session is no longer valid. The application then needs tore-authenticate. 7) Oracle NoSQL Database provides rolebased authorization. A user can be assigned one or more roles. Roles can either be a built-in system role(readonly, writeonly, readwrite, dbadmin, sysadmin and public) or user definedrole. These built-in roles map to one or more privileges. Privileges can eitherbe System privileges or Object (table level) privileges. System privilegegrants the user the ability to perform a store wide action while Objectprivilege grants the user the ability to perform an action only on that specificobject (table). 8) With Kerberos integration Oracle NoSQL Database canrelegate the authentication and authorization job to any Kerberos compliantLDAP or single-sign-on server. In summary, Oracle takes security very seriously for allof its products. Oracle NoSQL Database has been designed from the start to besecure and protect user’s data. Please refer to SecurityGuide for more details on any of the above mentioned security features.

Recent news has brought back the focus on how a poorly secured database server can cause irreversible damage to the reputation of the software vendor apart from manyother tangible and intangible...

Oracle NoSQL Database performance on Oracle Cloud Infrastructure (OCI) IaaS

We measured the performance of Oracle NoSQL Database release 4.3 running on Oracle Cloud Infrastructure (OCI) compute nodes using the YCSB workload. YCSB is the most popular benchmark for NoSQL systems. We are pleased to report the following performance results of our test. Performance Benchmark Results Clients Records Insert Throughput (ops/sec) Mixed Throughput (ops/sec) Insert Latency (ms) (av/95%/99%) Read Latency (ms) (av/95%/99%) Update Latency (ms) (av/95%/99%) 6 450 million 276,352 843,969 4.29/12/26 1.11/3/8 5.79/13/61     Hardware Configuration We used three DenseIO nodes to run the Oracle NoSQL Database server and three Standard nodes for the YCSB driver application. DenseIO node: 36 OCPU, 512 GB RAM, 28.8 TB NVMe SSD, 10 Gbps network. Standard node: 36 OCPU, 256 GB RAM, 10 Gbps network. Software Configuration We used Oracle Linux Server release 6.8 on the client as well as the server nodes. We used Oracle NoSQL DB version 4.3, which can be downloaded from here. The server was configured as a 9 shard system with three-way replication (RF = 3).   Benchmark Configuration We used the standard YCSB parameters: 13 byte keys and 1024 byte values. The database size was 450 million records. All insert and update operations were performed with a durability policy of simple majority. All read operations were performed with the NONE_REQUIRED read consistency policy. A total of 6 client processes were used, with 2 client processes running on each of the 3 Standard Compute Capacity nodes. Each client used 200 threads for the insert workload as well as the mixed read/update workload. NOTE: Learn how Oracle assists developers by automating the installation of Oracle NoSQL Database on the Oracle Cloud Infrastructure (OCI). Click here for details.

We measured the performance of Oracle NoSQL Database release 4.3 running on Oracle Cloud Infrastructure (OCI) compute nodes using the YCSB workload. YCSB is the most popular benchmark for NoSQL...

Expanding an Oracle NoSQL Cluster

Introduction  This is extension of my last blog, where I described how to setup a single node NoSQL database. You'd realize that setting up a single node NoSQL db or kvlite should take less than 5 min - that's how simple it is! In this blog, I want to talk about how to expand the kvlite instance to a 3x3 cluster -  3 shards each having 3 copies of data Motivation for NoSQL Cluster Expansion  Before we see the actual steps of expansion, lets look at what could be motivations to expand the NoSQL cluster : Increase Data Capacity - A Company’s Oracle NoSQL Database application is now obtaining it’s data from several unplanned new sources.  The utilization of the existing configuration as more than adequate  to meet requirements, with one exception, they anticipate running out of disk space later this year. The company would  like to add the needed disks to the existing servers in existing slots, establish mount points, ask NoSQL Database to fully utilize the new disks along with the disks already in place while the system is up and running Oracle NoSQL Database.  Increase Throughput -  As a result of an unplanned corporate merger, the live Oracle NoSQL Database will see a substantial increase in write operations. The read write mix of transactions will go from 50/50 to 85/15.  The need workload will exceeds the I/O capacity available of the available storage nodes. The company would like to add new hardware and have it be utilized by the existing Oracle NoSQL Database (kvstore) currently in place.  Oh, and of course the Application needs to continue to be available while this upgrade is occurring.  Increase Replication Factor-  A new requirement has been placed on an existing Oracle NoSQL Database to increase the overall availability of the Oracle NoSQL Database by increasing the replication factor by utilizing new storage nodes added in a second geographic location.  This is accomplished by adding at least 1 replication node for every existing shard.  The current configuration has a replication factor of 3. And, yes the expansion happens, while KVStore continues to service the existing workload. Of the above, Increase Data Capacity, is obviously understood. Improvement in the storage capacity is quite obvious. When you add replication nodes, you have additional space to store your data,  Expansion We want to transform our existing 1x1 NoSQL store to 3x3  (3 shards each having 3 copies of data) cluster. Prerequisite  Make sure that you have Java SE 6 (JDK 1.6.0 u25) or later installed on all of the hosts that you are going to use for the Oracle NoSQL Database installation Make sure that some sort of reliable clock synchronization is running on each of the machines. Generally, a synchronization delta of less than half a second is required. ntp is sufficient for this purpose. Installation Follow the installation steps highlighted on the doc guide Configuration Steps Step 1 : Create the initial "boot config" configuration file using the makebootconfig utility. You should do this on each Oracle NoSQL Database node, here we are doing for storage node 2 and node 3 (storage node 1 is already bootstrapped). Notice that we are defining three different port (-admin, -harangue, -port )ranges for various inter-intra node communication. Number of replication-nodes (RN) this storage node can hosts defined by the capacity as ‘-capacity’. You can define storage directories where data is going to be written using -storagedir argument.Follow the link to understand the meaning of different configuration parameters. Please note, in this expansion exercise, we have added storage nodes (SN2 and SN3) with the greater capacities (i.e capacity =3). we defined capacity=3, which means we would like three RNs to be hosted on each of the three storage node with each RN using its own disk .This is important aspect as independent spindles improves performance and the availability of the system. ########### Bootstrap Storage Node 2 ###########java -jar $KVHOME/lib/kvstore.jar makebootconfig \        -root $KVROOT \        -store-security none \        -capacity 3 \        -harange 6010,6030 \        -admin 6001 \        -port 6000 \        -memory_mb 200\        -host kvhost02 \        -storagedir <PATH TO STORAGEDIR 1 ON SN2> \        -storagedir <PATH TO STORAGEDIR 2 ON SN3> \        -storagedir <PATH TO STORAGEDIR 3 ON SN3> \############################################################ Bootstrap Storage Node 3 ##########java -jar $KVHOME/lib/kvstore.jar makebootconfig \        -root $KVROOT\        -store-security none \        -capacity 3 \        -harange 7010,7030 \        -admin 7001 \        -port 7000 \        -memory_mb 200\        -host kvhost03 \        -storagedir <PATH TO STORAGEDIR 1 ON SN3> \        -storagedir <PATH TO STORAGEDIR 2 ON SN3> \        -storagedir <PATH TO STORAGEDIR 3 ON SN3> \################################################ Step 2 : Start the Oracle NoSQL Database Storage Node Agent (SNA) on each of the Oracle NoSQL Database nodes. The SNA manages the Oracle NoSQL Database processes on each node. java -jar $KVHOME/lib/kvstore.jar start -root $KVROOT &java -jar $KVHOME/lib/kvstore.jar start -root $KVROOT & Step 3 : Deploy 3x3 topology In this step we deploy the storage nodes and configure the topology, since Storage Node (SN1) is already deployed, we are not deploying that again, but instead just increasing the capacity to 3 (from 1) by using the plan change parameter command. We also need to add the  path to the storage directory that will contain the environment associated with a 2 additional Replication Node that would be running on SN1. Next we, deploy the new storage nodes SN2 and SN3. To change from one configuration to another, you either create a new initial topology, or youclone an existing topology and modify it into your target topology and then deploy this target topology- we take the clone of the current topology. Remember, we are changing the Replication Factor (RF) from 1 to 3, increase in RF creates more copies of the data that  improves read throughput and availability. RF is increased by using the topology change-repfactor command. For additional information on how to identify your replication factor and its implications, seeReplication Factor.We then call the redistribute command, which will create new shards, replication nodes and move partitions, RN to re-balance the cluster. Finally, we preview and deploy the topology java -Xmx256m -Xms256m -jar ${KVHOME}/lib/kvstore.jar runadmin -port 5000       -host ${HOSTNAME} << EOFplan change-parameters -service sn1 -wait -params capacity=3plan change-storagedir -sn sn1 -storagedir<PATH TO STORAGEDIR 2 ON SN1> \ -add -waitplan change-storagedir -sn sn1 -storagedir <PATH TO STORAGEDIR 2 ON SN1> \ -add -waitplan deploy-sn -znname "Boston" -port 6000 -wait -host kvhost02plan deploy-admin -sn sn2 -port 6001 -waitplan deploy-sn -znname "Boston" -port 7000 -wait -host kvhost03plan deploy-admin -sn sn3 -port 7001 -waittopology clone -current -name 3x3topology change-repfactor -name 3x3 -pool BostonPool-znname "Boston" -rf 3topology redistribute -name 3x3 -pool BostonPool -partitionstopology preview -name 3x3plan deploy-topology -name 3x3 -waitshow  planEOF Once done. You can verify by logging into the admin console and using the show topology command kv-> show topology;store=kvstore  numPartitions=120 sequence=217  zn: id=zn1 name=Boston repFactor=3 type=PRIMARY allowArbiters=false sn=[sn1] zn:[id=zn1 name=Boston] kvhost01:5000 capacity=3 RUNNING    [rg1-rn1] RUNNING             single-op avg latency=1.2094289 ms   multi-op avg latency=5.4368057 ms    [rg2-rn1] RUNNING             single-op avg latency=0.0 ms   multi-op avg latency=7.688985 ms    [rg3-rn1] RUNNING             single-op avg latency=0.0 ms   multi-op avg latency=1.6008012 ms  sn=[sn2] zn:[id=zn1 name=Boston] kvhost02:6000 capacity=3 RUNNING    [rg1-rn2] RUNNING             single-op avg latency=0.0 ms   multi-op avg latency=3.631618 ms    [rg2-rn2] RUNNING             single-op avg latency=0.0 ms   multi-op avg latency=3.1090064 ms    [rg3-rn2] RUNNING             single-op avg latency=1.5529954 ms   multi-op avg latency=5.964887 ms  sn=[sn3] zn:[id=zn1 name=Boston] kvhost03:7000 capacity=3 RUNNING    [rg1-rn3] RUNNING             single-op avg latency=0.0 ms   multi-op avg latency=3.7453163 ms    [rg2-rn3] RUNNING             single-op avg latency=1.3022312 ms   multi-op avg latency=3.657524 ms    [rg3-rn3] RUNNING             single-op avg latency=0.0 ms   multi-op avg latency=5.796298 ms  shard=[rg1] num partitions=40    [rg1-rn1] sn=sn1    [rg1-rn2] sn=sn2    [rg1-rn3] sn=sn3  shard=[rg2] num partitions=40    [rg2-rn1] sn=sn1    [rg2-rn2] sn=sn2    [rg2-rn3] sn=sn3  shard=[rg3] num partitions=40    [rg3-rn1] sn=sn1    [rg3-rn2] sn=sn2    [rg3-rn3] sn=sn3

Introduction  This is extension of my last blog, where I described how to setup a single node NoSQL database. You'd realize that setting up a singlenode NoSQL db or kvlite should take less than 5 min -...

Oracle NoSQL Database for IoT environments

How can NoSQL be used in an IoT solution be used to manage truck parking spaces ? A problem that exists in many domains is how to allocate and use efficiently a finite resource, when the demand exceeds the normal supply. For example, there is a finite supply of truck parking spaces at truck stops, and on any given night, there may be more demand than supply.  How does a trucker know if a space is available at the next rest stop, or should he/she drive the extra hour to the next one.  With federal rules as to the amount of time that a driver may be on the road and required rest periods, optimizing the drive/rest cycle and using the limited resource is critical to running a truck stop business. Pain point: Need to manage thousands of truck parking spaces, (reserve,charge, analyze, maintain, as well as collateral fuel purchases and storepurchases).  Demand is high for these spaces. There are about 1.5 million1 largetrucks in the US.  New federal regulations are requiring truck drivers tostop and rest more frequently.  This is creating huge demand for theseparking spaces, and thus the opportunity to monetize them. The customer needsto be able to monitor the availabily of each space, charge for their use, andtie this information into their back office applications.   This is their business, so it is crucial to capture the data fast and efficiently. A combination of sensor technology at the parking sites, wireless reservation systems with real time capacity information can dramatically increase revenue and increase driver safety as well, as drivers can plan more efficient drive and rest cycles. An architecture that includes an IoT service to acquire the data, as well as NoSQL to store the data, before sending to  a longer term database, can greatly increase revenue and operating efficiency. LINKS 

How can NoSQL be used in an IoT solution be used to manage truck parking spaces ? A problem that exists in many domains is how to allocate and use efficiently a finite resource, when the demand exceeds...

Oracle NoSQL Database - SQL Query Overview & Demo

SQL (Structured Query Language) is one of the most popular query languages for accessing databases. Oracle NoSQL database (release 4.2 or later) offers such query feature to simplify application development. Developers can now use their knowledge and experience in SQL for  Oracle NoSQL database. This blog provides a quick and easy guide to get you started on testing SQL queries in Oracle NoSQL database. Following steps will guide you through the setup process. Step 1: Oracle NoSQL Database Setup If you have not installed Oracle NoSQL database, please download the product here. Either enterprise or community edition is fine. Enterprise edition is used in this example. Place the package in /home/oracle/nosql. Follow the commands below to unzip the package and launch KVLite, a single-node Oracle NoSQL database. gunzip <name of the downloaded package> Example: gunzip kv-ee-4.2.10.tar.gztar -xvf <name of the gunzipped package> Example: tar -xvf kv-ee-4.2.10.tar KVHOME=<directory of the unzipped package> Example: KVHOME=/home/oracle/nosql/kv-4.2.10 java -jar $KVHOME/lib/kvstore.jar kvlite Step 2: SQL Test Environment Setup A small sample database and scripts will be used to set up the test environment. Table is used to model the data. Download this demo file. Place and unzip it in the same parent directory where the Oracle NoSQL Database is located. Example: /home/oracle/nosql gunzip TestSQL.tar.gz tar -xvf TestSQL.tar After the file is unzipped, a folder called TestSQL should appear. Follow the commands below to set up the database and SQL test environment. cd TestSQL to open the foldertestSQL.sh to start testing SQL Step 3: SQL Testing A file called testSQL.txt (in TestSQL folder) has a collection of sample SQL statements that can be used for testing. For further details on SQL usage in Oracle NoSQL products, please refer to the online documents.

SQL (Structured Query Language) is one of the most popular query languages for accessing databases. Oracle NoSQL database (release 4.2 or later) offers such query feature to simplify application...

Oracle NoSQL Database - putIfVersion() API for read-modify-write operations

I read an interesting article about how hackers were able to steal 896 BTC and caused Flexcoin to shut down because the loss was too large for the company to bear.  The article highlights the need for software implementers to provide APIs designed to prevent unexpected behavior for read-modify-write (RMW) operations.  For example, if you need to update an account balance, you need to perform the following sequence of operations: Step 1: curr_value = get(key, "my_account_no"); // read from the database into the application Step 2: new_value = curr_value - amount_to_be_deducted;  // update the value in the application Step 3: put(key, new_value);  // store the updated value in the database A database system that supports multi-statement transactions ensures that the account balance for my_account_no in the database cannot be changed by another user until the update has been committed. Unfortunately, most NoSQL database systems do not support multi-statement transactions.  If concurrent users perform RMW operations like the one described above and on the same record, it is possible to get incorrect results, as the flexcoin article illustrates. Oracle NoSQL Database provides APIs that help prevent this problem, even in the presence of concurrent activity. In this blog, we are going to talk about one such useful API, putIfVersion. KVStore.PutIfVersion is a simple, yet powerful API that can be used when updating a value to ensure that it has not changed since the last time it was read.  Version putIfVersion(Key key,Value value,Version matchVersion)           Put a key/value pair, but only if the version of the existing value matches the matchVersion argument. Version putIfVersion(Key key,Value value,Version matchVersion,ReturnValueVersion prevValue,Durability durability, long timeout,TimeUnit timeoutUnit)           Put a key/value pair, but only if the version of the existing value matches the matchVersion argument. Let us see, how putIfVersion can be used to avoid consistency issues in the application. Consider the same sequence of RMW operations: Step 1: curr_value = get(key, "my_account_no"); Step 2: new_value = curr_value - amount_to_be_deducted; Step 3: put(key, new_value); The step two can be more elaborate, with checks to ensure that the new_value is still valid (e.g. non-zero, above certain limit etc.) Now, if there are two or more application threads that are updating the same value at the same time, it is possible that both the threads have read the current value before the put() operation is executed. At the end of this operation from both the threads, the value would be incorrect as the amount would have been deducted twice but updated in the database to have been deducted only once. putIfVersion() API solves the problem above. It protects the application by making an additional check during updates ensuring that the value has not changed since the last time. putIfVersion(key, new_value, curr_value); Although the example above illustrates usage of putIfVersion() for Key/Value data model, Oracle NoSQL Database implements putIfVersion() API for both Key/Value as well as Table data model. Refer the sample code here, to leverage putIfVersion() API in your application. Happy (and safe) coding!

I read an interesting article about how hackers were able to steal 896 BTC and caused Flexcoin to shut down because the loss was too large for the company to bear.  The article highlights the need...

Oracle NoSQL Database 4.0 Released

Oracle NoSQL Database 4.0 is now released and available for download from the Oracle Technology Network download pages. OracleNoSQL Database Version 4.0 – New Features · Full text search – Ability to perform full textsearches over the data using Elastic Search. · Time-To-Live – efficient aging out of “expired” data –This is a common IoT  requirement. · SQL Query – Declarative query language for developersmore comfortable with SQL than API level access. · Predicate Pushdown – ability to process predicates from BigData SQL in NoSQL Database nodes – This leadsto  improved performance and scalability. · Import/Export – Easy to backup/restore data or move databetween different Oracle NoSQL Database stores OracleNoSQL Database continues to enable innovative applications that require veryfast access to data and flexible interfaces. · Integrated– Connect with many other Oracle products for a complete, productiveenvironment. · Fast– Utilizes parallelism at a number of levels · Flexible– Flexible schema, is not predefined · Simpleto implement – Setup, Admin and Integration is straightforward. · Reliable– built-in data replication and HA support foravailability and business continuity. · Scalable– Add more servers to handle more data and throughput with zero down-time.  Learn More About Oracle NoSQL Database: LinkedIn Group – https://www.linkedin.com/groups/4147754 Twitter – @OracelNoSQL Community – https://community.oracle.com/community/database/high_availability/nosql_database

Oracle NoSQL Database 4.0 is now released and available for download from the Oracle Technology Network download pages. Oracle NoSQL Database Version 4.0 – New Features · Full text search – Ability to...

Oracle NoSQL Database Licensing

We have received a number of questions about the Oracle NoSQL Database licensing  policy, so we wanted to clear things up. News: As of Oracle NoSQL Database Version 4.3, a new edition has been announced, Oracle NoSQL Database Basic Edition. Oracle NoSQL Database Basic Edition is available for all organizations that have an Oracle Database Enterprise Edition (DBEE) license. Oracle NoSQL Database Basic Edition may be run on the same OCPUs or unused (but licensed) OCPUs as DBEE. For a more complete discussion of Oracle NoSQL Database Basic Edition, take a look at our FAQ. Be sure to look at the following blogs: Licensing Information as of Oracle NoSQL Database - Version 4.3 Oracle NoSQL Database License Explanations  Oracle NoSQL Database consists of three versions, each with a client and server license. The three versions of the distribution are commonly referred to as the Basic Edition (BE)< Community Edition (CE) and the Enterprise Edition (EE). Each version contains a client license (where the application runs) and the server license (where the application data is stored). Enterprise Edition (EE): Client License:  Apache License, Version 2 Server License: Oracle commercial license  Community Edition (CE): Client License: Apache License, Version 2 Server License: GNU Affero General Public License, Version 3    Apache License – The details can be found at: http://www.apache.org/licenses/LICENSE-2.0.txt. GNU Affero General Public License (AGPL) Version 3 – Details on this license can be found at: http://www.oracle.com/technetwork/database/database-technologies/nosqldb/documentation/nosql-db-agpl-license-1432845.txt

We have received a number of questions about the Oracle NoSQL Database licensing  policy, so we wanted to clear things up. News: As of Oracle NoSQL Database Version 4.3, a new edition has been...

Oracle NoSQL Database Cluster YCSB Testing with Fusion ioMemory™ Storage

Highly distributed systems with large data stores in the form of NoSQL databases are becoming increasinglyimportant to enterprises, not just to hyperscale organizations. NoSQL databases are being deployed forcapturing patient sensors data in health care, smart meter analysis in utilities, customer sentiment analysis inretail, and various other use cases in different industries. NoSQL database systems help organizations store,manage, and analyze huge amounts of data on distributed system architecture. The sheer volume of data andthe distributed system design needed to manage this large data at a reasonable cost necessitated a differentcategory of database systems, leading to NoSQL databases. Oracle NoSQL Database is part of the NoSQLdatabase family and is based on a distributed, key-value architecture. This technical white paper describes a three-node Oracle NoSQL Database Cluster deployment procedure onFusion ioMemory™ storage. The following points are emphasized: Highlights performance and scalability advantages compared to traditional spinning disks. Because enterprises evaluate and assess new technologies for enterprise-wide adaptability, Yahoo CloudServing Benchmark (YCSB) is the standard benchmark tool employed for testing and is the same toolused in this paper to evaluate Oracle NoSQL Database for YCSB Benchmark Testing. Analysis and discussion are provided for throughput and latency testing results with YCSB. Download now at:   https://www.sandisk.com/content/dam/sandisk-main/en_us/assets/resources/enterprise/white-papers/oracle-nosql-cluster-ycsb-testing-with-fusion-iomemory.pdf

Highly distributed systems with large data stores in the form of NoSQL databases are becoming increasinglyimportant to enterprises, not just to hyperscale organizations. NoSQL databases are being...

Oracle NoSQL BulkPut

Our customers have often asked us “what’s the fastest and most efficient way to insert a large number of records in the Oracle NoSQL database? “ Very recently, a shipping company reached out to us with a specific requirement of using Oracle NoSQL database for their ship management application, which is used to track the movements of their container ships that moves the cargo from port to port. The cargo ships are all fitted with GPS and other tracking devices, which relays ship's location after a few seconds into the application.   The application is then queried for  1) The location of all the ships displayed on the map 2) A specific ship's trajectory over a given period of time displayed on the map too. As the volume of the location data started growing, the company is finding hard to scale the application and is now looking at a back-end system that can ingest this large data-set very efficiently.   Historically, we have supported the option to execute a batch of operations for records that share the same shard key, which is what our large airline customer (Airbus) has done. They pre-sort the data by the shard key and then perform a multi-record insert when the shard key changes. Basically, rather than sending and storing a record at a time they can send a large number of records in a single operation. This certainly saved network trips, but they could only batch insert records that shared the same shard key. With Oracle NoSQL Database release 3.5.2, we have added the ability to do a bulk insert or a bulk put records across different shards, allowing application developers to work more effectively with very large data-sets.   The BulkPut API is available for the table as well as the key/Value data  model. The API provides significant performance gains over single row inserts by reducing the network traffic round trips as well as by doing ordered inserts in batch on internally sorted data across different shards in parallel. This feature is released in a controlled fashion, so there aren’t java docs available for this API with this release, but we encourage you to use it and give us feedback.   API KV interface: loads Key/Value pairs supplied by special purpose streams into the store. public void put(List<EntryStream<KeyValue>> streams, BulkWriteOptions bulkWriteOptions)   Table interface: loads rows supplied by special purpose streams into the store. public void put(List<EntryStream<Row>> streams, BulkWriteOptions bulkWriteOptions) streams the streams that supply the rows to be inserted. bulkWriteOptions non-default arguments controlling the behavior the bulk write operations Stream Interface : public interface EntryStream<E> { String name(); E getNext(); void completed(); void keyExists(E entry); void catchException(RuntimeException exception, E entry); }   Performance We ran the YCSB benchmark with the new Bulk-Put API on 3x3 (3 shards each with 3 copies of data) NoSQL Cluster running on bare metal servers, ingesting 50M records per shard or 150M records across the datastore, using 3 parallel thread per shard or total 9 ( 3x3) for the store and 6 parallel input streams per SN or total 54 ( 6 *9) across the store. The results for the benchmark run are shown in the graph below   The above graph compares the throughput (ops/sec) of Bulk vs Simple Put API with NoSQL store having 1000 partitions with durability settings of None and Simple Majority. As seen from the above charts there is over a 100% increase in throughput with either durability settings. Sample Example Here's link program uploaded to the github repository, the sample demonstrate how to use the BulkPut API in your application code. refer to the readme file for details related to the program execution.   {C} {C}{C}{C} Summary If you are looking at bulk loading data into Oracle NoSQL Database the latest Bulk Put API provides the most efficient and fastest (as demonstrated by the YCSB) way to ingest large amount of data. Check it out now and download the latest version of the Oracle NoSQL Database at: www.oracle.com/nosql. I'd like to thanks my colleague Jin Zhao for inputs on the performance numbers.          

Our customers have often asked us “what’s the fastest and most efficient way to insert a large number of records in the Oracle NoSQL database? “ Very recently, a shipping company reached out to us...

Oracle NoSQL BulkGet API

As seen from the timing graph (a)  above, getting 30M rows using simple get API  would have taken us  420 seconds which reduces to 149ms with 72 executor threads (Plotted on X-Axis) using the bulk get API. This is almost a 3X improvement! And as seen on the graph (b) the throughput went to 200K ops/s with 72 executor threads from 68k ops/sec using simple get operation. That is again a 3X improvement! In the above charts, the bulk-X is the maximum number of concurrent request that specifies the maximum degree of parallelism (in effect the maximum number of NoSQL Client side threads) to be used when running an iteration. To achieve the good performance we want to keep the Replication Nodes working as much as possible, so as a starting point we suggest using 3 * available RN (Replication Nodes). For e.g., if you are running a 3x3 store then you could start with 3*9 ( 3 shards each with 3 copies of data) = 27 concurrent requests. Of course, the optimal value varies based on the nature of the application requirements - some may want to be unobtrusive and use minimal resources (but efficiently) with elapsed time being a lower priority for e.g running analytic on secondary zones and on the other hand you may want strong a real-time latency running multi-get on Primary Zones. Please refer the java documentation for the API. We support both the key/value and table interfaces for the API Performance  In our internal Yahoo Server Scalability Benchmark runs we found that we could retrieve 30M rows in 149 sec with 72 executor threads, running on 3x3 shards, with 90 reader threads  and each record size is 100 bytes.  Refer to the below chart  for details of the benchmark runs                                         (a)                                                                                                      (b) As seen from the timing graph (a)  above, getting 30M rows using simple get api  would have taken us 420 seconds which reduces to 149ms with 72 executor threads (Plotted on X-Axis) using the bulk get API. This is almost a 3X improvement ! And as seen on the graph (b) the throughput went to 200K ops/s with 72 executor threads from 68k ops/sec using simple get operation. That is again a 3X improvement! In the above charts the bulk-X  is the maximum number of concurrent request that specifies the maximum degree of parallelism (in effect the maximum number of NoSQL Client side threads) to be used when running an iteration. To achieve good performance we want to keep the Replication Nodes working as much as possible, so as a starting point we suggest to use 3 * available RN (Replication Nodes). For e.g.  if you are running a 3x3 store then you could start with 3*9 ( 3 shards each with 3 copies of data) = 27 concurrent requests. Off course, the optimal value varies based on the nature of the application requirements - some may want to be unobtrusive and use minimal resources (but efficiently) with elapsed time being a lower priority for e.g. running analytic on secondary zones and on the other hand you may want strong a real-time latency running multi-get on Primary Zones.   Table Definition and Sample BulkGet Example : Below, is some sample code to demonstrate how to use this API   PhoneTable (manufacturer string,                price double ,                 …                primary key(shard(manufacturer), price));   import java.util.ArrayList; import java.util.List; import oracle.kv.Direction; import oracle.kv.KVStore; import oracle.kv.KVStoreConfig; import oracle.kv.KVStoreFactory; import oracle.kv.StoreIteratorException; import oracle.kv.table.FieldRange; import oracle.kv.table.MultiRowOptions; import oracle.kv.table.PrimaryKey; import oracle.kv.table.Row; import oracle.kv.table.Table; import oracle.kv.table.TableAPI; import oracle.kv.table.TableIterator; import oracle.kv.table.TableIteratorOptions;   public class BulkGetExample {    // hard-coded connection parameters in the example, this could be taken as input to the program or can be read from configuration file       private final String storeName = "kvstore";     private final String hostName = "localhost";     private final int port = 5000;     private final String tableName = "phones";     final KVStore store;     public static void main(final String args[]) {         try {             BulkGetExample runTest = new BulkGetExample();             runTest.run();         } catch (Exception e) {             System.err.print("BulkGetExample run failed: " + e.getMessage());         }     }       BulkGetExample() {         store = KVStoreFactory.getStore             (new KVStoreConfig(storeName, hostName + ":" + port));     }       void run() {         final String[] manufacturers = {"Nokia", "Apple", "Samsung", "Motorola"};         final List<PrimaryKey> keys =             new ArrayList<PrimaryKey>(manufacturers.length);         final TableAPI tableAPI = store.getTableAPI();         final Table phoneTable = tableAPI.getTable("PhoneTable");         if (phoneTable == null) {             throw new IllegalArgumentException("Table not found: " + tableName);         }         for (String manufacturer : manufacturers) {             final PrimaryKey pk = phoneTable.createPrimaryKey();             pk.put("manufacturer", manufacturer);             keys.add(pk);         }          /* Initialize multiRowOption: price range in [200, 500].  */         final FieldRange range = phoneTable.createFieldRange("price");         range.setStart(200d, true).setEnd(500d, true);         final MultiRowOptions mro = new MultiRowOptions(range, null, null);        /* Initialize TableIteratorOptions */ //  Setting batch size parameter as 200, this number indicates the maximum number of results batches that can be held in the No-SQL database client before its processed by the replication node         final int batchResultsSize = 200; //  Setting the maximum number of concurrent threads of executor threads to 9, I ran this on 3x3 shard         final int parallelism = 9;           final TableIteratorOptions tio =            new TableIteratorOptions(Direction.UNORDERED /* Direction */,                                      null /* Consistency */,                                      0 /* RequestTimeOut */,                                      null /*TimeUnit*/,                                      parallalism,                                      batchResultsSize);         TableIterator<Row> itr = null;         int count = 0;         try {               itr = tableAPI.tableIterator(keys.iterator(), mro, tio);             while (itr.hasNext()) {                 final Row phone = itr.next();                 System.out.println(phone.toJsonString(false));                 count++;                 /* ... */             }             System.out.println(count + " rows returned.");         } catch (StoreIteratorException sie) {             /* Handle exception.. */         } finally {             if (itr != null) {                 itr.close();             }         }     } } The above example returns an iterator over the keys matching the manufacturers within the price range [200-500] supplied by the supplied by iterator, if along with other details it also desired to retrieve the  images of all the phones then the images can be modeled as child table (for efficiency reasons) and the same can be retrieved in the single API call. Summary Oracle NoSQL BulkGet API provides the most effective and per-format way to fetch the data in bulk from Oracle NoSQL database. As demonstrated by the  YCSB runs, using this API you can expect between a 2 and 3 times performance improvement for retrieving data in bulk        

As seen from the timing graph (a)  above, getting 30M rows using simple get API  would have taken us  420 seconds which reduces to 149ms with 72 executor threads (Plotted on X-Axis) using the bulk get...

Oracle NoSQL Database 12.2.3.5.2 Available

Oracle NoSQL Database , version 12.2.3.5.2 is now available for download.  We strongly recommend that you download this new version.  The highlights are: Bulk Put API -  A high performance API that allows the application to insert multiple records (Bulk Put) in a single API call. The Bulk Put API is available for table as well as the key/Value data model. The API provides a significant performance gains over single row inserts by reducing the network round trips as well as by doing ordered inserts in batch on internally sorted data allowing application developers to work more effectively with very large datasets. Kerberos integration - Oracle NoSQL Enterprise Edition(EE) supports authentication using a Kerberos service. Kerberos is an industry standard authentication protocol for large client/server system. With Kerberos, Oracle NoSQL DB and application developers can take advantage of existing authentication infrastructure and processes within your enterprise. To use Oracle NoSQL DB with Kerberos, you must have a properly configured Kerberos deployment, configured Kerberos service principals for Oracle NoSQL DB, and added Kerberos user principal to Oracle NoSQL DB. Please refer to the security guide for details and also refer the sample code.  Read more and download from HERE.

Oracle NoSQL Database , version 12.2.3.5.2 is now available for download.  We strongly recommend that you download this new version.  The highlights are: Bulk Put API -  A high performance API that...

OOW'15 Oracle NoSQL HOLs

Last week at OOW'15 we did Hands On Labs for the NoSQL Database. There were essentially 2 tracks 1) Administrators 2) developers. I was often asked to provide the links for the labs, the links are available on our OTN pages and also posting the same on this blog. The link contains the VM that has pre-build scripts and a guide on the desktop that walks through the HOL step by step  1) Admin Track :  Contains steps to deploy a 3x3 cluster, we are simulating a scenario or 3 machine each with 3 disks  Extends the cluster, by adding a node to each shard (3x4) thereby increasing the read throughput  Backup and Recovery of your store Securing the store   2) Developer Track :  Deploys the cluster Create tables (parent and child) Create Secondary Indexes Populate the tables Query using cli and application : Range Queries Retrieve both parent and child records in a single call Usage of secondary indexes Integrate with RDBMS using external tables Data modelling exercise HOL were designed to be very simple and they provide us an opportunity to help folks in our community to get started with Oracle NoSQL DB, but I also realize that its too short a time to convey all of the above in a 1 hour session - specially for folks who are trying the product for 1st time. There are some very intricate details around distributed system, topology management, data modelling etc..That's something we'd work upon as we do these session next year..I'd also like to hear from you folks, what you'd like your feedback on content, material etc..

Last week at OOW'15 we did Hands On Labs for the NoSQL Database. There were essentially 2 tracks 1) Administrators 2) developers. I was often asked to provide the links for the labs, the links...

Setting up a single node Oracle NoSQL

This blog describes process to setup a single node Oracle NoSQL Database. If you are planning a large system for production use, please read Initial Capacity Planning to estimate the number of storage nodes on which you will need to install the software.   Installations Follow the steps listed in the product documentation page to install Oracle No SQL Database. It is important that you follow the installation prerequisites Configuration   At high level configuring your store requires following steps :   Bootstrap Storage Node (SN) : The physical machine or the storage node that we are going to use need to be bootstrapped first, in the bootstrap process we create config file in the KVRoot Dir that holds the cluster deployment info. Start Storage Node Agent (SNA) :  After bootstrapping, we start the Storage Node Agent or SNA that basically works like a listener that enables communication between the admin node and other replication node (RN) . We run bootstrapping and SNA on all the storage node.  Run Admin Steps to deploy the cluster  Create Store Create Data Center Assign Storage Nodes (SN) to a DC Deploy MxN topology   Before we start   first download the latest binaries, if there are some stale libraries cleanup the environment  Download the kv-ee-<version>.tar.gz software from  OTN site and and unpack it into /home/oracle/nosqldb. Set environment variables, KVHOME points to the directory where you unzipped the latest Oracle NoSQL Database binaries, KVROOT where you would like metadata files to be stored and KVDATA to the  storage directories KVHOME=/home/oracle/nosqldb/kv-<version>KVROOT=/home/oracle/nosqldb/rootKVDATA=/u02/nosqldb/data mkdir ${KVROOT}mkdir ${KVDATA} (Optional) Stop any KVROOT instance if running and remove existing data nohup java -jar $KVHOME/lib/kvstore.jar stop -root $KVROOT >stopNoSQL.log 2>&1 & // Remove any existing data  /bin/rm -rf ${KVDATA}/*      /bin/rm -rf ${KVROOT}/*  Lets look into each of above steps. Step 1 - BootStrap Storage Nodes (SN)   Create a configuration file that is use to start a not-yet-deployed Storage Node to be used in an instance of Oracle NoSQL Database. Notice that we are defining three different port (-admin, -harangue, -port ) ranges for variousinter-intra node communication. Number of replication-nodes (RN) this storage node can host is defined by the capacity as ‘-capacity’. You can define storage directories where data is going tobe written using -storagedir argument.In our case we defined capacity=1, which means we would like one RN to be hosted on the storage node with RN using its own disk. If there are more RNs then each RN could be potentially made to use their own disk, this is important aspect asindependent spindles improves performance and the availability of the system. Refer to the guide  to understand the details of each of the above configuration parameters mkdir -p $KVROOTjava -jar $KVHOME/lib/kvstore.jar makebootconfig \         -root $KVROOT \         -store-security none \         -capacity 1 \         -harange 5010,5030 \         -admin 5001 \         -port 5000 \         -memory_mb 1024 \         -host `hostname` \         -storagedir ${KVDATA} Step 2 - Start Storage Node Agent(SNA) Once the Storage Node is bootstrapped we are going to start the Oracle NoSQL Database Storage Agent(SNA). Notice that we are passing KVROOT of the SN to the start command. nohup java -Xmx256m -Xms256m -jar $KVHOME/lib/kvstore.jar start -root ${KVROOT} & Step 3 - Deploy the topology Next we deploy the NoSQL DB topology on the storage nodes that wejust started in previous step. Let's breakdown this steps  First we name the store as ‘mystore’.  Note that the name of your store is essentially used to form a path to records kept in the store. For this reason, you should avoid using characters in the store name that might interfere with its use within a file path. The command line interface does not allow an invalid store name. Valid characters are alphanumeric, '-', '_', and '.'. Create a Data center or zone 'Boston'. When you execute the plan deploy-data-center command, the CLI returns the plan number and whatever additional information it has about plan status Every KVStore has an administration database. You must deploy the Storage Node to which the command line interface is currently connecting to, in this case, "sn1", and then deploy an Administration process on that same node, in order to proceed to configure this database. Use the deploy-sn and deploy-admin commands to complete this step. Once the admin process is created, you must create a Storage node pool, which is used to contain all the SNs in the store. We name this pool as 'BostonPool' Next join the storage node(sn1) to the Storage node pool (BostonPool) The final step in your configuration process is to create Replication Nodes on every node in your store. You do this using the topology create and plan deploy-topology commands in its place.The plan deploy-topology command requires a topology name. We name it 'topo' As a final sanity check, you can confirm that all of the plans succeeded using the show plans command  java -Xmx256m -Xms256m -jar ${KVHOME}/lib/kvstore.jar runadmin -port 5000        -host ${HOSTNAME} << EOFconfigure -name mystoreplan deploy-zone -name "Boston" -rf 1 -waitplan deploy-sn -zn zn1 -host ${HOSTNAME} -port 5000 -waitplan deploy-admin -sn sn1 -port 5001 -waitpool create -name BostonPoolshow topologypool join -name BostonPool -sn sn1topology create -name topo -pool BostonPool -partitions 10topology preview -name topoplan deploy-topology -name topo -waitshow  planEOF Once all the above steps are done, you are set to use the single node Oracle NoSQL - that's how easy it is to setup and deploy a NoSQL Cluster. Depending on interest, I can extend this tutorial to setting up a multi-node cluster of Oracle NoSQL database.

This blog describes process to setup a single node Oracle NoSQL Database. If you are planning a large system for production use, please read Initial Capacity Planning to estimate the number of...

Invoking OracleNoSQL based Java application from PL/SQL

Recently, we ran into an interesting use-case with one of large supermarket customers, who wanted to take the output of a PL/SQL process and store that in  Oracle NoSQL Database to be be later consumed by one of their retail application - very quickly and something that can scale very well to support high volume of data that they are expected.  Oracle NoSQL DB is the obvious choice because it can provide a high throughput, low latency read/write operation and can scale to support large volume of data. Coming to the integration, one of the highlights of the OracleN SQL Database is that it integrates really very well with other Oracle Tech Stack. The simplest way to write to Oracle NoSQL DB from a PL/SQL procedure is to call a  Java procedure that uses the native NoSQL DB API in order to insert data into the database and the simplest way to read from Oracle NoSQL DB in a stored procedure is to use an External Table in the query so that data from Oracle NoSQL DB can be passed to the Oracle Database query processor. There's another possible option to  use Golden Gate to move data from the Oracle Database to NoSQL DB. We have already blogged about the GoldenGate Integration, so in this blog I am going to focus on the Java Stored procedure based approach. In case if you are not familiar with  Java Stored Procedure : A Java stored procedure essentially contains Java public static methods that are published to PL/SQL and stored in an Oracle database for general use. This allows a Java stored procedure to be executed from an application as if it were a PL/SQL stored procedure. When called by client applications, a Java stored procedure can accept arguments, reference Java classes, and return Java result values. So, to help our customer, we created a POC that showcases this integration. Am listing down steps involved in this integration  First, create a NoSQL DB tables  that would store the data from Oracle Database Create a Java Application using the native NoSQL Driver to perform CRUD operation on NoSQL DB. Load the Java Application classes that we created in Step#2 in the oracle database using the load-java utility. Create a Java Store stored procedure that takes the data from the PL/SQL and updates NoSQL Database Next, publish Java stored procedures in the Oracle data dictionary. To do that, you write call specs, which map Java method names, parameter types, and return types to their SQL counterparts. Finally, call the Java store procedure from the PL/SQL Block to perform the updates. The POC is available for download in a zip file from our OTN page (refer: The PL/SQL Integration in the Demo/Sample Program). The READ-ME file bundled in the zip has all the detailed steps and files needed for this integration. With this approach, the NoSQL access is transparent to the Oracle DB application . NoSQL DB is an excellent choice here and using this Java Stored Procedure approach, the customer can exploit the advantages of BOTH repositories effectively and with better TCO.

Recently, we ran into an interesting use-case with one of large supermarket customers, who wanted to take the output of a PL/SQL process and store that in  Oracle NoSQL Database to be be later...

Migrating/Importing MongoDB Documents into Nosql Tables

Summary  This paper presents a how to to migrate documents in MongoDB's collections into tables and child tables in Oracle Nosql. The idea is to take as example a relatively complex document, define a mapping file to map the basic fields of the document into a table,  and to map the embedded collections of the document into child tables. The java class that we provide will generate the Nosql structure of the tables and insert the data. The set of components of each element of the collection is inserted in the same operation into the store. A Json example   Let's use an example of a family item from a MongoDB collection: { "_id" : ObjectId("55c4c6576e4ae64b5997d39e"), "firstname" : "lena", "lastname" : "clark", "gender" : "W", "childrens" : [ { "name" : "bob", "schools" : [ {"street" : "90, pine street","id" : "Saint volume"}, {"street" : "134, mice street","id" : "Saint appearance"} ], "hobbies" : ["soccer","photo"] }, { "name" : "joseph", "schools" : [ {"street" : "168, merely street","id" : "Saint slipped"} ], "hobbies" : ["tennis","piano"] }, { "name" : "sandy", "schools" : [{"street" : "227, thread street","id" : "Saint discovery"}], "hobbies" : ["football","guitar"] } ] } In this case the main document has the the following fields : '_id', 'firstname', 'lastname', 'gender' and childrens'. 'childrens' is an embedded collection, containing 'name', 'schools' and 'hobbies'. 'schools' is again a nested collection with 'street and 'id' fields and 'hobbies' is a list. We can map them into several nested tables: the main table represents FAMILY, FAMILY.CHILDREN  gets 'childrens' items and FAMILY.CHILDREN.SCHOOLS and FAMILY.CHILDREN.HOBBIES store schools and hobbies information. The mapping file  The mapping file, is a properties file, it contains also connect information to access MongoDB database and Nosql store: the name of the Nosql store: Nosql.Store=kvstore the host and port of the nosql store: Nosql.URL=bigdatalite:5000 the mongodb host: MongoDB.host=localhost the mongodb port: MongoDB.port=27017 the mongodb database: MongoDB.DB=gadb Mapping principles Define the main collection, its fields and its main table mapping For each field define its type and its mapping value. Note that this can be a recursive step. For each table define the primary key index components.  Mapping extracts Mapping collection and table with its primary keys mongo.collection=family mongo.collection.map=FAMILY FAMILY.indexcols=LASTNAME,FIRSTNAME indexcols is the keyword to introduce the comma separated list of columns of the key, order is important. The indexcol prefix is a Nosql table name Family fields family.fields=lastname,firstname,gender,childrens family.firstname.type=string family.firstname.map=FIRSTNAME family.childrens.type=collection family.childrens.map=CHILDREN fields is the keyword to introduce the comma separated list of fields of a collection. For each field type corresponds to the type of a column in a Nosql table (string, integer, long, float, double or boolean are accepted). Two other values are used: array or collection. array is for lists of basic types, collection is for more complex collections. When  type is a basic type, map indicates a column of the mapped table, when the type is array or collection, map introduces a new table. Children mappings CHILDREN.indexcols=NAME childrens.fields=name,schools,hobbies childrens.name.type=string childrens.name.map=NAME childrens.schools.type=collection childrens.schools.map=SCHOOLS childrens.hobbies.type=array childrens.hobbies.map=HOBBIES School mappings  schools.fields=street,id schools.indexcols=ID street and id are basic string fields, their type and map are not shown. Hobbies mappings hobbies.fields=hobbies hobbies.hobbies.type=string hobbies.hobbies.map=HOBBY HOBBIES.indexcols=HOBBY children.hobbies is an array of strings mapped to child table HOBBIES, there is no name in the main collection for the field, I've chosen to use hobbies (the name of the collection) as the field name to be able to define a mapping.  Tables generated Get child tables from FAMILY    kv-> show tables -parent FAMILY Tables:  FAMILY.CHILDREN  FAMILY.CHILDREN.HOBBIES  FAMILY.CHILDREN.SCHOOLS Get table indexes kv-> show indexes -table FAMILY Indexes on table FAMILY FAMILYIndex (LASTNAME, FIRSTNAME) kv-> show indexes -table FAMILY.CHILDREN Indexes on table FAMILY.CHILDREN CHILDRENIndex (NAME) kv-> show indexes -table FAMILY.CHILDREN.SCHOOLS Indexes on table FAMILY.CHILDREN.SCHOOLS SCHOOLSIndex (ID) kv-> show indexes -table FAMILY.CHILDREN.HOBBIES Indexes on table FAMILY.CHILDREN.HOBBIES HOBBIESIndex (HOBBY)  Getting data from tables Get our example family kv-> get table -name FAMILY -field LASTNAME -value "clark" -field FIRSTNAME -value "lena" {"FIRSTNAME":"lena","LASTNAME":"clark","GENDER":"W"} Get our family children kv-> get table -name FAMILY.CHILDREN -field LASTNAME -value "clark" -field FIRSTNAME -value "lena" {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"bob"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"joseph"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"sandy"} Get our family children schools kv-> get table -name FAMILY.CHILDREN.SCHOOLS -field LASTNAME -value "clark" -field FIRSTNAME -value "lena" {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"bob","STREET":"134, mice street","ID":"Saint appearance"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"bob","STREET":"90, pine street","ID":"Saint volume"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"joseph","STREET":"168, merely street","ID":"Saint slipped"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"sandy","STREET":"227, thread street","ID":"Saint discovery"}  Get our family children hoobies kv-> get table -name FAMILY.CHILDREN.HOBBIES -field LASTNAME -value "clark" -field FIRSTNAME -value "lena" {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"bob","HOBBY":"photo"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"bob","HOBBY":"soccer"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"joseph","HOBBY":"piano"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"joseph","HOBBY":"tennis"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"sandy","HOBBY":"football"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"sandy","HOBBY":"guitar"} Running the example jar files needed MongoJavaDriver: mongo-java-driver-3.0.0.jar (this is the version we have used) Nosql client: kvclient.jar (it should be a version containing tables 12.3+, it had been tested with 3.3.4) main java class : mongoloadnosql.MongoDB2Nosql (java source code is here) Parameters The tool has 5 parameters: -limit <integer>, number of documents to load, 0 is for all the documents -skip <integer>, offset of the first document to load, similar to the skip function in MongoDB, 0 means no skip of documents -mapfile <file>, properties file to load -create [true|<anything else than true>], if true the Nosql API functions for creation of tables and indexes are issued -insert [true|<anything else than true>], if true the Nosql API functions  for insertion are issued Launch command This command creates nosql tables and indexes if there do not exists, and insert the whole collection items using the properties file  mappingfam.properties: java -classpath <tool_dir>/classes:<KVHOME>/lib/kvclient.jar:<MONGODB_CLASSPATH>/mongo-java-driver-3.0.0.jar mongoloadnosql.Mongo2Nosql -limit 0 -skip 0 -mapfile mappingfam.properties -create true -insert true Caveats and warnings Actually there is no possibility to map  MongoDB references (neither referenced relationships nor DBRefs) Fields should be presented in the order defined by their primary keys (lastname,firstname) instead of (firstname,lastname)  The java code attached is just  to illustrate how to  import/migrate MongoDB data into Nosql tables in an efficient and consistent way, it has not been tested in all kinds of situations and it is not intended to be free of bugs. Bonus The following mapping file, allows to map MongoDB documents having the structure of a post.   In this case there is an embedded object "origine" which is defined as: {"owner" : "gus","site" : "recent_safety.com"}) which is not a collection. There is no primary key other than the MongoDB '_id' field. Enjoy trying this example also.

Summary  This paper presents a how to to migrate documents in MongoDB's collections into tables and child tables in Oracle Nosql. The idea is to take as example a relatively complex document, define...

Uploading NoSQL tables from Golden Gate User Exits

Golden Gate and NoSQL Integration The aim of this post is to illustrate how to use Golden Gate to stream relational transactions to Oracle NoSQL 12.3.4. We follow the structure of the post which illustrated how to use Golden Gate to stream data into HBase.   As shown in the diagram below, integrating database with NoSQL is accomplished bydeveloping a custom handler using Oracle GoldenGate's  and NoSQL's Java APIs. The custom handler is deployed as an integral part of theOracle GoldenGate Pump process.   ThePump process and the custom adapter are configured through the Pump parameterfile and custom adapter's properties file. The Pump readsthe Trail File created by the Oracle GoldenGate Capture process and passes thetransactions to the adapter. Based on the configuration, the adapter writes thetransactions into NoSQL table. You can find the Java code for the handler at this Github repository in folder StreamFromGG. The steps to generate and test an example are: Prepare the database to stream data from a table Define a NoSQL table Define the pump extract parameter file from the data base Define  the extract parameter file and the adapter properties file for NoSQL Register the extract process Start the GoldenGate extract processes Do some data manipulation on the data base and verify the content on the NoSQL table Lets take an example Prepare the database to stream data from a table This part is not detailed lets say that a database user is defined on Oracle 12c to allow Golden Gate transnational streaming. The database parameters are set to log transnational SQL commands in the appropriate way to satisfy Golden Gate requirements. We will focus on the database table T2 from the gha schema, whose definition is: CREATE TABLE "GHA"."T2"     ( "ID" NUMBER,  "CREATED" TIMESTAMP (6),  "NOM" VARCHAR2(32 BYTE),  "VILLE" VARCHAR2(128 BYTE),  CONSTRAINT "PK_T2" PRIMARY KEY ("ID", "CREATED") )  Define a NoSQL table  After connecting to the NoSQL store the following commands create the table T2: table create -name T2 # Add table fields add-field -name ID -type STRING add-field -name NOM -type STRING add-field -name CREATED -type STRING add-field -name VILLE -type STRING # Assign a field as primary key primary-key -field ID -field CREATED shard-key -field ID exit # Add table to the database plan add-table -wait -name T2 Define the pump extract parameter file from the data base The extract for the database requires previously the use of defgen  utility to create what is called a data definition file which contains the definition of the source table. The content of the extract parameter's file is: EXTRACT E_ghat2 TARGETDEFS ./dirsql/t2.sql SETENV (ORACLE_SID=cdb) userid c##ogg, password ogg exttrail /u01/ogg/dirdat/T2 GETUPDATEBEFORES table orcl.gha.t2  TARGET gha.t2; The extract name is E_ghat2, the table definition file  is t2.sql, the oracle user for the transnational streaming is c##ogg, trail files generated are prefixed with T2, the container of the schema gha is orcl. Define  the extract parameter file  and the adapter properties file for NoSQL  When using GoldenGate java adapters, there are two files, one defines the extract parameters, the other gives the java specific properties for the adapter (the default name for this file is <extract_name>.properties, if a different name is used it should be given on the extract parameters. Our extract name is nosqlt2. Par of the content of nosqlt2.properties is: jvm.bootoptions= -Xms64m -Xmx512M -Dlog4j.configuration=log4j.properties -Djava.class.path=dirprm:/u01/nosql/kv-ee/lib/jackson-core-asl.jar:/u01/nosql/kv-ee/lib/jackson-mapper-asl.jar:/u01/nosql/kv-ee/lib/avro.jar:/u01/ogg/ggjava/oggnosql.jar:/u01/nosql/kv-ee/lib/kvclient.jar:/u01/ogg/ggjava/ggjava.jar:/usr/lib/hadoop/client/commons-configuration-1.6.jar:/etc/hadoop/conf:/usr/lib/hadoop/client/commons-cli.jar #Nosql Handler. gg.handlerlist=nosqlhandler gg.handler.nosqlhandler.type=com.goldengate.delivery.handler.nosql.NosqlHandler gg.handler.nosqlhandler.NosqlStore=kvstore gg.handler.nosqlhandler.NosqlUrl=bigdatalite:5000 gg.handler.nosqlhandler.NosqlTable=T2 gg.handler.nosqlhandler.NosqlCols=ID,CREATED,NOM,VILLE gg.handler.nosqlhandler.NosqlPKCols=ID,CREATED gg.handler.nosqlhandler.NosqlShardCols=ID gg.handler.nosqlhandler.NosqlMappings=ID,ID;CREATED,CREATED;NOM,NOM;VILLE,VILLE The meaning of these properties is:  jvm.bootoptions, gives the path for the nosql java classes including json data managing and the jar for nosql adapter gg.handlerlist, gives the list of handlers in this case noqsqlhandler will be used to identify the properties gg.handler.nosqlhandler.type, gives the class used as adapter gg.handler.nosqlhandler.NosqlStore, gives the name of the  Nosql store to connect to gg.handler.nosqlhandler.NosqlUrl, gives the nosql store url (hostname:port) gg.handler.nosqlhandler.NosqlTable, gives the name of the table gg.handler.nosqlhandler.NosqlCols, gives a comma separated list of the Nosql Table columns gg.handler.nosqlhandler.NosqlPKCols, gives a comma separated list of the Nosql Table primary key columns gg.handler.nosqlhandler.NosqlShardCols, gives a comma separated list of the Nosql Table shard columns (should be a non void subset of the primary key columns) gg.handler.nosqlhandler.NosqlMappings, gives a semi-colon separated list of mapping pairs (source column:target column) The adapter implementation of NoSQL data manipulation, delete, update, create uses the shard column values to batch operations into the NoSQL database. The execution of the batched operations is done only when the shard stored value changes. Register the extract process  Use ggsci utility to issue the following commands (replacing <OGG_HOME> with it's real value):  add extract E_GHAT2 ,integrated tranlog, begin now add exttrail  <OGG_HOME>/dirdat/T2 , extract E_GHAT2 , megabytes 10 register extract E_GHAT2  database container (orcl) add extract NOSQLT2, exttrailsource <OGG_HOME>/dirdat/T2   Start the GoldenGate extract processes  Use ggsci utility to start e_ghat2 and nosqlt2  Verify that the process are running: GGSCI (bigdatalite.localdomain) 1> info all Program     Status      Group       Lag at Chkpt  Time Since Chkpt MANAGER     RUNNING                                            EXTRACT     RUNNING     E_GHAT2     00:00:02      00:01:07     EXTRACT     RUNNING     NOSQLT2     49:22:05      00:01:56     Data manipulation and verifications between Oracle 12c and NoSQL Get the table count on NoSQL kv-> aggregate table -name t2  -count Row count: 4329 Delete data from t2 on Oracle 12c delete t2 where id = 135 commit  2 rows deleted  Recompute the table count on NoSQL kv-> aggregate table -name t2  -count Row count: 4327 Note that last batch of NoSQL operations is flushed when the extract nosqlt2 is stopped 

Golden Gate and NoSQL Integration The aim of this post is to illustrate how to use Golden Gate to stream relational transactions to Oracle NoSQL 12.3.4. We follow the structure of the post which illustr...

Forrester Wave places NoSQL Database among the leaders

Weare very pleased that Oracle NoSQL Database has been recognized as one of theleaders in the key-value NoSQL product category by Forrester Research. Please see http://www.oracle.com/us/corporate/analystreports/forrester-wave-nosql-2348063.pdf for the full report.  Inthe past few years, we’ve witnessed growing adoption of NoSQL technologies toaddress specific data management problems.   In many of the earlyadopter scenarios, the NoSQL applications were developed and managed asself-contained, standalone repositories of semi-structured data. Inrecent months, it has become clear that such data silos are very expensive toimplement and maintain.  Big data and NoSQL users now understand that wellintegrated NoSQL and SQL systems are the key to effective data management intoday’s world.  An integrated set of products for managing NoSQL andrelational data is critical for delivering business value in a cost effectivemanner. Oracle NoSQL Database is fully integrated with Oracle Database andrelated technologies, thus making it an excellent choice for enterprise-grade,mission-critical NoSQL applications.  As mentioned in the Forrester Wavereport, “Many Oracle customers use Oracle NoSQL to balance the need forscale-out workloads of simpler key-value data, with the rich set of relationaldata management capabilities needed in their core business systems, or whensupporting new applications that have frequently changing key-value data, suchas profiles for fraud, personalization, and sensor data management”.

We are very pleased that Oracle NoSQL Database has been recognized as one of the leaders in the key-value NoSQL product category by Forrester Research.  Please see http://www.oracle.com/us/corporate/an...

Using R to plot data from NoSql tables

Oracle NoSql tables are relatively new. The communication between those tables and other BigData systems and tools is under construction. There is a package (rkvstore) allowing to get access to Nosql kv data from R but this package does not allow to use tables. This paper presents a way to access R from Java and to work with table data from Nosql using the R package Rserve and shows how to generate a plot of this data.  RServe You need to install into R the package Rserve which can be found here and once there Download/Files To launch the R server just install the Rserve package: R CMD INSTALL  <path to the Rserve.tar.gz  source package>  Run R >library("Rserve") >Rserve() R server : gets data from a Java program and returns the result NoSql Tables To create and load Nosql tables refer to this post. Java Code The main steps of the java programs are: Connect to the kvstore Get data from a table (via an iterator) Create an R session Transform data from the iterator to an R format Assign data to R variables Generate the R elements to make the plot Display the data Disconnect You can find the java source code for this  blog entry here. Usage run java with the following class and arguments nosql.r.RViewTablesBlog kvstore kvhost:kvport tablename fligthrangebegin fligthrangeenduse -1 for fligthrangebegin fligthrangeend to ignore flight rangeadd kvclient.jar from NoSql and REngine.jar, RServeEngine.jar from Rserve to the classpath Results  R returns an image similar to: Enjoy ! 

Oracle NoSql tables are relatively new. The communication between those tables and other BigData systems and tools is under construction. There is a package (rkvstore) allowing to get access to Nosql...

Big Data SQL for Oracle NoSQL Database (part 1)

Introduction We recently announcedOracle Big Data SQL support for Oracle NoSQL Database. This feature will allow Oracle Database users to connect to external data repositories like Oracle NoSQL Database, and/or Hadoop, and will enable them to fetch data from any or all of the repositories (at once) through single SQL query. We believe this a significant step forward as it streamlines the adoption of big data technologies with traditional technology stack. If you are following big data space then you would agree that everyone likes the scalability, availability, performance at cost aspect of it but almost everyone struggles to modify their enterprise application to work with big data technologies because they don't talk the same language as traditional database systems do. For innovations to happen fast what is required is that we lower down the adoption barrier for the developers & the data scientists who have been at the core of this data revolution. And by providing same SQL declarative language for relational and non-relational data we have made it simpler for everyone to adopt these big data technologies. SQL on Oracle NoSQL Database Having an access layer using a declarative language is definitely a great thing but a common trend these days is that every NoSQL vendor is writing a custom SQLike layer. Now if you want to use your existing application to access relational data as well as non-relational data stored in the NoSQL database then you can not write a single SQL query to access these two separate stores, which again makes it complicated from application development and manageability point of view. Oracle Big Data SQL for Oracle NoSQL Database fills this gap by providing exactly the same SQL access layer to both the stores and therefore the queries would look exactly the same no matter which store you access. To achieve this, Oracle Big Data SQL requires that Oracle NoSQL Database support the ability to access its data via Hive. Thus, plugging Oracle NoSQL Database to Oracle Database is as simple as performing 1 (Creating a NoSQL Store and a Table), 2  (Configuring Hive to access NoSQL Table) & 3 (Configuring Oracle Database to talk to HCatalog) steps. Please note that Big Data SQL provides much better query performance over running a query directly on a generic external tables because with former approach, computation (for selecting, & formatting data) is moved where the data is as apposed to moving all the data to the Oracle Database node first and then applying the predicates. The same proprietary cell technology (developed for Exadata) is used for applying the query predicates in parallel to all the distributed NoSQL partitions, making the overall query to run faster. In rest of the article I will give you the technical details on how to perform first two steps so you can get the hands-on experience of this feature. Figure 1: Flow diagram showing steps to access data in Oracle NoSQL Database using hive queries. Setup Step 1: Create a TABLE in Oracle NoSQL Database and load some data that we would like to access from HIVE later on. For this demonstration we will be using KVLITE but you can very well deploy a real cluster by following steps covered in this webinar. Please remember to set KVHOME as the environment variable that will point to the directory where you unzipped the latest Oracle NoSQL Database binaries (download here) and KVROOT (eg. /u01/kvroot) where you would like metadata files to be stored. $java -jar $KVHOME/lib/kvstore.jar kvlite -root $KVROOT -host localhost & Once you have Oracle NoSQL Database cluster deployed (in our case it is single instance), connect to administration shell using runadmin command and then from the KV prompt connect to the store (kvstore in my case). $java -jar $KVHOME/lib/kvstore.jar runadmin -port 5000 -host localhostkv-> CONNECT STORE -name kvstore Create MOVIE table with few columns (to capture movie specific information) kv->EXEC "CREATE TABLE MOVIE( \ id INTEGER, \ original_title STRING,\ overview STRING,\ poster_path STRING,\ release_date STRING,\ vote_count INTEGER,\ runtime INTEGER,\ popularity DOUBLE,\ genres STRING,\ PRIMARY KEY (id) )" SHOW table to confirm it is created successfully & DESCRIBE to make sure all the columns types are appropriate as well. kv-> EXEC 'SHOW TABLES' kv-> EXEC 'DESC AS JSON TABLE MOVIE' Next we will load some MOVIE data into the table. You can download a movie.dat file with details of about 10 movies represented as JSON string. We will directly load these strings into MOVIE table. kv-> PUT TABLE -name MOVIE -file /tmp/movie.datInserted 10 rows in movie table You can now COUNT the rows and GET the rows to make sure data is indeed there. kv-> AGGREGATE TABLE -name MOVIE -countRow count: 10KV-> GET TABLE -name MOVIE Step 2: Create Hive External Table using Storage Handler for Oracle NoSQL Database (that got shipped in release 3.2.5) . Before we begin doing anything with Hive we first need to install Hadoop on our system. I have already installed Apache hadoop-2.2.0 and there is a good article that you can follow to do it yourself. After Hadoop being installed and tested, download the Hive archive from here. I am using apache-hive-0.14.0 at the time of this article but you can download any latest version. Unzip the archive and create environment variable HIVE_HOME to point to this directory. Here are some of the environment variables that are set in my ~/.bashrc file: export HADOOP_HOME=/u01/home/hadoop/hadoop-2.2.0export HADOOP_MAPRED_HOME=$HADOOP_HOMEexport HADOOP_COMMON_HOME=$HADOOP_HOMEexport HADOOP_HDFS_HOME=$HADOOP_HOMEexport YARN_HOME=$HADOOP_HOMEexport HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"export KVHOME=/u01/nosql/kvhomeexport HIVE_HOME=/u01/home/hadoop/apache-hive-0.14.0export HIVE_AUX_JARS_PATH=$KVHOME/lib/kvclient.jarexport PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:\ $HADOOP_HOME/bin:$HADOOP_HOME  NOTICE that we have HIVE_AUX_JARS_PATH set to kvclient.jar. We need to set this variable to make sure Hive can find Storage Handler for Oracle NoSQL Database from the client driver itself. Now the only thing that is left is creating an external table to mimic the table exist in Oracle NoSQL Database. Let's connect to hive first and then create the table using this DDL: $hivehive-> CREATE EXTERNAL TABLE IF NOT EXISTS MOVIE( id INT, original_title STRING, overview STRING, poster_path STRING, release_date STRING, vote_count INT, runtime INT, popularity DOUBLE, genres STRING) STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' TBLPROPERTIES ( "oracle.kv.kvstore"="kvstore", "oracle.kv.hosts"="localhost:5000", "oracle.kv.hadoop.hosts"="localhost", "oracle.kv.tableName"="MOVIE");Time taken: 0.174 seconds           Note: STORE BY field is where we define the StorageHandler class for Oracle NoSQL Database and TBLPROPERTIES is where we define name of the store, host name and port where storage node agent is listening at (in our case we have only node), and the name of the table in NoSQL DB. We have MOVIE as the table there so that is what we use as oracle.kv.tableName  field. That is all one need to do to configure Hive to work with Oracle NoSQL Database. Pretty simple, right ?  Step 2b: Now that we have NoSQL DB & Hive setup done, we just need to test running hive queries. Here are some of the simple queries to fetch MOVIE data using Hive Query Language (HQL): hive> SELECT id, original_title, release_date, genres FROM MOVIE;hive> SELECT id, original_title, release_date, genres FROM MOVIE WHERE release_date >1990;hive> SELECT id, original_title, release_date, genres FROM MOVIE WHERE genres LIKE '%Comedy%'; Some other queries that would require M/R jobs to be spun off to get the results: hive> SELECT count(*) FROM MOVIE WHERE genres LIKE '%Drama%'; hive> SELECT id, original_title, release_date, genres FROM MOVIE WHERE genres LIKE '%Drama%' ORDER BY release_date;hive> SELECT MAX(release_date) FROM MOVIE; Summary Using the Hive StorageHandler from latest release, one can access NoSQL Data as hive external tables. The metadata of these external tables are stored in the HCatalog, which is later referenced by Oracle Big Data SQL to decide where to dispatch the queries in parallel. The process of configuring Oracle NoSQL Database has become very simple (using the out of the box hive StorageHandler) that it takes only couple of steps to complete. In the part 2 of the blog I will talk about how hive metastore can be configured with Oracle Database to finally run SQL queries on NoSQL Data.

Introduction We recently announcedOracle Big Data SQL support for Oracle NoSQL Database. This feature will allow Oracle Database users to connect to external data repositories like Oracle NoSQL...

Using Nosql Tables with Spark

This post goal is to explain how to use Nosql tables and how to put their content into a file on hdfs using the java API for Spark. In hdfs, the table content will be presented in a comma separated style (CSV). Oracle (latest) Big Data Appliance "X4-2", offers Cloudera Enterprise Technology software including Cloudera CDH, and Oracle NoSql database including tables. The Cloudera part offers several ways of integration with Spark (see Using Nosql and Spark) : Standalone or via Yarn (see Running Spark Applications) The Nosql part allows the use of tables. Tables can be defined within the Nosql console by issuing the following command: java -Xmx256m -Xms256m -jar $KVHOME/lib/kvstore.jar runadmin -host <host> -port <store port> -store <store name> There are two parts for defining and creating a table. Define which includes table name, table fields, primary key and shared-key which is a "prefix" of the primary key, ends with the keyword "exit" table create -name flightTestExtract add-field -name param -type STRING add-field -name flight -type STRING add-field -name timeref -type LONG add-field -name value -type INTEGER primary-key -field timeref -field param -field flight  shard-key -field timeref exit Plan which allows table creation and index definition and creation: plan add-table -wait -name flightTestExtract plan add-index -wait -name flightIndex -table  flightTestExtract -field flight -field param -field timeref plan add-index -wait -name paramIndex -table  flightTestExtract -field param -field flight -field timeref Inserting into the table can be done by the put command as: put table -name flightTestExtract -json "{\"param\":\"11\",\"flight\":\"8\",\"timeref\":61000000000002,\"value\":1764248535}" put table -name flightTestExtract -json "{\"param\":\"12\",\"flight\":\"8\",\"timeref\":61000000000002,\"value\":-1936513330}" put table -name flightTestExtract -json "{\"param\":\"11\",\"flight\":\"6\",\"timeref\":61000000000013,\"value\":1600130521}" put table -name flightTestExtract -json "{\"param\":\"11\",\"flight\":\"8\",\"timeref\":61000000000013,\"value\":478674806}" The last patch of Nosql, 3.1.7, has some new java classes that could be used to get table data into hadoop. The class oracle.kv.hadoop.table.TableInputFormat can be used as a Spark JavaRDD: JavaPairRDD<PrimaryKey, Row> jrdd = sc.newAPIHadoopRDD(hconf, TableInputFormat.class, PrimaryKey.class, Row.class); The oracle.kv.table.PrimaryKey.class correspond to the fields of the primary key of the table, for example in json style: {"timeref":61000000000013, "param":"11","flight":"8"} The oracle.kv.table.Row.class correspond to the fields of table row, for example in json style: {"param":"11","flight":"8","timeref":61000000000013,"value":478674806} If we want to save the content of the table on hdfs in a csv style we have to: apply a flatMap on the rows of the RDD  flatMap(func) each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).  save the result on hdfs The following inner class defines the map:      static class FlatMapRow_Str implements FlatMapFunction<Row, String> {         @Override         public Iterable<String> call(Row s) {             List<String> lstr = s.getFields();             String tabedValues = "";             for (String field : lstr)                 tabedValues += s.get(field) + ",";             return Arrays.asList(tabedValues);         }     } The code to do the job is:  //Obtain the Row RDD        JavaRDD<Row> rddvalues = jrdd.values(); //Obtain the csv style form of the RDD  JavaRDD<String> csvStr = rddvalues .flatMap(new FlatMapRow_Str()); //Save the results on hdfs  csvStr.saveAsTextFile(pathPrefix + "/" + tableName + "csvStr"); The last step is to test using Yarn: spark-submit --master yarn --jars /u01/nosql/kv-ee/lib/kvclient.jar --class table.SparkNosqlTable2HadoopBlog /u01/nosql/kv-ee/examples/table/deploy/sparktables.jar <nosql store name> <nosql store url> <table name> <path prefix> <nosql store url> is <store host>:<store port>  You can get the java source code here. 

This post goal is to explain how to use Nosql tables and how to put their content into a file on hdfs using the java API for Spark. In hdfs, the table content will be presented in a comma...

Is your NoSQL ready for SSD - benchmark results from SanDisk

It’s well known that the killer NoSQL use case isscalability for large data sets under high concurrency.   However, what is less talked about is thefact that performance is often the key driver. Given a high performanceprofile, NoSQL then provides the ability to scale that performance out linearlyby adding more servers. If not for the high performance profile tobegin with, often times an alternative technology like a distributed cachewould be used instead, fronting a more traditional database technology. Given that performance is often a key driver, it’s nosurprise to find that flash based storage devices are increasingly common forNoSQL deployments. In fact, popularcloud providers of NoSQL services (Amazonand Google) provide thoseservices exclusively on SSD backed infrastructure. The SSD form factor gives not only the bestperformance, but also more manageable service level agreements, by taking diskI/O bottlenecks out of the equation. AtOracle, we are also seeing customers deploy the OracleNoSQL Database on servers backed by SSD storage. Unlike Oracle NoSQL Database, not all NoSQL Databases arecapable of taking advantage of SSD’s. Probably the biggest offenders are thosealigned with the Hadoop stack, based on HDFS which is using a large block basedscanning technique that is ill suited to SSD optimization. At Oracle, we’ve seen customer’s benchmarkagainst technologies like Hbase only to find that with large clusters, the costof deploying on non-SSD backed storage is quite simply cost prohibitive. In one customer benchmark for a mobile profilingapplication, we saw a customer predicting the need for dozens of Hbase serversto accommodate the same workloads that could be achieved on a standard high availabilityconfiguration (3 servers) for Oracle NoSQL when backed by SSD. Forget about the cost of software folks(Oracle NoSQL is practically free, providing annual subscription based support),imagine having to buy 50 servers to stand up your NoSQL cluster just to runwhat many see as “the Hadoop” software eco system. Seems to me people ought to be seriously re-thinkingtheir Big Data software choice when it comes to Big Data solutions that requireboth low latency NoSQL and batch based Hadoop MapReduce. SanDisk has recently benchmarked Oracle NoSQL Database usingthe standard YahooServer Scalability Benchmark on its latest generation SanDisk CloudSpeedSATA SSD’s. This makes for aninteresting read and highlights the advantages of using a NoSQL Database thatis designed to work with this latest generation of storage technology. http://itblog.sandisk.com/hyperscale-databases-testing-oracle-nosql-using-sandisk-cloudspeed-ssds-vs-hdds/

It’s well known that the killer NoSQL use case is scalability for large data sets under high concurrency.   However, what is less talked about is thefact that performance is often the key driver....

Using Nosql and Spark (How to Start)

Spark is an open-source data analytics cluster computing framework,  built outside of Hadoop’s two-stage MapReduce paradigm but runs on top of HDFS. Because of its successful approach, Spark has quickly been adopted and is as an attractive choice for the future of data processing in Hadoop. The question about how to link Nosql and Spark often concerns Big Data architects and developers. Let's take a quit look to this question. Spark revolves around the concept of a resilient distributed dataset (RDD), which is a fault-tolerant collection of elements that can be operated on in parallel. RDDs can be created by referencing a dataset in an external storage system, such as ashared filesystem, HDFS, HBase, or any data source offering a Hadoop InputFormat. Good news,  Nosql could be integrated with Spark-Java using KVInputFormat, KVAvroInputFormat, which are two Nosql java classes which extend the Hadoop abstract class InputFormat. How to proceed Get Spark resources Define the configuration parameters for Nosql (connection, store name,  key prefix) Define the Spark resilient distributed dataset to get data from Nosql Dispatch Nosql key-value subset into a dataset for key and a dataset for values Do some computations Release Spark resources Get Spark resources SparkConf sparkConf = new SparkConf().setAppName("SparkKVInput"); //AppName allows                                                                       //the tracking of job JavaSparkContext sc = new JavaSparkContext(sparkConf); Define the configuration parameters hconf.set("oracle.kv.kvstore", "kvstore"); hconf.set("oracle.kv.parentKey", "/H/10"); // just a major key prefix hconf.set("oracle.kv.hosts", String[]{"bigdatalite.localdomain:5000"});  Define the Spark resilient distributed dataset to get data from Nosql JavaPairRDD<Text,Text> jrdd = sc.newAPIHadoopRDD(hconf,KVInputFormat.class, Text.class, Text.class); The dataset parameters are the configuration, the InputFormat extension, the java class for keys and the java class for values. The class of keys and values is stated on the javadoc of KVInputFormat public class KVInputFormat  extends KVInputFormatBase<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>Dispatch Nosql key-value subset into a dataset for key and a dataset for values Setting the datasets for keys and values is easy:  JavaRDD<Text> rddkeys = jrdd.keys(); JavaRDD<Text> rddvalues = jrdd.values(); Their manipulation is not possible, Spark does not know how to serialize Text. A mapping is needed to transform the Text dataset into a String dataset. The following code does the trick:  The code for the key values is very similar Do some computations Print key and values: List <String> keys = strkeys.collect(); List <String> values = strvalues.collect(); for(int idx=0;idx<values.size();idx++){        String val = values.get(idx);        String key = keys.get(idx);        System.out.println(key.toString()+":"+val.toString()); } Release Spark resources sc.stop(); //release JavaSparkContext  How to test-it Put some data on the kvstore put kv -key /T/1/1/-/1 -value V11_1 put kv -key /T/1/1/-/2 -value V11_2 put kv -key /T/1/1/-/3 -value V11_3 put kv -key /T/1/2/-/3 -value V12_3 put kv -key /T/2/2/-/1 -value V22_1 put kv -key /T/2/2/-/2 -value V22_2 Generate a jar containing kvclient.jar, avro.jar and the class with the spark code SparkKVInput. This class has three parameters which are the configuration file parameters: oracle.kv.store, oracle.kv.host  and oracle.kv.parentkey. Generate a jar (spark.jar) containing the nosql kvclient.jar and kvavro.jar An example of calling command on a bigdatalite 4.1 is:spark-submit --class nosql.examples.oracle.com.SparkKVInput <path  to spark.jar location>/spark.jar kvstore bigdatalite.localdomain:5000 /T/1results are: 14/10/21 11:26:54 INFO SparkContext: Job finished: collect at SparkKVInput.java:62, took 0.341590587 s /T/1/1/-/1:V11_1 /T/1/1/-/2:V11_2 /T/1/1/-/3:V11_3 /T/1/2/-/3:V12_3 14/10/21 11:26:54 INFO SparkUI: Stopped Spark web UI at http://bigdatalite.localdomain:4040 Hope this first introduction to the use of Nosql key/values on Spark will help you to go deeper with Spark as an engine to manipulate Nosql Data. 

Spark is an open-source data analytics cluster computing framework,  built outside of Hadoop’s two-stage MapReduce paradigm but runs on topof HDFS. Because of its successful approach, Spark has...

Loading into Nosql using Hive

The main purpose of this post is to  show how strongly we can tied NoSql and Hive, the focus will be the upload of data into Nosql from Hive. The post  (here) discussed about the use of Hive external tables to select data from Oracle Nosql. We used a HiveStorageHandle implementation. We have reworked on this implementation to load data from hdfs or a local system via Hive into Nosql. Only uploading of text data is currently supported. Two kinds of data files can be uploaded: Case 1: Files containing plain text data like the following comma separated lines: 10,5,001,545973390 10,5,010,1424802007 10,5,011,164988888  Case 2: Files containing a JSON field corresponding to a given AVRO schema like the following tab separated lines:  10 5 173 {"samples": [{"delay": 0, "value": -914351026}, {"delay": 1, "value": 1842307749}, {"delay": 2, "value": -723989379}, {"delay": 3, "value": -1665788954}, {"delay": 4, "value": 91277214}, {"delay": 5, "value": 1569414562}, {"delay": 6, "value": -877947100}, {"delay": 7, "value": 498879656}, {"delay": 8, "value": -1245756571}, {"delay": 9, "value": 812356097}]}  10 5 174 {"samples": [{"delay": 0, "value": -254460852}, {"delay": 1, "value": -478216539}, {"delay": 2, "value": -1735664690}, {"delay": 3, "value": -1997506933}, {"delay": 4, "value": -1062624313}]} How to do it ? 1. Define the external table 2. Create and load a native Hive table 3. Insert into the external table a selection from the native Hive table Case 1: 1.Define the external table CREATE EXTERNAL TABLE MY_KV_PI_10_5_TABLE (flight string, sensor string, timeref string, stuff string)       STORED BY 'nosql.example.oracle.com.NosqlStorageHandler'       WITH SERDEPROPERTIES ("kv.major.keys.mapping" = "flight,sensor", "kv.minor.metadata" = "false", "kv.minor.keys.mapping" = "timeref", "kv.key.prefix" = "PI/10/5", "kv.value.type" = "string", "kv.key.range" = "", "kv.host.port" = "bigdatalite:5000", "kv.name" = "kvstore","kv.key.ismajor" = "true"); 2. Create and load a native Hive table CREATE TABLE kv_pi_10_5_load (flight string, sensor string, timeref string, stuff string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '/home/oracle/hivepath/pi_10_5' OVERWRITE INTO TABLE kv_pi_10_5_load; 3. Insert into the external table a selection from the native Hive table INSERT INTO TABLE my_kv_pi_10_5_table SELECT * from kv_pi_10_5_load; The external table generation defines a major key and its complete key components, this definition is used when inserting, the flight, and sensor values of the data are ignored, timeref elements are loaded the Nosql operation API for batching the insertions. Case 2: 1.Define the external table CREATE EXTERNAL TABLE MY_KV_RI_10_5_TABLE (flight string, sensor string, timeref string, stuff string)       STORED BY 'nosql.example.oracle.com.NosqlStorageHandler'       WITH SERDEPROPERTIES ("kv.major.keys.mapping" = "flight,sensor", "kv.minor.metadata" = "false", "kv.minor.keys.mapping" = "timeref", "kv.key.prefix" = "RI/10/5", "kv.value.type" = "avro", "kv.key.range" = "","kv.key.ismajor" = "true", "kv.avro.schema" = "com.airbus.zihb.avro.SampleIntSet","kv.host.port" = "bigdatalite:5000", "kv.name" = "kvstore");  When creating the external table used for upload into Nosql a new parameter is used "kv.avro.schema" = "com.airbus.zihb.avro.SampleIntSet" It is the Nosql name for an avro schema. Talking about avro schema definition, its the schema namespace "." schema name.   2. Create and load a native Hive table  CREATE TABLE kv_ri_10_5_load (flight string, sensor string, timeref string, stuff string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '/home/oracle/hivepath/ri_10_5' OVERWRITE INTO TABLE kv_ri_10_5_load;  3. Insert into the external table a selection from the native Hive table  LOAD DATA LOCAL INPATH '/home/oracle/hivepath/ri_10_5' INTO TABLE my_kv_ri_10_5_table; How to verify the upload ?  Two possibilities: a select query on Hive a get on the kvstore Let's do it on the Nosql client command line Case 1: Verify a random line existence  kv-> get kv  -key /PI/10/5/-/010 -all /PI/10/5/-/010 1424802007 1 Record returned Case 2: Verify a random line existence kv-> get kv  -key /RI/10/5/-/173 -all /RI/10/5/-/173 {   "samples" : [ {     "delay" : 0,     "value" : -914351026   }, {     "delay" : 1,     "value" : 1842307749   }, {     "delay" : 2,     "value" : -723989379   }, {     "delay" : 3,     "value" : -1665788954   }, {     "delay" : 4,     "value" : 91277214   }, {     "delay" : 5,     "value" : 1569414562   }, {     "delay" : 6,     "value" : -877947100   }, {     "delay" : 7,     "value" : 498879656   }, {     "delay" : 8,     "value" : -1245756571   }, {     "delay" : 9,     "value" : 812356097   }  ] } 1 Record returned Let's do it on the hive command line Case 1: Verify a random line existence select *  from MY_KV_PI_10_5_TABLE where timeref = "010"; OK 10 5 010 1424802007 Case 2: Verify a random line existence hive> select *  from MY_KV_RI_10_5_TABLE where timeref = "173"; ...  OK 10 5 173 {"samples": [{"delay": 0, "value": -914351026}, {"delay": 1, "value": 1842307749}, {"delay": 2, "value": -723989379}, {"delay": 3, "value": -1665788954}, {"delay": 4, "value": 91277214}, {"delay": 5, "value": 1569414562}, {"delay": 6, "value": -877947100}, {"delay": 7, "value": 498879656}, {"delay": 8, "value": -1245756571}, {"delay": 9, "value": 812356097}]} You can get a Jdeveloper 12c project here.  We have done, a return trip between Nosql and Hive: Key value subsets of a Nosql database, can be viewed using the select query language of Hive  Data from Hive tables can be uploaded into Nosql key-value pairs

The main purpose of this post is to  show how strongly we can tied NoSql and Hive, the focus will be the upload of data into Nosql from Hive. The post  (here) discussed about the use of Hive external...

Revisiting Nosql into Hive External Tables

About two years ago, in the Data Integration blog, David Allan wrote about the integration between Oracle NoSql and Hive, in his post David mentioned the work of Alexander Vilcek to implement a HiveStorageHandler to have access to NoSql data via Hive Sql select commands. Unfortunately the Java code does not compile anymore with the NoSql Java Api (3.0+).I've modified the java source code to comply with the 3.0 java KV API. The new Hive storage handler has new definition possibilities. The current  limitations are the same as the ones stated by the original Vilcek project code: Supports only external non-native Hive tables. Writing data to Oracle NoSQLDB is not supported . Parsing of Hive SerDe properties is very rudimentary yet and spaces between NoSQL DB keys definitions in the key mappings properties in the Hive table create statement will cause key names to be misinterpreted. Columns names and types specified in the Hive table definition are ignored; only NoSQL DB Major and Minor Keys mappings in the Hive table create statement define the column names. A NoSQL DB Value for a given key is always interpred as string in the Hive table. Hive CREATE TABLE Syntax: CREATE EXTERNAL TABLE <hive_table_name> (column_name column_type,column_name column type, ...) STORED BY 'nosql.example.oracle.com.NosqlStorageHandler' WITH SERDEPROPERTIES ("kv.major.keys.mapping" = "<majorKey1,majorKey2,...>", "kv.minor.keys.mapping" = "<minorKey1,minorKey2,...>" "kv.minor.metadata" = "true|false", "kv.key.prefix" = "H", "kv.value.type" = "<avro|other>", "kv.key.range" = "<key restriction range>", "kv.key.ismajor" = "true|false","kv.host.port" = "<kvstore hostname>:<kvstore port number>", "kv.name" = "<kvstore name>");  Example 1: Data stored in Oracle NoSQL Database: /Smith/Bob/-/birthdate: 05/02/1975 /Smith/Bob/-/phonenumber: 1111-1111  For this kind of data the minor key birthdate or phonenumber are metadata (field names) and the value of the key value pair is the real value for the metadata Example 2: /USAir1/gauge4/-/10-11-2012:12:12:56:77789:45678  /USAir1/gauge4/-/10-11-2012:12:12:56:77790:45640 For this kind of data the minor key is a timestamp for a measure (45678, 45640) of a gauge of a plane. There in no metadata on the minor key. The new table definitions  have the following parameters: "kv.value.type" when the value type is avro the value is transformed into string by using the json transformation of the avro value "kv.minor.metadata"  it's true or false value allows to make a difference between the two uses of key value pairs, true for Example 1, false for Example 2 "kv.key.prefix"  allows to define an starting path for a major key "kv.key.ismajor" it's value is true if the key prefix is a complete major key path or false otherwise   "kv.key.range"  is used in cases  where the key prefix does not correspond to a major key or when there is no key prefix In real bigdata store you should not use null values for both "kv.key.prefix"   and "kv.key.range" . Detail of metadata Example 1: The following put operations have been done on our store using the kvstore client commands:  put kv -key /H/Smith/Bob/-/birthdate -value 05/02/1975 put kv -key /H/Smith/Bob/-/phonenumber -value 1111-1111 put kv -key /H/Smith/Bob/-/userid -value 1 put kv -key /H/Smith/Patricia/-/birthdate -value 10/25/1967 put kv -key /H/Smith/Patricia/-/phonenumber -value 2222-2222 put kv -key /H/Smith/Patricia/-/userid -value 2 put kv -key /H/Wong/Bill/-/birthdate -value 03/10/1982 put kv -key /H/Wong/Bill/-/phonenumber -value 3333-3333 put kv -key /H/Wong/Bill/-/userid -value 3 A table creation to include all the above insertions is: CREATE EXTERNAL TABLE MY_KV_TABLE (lastname string, firstname string, birthdate string, phonenumber string, userid string)       STORED BY 'nosql.example.oracle.com.NosqlStorageHandler'       WITH SERDEPROPERTIES ("kv.major.keys.mapping" = "lastname,firstname", "kv.minor.keys.mapping" = "birthdate,phonenumber,userid",  "kv.minor.metadata" = "true", "k.key.pvrefix" = "H", "kv.value.type" = "String", "kv.key.range" = "", "kv.key.ismajor" = "false", "kv.host.port" = "bigdatalite:5000", "kv.name" = "kvstore");  a select * from my_kv_table returns 9 values. The creation and the select commands needs that the jar containing  nosql.example.oracle.com.NosqlStorageHandler be a part of the hive lib jars, lets say this jar is kvhive.jar. any select applied to the table (different to the select *) invokes map/reduce procedures and needs hive add jar commands to target kvclient.jar and  kvhive.jar.  For example:  hive> SELECT lastname, firstname, collect_set(birthdate)[0], collect_set(phonenumber)[0], collect_set(userid)[0] FROM my_kv_table GROUP BY lastname, firstname;  produces: OK Smith      Bob      05/02/1975      1111-1111      1 Smith Patricia      10/25/1967 2222-2222 2 Wong Bill 03/10/1982 3333-3333 3 Detail of a non-metadata example 2: We have defined an Avro schema to record the list of min and max values of a given measure for a range of time stamps.  The Hive table declaration in this case is: CREATE EXTERNAL TABLE MY_KV_A_TABLE (flight string, sensor string, timeref string, stuff string)       STORED BY 'nosql.example.oracle.com.NosqlStorageHandler'       WITH SERDEPROPERTIES ("kv.major.keys.mapping" = "flight,sensor", "kv.minor.metadata" = "false", "kv.minor.keys.mapping" = "timeref", "kv.key.prefix" = "A/8/11", "kv.value.type" = "avro", "kv.key.range" = "", "kv.host.port" = "bigdatalite:5000", "kv.name" = "kvstore","kv.key.ismajor" = "true");  In this case the value type specified is avro, the key prefix A/8/11 is A for analytic data prefix 8 for plane number 8 and 11 for gauge measure number 8. A select * from my_kv_a_table returns records of the type:  8 11 W2412 {"values": [{"min": -2.141749606E9, "max": 2.14331348E9}, {"min": -2.118390555E9, "max": 2.090954548E9}, {"min": -2.130274733E9, "max": 2.098153258E9}, {"min": -2.082685648E9, "max": 2.023497823E9}]}  {"values": [{"min": -2.141749606E9, "max": 2.14331348E9}, {"min": -2.118390555E9, "max": 2.090954548E9}, {"min": -2.130274733E9, "max": 2.098153258E9}, {"min": -2.082685648E9, "max": 2.023497823E9}]} is the json format of the avro values stored in the nosql database. The HiveSorageHandle Jdeveloper code can be get here.

About two years ago, in the Data Integration blog, David Allan wrote about the integration between Oracle NoSql and Hive, in his post David mentioned the work of Alexander Vilcek to implement...

Using multiDelete for efficient cleanup of old data

In a recent project one of our field engineers ( Gustavo Arango ) was confronted by a problem, he needed to efficiently delete millions of keys beneath a key space where he did not know the complete major key path, which ended in a time stamp.   He quickly discovered a way to efficiently find these major key paths and then use them to perform high speed multi-value deletions without causing unusual chattiness ( a call for each key-value pair ) on the network.   I thought it would be useful to review this solution here and give you a link to a Github example of working code to play with and understand the behavior.  This is possible by using Oracle NoSQL's method storeKeysIterator( ) on your connection to the cluster.    This iterator can be used to obtain a list of store keys in a specific range, in a single call and without loading all of the value data beneath the keys: First, you need a partial key path: Key key = Key.createKey( "/root/pieceOfKey" ); Next you need a range:  KeyRange kr = new KeyRange( "aastart" ,true, "zzfinish", true); Now you can get your Key iterator  ( getKVStore( ) returns a connection handle to cluster, an instance of KVStore ): Iterator<Key> iter = getKVStore().storeKeysIterator(Direction.UNORDERED, 0, key, kr, Depth.DESCENDANTS_ONLY); So, this is nice as the storeKeysIterator will do a query to the cluster and return ONLY the keys that start with that partial path and optionally in the range specifier, no need to give a range if you want all the keys.     Now, to efficiently delete many keys with a single call to the cluster, you need to have a full major key path.  So, now that you have the whole set of keys, you can ask them what is their full major key path and then use that information to do the efficient multiDelete operation, which in Gustavo's case of storing and managing dense time series data, meant millions of keys/values being deleted with only a small number of actual network calls and very little data being transferred across the network.  boolean enditerate = false;             while(! enditerate){               if(iter.hasNext()){                 Key iterkey = iter.next();                 iterkey = Key.createKey(iterkey.getMajorPath());                 int delNb = getKVStore(). multiDelete(iterkey, kr, Depth.PARENT_AND_DESCENDANTS, durability, 0, null) ;                    res += delNb;               }else{  enditerate = true; } If you want to get tricky and do this in parallel, you can wrap this in a worker thread and place the initialization of the iterator inside the while loop, effectively treating the iterator like a queue.  It would cause a new query to happen every time the loop iterates, but in the mean time, a lot of other threads may have deleted half of that queue. In fact, there are also parallel versions of the store iterator that will query the entire cluster in parallel using the number of threads that works best given your hardware configuration and the number of shards in your cluster.  You can learn more about in the online documentation. If you are interested in playing with the ideas in this blog, there is a Github example project in Eclipse that will get you started. 

In a recent project one of our field engineers ( Gustavo Arango ) was confronted by a problem, he needed to efficiently delete millions of keys beneath a key space where he did not know the complete...