X

The Oracle NoSQL Database Blog covers all things Oracle NoSQL Database. On-Prem, Cloud and more.

Recent Posts

JSON Partial Update using the Oracle NoSQL Database Query Language

Background A general direction that we see in NoSQL and relational databases is the adoption of the document model. While each document-oriented database implementation differs on the details of this definition, in general, they all assume documents encapsulate and encode data (or information) in some standard format or encoding such as XML, YAML, JSON, and BSON . JavaScript Object Notation (JSON) has become the accepted medium for interchange between numerous Internet services and for unstructured data. The JSON-style document model enables customers to build services that are schema-less. Sure, you could always store JSON as an opaque JSON blob in Database, but that meant that you could not easily work with JSON data at the database level. Starting with Oracle NoSQL Database 4.2.10 we added support for a native JSON data type defined by RFC 7159 that enables efficient access to data in JSON documents. The JSON data type provides following benefits over storing JSON-format strings in a string column : Automatic JSON Document Validation : Only valid JSON documents can be stored in a JSON column, so you get automatic validation of your data. Invalid documents produce an error while doing the “put” or insert operation in Oracle NoSQL Database. Efficient Access : JSON documents stored in JSON columns are converted to an internal binary (optimized) format that allows quick read access to document elements. Let’s say you want to find a particular name/value pair from a JSON document and If you were to store JSON documents as blob, your best option would be to read the JSON data in memory, run it through a JSON parser, which meant that you have to scan the JSON document and convert at the application level – needless to say it won’t be performant. Conversely, with native JSON data type, the database parses and stores JSON internally in native data format and querying JSON data is no different than querying data in other columns. Performance : Improve your query performance by creating indexes on a specific field within the JSON column value. We continue to enhance the Oracle NoSQL Database support for JSON datatypes. This includes recent enhancements: Indexes into JSON column. An index is a JSON index if it indexes at least one field that is contained inside JSON data. We support both simple type JSON indexes and multi-key types (which index arrays or map) JSON indexes.This allows developers access to the nested attributes embedded deep within a JSON document. Queries (SQL for NoSQL) involving JSON data types. Update JSON documents using SQL Update Statement. An Update statement can be used to update a row in a table that contains JSON column. This also allow users to perform partial updates or updating a subset of fields in the JSON column. While each of above point deserve a separate blog post, in this post, we are  going to focus on the last point about performing an update on the JSON document using the SQL Update statement.  Updating JSON Documents While some systems don’t recommend manipulating JSON documents, Oracle NoSQL Database with SQL Update statement has made is possible for developers to modify or partially update (subset of fields) JSON documents. Let’s take a real world example, where you want to model some User profile data (personal data associated with specific users) in Oracle NoSQL Database. User profile makes an interesting case for JSON data type, because a user profile data can have varying sets of attributes – users have different interest and expense patterns, some users have more than one work address, some users have more than one type of phone (work, home etc), some are like to specify firstInitial}{lastname} and others want {firstname}.{lastname} etc. Thus, a user profile data doesn’t really conform to the rigid schema (schema, data-types etc) of the traditional system and there is a need for modelling option that reduces the liturgy (you don’t have to define schemas) and increases flexibility (store, all sorts of data without prior definition). Enter native support for JSON in Oracle NoSQL Database – that provides the scalability and flexibility together, which makes developers more productive. Let’s create a simple table called Persons with column person defined as that holds the user profile CREATE TABLE IF NOT EXISTS JSONPersons ( id integer, person JSON, primary key (id) ); And here’s sample data that we want to insert (notice how each users have a different set of fields): { "id":1, "person" : { "firstname":"David", "lastname":"Morrison", "age":25, "income":100000, "lastLogin" : "2016-10-29T18:43:59.8319", "address":{"street":"150 Route 2", "city":"Antioch", "state":"TN", "zipcode" : 37013, "phones":[{"type":"home", "areacode":423, "number":8634379}] }, "connections":[2, 3], "expenses":{"food":1000, "gas":180} } } { "id":2, "person": { "firstname":"Peter", "lastname":"Smith", "age":38, "income":80000, "lastLogin" : "2016-10-19T09:18:05.5555", "address":{"street":"364 Mulberry Street", "city":"Leominster", "state":"MA", "phones":[{"type":"work", "areacode":339, "number":4120211}, {"type":"work", "areacode":339, "number":8694021}, {"type":"home", "areacode":339, "number":1205678}, null, {"type":"home", "areacode":305, "number":8064321} ] }, "connections":[3, 5, 1, 2], "expenses":{"food":6000, "books":240, "clothes":2000, "shoes":1200} } } { "id":6, "person" : { "mynumber":5, "myarray":[1,2,3,4] } } You can store the above JSON records in a JSON file and use the load utility to insert the records in Oracle NoSQL Database. Alternatively, you can also use the API to “put” the JSON records in the table. After successful inserting the data, you may need to modify (change field values, add or remove elements from user profile) the user records. And reasonable to say that the modification shall satisfy follow requirements : Shall take place at server:  Eliminating the read-modify-write cycle, that is, the need to fetch the whole user row at the client, compute new values for the targeted fields (potentially based on their current values) and then send the entire row back to the server. Shall be thread safe and not lead in race conditions (if updates are happening concurrently from different sessions) Shall be Atomic: Other threads shall see the most up-to-date values. Enter SQL UPDATE statement that can be used to update the rows.  UPDATE statements make server-side updates directly, without requiring a Read/Modify/Write update cycle. Both syntactically and semantically, the update statement of Oracle NoSQL Database is similar to the update statement of standard SQL, but with extensions to handle the richer data model of Oracle NoSQL Database.   Let’s take a look how this works: Case 1: Changing field values, this is simplest form of update Let’s say you want to modify user with id 3 and set the field mynumber to 300. We can use the SET clause as shown below : sql-> UPDATE JSONPersons j SET j.person.mynumber = 300 WHERE j.id = 3 RETURNING id, j.person.mynumber AS MyNumber; +----+---------------------+ | id | MyNumber | +----+---------------------+ | 3 | 300 | +----+---------------------+ 1 row returned Case: 2 Add elements in Array Let’s say we want to ADD more elements (50 and 100) to the array myarray field for user id =3. And we want to add 50 at 0th position and 100 at the end. Let’s use the ADD clause for the same: sql-> UPDATE JSONPersons j ADD j.person.myarray 0 50, ADD j.person.myarray 100 WHERE j.id = 3 RETURNING *; +----+-------------------+ | id | person | +----+-------------------+ | 3 | myarray | | | 50 | | | 1 | | | 2 | | | 3 | | | 4 | | | 100 | | | mynumber | 300 | +----+-------------------+ 1 row returned Notice that multiple ADD clauses are used in the query above. Case 3: Remove elements from Array Let’s say we want to remove the elements(myarray and mynumber) from the user id=3. Clearly, this user looks different from other users. We can do that using the single UPDATE statement sql-> UPDATE JSONPersons j REMOVE j.person.myarray, REMOVE j.person.mynumber WHERE j.id = 3 RETURNING *; +----+-----------------+ | id | person | +----+-----------------+ | 3 | | +----+-----------------+ 1 row returned Case 4: Add elements to the Map Next, let’s say we want to modify the user with id=3 and add the firstname and lastname for this user. Here, we use a single PUT clause that specifies a map with multiple elements: “ sql-> UPDATE JSONPersons j PUT j.person {"firstname" : "Wendy", "lastname" : "Purvis"} WHERE j.id = 3 RETURNING *; +----+--------------------+ | id | person | +----+--------------------+ | 3| firstname | Wendy | | | lastname | Purvis | +----+--------------------+ 1 row returned Next, we can specify the age, connections, expenses, income, and lastLogin fields using multiple PUT clauses on a single UPDATE statement: sql-> UPDATE JSONPersons j PUT j.person {"age": 43}, PUT j.person {"connections”: [2, 3]}, PUT j.person {"expenses”: {"food”: 1100, "books”: 210, "travel”: 50}}, PUT j.person {"income”: 80000}, PUT j.person {"lastLogin”: "2017-06-29T16:12:35.0285"} WHERE j.id = 3 RETURNING *; +----+----------------------------------------+ | id | person | +----+----------------------------------------+ | 3 | age | 43 | | | connections | | | 2 | | | 3 | | | expenses | | | books | 210 | | | food | 1100 | | | travel | 50 | | | firstname | Wendy | | | income | 80000 | | | lastLogin | 2017-06-29T16:12:35.0285 | | | lastname | Purvis | +----+----------------------------------------+ 1 row returned Conclusion Oracle NoSQL Database with native support for the JSON datatype makes it easy to store data that doesn’t confirm to rigid schema. In addition, the SQL UPDATE statement provides a simple and intuitive way to add, modify, remove elements from the JSON documents eliminating need to do the Read-Modify-Write cycle. Do try this at home!                                

Background A general direction that we see in NoSQL and relational databases is the adoption of the document model. While each document-oriented database implementation differs on the details of this...

Greater developer flexibility building JSON applications with Oracle NoSQL Database

In this article, we look at how to go about storing and retrieving JSON data with Oracle NoSQL Database as our choice of the database because it further enhances developer flexibility. If you are a developer building microservices for the web or mobile applications that communicate with other services over HTTP using REST (REpresentational State Transfer) architecture, then you are, usually working with JSON data. JSON has become the choice of developers for building microservices with its simplicity and flexibility. As you, or developers, started using JSON as the communication between microservices, writing and reading this JSON data to and from backend database became the need of the hour. Only because, that saved developers from converting (serializing/deserializing) JSON data fetched from the database as a result of a query run, thus further unleashing the power of JSON. Over the last few years, NoSQL databases have stepped up to provide comprehensive support for storing and querying JSON data, and relational databases have followed the trend.   Flexibility of JSON JSON is a common data interchange format and provides flexibility to developers when it comes to designing and evolving data models for building microservices. It is useful for cases where data is unorganized. How to manipulate JSON data boils down to two aspects while developing applications.   Modeling data to unleash  JSON flexibility Querying the free and wild JSON data effectively   It is important to note that databases store the JSON data in a manner which the record itself stores the type information.  Therefore each of the records is self-describing rather than schema-less. However, they do not have to adhere to any conventional structure within the scope of a table or record collection. As an example, If we are building contacts database for Person entity, a sample JSON data could look like below - { "account" : 1, "person" : {      "lastName" : "Jones",      "firstName" : "Joe",      "address" : {        "home" : {          "street" : "15 Elm",          "city" : "Lakeville",          "zip" : "12345"        },        "work" : {          "street" : "12 Main",          "city" : "Lakeville",          "zip" : "12345"        }      },      "phone" : {        "home" : "800-555-1234",       "work" : "877-123-4567"      }   } }   Here, please note that there might be records of Persons for which the phone map is not available. Thus allowing the application to work with JSON data that conforms to various combinations of its object structure. If the developer gets a new requirement that  “Person could have 2 or more phone numbers.”, He would be able to incorporate that use case with minimal code changes as opposed to the instance if he had to adhere to a standard structure for all records. Application developers can unleash this flexibility when they use JSON. Before we go further, It is also important to note that - while the JSON data itself can have its structure, in all cases the context of information including the type (static or dynamic) of field is known to the application. In other words, Application still has the business context of data and knows how to manipulate the JSON data. It does not make any sense to store 2 JSON records that have entirely different structure and also, more importantly, different information context ( one record represents Person Contact, and another record represents Shopping Cart)  in one table or collection. The application (Object Mapping frameworks when used) brings in the type consistency to the free and wild JSON data format. Now let's look at how does Oracle NoSQL Database extend this flexibility - for both data modeling and querying. Data Modeling - Oracle NoSQL database allows developers to manipulate data using the popular and well-understood paradigm of Tables. JSON is a native data-type for the column of this Table.  Meaning the Create Table DDL call can create JSON column types. So, let's see how this model allows for greater flexibility. Let's consider following create table statement - CREATE TABLE personContacts ( account INTEGER, person JSON, Primary Key (account) ); After statement execution, it creates a Table with primary key and JSON column. This model is equivalent to Document NoSQL stores such that each record in the table is a JSON document but with a primary (or shard) key defined outside. You can assume that some of the properties of person JSON data are always defined (either by default or some value). It is possible to consider another way to define the same Table such as - CREATE TABLE personContacts ( account INTEGER, person JSON, primary_phone INTEGER, Primary Key (account) ); Here, the table schema allows for self-described JSON type to be mixed with columns with other data types. Allowing for taking common fields or properties out of the JSON for the application to gain better control wherever applicable. Querying - Oracle NoSQL Database provides comprehensive ways to query individual fields or properties within the  JSON data type. Including the ability to create indexes on fields or properties from within the JSON data type. Querying JSON is possible through API calls as well as the SQL-like interface. In this blog, we will look at some of the examples using the SQL-like interface, for APIs supported to perform queries over JSON data, please see this section in the product documentation. In general, querying can be a topic of series of blog posts, but here I will discuss some of the high-level aspects that contribute towards the flexibility and in turn power of querying JSON data. Oracle NoSQL Database supports Expressions that represent a set of operations using various operators (arithmetic, logical, value and sequence comparisons), function calls and other grammatical constructs performed over the records stored to produce a result. Consider the query below - sql-> select p.person.lastName, p.account from personContacts p where p.person.address.home.city = "Lakeville"; {"lastName":"Jones","account":1}   1 row returned   The query returns lastName and account number associated with all users that live in “Lakeville.” The expression steps into the JSON data to identify the lastName property. The example is a typical Select-From-Where query in which the where clause can use specific attributes from the JSON data as a predicate. Such expressions can be written to form a variety of SFW (Select-From-Where) queries to retrieve results based on complex criteria or conditions.  They can be added at multiple places in the supported grammar further enhancing the flexibility. The supported grammar for SFW queries is below- SELECT <expression> FROM <table name> [WHERE <expression>] [ORDER BY <expression> [<sort order>]] [OFFSET <number>] [LIMIT <number>]; The beauty of the query support is that the same set of queries would work on complex data types such as Array, Map, and Record which are supported by Oracle NoSQL Database.  As a developer if you want to move from wild and free JSON to more structured schema approach by mapping JSON data to multiple column types, the queries will remain untouched. So as you firm up your application schema, you will have a choice to stay with JSON formatted data or move to a "type consistent" table schema. Oracle NoSQL Database version 4.5 or higher will include support for JSON related features we discussed earlier; you can download this version from here. To learn more about building real-world web-scale JSON data applications with Oracle NoSQL Database, refer to this extensive tutorial.

In this article, we look at how to go about storing and retrieving JSON data with Oracle NoSQL Database as our choice of the database because it further enhances developer flexibility. If you are a deve...

Hardware Strategies for Horizontal and Vertical Scalability – Oracle NoSQL Database

Since almost all NoSQL databases have built-in high availability, it is easy to change hardware configuration which involves shutting down, replacing, or adding a server to a cluster while maintaining the continuous operations. Seamless, online scalability is one of the key attributes of NoSQL systems which are sharded, shared nothing databases. Users can start with a small cluster (few servers) and then grow the cluster (by adding more hardware) as throughput and/or storage requirements grow. Similarly, users can also contract the cluster as throughput and/or storage requirements shrink due to decreased workloads. Hence, most NoSQL databases are considered to have horizontal scalability. Figure 1 shows horizontal scalability during different operational workloads. Figure 1 - Horizontal Scaling for various workloads   Scalability is appealing to businesses for a variety of reasons.  In many cases, the peak workload is not known in advance of the initial setup of a NoSQL database cluster. Scalability enables the organization to deploy enough capacity to meet the expected workload without unnecessary over-expenditure on the hardware initially. As the workload increases, more hardware can be added to meet the peak demand. As the workload decreases, the hardware can be reclaimed and re-purposed for other applications. The cluster can grow and shrink as the workload changes with zero downtime. Most NoSQL systems claim linear or near-linear scalability as the cluster size grows. Linear scalability simply means that when more hardware is added to the cluster, it is expected to deliver the performance that is roughly proportional to its capacities. In other words, if the number of servers in the NoSQL cluster is doubled, it is expected to handle roughly double the workload (throughput and storage) compared to the original configuration, if the additional servers have the same processing and storage capabilities. Since almost all NoSQL databases have built-in high availability, it is easy to replace hardware incrementally which involves shutting down a machine, removing it from the cluster, bringing in a new server, and adding it to the cluster. Everything continues to work without downtime or interruption of the application.   From a capital expenditure perspective, it can be more cost effective to purchase newer and “bigger” servers because hardware costs decline over time. Newer hardware has more processing and storage capability relative to older generations of hardware at the same price. Replacing older hardware with newer, more powerful and capable hardware gives NoSQL database vertical scalability. From an operational cost perspective (OPEX) , a cluster with fewer servers is preferable to a cluster with a large number of servers, assuming both options deliver the same performance and storage capabilities. Hardware needs to be actively managed, needs space, cooling and consumes power. Smaller clusters can improve operational efficiency. Over time, the combined effect of growing the number of servers in a cluster and/or replacing old servers with newer hardware will result in a cluster which has a mix of servers of varying processing and storage capabilities and capacity. It is common to find such “mixed” clusters in production NoSQL applications that have been upgraded over a period of time. In the context of “mixed” cluster scenarios, it is important to choose a NoSQL solution that can leverage hardware with varying processing and storage capabilities in order to address the application requirements efficiently and effectively. As mentioned earlier, most NoSQL products claim horizontal scalability. But, does an administrator really want to maintain a large cluster for NoSQL product X if a smaller cluster running a different NoSQL product will do the job, or exceed the requirements? The example in Figure 2 shows that over time, and with increased storage and processing power, a 12-machine system could be replaced with just a 3-machine system. Figure 2 - Replacing smaller servers with larger, more capable ones. When evaluating a NoSQL product, it is wise to consider the horizontal as well as vertical scalability characteristics. Oracle NoSQL Database is designed for horizontal and vertical scalability in order to deliver the most cost-effective NoSQL solution for modern applications. For every server deployed in Oracle NoSQL Database cluster, the administrator specifies the capacities (e.g., storage) of the server at the time of the initial setup.  Oracle NoSQL Database uses this information to automatically assign workloads such that each server process manages a subset of data that is roughly proportional to the storage capacities of that server.  A smaller server will be assigned less workload compared to a larger server.  Similarly, each process uses RAM based on the amount of physical memory available on the machine.  Figure 3 shows the Oracle NoSQL Database cluster with mixed hardware capacities. Figure 3 - A Mixed cluster Oracle NoSQL Database distributes the workload across the available hardware based on the capabilities of the server. More importantly, during the cluster creation and expansion/contraction operations, the system ensures that each replica for every shard (remember, Oracle NoSQL Database is a sharded, HA database) is distributed across different physical servers. This “rule” ensures that the failure of any single server will never result in the loss of a complete shard.   Also, in steady-state operation, it is possible that servers might be shut down and restarted at various points in time.  Oracle NoSQL Database automatically monitors the health and workload on each server in the cluster and rebalances the workload across the servers in order to avoid “hotspots”.  All of this happens automatically and without any administrative intervention, thus ensuring reliable and predictable performance and high availability over mixed clusters and cluster transitions. Oracle NoSQL Database leverages hardware effectively in order to meet the performance and availability requirements of the most demanding and innovative applications..  When deciding which NoSQL solution to use, vertical scalability is just as important as horizontal scalability in order to ensure the lowest Total Cost of Ownership for the overall solution.  

Since almost all NoSQL databases have built-in high availability, it is easy to change hardware configuration which involves shutting down, replacing, or adding a server to a cluster while maintaining...

NoSQL Database

Real Time Tracking with Oracle NoSQL Database

Real Time Tracking with Oracle NoSQL Database I recently got back from a short trip to Boston to watch my daughter race in the Boston Marathon. As soon as my wife and I found some spots to watch, about 1 mile from the finish line, we got out our phones and fired up the tracking software. We were both disappointed at the ability to get timely updates regarding the progress of our daughter. Remember that once you have a space to stand and watch, you basically don’t move for greater than 4 hours and try to figure out when your favorite runner will be passing your spot. An effective and efficient tracking application is critical for this. I got to thinking about the application for tracking runners, now that RFID tags are common place and so inexpensive. Each numbered bib that the runner wears contains an RFID chip that can be activated as the runner passes on or through the data activation mat. Here is what the sensor looks like from an actual Boston Marathon bib. During the race, at specific intervals, the time of activation of the sensor is captured, stored, and some simple computations are then performed, such as the most recent minutes/mile and an extrapolation of what the expected finishing time will be. A NoSQL application for the timing of runners would be quite straightforward to develop. Let’s look at two of the basics.   Registration – when someone registers to run in a race, the basic information must be acquired, including name, address, phone and birthday. The birthday is actually quite important, as qualifying times are based on the age at the time of the race, as well as how a participant places within their respective age group. For example, a JSON Document could be created at registration time with the following information. { "RunnerId":12345, "Runner" : { "firstname":"John", "lastname":"Doe", "birthdayyear":”1985”, “birthdaymonth”:”02”, “birthdayday”:”15”, "email" : "john.doe@example.net", “payment”:”Mastercard” “paymentstatus”: “paid” "social":{ "twitter":"@jdoe", "instagram":"jdoe_pics" }, } } As the race begins, each runner passes over a mat on the ground which activates the RFID chip and records the start time. As the runner progress over the race course, at specified intervals the runners cross more of these mats and the times are recorded. Simple math can then determine the elapsed time for that specific runner, as well as the minutes per mile over the past interval, as well as extrapolate the expected finish time. The JSON data as the race progresses may look like below which is quite small and can be transmitted to the servers quite quickly, or even batched up (depending on the transmitting device capability) and sent every few hundred runners, or when there is a break in the runners crossing the mat { "RunnerId":”12345”, "milestone":"milestone_5k", "timestamp":"2017-04-12T10:00:00" } Then, this information could be added to the race record for the runner as they make progress. "Marathon_Boston" : {                 “RunnerID”:12345, "start_time":"2017-04-12T10:00:00",   "milestone_5k":"2017-04-12T10:21:00", "milestone_10k":"2017-04-12T10:44:00", "milestone_15k":"2017-04-12T11:10:00", "milestone_20k":"2017-04-12T11:25:00", "milestone_25k":"2017-04-12T11:42:00", "milestone_30k":"2017-04-12T11:56:00", "milestone_35k":"2017-04-12T12:09:00", "milestone_40k":"2017-04-12T12:28:00", "milestone_41k":"2017-04-12T12:42:00", "milestone_42k_end":"2017-04-12T12:45:00" } Overall, this would be an ideal application to use a NoSQL system. The amount of data, even for a 35,000 person race would not be very much, and as the runners spread out, even less so than comparted to the starting gates. If we assume that each runners record would consume about 1K of data, then for the entire race there would only be about 35 MB of raw data. If we then assume a replication factor of 3, and include some overhead, the entire race data would need about 225 MB of storage, which could easily fit on a USB thumb drive. Using high speed SSDs can store in the Terabyte (TB) range, so that thousands of marathons results could be stored in a single Oracle NoSQL Database. This still doesn’t answer the question as to why the updates were so slow, but from my source in the Boston area, the downtown is notorious for poor cell service and add many thousands of race watchers trying to use their tracking apps at basically the same time, and you can start to understand the delays. At least we know that if a system were based on NoSQL, it would not be the culprit.

Real Time Tracking with Oracle NoSQL Database I recently got back from a short trip to Boston to watch my daughter race in the Boston Marathon. As soon as my wife and I found some spots to watch, about...

Oracle NoSQL Database on Docker

Docker makes it easier for developers to package, deploy and run applications by using Containers. It makes applications portable across operating systems running across Cloud platforms. The latest release of Oracle NoSQL Database (version 4.3) is now available as Docker container on the Docker Hub. This means that if you have Docker already installed, you can get setup and running with Oracle NoSQL Database with just one command. $ docker run -d --name=kvlite oracle/nosql The Docker container for Oracle NoSQL Database would download the latest Oracle NoSQL Database Community Edition and jump-start the kvlite instance, a single node - single shard Oracle NoSQL Database store. Developers can start playing around with Oracle NoSQL Database including some of the new and exciting features of version 4.3 in no time. Oracle NoSQL Database provides strong security support and starting with version 4.3, the security would be enabled by default. Please note that, however, the Docker version by default bootstraps the kvliteinstance with security disabled.  CMD ["java","-jar", "lib/kvstore.jar", "kvlite","-secure-config", "disable"] Building your own Docker version with security enabled is very simple. Clone github.com/oracle/docker-images/nosql Update 4.3.11/Dockerfile to include following command that will bring up the secured kvlite instance when the container is run CMD ["java", "-jar", "lib/kvstore.jar", "kvlite"] To connect to the secured kvlite instance, please follow the command to start the CLI as noted in the Oracle NoSQL Database documentation, here. Oracle NoSQL Database Docker container will get you started and you are ready to play with storing and retrieving data in minutes, but there is a lot more you can do with this. Abhishek Gupta put together a simple Jersey (JAX-RS implementation) web application that uses Oracle NoSQL Database over REST APIs. In this example, he builds 2 Docker containers, one for Oracle NoSQL Database and another that runs the Jersey application. You can follow these simple steps and simply extend the web application to build your own Oracle NoSQL Database application. Happy coding! Docker and Docker logo are trademarks or registered trademarks of Docker, Inc. in the United States and/or other countries. Docker, Inc. and other parties may also have trademark rights in other terms usedherein.

Docker makes it easier for developers to package, deploy and run applications by using Containers. It makes applications portable across operating systems running across Cloud platforms. The latest...

Oracle NoSQL Database Keeps Your Data Secure

Recent news hasbrought back the focus on how a poorly secured database server can causeirreversible damage to the reputation of the software vendor apart from manyother tangible and intangible losses. The security features in Oracle NoSQL Database makes it amember of Oracle family of products which prides themselves in being called as verysecure. This blog briefly describes these security features. 1) There are two levels of security: network security anduser authentication and authorization Networksecurity provides an outer layer of protection at the network level and isconfigured during the installation process using cryptographic keys, X.509certificate chains and trusted certificates. What this means is thecommunication between the client and server nodes and also between the servernodes is encrypted using the SSL/TLS protocol. Userauthentication and authorization can either be managed by using Oracle NoSQL Databaseutilities or relegated to any Kerberos compliant LDAP/Single-sign-on server. 2) Starting with release 4.3 of Oracle NoSQL Database,the security features are enabled by default. 3) Access to a secure Oracle NoSQL Database is limited toonly authenticated users. Oracle NoSQL Database provides tools for userand password management. 4) Password credentials for connecting to database are storedin a client-side OracleWallet, a secure software container used to storeauthentication and signing credentials. With Oracle Wallet, applications nolonger need to embed user and password in application code and scripts. This considerablyreduces risks because the user credentials are no longer exposed in the clearand password management policies are more easily enforced without changingapplication code whenever user names or passwords change. 5) Oracle NoSQL Database provides a set of default rules forcreating and updating a user password in order to enhance security. These rulesenable the administrator to enforce strongpassword policies such as minimum and maximum password length,minimum number of upper and lower case characters, digits and specialcharacters, password expiry, list of restricted passwords and maximum passwordsto be remembered that cannot be reused when setting new password. 6) Successfully authenticated users receive an identifierfor a login session that allows a single login operation to be shared acrossStorage Nodes. That session has an initial lifetime associated with it, afterwhich the session is no longer valid. The server notifies the user with anerror once the session is no longer valid. The application then needs tore-authenticate. 7) Oracle NoSQL Database provides rolebased authorization. A user can be assigned one or more roles. Roles can either be a built-in system role(readonly, writeonly, readwrite, dbadmin, sysadmin and public) or user definedrole. These built-in roles map to one or more privileges. Privileges can eitherbe System privileges or Object (table level) privileges. System privilegegrants the user the ability to perform a store wide action while Objectprivilege grants the user the ability to perform an action only on that specificobject (table). 8) With Kerberos integration Oracle NoSQL Database canrelegate the authentication and authorization job to any Kerberos compliantLDAP or single-sign-on server. In summary, Oracle takes security very seriously for allof its products. Oracle NoSQL Database has been designed from the start to besecure and protect user’s data. Please refer to SecurityGuide for more details on any of the above mentioned security features.

Recent news has brought back the focus on how a poorly secured database server can cause irreversible damage to the reputation of the software vendor apart from manyother tangible and intangible...

Oracle NoSQL Database performance on Oracle Cloud Infrastructure (OCI) IaaS

We measured the performance of Oracle NoSQL Database release 4.3 running on Oracle Cloud Infrastructure (OCI) compute nodes using the YCSB workload. YCSB is the most popular benchmark for NoSQL systems. We are pleased to report the following performance results of our test. Performance Benchmark Results Clients Records Insert Throughput (ops/sec) Mixed Throughput (ops/sec) Insert Latency (ms) (av/95%/99%) Read Latency (ms) (av/95%/99%) Update Latency (ms) (av/95%/99%) 6 450 million 276,352 843,969 4.29/12/26 1.11/3/8 5.79/13/61     Hardware Configuration We used three DenseIO nodes to run the Oracle NoSQL Database server and three Standard nodes for the YCSB driver application. DenseIO node: 36 OCPU, 512 GB RAM, 28.8 TB NVMe SSD, 10 Gbps network. Standard node: 36 OCPU, 256 GB RAM, 10 Gbps network. Software Configuration We used Oracle Linux Server release 6.8 on the client as well as the server nodes. We used Oracle NoSQL DB version 4.3, which can be downloaded from here. The server was configured as a 9 shard system with three-way replication (RF = 3).   Benchmark Configuration We used the standard YCSB parameters: 13 byte keys and 1024 byte values. The database size was 450 million records. All insert and update operations were performed with a durability policy of simple majority. All read operations were performed with the NONE_REQUIRED read consistency policy. A total of 6 client processes were used, with 2 client processes running on each of the 3 Standard Compute Capacity nodes. Each client used 200 threads for the insert workload as well as the mixed read/update workload. NOTE: Learn how Oracle assists developers by automating the installation of Oracle NoSQL Database on the Oracle Cloud Infrastructure (OCI). Click here for details.

We measured the performance of Oracle NoSQL Database release 4.3 running on Oracle Cloud Infrastructure (OCI) compute nodes using the YCSB workload. YCSB is the most popular benchmark for NoSQL...

Expanding an Oracle NoSQL Cluster

Introduction  This is extension of my last blog, where I described how to setup a single node NoSQL database. You'd realize that setting up a single node NoSQL db or kvlite should take less than 5 min - that's how simple it is! In this blog, I want to talk about how to expand the kvlite instance to a 3x3 cluster -  3 shards each having 3 copies of data Motivation for NoSQL Cluster Expansion  Before we see the actual steps of expansion, lets look at what could be motivations to expand the NoSQL cluster : Increase Data Capacity - A Company’s Oracle NoSQL Database application is now obtaining it’s data from several unplanned new sources.  The utilization of the existing configuration as more than adequate  to meet requirements, with one exception, they anticipate running out of disk space later this year. The company would  like to add the needed disks to the existing servers in existing slots, establish mount points, ask NoSQL Database to fully utilize the new disks along with the disks already in place while the system is up and running Oracle NoSQL Database.  Increase Throughput -  As a result of an unplanned corporate merger, the live Oracle NoSQL Database will see a substantial increase in write operations. The read write mix of transactions will go from 50/50 to 85/15.  The need workload will exceeds the I/O capacity available of the available storage nodes. The company would like to add new hardware and have it be utilized by the existing Oracle NoSQL Database (kvstore) currently in place.  Oh, and of course the Application needs to continue to be available while this upgrade is occurring.  Increase Replication Factor-  A new requirement has been placed on an existing Oracle NoSQL Database to increase the overall availability of the Oracle NoSQL Database by increasing the replication factor by utilizing new storage nodes added in a second geographic location.  This is accomplished by adding at least 1 replication node for every existing shard.  The current configuration has a replication factor of 3. And, yes the expansion happens, while KVStore continues to service the existing workload. Of the above, Increase Data Capacity, is obviously understood. Improvement in the storage capacity is quite obvious. When you add replication nodes, you have additional space to store your data,  Expansion We want to transform our existing 1x1 NoSQL store to 3x3  (3 shards each having 3 copies of data) cluster. Prerequisite  Make sure that you have Java SE 6 (JDK 1.6.0 u25) or later installed on all of the hosts that you are going to use for the Oracle NoSQL Database installation Make sure that some sort of reliable clock synchronization is running on each of the machines. Generally, a synchronization delta of less than half a second is required. ntp is sufficient for this purpose. Installation Follow the installation steps highlighted on the doc guide Configuration Steps Step 1 : Create the initial "boot config" configuration file using the makebootconfig utility. You should do this on each Oracle NoSQL Database node, here we are doing for storage node 2 and node 3 (storage node 1 is already bootstrapped). Notice that we are defining three different port (-admin, -harangue, -port )ranges for various inter-intra node communication. Number of replication-nodes (RN) this storage node can hosts defined by the capacity as ‘-capacity’. You can define storage directories where data is going to be written using -storagedir argument.Follow the link to understand the meaning of different configuration parameters. Please note, in this expansion exercise, we have added storage nodes (SN2 and SN3) with the greater capacities (i.e capacity =3). we defined capacity=3, which means we would like three RNs to be hosted on each of the three storage node with each RN using its own disk .This is important aspect as independent spindles improves performance and the availability of the system. ########### Bootstrap Storage Node 2 ###########java -jar $KVHOME/lib/kvstore.jar makebootconfig \        -root $KVROOT \        -store-security none \        -capacity 3 \        -harange 6010,6030 \        -admin 6001 \        -port 6000 \        -memory_mb 200\        -host kvhost02 \        -storagedir <PATH TO STORAGEDIR 1 ON SN2> \        -storagedir <PATH TO STORAGEDIR 2 ON SN3> \        -storagedir <PATH TO STORAGEDIR 3 ON SN3> \############################################################ Bootstrap Storage Node 3 ##########java -jar $KVHOME/lib/kvstore.jar makebootconfig \        -root $KVROOT\        -store-security none \        -capacity 3 \        -harange 7010,7030 \        -admin 7001 \        -port 7000 \        -memory_mb 200\        -host kvhost03 \        -storagedir <PATH TO STORAGEDIR 1 ON SN3> \        -storagedir <PATH TO STORAGEDIR 2 ON SN3> \        -storagedir <PATH TO STORAGEDIR 3 ON SN3> \################################################ Step 2 : Start the Oracle NoSQL Database Storage Node Agent (SNA) on each of the Oracle NoSQL Database nodes. The SNA manages the Oracle NoSQL Database processes on each node. java -jar $KVHOME/lib/kvstore.jar start -root $KVROOT &java -jar $KVHOME/lib/kvstore.jar start -root $KVROOT & Step 3 : Deploy 3x3 topology In this step we deploy the storage nodes and configure the topology, since Storage Node (SN1) is already deployed, we are not deploying that again, but instead just increasing the capacity to 3 (from 1) by using the plan change parameter command. We also need to add the  path to the storage directory that will contain the environment associated with a 2 additional Replication Node that would be running on SN1. Next we, deploy the new storage nodes SN2 and SN3. To change from one configuration to another, you either create a new initial topology, or youclone an existing topology and modify it into your target topology and then deploy this target topology- we take the clone of the current topology. Remember, we are changing the Replication Factor (RF) from 1 to 3, increase in RF creates more copies of the data that  improves read throughput and availability. RF is increased by using the topology change-repfactor command. For additional information on how to identify your replication factor and its implications, seeReplication Factor.We then call the redistribute command, which will create new shards, replication nodes and move partitions, RN to re-balance the cluster. Finally, we preview and deploy the topology java -Xmx256m -Xms256m -jar ${KVHOME}/lib/kvstore.jar runadmin -port 5000       -host ${HOSTNAME} << EOFplan change-parameters -service sn1 -wait -params capacity=3plan change-storagedir -sn sn1 -storagedir<PATH TO STORAGEDIR 2 ON SN1> \ -add -waitplan change-storagedir -sn sn1 -storagedir <PATH TO STORAGEDIR 2 ON SN1> \ -add -waitplan deploy-sn -znname "Boston" -port 6000 -wait -host kvhost02plan deploy-admin -sn sn2 -port 6001 -waitplan deploy-sn -znname "Boston" -port 7000 -wait -host kvhost03plan deploy-admin -sn sn3 -port 7001 -waittopology clone -current -name 3x3topology change-repfactor -name 3x3 -pool BostonPool-znname "Boston" -rf 3topology redistribute -name 3x3 -pool BostonPool -partitionstopology preview -name 3x3plan deploy-topology -name 3x3 -waitshow  planEOF Once done. You can verify by logging into the admin console and using the show topology command kv-> show topology;store=kvstore  numPartitions=120 sequence=217  zn: id=zn1 name=Boston repFactor=3 type=PRIMARY allowArbiters=false sn=[sn1] zn:[id=zn1 name=Boston] kvhost01:5000 capacity=3 RUNNING    [rg1-rn1] RUNNING             single-op avg latency=1.2094289 ms   multi-op avg latency=5.4368057 ms    [rg2-rn1] RUNNING             single-op avg latency=0.0 ms   multi-op avg latency=7.688985 ms    [rg3-rn1] RUNNING             single-op avg latency=0.0 ms   multi-op avg latency=1.6008012 ms  sn=[sn2] zn:[id=zn1 name=Boston] kvhost02:6000 capacity=3 RUNNING    [rg1-rn2] RUNNING             single-op avg latency=0.0 ms   multi-op avg latency=3.631618 ms    [rg2-rn2] RUNNING             single-op avg latency=0.0 ms   multi-op avg latency=3.1090064 ms    [rg3-rn2] RUNNING             single-op avg latency=1.5529954 ms   multi-op avg latency=5.964887 ms  sn=[sn3] zn:[id=zn1 name=Boston] kvhost03:7000 capacity=3 RUNNING    [rg1-rn3] RUNNING             single-op avg latency=0.0 ms   multi-op avg latency=3.7453163 ms    [rg2-rn3] RUNNING             single-op avg latency=1.3022312 ms   multi-op avg latency=3.657524 ms    [rg3-rn3] RUNNING             single-op avg latency=0.0 ms   multi-op avg latency=5.796298 ms  shard=[rg1] num partitions=40    [rg1-rn1] sn=sn1    [rg1-rn2] sn=sn2    [rg1-rn3] sn=sn3  shard=[rg2] num partitions=40    [rg2-rn1] sn=sn1    [rg2-rn2] sn=sn2    [rg2-rn3] sn=sn3  shard=[rg3] num partitions=40    [rg3-rn1] sn=sn1    [rg3-rn2] sn=sn2    [rg3-rn3] sn=sn3

Introduction  This is extension of my last blog, where I described how to setup a single node NoSQL database. You'd realize that setting up a singlenode NoSQL db or kvlite should take less than 5 min -...

Oracle NoSQL Database for IoT environments

How can NoSQL be used in an IoT solution be used to manage truck parking spaces ? A problem that exists in many domains is how to allocate and use efficiently a finite resource, when the demand exceeds the normal supply. For example, there is a finite supply of truck parking spaces at truck stops, and on any given night, there may be more demand than supply.  How does a trucker know if a space is available at the next rest stop, or should he/she drive the extra hour to the next one.  With federal rules as to the amount of time that a driver may be on the road and required rest periods, optimizing the drive/rest cycle and using the limited resource is critical to running a truck stop business. Pain point: Need to manage thousands of truck parking spaces, (reserve,charge, analyze, maintain, as well as collateral fuel purchases and storepurchases).  Demand is high for these spaces. There are about 1.5 million1 largetrucks in the US.  New federal regulations are requiring truck drivers tostop and rest more frequently.  This is creating huge demand for theseparking spaces, and thus the opportunity to monetize them. The customer needsto be able to monitor the availabily of each space, charge for their use, andtie this information into their back office applications.   This is their business, so it is crucial to capture the data fast and efficiently. A combination of sensor technology at the parking sites, wireless reservation systems with real time capacity information can dramatically increase revenue and increase driver safety as well, as drivers can plan more efficient drive and rest cycles. An architecture that includes an IoT service to acquire the data, as well as NoSQL to store the data, before sending to  a longer term database, can greatly increase revenue and operating efficiency. LINKS 

How can NoSQL be used in an IoT solution be used to manage truck parking spaces ? A problem that exists in many domains is how to allocate and use efficiently a finite resource, when the demand exceeds...

Oracle NoSQL Database - SQL Query Overview & Demo

SQL (Structured Query Language) is one of the most popular query languages for accessing databases. Oracle NoSQL database (release 4.2 or later) offers such query feature to simplify application development. Developers can now use their knowledge and experience in SQL for  Oracle NoSQL database. This blog provides a quick and easy guide to get you started on testing SQL queries in Oracle NoSQL database. Following steps will guide you through the setup process. Step 1: Oracle NoSQL Database Setup If you have not installed Oracle NoSQL database, please download the product here. Either enterprise or community edition is fine. Enterprise edition is used in this example. Place the package in /home/oracle/nosql. Follow the commands below to unzip the package and launch KVLite, a single-node Oracle NoSQL database. gunzip <name of the downloaded package> Example: gunzip kv-ee-4.2.10.tar.gztar -xvf <name of the gunzipped package> Example: tar -xvf kv-ee-4.2.10.tar KVHOME=<directory of the unzipped package> Example: KVHOME=/home/oracle/nosql/kv-4.2.10 java -jar $KVHOME/lib/kvstore.jar kvlite Step 2: SQL Test Environment Setup A small sample database and scripts will be used to set up the test environment. Table is used to model the data. Download this demo file. Place and unzip it in the same parent directory where the Oracle NoSQL Database is located. Example: /home/oracle/nosql gunzip TestSQL.tar.gz tar -xvf TestSQL.tar After the file is unzipped, a folder called TestSQL should appear. Follow the commands below to set up the database and SQL test environment. cd TestSQL to open the foldertestSQL.sh to start testing SQL Step 3: SQL Testing A file called testSQL.txt (in TestSQL folder) has a collection of sample SQL statements that can be used for testing. For further details on SQL usage in Oracle NoSQL products, please refer to the online documents.

SQL (Structured Query Language) is one of the most popular query languages for accessing databases. Oracle NoSQL database (release 4.2 or later) offers such query feature to simplify application...

Oracle NoSQL Database - putIfVersion() API for read-modify-write operations

I read an interesting article about how hackers were able to steal 896 BTC and caused Flexcoin to shut down because the loss was too large for the company to bear.  The article highlights the need for software implementers to provide APIs designed to prevent unexpected behavior for read-modify-write (RMW) operations.  For example, if you need to update an account balance, you need to perform the following sequence of operations: Step 1: curr_value = get(key, "my_account_no"); // read from the database into the application Step 2: new_value = curr_value - amount_to_be_deducted;  // update the value in the application Step 3: put(key, new_value);  // store the updated value in the database A database system that supports multi-statement transactions ensures that the account balance for my_account_no in the database cannot be changed by another user until the update has been committed. Unfortunately, most NoSQL database systems do not support multi-statement transactions.  If concurrent users perform RMW operations like the one described above and on the same record, it is possible to get incorrect results, as the flexcoin article illustrates. Oracle NoSQL Database provides APIs that help prevent this problem, even in the presence of concurrent activity. In this blog, we are going to talk about one such useful API, putIfVersion. KVStore.PutIfVersion is a simple, yet powerful API that can be used when updating a value to ensure that it has not changed since the last time it was read.  Version putIfVersion(Key key,Value value,Version matchVersion)           Put a key/value pair, but only if the version of the existing value matches the matchVersion argument. Version putIfVersion(Key key,Value value,Version matchVersion,ReturnValueVersion prevValue,Durability durability, long timeout,TimeUnit timeoutUnit)           Put a key/value pair, but only if the version of the existing value matches the matchVersion argument. Let us see, how putIfVersion can be used to avoid consistency issues in the application. Consider the same sequence of RMW operations: Step 1: curr_value = get(key, "my_account_no"); Step 2: new_value = curr_value - amount_to_be_deducted; Step 3: put(key, new_value); The step two can be more elaborate, with checks to ensure that the new_value is still valid (e.g. non-zero, above certain limit etc.) Now, if there are two or more application threads that are updating the same value at the same time, it is possible that both the threads have read the current value before the put() operation is executed. At the end of this operation from both the threads, the value would be incorrect as the amount would have been deducted twice but updated in the database to have been deducted only once. putIfVersion() API solves the problem above. It protects the application by making an additional check during updates ensuring that the value has not changed since the last time. putIfVersion(key, new_value, curr_value); Although the example above illustrates usage of putIfVersion() for Key/Value data model, Oracle NoSQL Database implements putIfVersion() API for both Key/Value as well as Table data model. Refer the sample code here, to leverage putIfVersion() API in your application. Happy (and safe) coding!

I read an interesting article about how hackers were able to steal 896 BTC and caused Flexcoin to shut down because the loss was too large for the company to bear.  The article highlights the need...

Oracle NoSQL Database 4.0 Released

Oracle NoSQL Database 4.0 is now released and available for download from the Oracle Technology Network download pages. OracleNoSQL Database Version 4.0 – New Features · Full text search – Ability to perform full textsearches over the data using Elastic Search. · Time-To-Live – efficient aging out of “expired” data –This is a common IoT  requirement. · SQL Query – Declarative query language for developersmore comfortable with SQL than API level access. · Predicate Pushdown – ability to process predicates from BigData SQL in NoSQL Database nodes – This leadsto  improved performance and scalability. · Import/Export – Easy to backup/restore data or move databetween different Oracle NoSQL Database stores OracleNoSQL Database continues to enable innovative applications that require veryfast access to data and flexible interfaces. · Integrated– Connect with many other Oracle products for a complete, productiveenvironment. · Fast– Utilizes parallelism at a number of levels · Flexible– Flexible schema, is not predefined · Simpleto implement – Setup, Admin and Integration is straightforward. · Reliable– built-in data replication and HA support foravailability and business continuity. · Scalable– Add more servers to handle more data and throughput with zero down-time.  Learn More About Oracle NoSQL Database: LinkedIn Group – https://www.linkedin.com/groups/4147754 Twitter – @OracelNoSQL Community – https://community.oracle.com/community/database/high_availability/nosql_database

Oracle NoSQL Database 4.0 is now released and available for download from the Oracle Technology Network download pages. Oracle NoSQL Database Version 4.0 – New Features · Full text search – Ability to...

Oracle NoSQL Database Licensing

We have received a number of questions about the Oracle NoSQL Database licensing  policy, so we wanted to clear things up. News: As of Oracle NoSQL Database Version 4.3, a new edition has been announced, Oracle NoSQL Database Basic Edition. Oracle NoSQL Database Basic Edition is available for all organizations that have an Oracle Database Enterprise Edition (DBEE) license. Oracle NoSQL Database Basic Edition may be run on the same OCPUs or unused (but licensed) OCPUs as DBEE. For a more complete discussion of Oracle NoSQL Database Basic Edition, take a look at our FAQ. Be sure to look at the following blogs: Licensing Information as of Oracle NoSQL Database - Version 4.3 Oracle NoSQL Database License Explanations  Oracle NoSQL Database consists of three versions, each with a client and server license. The three versions of the distribution are commonly referred to as the Basic Edition (BE)< Community Edition (CE) and the Enterprise Edition (EE). Each version contains a client license (where the application runs) and the server license (where the application data is stored). Enterprise Edition (EE): Client License:  Apache License, Version 2 Server License: Oracle commercial license  Community Edition (CE): Client License: Apache License, Version 2 Server License: GNU Affero General Public License, Version 3    Apache License – The details can be found at: http://www.apache.org/licenses/LICENSE-2.0.txt. GNU Affero General Public License (AGPL) Version 3 – Details on this license can be found at: http://www.oracle.com/technetwork/database/database-technologies/nosqldb/documentation/nosql-db-agpl-license-1432845.txt

We have received a number of questions about the Oracle NoSQL Database licensing  policy, so we wanted to clear things up. News: As of Oracle NoSQL Database Version 4.3, a new edition has been...

Oracle NoSQL Database Cluster YCSB Testing with Fusion ioMemory™ Storage

Highly distributed systems with large data stores in the form of NoSQL databases are becoming increasinglyimportant to enterprises, not just to hyperscale organizations. NoSQL databases are being deployed forcapturing patient sensors data in health care, smart meter analysis in utilities, customer sentiment analysis inretail, and various other use cases in different industries. NoSQL database systems help organizations store,manage, and analyze huge amounts of data on distributed system architecture. The sheer volume of data andthe distributed system design needed to manage this large data at a reasonable cost necessitated a differentcategory of database systems, leading to NoSQL databases. Oracle NoSQL Database is part of the NoSQLdatabase family and is based on a distributed, key-value architecture. This technical white paper describes a three-node Oracle NoSQL Database Cluster deployment procedure onFusion ioMemory™ storage. The following points are emphasized: Highlights performance and scalability advantages compared to traditional spinning disks. Because enterprises evaluate and assess new technologies for enterprise-wide adaptability, Yahoo CloudServing Benchmark (YCSB) is the standard benchmark tool employed for testing and is the same toolused in this paper to evaluate Oracle NoSQL Database for YCSB Benchmark Testing. Analysis and discussion are provided for throughput and latency testing results with YCSB. Download now at:   https://www.sandisk.com/content/dam/sandisk-main/en_us/assets/resources/enterprise/white-papers/oracle-nosql-cluster-ycsb-testing-with-fusion-iomemory.pdf

Highly distributed systems with large data stores in the form of NoSQL databases are becoming increasinglyimportant to enterprises, not just to hyperscale organizations. NoSQL databases are being...

Oracle NoSQL BulkPut

Our customers have often asked us “what’s the fastest and most efficient way to insert a large number of records in the Oracle NoSQL database? “ Very recently, a shipping company reached out to us with a specific requirement of using Oracle NoSQL database for their ship management application, which is used to track the movements of their container ships that moves the cargo from port to port. The cargo ships are all fitted with GPS and other tracking devices, which relays ship's location after a few seconds into the application.   The application is then queried for  1) The location of all the ships displayed on the map 2) A specific ship's trajectory over a given period of time displayed on the map too. As the volume of the location data started growing, the company is finding hard to scale the application and is now looking at a back-end system that can ingest this large data-set very efficiently.   Historically, we have supported the option to execute a batch of operations for records that share the same shard key, which is what our large airline customer (Airbus) has done. They pre-sort the data by the shard key and then perform a multi-record insert when the shard key changes. Basically, rather than sending and storing a record at a time they can send a large number of records in a single operation. This certainly saved network trips, but they could only batch insert records that shared the same shard key. With Oracle NoSQL Database release 3.5.2, we have added the ability to do a bulk insert or a bulk put records across different shards, allowing application developers to work more effectively with very large data-sets.   The BulkPut API is available for the table as well as the key/Value data  model. The API provides significant performance gains over single row inserts by reducing the network traffic round trips as well as by doing ordered inserts in batch on internally sorted data across different shards in parallel. This feature is released in a controlled fashion, so there aren’t java docs available for this API with this release, but we encourage you to use it and give us feedback.   API KV interface: loads Key/Value pairs supplied by special purpose streams into the store. public void put(List<EntryStream<KeyValue>> streams, BulkWriteOptions bulkWriteOptions)   Table interface: loads rows supplied by special purpose streams into the store. public void put(List<EntryStream<Row>> streams, BulkWriteOptions bulkWriteOptions) streams the streams that supply the rows to be inserted. bulkWriteOptions non-default arguments controlling the behavior the bulk write operations Stream Interface : public interface EntryStream<E> { String name(); E getNext(); void completed(); void keyExists(E entry); void catchException(RuntimeException exception, E entry); }   Performance We ran the YCSB benchmark with the new Bulk-Put API on 3x3 (3 shards each with 3 copies of data) NoSQL Cluster running on bare metal servers, ingesting 50M records per shard or 150M records across the datastore, using 3 parallel thread per shard or total 9 ( 3x3) for the store and 6 parallel input streams per SN or total 54 ( 6 *9) across the store. The results for the benchmark run are shown in the graph below   The above graph compares the throughput (ops/sec) of Bulk vs Simple Put API with NoSQL store having 1000 partitions with durability settings of None and Simple Majority. As seen from the above charts there is over a 100% increase in throughput with either durability settings. Sample Example Here's link program uploaded to the github repository, the sample demonstrate how to use the BulkPut API in your application code. refer to the readme file for details related to the program execution.   {C} {C}{C}{C} Summary If you are looking at bulk loading data into Oracle NoSQL Database the latest Bulk Put API provides the most efficient and fastest (as demonstrated by the YCSB) way to ingest large amount of data. Check it out now and download the latest version of the Oracle NoSQL Database at: www.oracle.com/nosql. I'd like to thanks my colleague Jin Zhao for inputs on the performance numbers.          

Our customers have often asked us “what’s the fastest and most efficient way to insert a large number of records in the Oracle NoSQL database? “ Very recently, a shipping company reached out to us...

Oracle NoSQL BulkGet API

As seen from the timing graph (a)  above, getting 30M rows using simple get API  would have taken us  420 seconds which reduces to 149ms with 72 executor threads (Plotted on X-Axis) using the bulk get API. This is almost a 3X improvement! And as seen on the graph (b) the throughput went to 200K ops/s with 72 executor threads from 68k ops/sec using simple get operation. That is again a 3X improvement! In the above charts, the bulk-X is the maximum number of concurrent request that specifies the maximum degree of parallelism (in effect the maximum number of NoSQL Client side threads) to be used when running an iteration. To achieve the good performance we want to keep the Replication Nodes working as much as possible, so as a starting point we suggest using 3 * available RN (Replication Nodes). For e.g., if you are running a 3x3 store then you could start with 3*9 ( 3 shards each with 3 copies of data) = 27 concurrent requests. Of course, the optimal value varies based on the nature of the application requirements - some may want to be unobtrusive and use minimal resources (but efficiently) with elapsed time being a lower priority for e.g running analytic on secondary zones and on the other hand you may want strong a real-time latency running multi-get on Primary Zones. Please refer the java documentation for the API. We support both the key/value and table interfaces for the API Performance  In our internal Yahoo Server Scalability Benchmark runs we found that we could retrieve 30M rows in 149 sec with 72 executor threads, running on 3x3 shards, with 90 reader threads  and each record size is 100 bytes.  Refer to the below chart  for details of the benchmark runs                                         (a)                                                                                                      (b) As seen from the timing graph (a)  above, getting 30M rows using simple get api  would have taken us 420 seconds which reduces to 149ms with 72 executor threads (Plotted on X-Axis) using the bulk get API. This is almost a 3X improvement ! And as seen on the graph (b) the throughput went to 200K ops/s with 72 executor threads from 68k ops/sec using simple get operation. That is again a 3X improvement! In the above charts the bulk-X  is the maximum number of concurrent request that specifies the maximum degree of parallelism (in effect the maximum number of NoSQL Client side threads) to be used when running an iteration. To achieve good performance we want to keep the Replication Nodes working as much as possible, so as a starting point we suggest to use 3 * available RN (Replication Nodes). For e.g.  if you are running a 3x3 store then you could start with 3*9 ( 3 shards each with 3 copies of data) = 27 concurrent requests. Off course, the optimal value varies based on the nature of the application requirements - some may want to be unobtrusive and use minimal resources (but efficiently) with elapsed time being a lower priority for e.g. running analytic on secondary zones and on the other hand you may want strong a real-time latency running multi-get on Primary Zones.   Table Definition and Sample BulkGet Example : Below, is some sample code to demonstrate how to use this API   PhoneTable (manufacturer string,                price double ,                 …                primary key(shard(manufacturer), price));   import java.util.ArrayList; import java.util.List; import oracle.kv.Direction; import oracle.kv.KVStore; import oracle.kv.KVStoreConfig; import oracle.kv.KVStoreFactory; import oracle.kv.StoreIteratorException; import oracle.kv.table.FieldRange; import oracle.kv.table.MultiRowOptions; import oracle.kv.table.PrimaryKey; import oracle.kv.table.Row; import oracle.kv.table.Table; import oracle.kv.table.TableAPI; import oracle.kv.table.TableIterator; import oracle.kv.table.TableIteratorOptions;   public class BulkGetExample {    // hard-coded connection parameters in the example, this could be taken as input to the program or can be read from configuration file       private final String storeName = "kvstore";     private final String hostName = "localhost";     private final int port = 5000;     private final String tableName = "phones";     final KVStore store;     public static void main(final String args[]) {         try {             BulkGetExample runTest = new BulkGetExample();             runTest.run();         } catch (Exception e) {             System.err.print("BulkGetExample run failed: " + e.getMessage());         }     }       BulkGetExample() {         store = KVStoreFactory.getStore             (new KVStoreConfig(storeName, hostName + ":" + port));     }       void run() {         final String[] manufacturers = {"Nokia", "Apple", "Samsung", "Motorola"};         final List<PrimaryKey> keys =             new ArrayList<PrimaryKey>(manufacturers.length);         final TableAPI tableAPI = store.getTableAPI();         final Table phoneTable = tableAPI.getTable("PhoneTable");         if (phoneTable == null) {             throw new IllegalArgumentException("Table not found: " + tableName);         }         for (String manufacturer : manufacturers) {             final PrimaryKey pk = phoneTable.createPrimaryKey();             pk.put("manufacturer", manufacturer);             keys.add(pk);         }          /* Initialize multiRowOption: price range in [200, 500].  */         final FieldRange range = phoneTable.createFieldRange("price");         range.setStart(200d, true).setEnd(500d, true);         final MultiRowOptions mro = new MultiRowOptions(range, null, null);        /* Initialize TableIteratorOptions */ //  Setting batch size parameter as 200, this number indicates the maximum number of results batches that can be held in the No-SQL database client before its processed by the replication node         final int batchResultsSize = 200; //  Setting the maximum number of concurrent threads of executor threads to 9, I ran this on 3x3 shard         final int parallelism = 9;           final TableIteratorOptions tio =            new TableIteratorOptions(Direction.UNORDERED /* Direction */,                                      null /* Consistency */,                                      0 /* RequestTimeOut */,                                      null /*TimeUnit*/,                                      parallalism,                                      batchResultsSize);         TableIterator<Row> itr = null;         int count = 0;         try {               itr = tableAPI.tableIterator(keys.iterator(), mro, tio);             while (itr.hasNext()) {                 final Row phone = itr.next();                 System.out.println(phone.toJsonString(false));                 count++;                 /* ... */             }             System.out.println(count + " rows returned.");         } catch (StoreIteratorException sie) {             /* Handle exception.. */         } finally {             if (itr != null) {                 itr.close();             }         }     } } The above example returns an iterator over the keys matching the manufacturers within the price range [200-500] supplied by the supplied by iterator, if along with other details it also desired to retrieve the  images of all the phones then the images can be modeled as child table (for efficiency reasons) and the same can be retrieved in the single API call. Summary Oracle NoSQL BulkGet API provides the most effective and per-format way to fetch the data in bulk from Oracle NoSQL database. As demonstrated by the  YCSB runs, using this API you can expect between a 2 and 3 times performance improvement for retrieving data in bulk        

As seen from the timing graph (a)  above, getting 30M rows using simple get API  would have taken us  420 seconds which reduces to 149ms with 72 executor threads (Plotted on X-Axis) using the bulk get...

Oracle NoSQL Database 12.2.3.5.2 Available

Oracle NoSQL Database , version 12.2.3.5.2 is now available for download.  We strongly recommend that you download this new version.  The highlights are: Bulk Put API -  A high performance API that allows the application to insert multiple records (Bulk Put) in a single API call. The Bulk Put API is available for table as well as the key/Value data model. The API provides a significant performance gains over single row inserts by reducing the network round trips as well as by doing ordered inserts in batch on internally sorted data allowing application developers to work more effectively with very large datasets. Kerberos integration - Oracle NoSQL Enterprise Edition(EE) supports authentication using a Kerberos service. Kerberos is an industry standard authentication protocol for large client/server system. With Kerberos, Oracle NoSQL DB and application developers can take advantage of existing authentication infrastructure and processes within your enterprise. To use Oracle NoSQL DB with Kerberos, you must have a properly configured Kerberos deployment, configured Kerberos service principals for Oracle NoSQL DB, and added Kerberos user principal to Oracle NoSQL DB. Please refer to the security guide for details and also refer the sample code.  Read more and download from HERE.

Oracle NoSQL Database , version 12.2.3.5.2 is now available for download.  We strongly recommend that you download this new version.  The highlights are: Bulk Put API -  A high performance API that...

OOW'15 Oracle NoSQL HOLs

Last week at OOW'15 we did Hands On Labs for the NoSQL Database. There were essentially 2 tracks 1) Administrators 2) developers. I was often asked to provide the links for the labs, the links are available on our OTN pages and also posting the same on this blog. The link contains the VM that has pre-build scripts and a guide on the desktop that walks through the HOL step by step  1) Admin Track :  Contains steps to deploy a 3x3 cluster, we are simulating a scenario or 3 machine each with 3 disks  Extends the cluster, by adding a node to each shard (3x4) thereby increasing the read throughput  Backup and Recovery of your store Securing the store   2) Developer Track :  Deploys the cluster Create tables (parent and child) Create Secondary Indexes Populate the tables Query using cli and application : Range Queries Retrieve both parent and child records in a single call Usage of secondary indexes Integrate with RDBMS using external tables Data modelling exercise HOL were designed to be very simple and they provide us an opportunity to help folks in our community to get started with Oracle NoSQL DB, but I also realize that its too short a time to convey all of the above in a 1 hour session - specially for folks who are trying the product for 1st time. There are some very intricate details around distributed system, topology management, data modelling etc..That's something we'd work upon as we do these session next year..I'd also like to hear from you folks, what you'd like your feedback on content, material etc..

Last week at OOW'15 we did Hands On Labs for the NoSQL Database. There were essentially 2 tracks 1) Administrators 2) developers. I was often asked to provide the links for the labs, the links...

Setting up a single node Oracle NoSQL

This blog describes process to setup a single node Oracle NoSQL Database. If you are planning a large system for production use, please read Initial Capacity Planning to estimate the number of storage nodes on which you will need to install the software.   Installations Follow the steps listed in the product documentation page to install Oracle No SQL Database. It is important that you follow the installation prerequisites Configuration   At high level configuring your store requires following steps :   Bootstrap Storage Node (SN) : The physical machine or the storage node that we are going to use need to be bootstrapped first, in the bootstrap process we create config file in the KVRoot Dir that holds the cluster deployment info. Start Storage Node Agent (SNA) :  After bootstrapping, we start the Storage Node Agent or SNA that basically works like a listener that enables communication between the admin node and other replication node (RN) . We run bootstrapping and SNA on all the storage node.  Run Admin Steps to deploy the cluster  Create Store Create Data Center Assign Storage Nodes (SN) to a DC Deploy MxN topology   Before we start   first download the latest binaries, if there are some stale libraries cleanup the environment  Download the kv-ee-<version>.tar.gz software from  OTN site and and unpack it into /home/oracle/nosqldb. Set environment variables, KVHOME points to the directory where you unzipped the latest Oracle NoSQL Database binaries, KVROOT where you would like metadata files to be stored and KVDATA to the  storage directories KVHOME=/home/oracle/nosqldb/kv-<version>KVROOT=/home/oracle/nosqldb/rootKVDATA=/u02/nosqldb/data mkdir ${KVROOT}mkdir ${KVDATA} (Optional) Stop any KVROOT instance if running and remove existing data nohup java -jar $KVHOME/lib/kvstore.jar stop -root $KVROOT >stopNoSQL.log 2>&1 & // Remove any existing data  /bin/rm -rf ${KVDATA}/*      /bin/rm -rf ${KVROOT}/*  Lets look into each of above steps. Step 1 - BootStrap Storage Nodes (SN)   Create a configuration file that is use to start a not-yet-deployed Storage Node to be used in an instance of Oracle NoSQL Database. Notice that we are defining three different port (-admin, -harangue, -port ) ranges for variousinter-intra node communication. Number of replication-nodes (RN) this storage node can host is defined by the capacity as ‘-capacity’. You can define storage directories where data is going tobe written using -storagedir argument.In our case we defined capacity=1, which means we would like one RN to be hosted on the storage node with RN using its own disk. If there are more RNs then each RN could be potentially made to use their own disk, this is important aspect asindependent spindles improves performance and the availability of the system. Refer to the guide  to understand the details of each of the above configuration parameters mkdir -p $KVROOTjava -jar $KVHOME/lib/kvstore.jar makebootconfig \         -root $KVROOT \         -store-security none \         -capacity 1 \         -harange 5010,5030 \         -admin 5001 \         -port 5000 \         -memory_mb 1024 \         -host `hostname` \         -storagedir ${KVDATA} Step 2 - Start Storage Node Agent(SNA) Once the Storage Node is bootstrapped we are going to start the Oracle NoSQL Database Storage Agent(SNA). Notice that we are passing KVROOT of the SN to the start command. nohup java -Xmx256m -Xms256m -jar $KVHOME/lib/kvstore.jar start -root ${KVROOT} & Step 3 - Deploy the topology Next we deploy the NoSQL DB topology on the storage nodes that wejust started in previous step. Let's breakdown this steps  First we name the store as ‘mystore’.  Note that the name of your store is essentially used to form a path to records kept in the store. For this reason, you should avoid using characters in the store name that might interfere with its use within a file path. The command line interface does not allow an invalid store name. Valid characters are alphanumeric, '-', '_', and '.'. Create a Data center or zone 'Boston'. When you execute the plan deploy-data-center command, the CLI returns the plan number and whatever additional information it has about plan status Every KVStore has an administration database. You must deploy the Storage Node to which the command line interface is currently connecting to, in this case, "sn1", and then deploy an Administration process on that same node, in order to proceed to configure this database. Use the deploy-sn and deploy-admin commands to complete this step. Once the admin process is created, you must create a Storage node pool, which is used to contain all the SNs in the store. We name this pool as 'BostonPool' Next join the storage node(sn1) to the Storage node pool (BostonPool) The final step in your configuration process is to create Replication Nodes on every node in your store. You do this using the topology create and plan deploy-topology commands in its place.The plan deploy-topology command requires a topology name. We name it 'topo' As a final sanity check, you can confirm that all of the plans succeeded using the show plans command  java -Xmx256m -Xms256m -jar ${KVHOME}/lib/kvstore.jar runadmin -port 5000        -host ${HOSTNAME} << EOFconfigure -name mystoreplan deploy-zone -name "Boston" -rf 1 -waitplan deploy-sn -zn zn1 -host ${HOSTNAME} -port 5000 -waitplan deploy-admin -sn sn1 -port 5001 -waitpool create -name BostonPoolshow topologypool join -name BostonPool -sn sn1topology create -name topo -pool BostonPool -partitions 10topology preview -name topoplan deploy-topology -name topo -waitshow  planEOF Once all the above steps are done, you are set to use the single node Oracle NoSQL - that's how easy it is to setup and deploy a NoSQL Cluster. Depending on interest, I can extend this tutorial to setting up a multi-node cluster of Oracle NoSQL database.

This blog describes process to setup a single node Oracle NoSQL Database. If you are planning a large system for production use, please read Initial Capacity Planning to estimate the number of...

Invoking OracleNoSQL based Java application from PL/SQL

Recently, we ran into an interesting use-case with one of large supermarket customers, who wanted to take the output of a PL/SQL process and store that in  Oracle NoSQL Database to be be later consumed by one of their retail application - very quickly and something that can scale very well to support high volume of data that they are expected.  Oracle NoSQL DB is the obvious choice because it can provide a high throughput, low latency read/write operation and can scale to support large volume of data. Coming to the integration, one of the highlights of the OracleN SQL Database is that it integrates really very well with other Oracle Tech Stack. The simplest way to write to Oracle NoSQL DB from a PL/SQL procedure is to call a  Java procedure that uses the native NoSQL DB API in order to insert data into the database and the simplest way to read from Oracle NoSQL DB in a stored procedure is to use an External Table in the query so that data from Oracle NoSQL DB can be passed to the Oracle Database query processor. There's another possible option to  use Golden Gate to move data from the Oracle Database to NoSQL DB. We have already blogged about the GoldenGate Integration, so in this blog I am going to focus on the Java Stored procedure based approach. In case if you are not familiar with  Java Stored Procedure : A Java stored procedure essentially contains Java public static methods that are published to PL/SQL and stored in an Oracle database for general use. This allows a Java stored procedure to be executed from an application as if it were a PL/SQL stored procedure. When called by client applications, a Java stored procedure can accept arguments, reference Java classes, and return Java result values. So, to help our customer, we created a POC that showcases this integration. Am listing down steps involved in this integration  First, create a NoSQL DB tables  that would store the data from Oracle Database Create a Java Application using the native NoSQL Driver to perform CRUD operation on NoSQL DB. Load the Java Application classes that we created in Step#2 in the oracle database using the load-java utility. Create a Java Store stored procedure that takes the data from the PL/SQL and updates NoSQL Database Next, publish Java stored procedures in the Oracle data dictionary. To do that, you write call specs, which map Java method names, parameter types, and return types to their SQL counterparts. Finally, call the Java store procedure from the PL/SQL Block to perform the updates. The POC is available for download in a zip file from our OTN page (refer: The PL/SQL Integration in the Demo/Sample Program). The READ-ME file bundled in the zip has all the detailed steps and files needed for this integration. With this approach, the NoSQL access is transparent to the Oracle DB application . NoSQL DB is an excellent choice here and using this Java Stored Procedure approach, the customer can exploit the advantages of BOTH repositories effectively and with better TCO.

Recently, we ran into an interesting use-case with one of large supermarket customers, who wanted to take the output of a PL/SQL process and store that in  Oracle NoSQL Database to be be later...

Migrating/Importing MongoDB Documents into Nosql Tables

Summary  This paper presents a how to to migrate documents in MongoDB's collections into tables and child tables in Oracle Nosql. The idea is to take as example a relatively complex document, define a mapping file to map the basic fields of the document into a table,  and to map the embedded collections of the document into child tables. The java class that we provide will generate the Nosql structure of the tables and insert the data. The set of components of each element of the collection is inserted in the same operation into the store. A Json example   Let's use an example of a family item from a MongoDB collection: { "_id" : ObjectId("55c4c6576e4ae64b5997d39e"), "firstname" : "lena", "lastname" : "clark", "gender" : "W", "childrens" : [ { "name" : "bob", "schools" : [ {"street" : "90, pine street","id" : "Saint volume"}, {"street" : "134, mice street","id" : "Saint appearance"} ], "hobbies" : ["soccer","photo"] }, { "name" : "joseph", "schools" : [ {"street" : "168, merely street","id" : "Saint slipped"} ], "hobbies" : ["tennis","piano"] }, { "name" : "sandy", "schools" : [{"street" : "227, thread street","id" : "Saint discovery"}], "hobbies" : ["football","guitar"] } ] } In this case the main document has the the following fields : '_id', 'firstname', 'lastname', 'gender' and childrens'. 'childrens' is an embedded collection, containing 'name', 'schools' and 'hobbies'. 'schools' is again a nested collection with 'street and 'id' fields and 'hobbies' is a list. We can map them into several nested tables: the main table represents FAMILY, FAMILY.CHILDREN  gets 'childrens' items and FAMILY.CHILDREN.SCHOOLS and FAMILY.CHILDREN.HOBBIES store schools and hobbies information. The mapping file  The mapping file, is a properties file, it contains also connect information to access MongoDB database and Nosql store: the name of the Nosql store: Nosql.Store=kvstore the host and port of the nosql store: Nosql.URL=bigdatalite:5000 the mongodb host: MongoDB.host=localhost the mongodb port: MongoDB.port=27017 the mongodb database: MongoDB.DB=gadb Mapping principles Define the main collection, its fields and its main table mapping For each field define its type and its mapping value. Note that this can be a recursive step. For each table define the primary key index components.  Mapping extracts Mapping collection and table with its primary keys mongo.collection=family mongo.collection.map=FAMILY FAMILY.indexcols=LASTNAME,FIRSTNAME indexcols is the keyword to introduce the comma separated list of columns of the key, order is important. The indexcol prefix is a Nosql table name Family fields family.fields=lastname,firstname,gender,childrens family.firstname.type=string family.firstname.map=FIRSTNAME family.childrens.type=collection family.childrens.map=CHILDREN fields is the keyword to introduce the comma separated list of fields of a collection. For each field type corresponds to the type of a column in a Nosql table (string, integer, long, float, double or boolean are accepted). Two other values are used: array or collection. array is for lists of basic types, collection is for more complex collections. When  type is a basic type, map indicates a column of the mapped table, when the type is array or collection, map introduces a new table. Children mappings CHILDREN.indexcols=NAME childrens.fields=name,schools,hobbies childrens.name.type=string childrens.name.map=NAME childrens.schools.type=collection childrens.schools.map=SCHOOLS childrens.hobbies.type=array childrens.hobbies.map=HOBBIES School mappings  schools.fields=street,id schools.indexcols=ID street and id are basic string fields, their type and map are not shown. Hobbies mappings hobbies.fields=hobbies hobbies.hobbies.type=string hobbies.hobbies.map=HOBBY HOBBIES.indexcols=HOBBY children.hobbies is an array of strings mapped to child table HOBBIES, there is no name in the main collection for the field, I've chosen to use hobbies (the name of the collection) as the field name to be able to define a mapping.  Tables generated Get child tables from FAMILY    kv-> show tables -parent FAMILY Tables:  FAMILY.CHILDREN  FAMILY.CHILDREN.HOBBIES  FAMILY.CHILDREN.SCHOOLS Get table indexes kv-> show indexes -table FAMILY Indexes on table FAMILY FAMILYIndex (LASTNAME, FIRSTNAME) kv-> show indexes -table FAMILY.CHILDREN Indexes on table FAMILY.CHILDREN CHILDRENIndex (NAME) kv-> show indexes -table FAMILY.CHILDREN.SCHOOLS Indexes on table FAMILY.CHILDREN.SCHOOLS SCHOOLSIndex (ID) kv-> show indexes -table FAMILY.CHILDREN.HOBBIES Indexes on table FAMILY.CHILDREN.HOBBIES HOBBIESIndex (HOBBY)  Getting data from tables Get our example family kv-> get table -name FAMILY -field LASTNAME -value "clark" -field FIRSTNAME -value "lena" {"FIRSTNAME":"lena","LASTNAME":"clark","GENDER":"W"} Get our family children kv-> get table -name FAMILY.CHILDREN -field LASTNAME -value "clark" -field FIRSTNAME -value "lena" {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"bob"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"joseph"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"sandy"} Get our family children schools kv-> get table -name FAMILY.CHILDREN.SCHOOLS -field LASTNAME -value "clark" -field FIRSTNAME -value "lena" {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"bob","STREET":"134, mice street","ID":"Saint appearance"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"bob","STREET":"90, pine street","ID":"Saint volume"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"joseph","STREET":"168, merely street","ID":"Saint slipped"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"sandy","STREET":"227, thread street","ID":"Saint discovery"}  Get our family children hoobies kv-> get table -name FAMILY.CHILDREN.HOBBIES -field LASTNAME -value "clark" -field FIRSTNAME -value "lena" {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"bob","HOBBY":"photo"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"bob","HOBBY":"soccer"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"joseph","HOBBY":"piano"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"joseph","HOBBY":"tennis"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"sandy","HOBBY":"football"} {"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"sandy","HOBBY":"guitar"} Running the example jar files needed MongoJavaDriver: mongo-java-driver-3.0.0.jar (this is the version we have used) Nosql client: kvclient.jar (it should be a version containing tables 12.3+, it had been tested with 3.3.4) main java class : mongoloadnosql.MongoDB2Nosql (java source code is here) Parameters The tool has 5 parameters: -limit <integer>, number of documents to load, 0 is for all the documents -skip <integer>, offset of the first document to load, similar to the skip function in MongoDB, 0 means no skip of documents -mapfile <file>, properties file to load -create [true|<anything else than true>], if true the Nosql API functions for creation of tables and indexes are issued -insert [true|<anything else than true>], if true the Nosql API functions  for insertion are issued Launch command This command creates nosql tables and indexes if there do not exists, and insert the whole collection items using the properties file  mappingfam.properties: java -classpath <tool_dir>/classes:<KVHOME>/lib/kvclient.jar:<MONGODB_CLASSPATH>/mongo-java-driver-3.0.0.jar mongoloadnosql.Mongo2Nosql -limit 0 -skip 0 -mapfile mappingfam.properties -create true -insert true Caveats and warnings Actually there is no possibility to map  MongoDB references (neither referenced relationships nor DBRefs) Fields should be presented in the order defined by their primary keys (lastname,firstname) instead of (firstname,lastname)  The java code attached is just  to illustrate how to  import/migrate MongoDB data into Nosql tables in an efficient and consistent way, it has not been tested in all kinds of situations and it is not intended to be free of bugs. Bonus The following mapping file, allows to map MongoDB documents having the structure of a post.   In this case there is an embedded object "origine" which is defined as: {"owner" : "gus","site" : "recent_safety.com"}) which is not a collection. There is no primary key other than the MongoDB '_id' field. Enjoy trying this example also.

Summary  This paper presents a how to to migrate documents in MongoDB's collections into tables and child tables in Oracle Nosql. The idea is to take as example a relatively complex document, define...

Uploading NoSQL tables from Golden Gate User Exits

Golden Gate and NoSQL Integration The aim of this post is to illustrate how to use Golden Gate to stream relational transactions to Oracle NoSQL 12.3.4. We follow the structure of the post which illustrated how to use Golden Gate to stream data into HBase.   As shown in the diagram below, integrating database with NoSQL is accomplished bydeveloping a custom handler using Oracle GoldenGate's  and NoSQL's Java APIs. The custom handler is deployed as an integral part of theOracle GoldenGate Pump process.   ThePump process and the custom adapter are configured through the Pump parameterfile and custom adapter's properties file. The Pump readsthe Trail File created by the Oracle GoldenGate Capture process and passes thetransactions to the adapter. Based on the configuration, the adapter writes thetransactions into NoSQL table. You can find the Java code for the handler at this Github repository in folder StreamFromGG. The steps to generate and test an example are: Prepare the database to stream data from a table Define a NoSQL table Define the pump extract parameter file from the data base Define  the extract parameter file and the adapter properties file for NoSQL Register the extract process Start the GoldenGate extract processes Do some data manipulation on the data base and verify the content on the NoSQL table Lets take an example Prepare the database to stream data from a table This part is not detailed lets say that a database user is defined on Oracle 12c to allow Golden Gate transnational streaming. The database parameters are set to log transnational SQL commands in the appropriate way to satisfy Golden Gate requirements. We will focus on the database table T2 from the gha schema, whose definition is: CREATE TABLE "GHA"."T2"     ( "ID" NUMBER,  "CREATED" TIMESTAMP (6),  "NOM" VARCHAR2(32 BYTE),  "VILLE" VARCHAR2(128 BYTE),  CONSTRAINT "PK_T2" PRIMARY KEY ("ID", "CREATED") )  Define a NoSQL table  After connecting to the NoSQL store the following commands create the table T2: table create -name T2 # Add table fields add-field -name ID -type STRING add-field -name NOM -type STRING add-field -name CREATED -type STRING add-field -name VILLE -type STRING # Assign a field as primary key primary-key -field ID -field CREATED shard-key -field ID exit # Add table to the database plan add-table -wait -name T2 Define the pump extract parameter file from the data base The extract for the database requires previously the use of defgen  utility to create what is called a data definition file which contains the definition of the source table. The content of the extract parameter's file is: EXTRACT E_ghat2 TARGETDEFS ./dirsql/t2.sql SETENV (ORACLE_SID=cdb) userid c##ogg, password ogg exttrail /u01/ogg/dirdat/T2 GETUPDATEBEFORES table orcl.gha.t2  TARGET gha.t2; The extract name is E_ghat2, the table definition file  is t2.sql, the oracle user for the transnational streaming is c##ogg, trail files generated are prefixed with T2, the container of the schema gha is orcl. Define  the extract parameter file  and the adapter properties file for NoSQL  When using GoldenGate java adapters, there are two files, one defines the extract parameters, the other gives the java specific properties for the adapter (the default name for this file is <extract_name>.properties, if a different name is used it should be given on the extract parameters. Our extract name is nosqlt2. Par of the content of nosqlt2.properties is: jvm.bootoptions= -Xms64m -Xmx512M -Dlog4j.configuration=log4j.properties -Djava.class.path=dirprm:/u01/nosql/kv-ee/lib/jackson-core-asl.jar:/u01/nosql/kv-ee/lib/jackson-mapper-asl.jar:/u01/nosql/kv-ee/lib/avro.jar:/u01/ogg/ggjava/oggnosql.jar:/u01/nosql/kv-ee/lib/kvclient.jar:/u01/ogg/ggjava/ggjava.jar:/usr/lib/hadoop/client/commons-configuration-1.6.jar:/etc/hadoop/conf:/usr/lib/hadoop/client/commons-cli.jar #Nosql Handler. gg.handlerlist=nosqlhandler gg.handler.nosqlhandler.type=com.goldengate.delivery.handler.nosql.NosqlHandler gg.handler.nosqlhandler.NosqlStore=kvstore gg.handler.nosqlhandler.NosqlUrl=bigdatalite:5000 gg.handler.nosqlhandler.NosqlTable=T2 gg.handler.nosqlhandler.NosqlCols=ID,CREATED,NOM,VILLE gg.handler.nosqlhandler.NosqlPKCols=ID,CREATED gg.handler.nosqlhandler.NosqlShardCols=ID gg.handler.nosqlhandler.NosqlMappings=ID,ID;CREATED,CREATED;NOM,NOM;VILLE,VILLE The meaning of these properties is:  jvm.bootoptions, gives the path for the nosql java classes including json data managing and the jar for nosql adapter gg.handlerlist, gives the list of handlers in this case noqsqlhandler will be used to identify the properties gg.handler.nosqlhandler.type, gives the class used as adapter gg.handler.nosqlhandler.NosqlStore, gives the name of the  Nosql store to connect to gg.handler.nosqlhandler.NosqlUrl, gives the nosql store url (hostname:port) gg.handler.nosqlhandler.NosqlTable, gives the name of the table gg.handler.nosqlhandler.NosqlCols, gives a comma separated list of the Nosql Table columns gg.handler.nosqlhandler.NosqlPKCols, gives a comma separated list of the Nosql Table primary key columns gg.handler.nosqlhandler.NosqlShardCols, gives a comma separated list of the Nosql Table shard columns (should be a non void subset of the primary key columns) gg.handler.nosqlhandler.NosqlMappings, gives a semi-colon separated list of mapping pairs (source column:target column) The adapter implementation of NoSQL data manipulation, delete, update, create uses the shard column values to batch operations into the NoSQL database. The execution of the batched operations is done only when the shard stored value changes. Register the extract process  Use ggsci utility to issue the following commands (replacing <OGG_HOME> with it's real value):  add extract E_GHAT2 ,integrated tranlog, begin now add exttrail  <OGG_HOME>/dirdat/T2 , extract E_GHAT2 , megabytes 10 register extract E_GHAT2  database container (orcl) add extract NOSQLT2, exttrailsource <OGG_HOME>/dirdat/T2   Start the GoldenGate extract processes  Use ggsci utility to start e_ghat2 and nosqlt2  Verify that the process are running: GGSCI (bigdatalite.localdomain) 1> info all Program     Status      Group       Lag at Chkpt  Time Since Chkpt MANAGER     RUNNING                                            EXTRACT     RUNNING     E_GHAT2     00:00:02      00:01:07     EXTRACT     RUNNING     NOSQLT2     49:22:05      00:01:56     Data manipulation and verifications between Oracle 12c and NoSQL Get the table count on NoSQL kv-> aggregate table -name t2  -count Row count: 4329 Delete data from t2 on Oracle 12c delete t2 where id = 135 commit  2 rows deleted  Recompute the table count on NoSQL kv-> aggregate table -name t2  -count Row count: 4327 Note that last batch of NoSQL operations is flushed when the extract nosqlt2 is stopped 

Golden Gate and NoSQL Integration The aim of this post is to illustrate how to use Golden Gate to stream relational transactions to Oracle NoSQL 12.3.4. We follow the structure of the post ...

Forrester Wave places NoSQL Database among the leaders

Weare very pleased that Oracle NoSQL Database has been recognized as one of theleaders in the key-value NoSQL product category by Forrester Research. Please see http://www.oracle.com/us/corporate/analystreports/forrester-wave-nosql-2348063.pdf for the full report.  Inthe past few years, we’ve witnessed growing adoption of NoSQL technologies toaddress specific data management problems.   In many of the earlyadopter scenarios, the NoSQL applications were developed and managed asself-contained, standalone repositories of semi-structured data. Inrecent months, it has become clear that such data silos are very expensive toimplement and maintain.  Big data and NoSQL users now understand that wellintegrated NoSQL and SQL systems are the key to effective data management intoday’s world.  An integrated set of products for managing NoSQL andrelational data is critical for delivering business value in a cost effectivemanner. Oracle NoSQL Database is fully integrated with Oracle Database andrelated technologies, thus making it an excellent choice for enterprise-grade,mission-critical NoSQL applications.  As mentioned in the Forrester Wavereport, “Many Oracle customers use Oracle NoSQL to balance the need forscale-out workloads of simpler key-value data, with the rich set of relationaldata management capabilities needed in their core business systems, or whensupporting new applications that have frequently changing key-value data, suchas profiles for fraud, personalization, and sensor data management”.

We are very pleased that Oracle NoSQL Database has been recognized as one of the leaders in the key-value NoSQL product category by Forrester Research.  Please see http://www.oracle.com/us/corporate/an...

Using R to plot data from NoSql tables

Oracle NoSql tables are relatively new. The communication between those tables and other BigData systems and tools is under construction. There is a package (rkvstore) allowing to get access to Nosql kv data from R but this package does not allow to use tables. This paper presents a way to access R from Java and to work with table data from Nosql using the R package Rserve and shows how to generate a plot of this data.  RServe You need to install into R the package Rserve which can be found here and once there Download/Files To launch the R server just install the Rserve package: R CMD INSTALL  <path to the Rserve.tar.gz  source package>  Run R >library("Rserve") >Rserve() R server : gets data from a Java program and returns the result NoSql Tables To create and load Nosql tables refer to this post. Java Code The main steps of the java programs are: Connect to the kvstore Get data from a table (via an iterator) Create an R session Transform data from the iterator to an R format Assign data to R variables Generate the R elements to make the plot Display the data Disconnect You can find the java source code for this  blog entry here. Usage run java with the following class and arguments nosql.r.RViewTablesBlog kvstore kvhost:kvport tablename fligthrangebegin fligthrangeenduse -1 for fligthrangebegin fligthrangeend to ignore flight rangeadd kvclient.jar from NoSql and REngine.jar, RServeEngine.jar from Rserve to the classpath Results  R returns an image similar to: Enjoy ! 

Oracle NoSql tables are relatively new. The communication between those tables and other BigData systems and tools is under construction. There is a package (rkvstore) allowing to get access to Nosql...

Big Data SQL for Oracle NoSQL Database (part 1)

Introduction We recently announcedOracle Big Data SQL support for Oracle NoSQL Database. This feature will allow Oracle Database users to connect to external data repositories like Oracle NoSQL Database, and/or Hadoop, and will enable them to fetch data from any or all of the repositories (at once) through single SQL query. We believe this a significant step forward as it streamlines the adoption of big data technologies with traditional technology stack. If you are following big data space then you would agree that everyone likes the scalability, availability, performance at cost aspect of it but almost everyone struggles to modify their enterprise application to work with big data technologies because they don't talk the same language as traditional database systems do. For innovations to happen fast what is required is that we lower down the adoption barrier for the developers & the data scientists who have been at the core of this data revolution. And by providing same SQL declarative language for relational and non-relational data we have made it simpler for everyone to adopt these big data technologies. SQL on Oracle NoSQL Database Having an access layer using a declarative language is definitely a great thing but a common trend these days is that every NoSQL vendor is writing a custom SQLike layer. Now if you want to use your existing application to access relational data as well as non-relational data stored in the NoSQL database then you can not write a single SQL query to access these two separate stores, which again makes it complicated from application development and manageability point of view. Oracle Big Data SQL for Oracle NoSQL Database fills this gap by providing exactly the same SQL access layer to both the stores and therefore the queries would look exactly the same no matter which store you access. To achieve this, Oracle Big Data SQL requires that Oracle NoSQL Database support the ability to access its data via Hive. Thus, plugging Oracle NoSQL Database to Oracle Database is as simple as performing 1 (Creating a NoSQL Store and a Table), 2  (Configuring Hive to access NoSQL Table) & 3 (Configuring Oracle Database to talk to HCatalog) steps. Please note that Big Data SQL provides much better query performance over running a query directly on a generic external tables because with former approach, computation (for selecting, & formatting data) is moved where the data is as apposed to moving all the data to the Oracle Database node first and then applying the predicates. The same proprietary cell technology (developed for Exadata) is used for applying the query predicates in parallel to all the distributed NoSQL partitions, making the overall query to run faster. In rest of the article I will give you the technical details on how to perform first two steps so you can get the hands-on experience of this feature. Figure 1: Flow diagram showing steps to access data in Oracle NoSQL Database using hive queries. Setup Step 1: Create a TABLE in Oracle NoSQL Database and load some data that we would like to access from HIVE later on. For this demonstration we will be using KVLITE but you can very well deploy a real cluster by following steps covered in this webinar. Please remember to set KVHOME as the environment variable that will point to the directory where you unzipped the latest Oracle NoSQL Database binaries (download here) and KVROOT (eg. /u01/kvroot) where you would like metadata files to be stored. $java -jar $KVHOME/lib/kvstore.jar kvlite -root $KVROOT -host localhost & Once you have Oracle NoSQL Database cluster deployed (in our case it is single instance), connect to administration shell using runadmin command and then from the KV prompt connect to the store (kvstore in my case). $java -jar $KVHOME/lib/kvstore.jar runadmin -port 5000 -host localhostkv-> CONNECT STORE -name kvstore Create MOVIE table with few columns (to capture movie specific information) kv->EXEC "CREATE TABLE MOVIE( \ id INTEGER, \ original_title STRING,\ overview STRING,\ poster_path STRING,\ release_date STRING,\ vote_count INTEGER,\ runtime INTEGER,\ popularity DOUBLE,\ genres STRING,\ PRIMARY KEY (id) )" SHOW table to confirm it is created successfully & DESCRIBE to make sure all the columns types are appropriate as well. kv-> EXEC 'SHOW TABLES' kv-> EXEC 'DESC AS JSON TABLE MOVIE' Next we will load some MOVIE data into the table. You can download a movie.dat file with details of about 10 movies represented as JSON string. We will directly load these strings into MOVIE table. kv-> PUT TABLE -name MOVIE -file /tmp/movie.datInserted 10 rows in movie table You can now COUNT the rows and GET the rows to make sure data is indeed there. kv-> AGGREGATE TABLE -name MOVIE -countRow count: 10KV-> GET TABLE -name MOVIE Step 2: Create Hive External Table using Storage Handler for Oracle NoSQL Database (that got shipped in release 3.2.5) . Before we begin doing anything with Hive we first need to install Hadoop on our system. I have already installed Apache hadoop-2.2.0 and there is a good article that you can follow to do it yourself. After Hadoop being installed and tested, download the Hive archive from here. I am using apache-hive-0.14.0 at the time of this article but you can download any latest version. Unzip the archive and create environment variable HIVE_HOME to point to this directory. Here are some of the environment variables that are set in my ~/.bashrc file: export HADOOP_HOME=/u01/home/hadoop/hadoop-2.2.0export HADOOP_MAPRED_HOME=$HADOOP_HOMEexport HADOOP_COMMON_HOME=$HADOOP_HOMEexport HADOOP_HDFS_HOME=$HADOOP_HOMEexport YARN_HOME=$HADOOP_HOMEexport HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"export KVHOME=/u01/nosql/kvhomeexport HIVE_HOME=/u01/home/hadoop/apache-hive-0.14.0export HIVE_AUX_JARS_PATH=$KVHOME/lib/kvclient.jarexport PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:\ $HADOOP_HOME/bin:$HADOOP_HOME  NOTICE that we have HIVE_AUX_JARS_PATH set to kvclient.jar. We need to set this variable to make sure Hive can find Storage Handler for Oracle NoSQL Database from the client driver itself. Now the only thing that is left is creating an external table to mimic the table exist in Oracle NoSQL Database. Let's connect to hive first and then create the table using this DDL: $hivehive-> CREATE EXTERNAL TABLE IF NOT EXISTS MOVIE( id INT, original_title STRING, overview STRING, poster_path STRING, release_date STRING, vote_count INT, runtime INT, popularity DOUBLE, genres STRING) STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' TBLPROPERTIES ( "oracle.kv.kvstore"="kvstore", "oracle.kv.hosts"="localhost:5000", "oracle.kv.hadoop.hosts"="localhost", "oracle.kv.tableName"="MOVIE");Time taken: 0.174 seconds           Note: STORE BY field is where we define the StorageHandler class for Oracle NoSQL Database and TBLPROPERTIES is where we define name of the store, host name and port where storage node agent is listening at (in our case we have only node), and the name of the table in NoSQL DB. We have MOVIE as the table there so that is what we use as oracle.kv.tableName  field. That is all one need to do to configure Hive to work with Oracle NoSQL Database. Pretty simple, right ?  Step 2b: Now that we have NoSQL DB & Hive setup done, we just need to test running hive queries. Here are some of the simple queries to fetch MOVIE data using Hive Query Language (HQL): hive> SELECT id, original_title, release_date, genres FROM MOVIE;hive> SELECT id, original_title, release_date, genres FROM MOVIE WHERE release_date >1990;hive> SELECT id, original_title, release_date, genres FROM MOVIE WHERE genres LIKE '%Comedy%'; Some other queries that would require M/R jobs to be spun off to get the results: hive> SELECT count(*) FROM MOVIE WHERE genres LIKE '%Drama%'; hive> SELECT id, original_title, release_date, genres FROM MOVIE WHERE genres LIKE '%Drama%' ORDER BY release_date;hive> SELECT MAX(release_date) FROM MOVIE; Summary Using the Hive StorageHandler from latest release, one can access NoSQL Data as hive external tables. The metadata of these external tables are stored in the HCatalog, which is later referenced by Oracle Big Data SQL to decide where to dispatch the queries in parallel. The process of configuring Oracle NoSQL Database has become very simple (using the out of the box hive StorageHandler) that it takes only couple of steps to complete. In the part 2 of the blog I will talk about how hive metastore can be configured with Oracle Database to finally run SQL queries on NoSQL Data.

Introduction We recently announcedOracle Big Data SQL support for Oracle NoSQL Database. This feature will allow Oracle Database users to connect to external data repositories like Oracle NoSQL...

Using Nosql Tables with Spark

This post goal is to explain how to use Nosql tables and how to put their content into a file on hdfs using the java API for Spark. In hdfs, the table content will be presented in a comma separated style (CSV). Oracle (latest) Big Data Appliance "X4-2", offers Cloudera Enterprise Technology software including Cloudera CDH, and Oracle NoSql database including tables. The Cloudera part offers several ways of integration with Spark (see Using Nosql and Spark) : Standalone or via Yarn (see Running Spark Applications) The Nosql part allows the use of tables. Tables can be defined within the Nosql console by issuing the following command: java -Xmx256m -Xms256m -jar $KVHOME/lib/kvstore.jar runadmin -host <host> -port <store port> -store <store name> There are two parts for defining and creating a table. Define which includes table name, table fields, primary key and shared-key which is a "prefix" of the primary key, ends with the keyword "exit" table create -name flightTestExtract add-field -name param -type STRING add-field -name flight -type STRING add-field -name timeref -type LONG add-field -name value -type INTEGER primary-key -field timeref -field param -field flight  shard-key -field timeref exit Plan which allows table creation and index definition and creation: plan add-table -wait -name flightTestExtract plan add-index -wait -name flightIndex -table  flightTestExtract -field flight -field param -field timeref plan add-index -wait -name paramIndex -table  flightTestExtract -field param -field flight -field timeref Inserting into the table can be done by the put command as: put table -name flightTestExtract -json "{\"param\":\"11\",\"flight\":\"8\",\"timeref\":61000000000002,\"value\":1764248535}" put table -name flightTestExtract -json "{\"param\":\"12\",\"flight\":\"8\",\"timeref\":61000000000002,\"value\":-1936513330}" put table -name flightTestExtract -json "{\"param\":\"11\",\"flight\":\"6\",\"timeref\":61000000000013,\"value\":1600130521}" put table -name flightTestExtract -json "{\"param\":\"11\",\"flight\":\"8\",\"timeref\":61000000000013,\"value\":478674806}" The last patch of Nosql, 3.1.7, has some new java classes that could be used to get table data into hadoop. The class oracle.kv.hadoop.table.TableInputFormat can be used as a Spark JavaRDD: JavaPairRDD<PrimaryKey, Row> jrdd = sc.newAPIHadoopRDD(hconf, TableInputFormat.class, PrimaryKey.class, Row.class); The oracle.kv.table.PrimaryKey.class correspond to the fields of the primary key of the table, for example in json style: {"timeref":61000000000013, "param":"11","flight":"8"} The oracle.kv.table.Row.class correspond to the fields of table row, for example in json style: {"param":"11","flight":"8","timeref":61000000000013,"value":478674806} If we want to save the content of the table on hdfs in a csv style we have to: apply a flatMap on the rows of the RDD  flatMap(func) each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).  save the result on hdfs The following inner class defines the map:      static class FlatMapRow_Str implements FlatMapFunction<Row, String> {         @Override         public Iterable<String> call(Row s) {             List<String> lstr = s.getFields();             String tabedValues = "";             for (String field : lstr)                 tabedValues += s.get(field) + ",";             return Arrays.asList(tabedValues);         }     } The code to do the job is:  //Obtain the Row RDD        JavaRDD<Row> rddvalues = jrdd.values(); //Obtain the csv style form of the RDD  JavaRDD<String> csvStr = rddvalues .flatMap(new FlatMapRow_Str()); //Save the results on hdfs  csvStr.saveAsTextFile(pathPrefix + "/" + tableName + "csvStr"); The last step is to test using Yarn: spark-submit --master yarn --jars /u01/nosql/kv-ee/lib/kvclient.jar --class table.SparkNosqlTable2HadoopBlog /u01/nosql/kv-ee/examples/table/deploy/sparktables.jar <nosql store name> <nosql store url> <table name> <path prefix> <nosql store url> is <store host>:<store port>  You can get the java source code here. 

This post goal is to explain how to use Nosql tables and how to put their content into a file on hdfs using the java API for Spark. In hdfs, the table content will be presented in a comma...

Is your NoSQL ready for SSD - benchmark results from SanDisk

It’s well known that the killer NoSQL use case isscalability for large data sets under high concurrency.   However, what is less talked about is thefact that performance is often the key driver. Given a high performanceprofile, NoSQL then provides the ability to scale that performance out linearlyby adding more servers. If not for the high performance profile tobegin with, often times an alternative technology like a distributed cachewould be used instead, fronting a more traditional database technology. Given that performance is often a key driver, it’s nosurprise to find that flash based storage devices are increasingly common forNoSQL deployments. In fact, popularcloud providers of NoSQL services (Amazonand Google) provide thoseservices exclusively on SSD backed infrastructure. The SSD form factor gives not only the bestperformance, but also more manageable service level agreements, by taking diskI/O bottlenecks out of the equation. AtOracle, we are also seeing customers deploy the OracleNoSQL Database on servers backed by SSD storage. Unlike Oracle NoSQL Database, not all NoSQL Databases arecapable of taking advantage of SSD’s. Probably the biggest offenders are thosealigned with the Hadoop stack, based on HDFS which is using a large block basedscanning technique that is ill suited to SSD optimization. At Oracle, we’ve seen customer’s benchmarkagainst technologies like Hbase only to find that with large clusters, the costof deploying on non-SSD backed storage is quite simply cost prohibitive. In one customer benchmark for a mobile profilingapplication, we saw a customer predicting the need for dozens of Hbase serversto accommodate the same workloads that could be achieved on a standard high availabilityconfiguration (3 servers) for Oracle NoSQL when backed by SSD. Forget about the cost of software folks(Oracle NoSQL is practically free, providing annual subscription based support),imagine having to buy 50 servers to stand up your NoSQL cluster just to runwhat many see as “the Hadoop” software eco system. Seems to me people ought to be seriously re-thinkingtheir Big Data software choice when it comes to Big Data solutions that requireboth low latency NoSQL and batch based Hadoop MapReduce. SanDisk has recently benchmarked Oracle NoSQL Database usingthe standard YahooServer Scalability Benchmark on its latest generation SanDisk CloudSpeedSATA SSD’s. This makes for aninteresting read and highlights the advantages of using a NoSQL Database thatis designed to work with this latest generation of storage technology. http://itblog.sandisk.com/hyperscale-databases-testing-oracle-nosql-using-sandisk-cloudspeed-ssds-vs-hdds/

It’s well known that the killer NoSQL use case is scalability for large data sets under high concurrency.   However, what is less talked about is thefact that performance is often the key driver....

Using Nosql and Spark (How to Start)

Spark is an open-source data analytics cluster computing framework,  built outside of Hadoop’s two-stage MapReduce paradigm but runs on top of HDFS. Because of its successful approach, Spark has quickly been adopted and is as an attractive choice for the future of data processing in Hadoop. The question about how to link Nosql and Spark often concerns Big Data architects and developers. Let's take a quit look to this question. Spark revolves around the concept of a resilient distributed dataset (RDD), which is a fault-tolerant collection of elements that can be operated on in parallel. RDDs can be created by referencing a dataset in an external storage system, such as ashared filesystem, HDFS, HBase, or any data source offering a Hadoop InputFormat. Good news,  Nosql could be integrated with Spark-Java using KVInputFormat, KVAvroInputFormat, which are two Nosql java classes which extend the Hadoop abstract class InputFormat. How to proceed Get Spark resources Define the configuration parameters for Nosql (connection, store name,  key prefix) Define the Spark resilient distributed dataset to get data from Nosql Dispatch Nosql key-value subset into a dataset for key and a dataset for values Do some computations Release Spark resources Get Spark resources SparkConf sparkConf = new SparkConf().setAppName("SparkKVInput"); //AppName allows                                                                       //the tracking of job JavaSparkContext sc = new JavaSparkContext(sparkConf); Define the configuration parameters hconf.set("oracle.kv.kvstore", "kvstore"); hconf.set("oracle.kv.parentKey", "/H/10"); // just a major key prefix hconf.set("oracle.kv.hosts", String[]{"bigdatalite.localdomain:5000"});  Define the Spark resilient distributed dataset to get data from Nosql JavaPairRDD<Text,Text> jrdd = sc.newAPIHadoopRDD(hconf,KVInputFormat.class, Text.class, Text.class); The dataset parameters are the configuration, the InputFormat extension, the java class for keys and the java class for values. The class of keys and values is stated on the javadoc of KVInputFormat public class KVInputFormat  extends KVInputFormatBase<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>Dispatch Nosql key-value subset into a dataset for key and a dataset for values Setting the datasets for keys and values is easy:  JavaRDD<Text> rddkeys = jrdd.keys(); JavaRDD<Text> rddvalues = jrdd.values(); Their manipulation is not possible, Spark does not know how to serialize Text. A mapping is needed to transform the Text dataset into a String dataset. The following code does the trick:  The code for the key values is very similar Do some computations Print key and values: List <String> keys = strkeys.collect(); List <String> values = strvalues.collect(); for(int idx=0;idx<values.size();idx++){        String val = values.get(idx);        String key = keys.get(idx);        System.out.println(key.toString()+":"+val.toString()); } Release Spark resources sc.stop(); //release JavaSparkContext  How to test-it Put some data on the kvstore put kv -key /T/1/1/-/1 -value V11_1 put kv -key /T/1/1/-/2 -value V11_2 put kv -key /T/1/1/-/3 -value V11_3 put kv -key /T/1/2/-/3 -value V12_3 put kv -key /T/2/2/-/1 -value V22_1 put kv -key /T/2/2/-/2 -value V22_2 Generate a jar containing kvclient.jar, avro.jar and the class with the spark code SparkKVInput. This class has three parameters which are the configuration file parameters: oracle.kv.store, oracle.kv.host  and oracle.kv.parentkey. Generate a jar (spark.jar) containing the nosql kvclient.jar and kvavro.jar An example of calling command on a bigdatalite 4.1 is:spark-submit --class nosql.examples.oracle.com.SparkKVInput <path  to spark.jar location>/spark.jar kvstore bigdatalite.localdomain:5000 /T/1results are: 14/10/21 11:26:54 INFO SparkContext: Job finished: collect at SparkKVInput.java:62, took 0.341590587 s /T/1/1/-/1:V11_1 /T/1/1/-/2:V11_2 /T/1/1/-/3:V11_3 /T/1/2/-/3:V12_3 14/10/21 11:26:54 INFO SparkUI: Stopped Spark web UI at http://bigdatalite.localdomain:4040 Hope this first introduction to the use of Nosql key/values on Spark will help you to go deeper with Spark as an engine to manipulate Nosql Data. 

Spark is an open-source data analytics cluster computing framework,  built outside of Hadoop’s two-stage MapReduce paradigm but runs on topof HDFS. Because of its successful approach, Spark has...

Loading into Nosql using Hive

The main purpose of this post is to  show how strongly we can tied NoSql and Hive, the focus will be the upload of data into Nosql from Hive. The post  (here) discussed about the use of Hive external tables to select data from Oracle Nosql. We used a HiveStorageHandle implementation. We have reworked on this implementation to load data from hdfs or a local system via Hive into Nosql. Only uploading of text data is currently supported. Two kinds of data files can be uploaded: Case 1: Files containing plain text data like the following comma separated lines: 10,5,001,545973390 10,5,010,1424802007 10,5,011,164988888  Case 2: Files containing a JSON field corresponding to a given AVRO schema like the following tab separated lines:  10 5 173 {"samples": [{"delay": 0, "value": -914351026}, {"delay": 1, "value": 1842307749}, {"delay": 2, "value": -723989379}, {"delay": 3, "value": -1665788954}, {"delay": 4, "value": 91277214}, {"delay": 5, "value": 1569414562}, {"delay": 6, "value": -877947100}, {"delay": 7, "value": 498879656}, {"delay": 8, "value": -1245756571}, {"delay": 9, "value": 812356097}]}  10 5 174 {"samples": [{"delay": 0, "value": -254460852}, {"delay": 1, "value": -478216539}, {"delay": 2, "value": -1735664690}, {"delay": 3, "value": -1997506933}, {"delay": 4, "value": -1062624313}]} How to do it ? 1. Define the external table 2. Create and load a native Hive table 3. Insert into the external table a selection from the native Hive table Case 1: 1.Define the external table CREATE EXTERNAL TABLE MY_KV_PI_10_5_TABLE (flight string, sensor string, timeref string, stuff string)       STORED BY 'nosql.example.oracle.com.NosqlStorageHandler'       WITH SERDEPROPERTIES ("kv.major.keys.mapping" = "flight,sensor", "kv.minor.metadata" = "false", "kv.minor.keys.mapping" = "timeref", "kv.key.prefix" = "PI/10/5", "kv.value.type" = "string", "kv.key.range" = "", "kv.host.port" = "bigdatalite:5000", "kv.name" = "kvstore","kv.key.ismajor" = "true"); 2. Create and load a native Hive table CREATE TABLE kv_pi_10_5_load (flight string, sensor string, timeref string, stuff string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '/home/oracle/hivepath/pi_10_5' OVERWRITE INTO TABLE kv_pi_10_5_load; 3. Insert into the external table a selection from the native Hive table INSERT INTO TABLE my_kv_pi_10_5_table SELECT * from kv_pi_10_5_load; The external table generation defines a major key and its complete key components, this definition is used when inserting, the flight, and sensor values of the data are ignored, timeref elements are loaded the Nosql operation API for batching the insertions. Case 2: 1.Define the external table CREATE EXTERNAL TABLE MY_KV_RI_10_5_TABLE (flight string, sensor string, timeref string, stuff string)       STORED BY 'nosql.example.oracle.com.NosqlStorageHandler'       WITH SERDEPROPERTIES ("kv.major.keys.mapping" = "flight,sensor", "kv.minor.metadata" = "false", "kv.minor.keys.mapping" = "timeref", "kv.key.prefix" = "RI/10/5", "kv.value.type" = "avro", "kv.key.range" = "","kv.key.ismajor" = "true", "kv.avro.schema" = "com.airbus.zihb.avro.SampleIntSet","kv.host.port" = "bigdatalite:5000", "kv.name" = "kvstore");  When creating the external table used for upload into Nosql a new parameter is used "kv.avro.schema" = "com.airbus.zihb.avro.SampleIntSet" It is the Nosql name for an avro schema. Talking about avro schema definition, its the schema namespace "." schema name.   2. Create and load a native Hive table  CREATE TABLE kv_ri_10_5_load (flight string, sensor string, timeref string, stuff string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '/home/oracle/hivepath/ri_10_5' OVERWRITE INTO TABLE kv_ri_10_5_load;  3. Insert into the external table a selection from the native Hive table  LOAD DATA LOCAL INPATH '/home/oracle/hivepath/ri_10_5' INTO TABLE my_kv_ri_10_5_table; How to verify the upload ?  Two possibilities: a select query on Hive a get on the kvstore Let's do it on the Nosql client command line Case 1: Verify a random line existence  kv-> get kv  -key /PI/10/5/-/010 -all /PI/10/5/-/010 1424802007 1 Record returned Case 2: Verify a random line existence kv-> get kv  -key /RI/10/5/-/173 -all /RI/10/5/-/173 {   "samples" : [ {     "delay" : 0,     "value" : -914351026   }, {     "delay" : 1,     "value" : 1842307749   }, {     "delay" : 2,     "value" : -723989379   }, {     "delay" : 3,     "value" : -1665788954   }, {     "delay" : 4,     "value" : 91277214   }, {     "delay" : 5,     "value" : 1569414562   }, {     "delay" : 6,     "value" : -877947100   }, {     "delay" : 7,     "value" : 498879656   }, {     "delay" : 8,     "value" : -1245756571   }, {     "delay" : 9,     "value" : 812356097   }  ] } 1 Record returned Let's do it on the hive command line Case 1: Verify a random line existence select *  from MY_KV_PI_10_5_TABLE where timeref = "010"; OK 10 5 010 1424802007 Case 2: Verify a random line existence hive> select *  from MY_KV_RI_10_5_TABLE where timeref = "173"; ...  OK 10 5 173 {"samples": [{"delay": 0, "value": -914351026}, {"delay": 1, "value": 1842307749}, {"delay": 2, "value": -723989379}, {"delay": 3, "value": -1665788954}, {"delay": 4, "value": 91277214}, {"delay": 5, "value": 1569414562}, {"delay": 6, "value": -877947100}, {"delay": 7, "value": 498879656}, {"delay": 8, "value": -1245756571}, {"delay": 9, "value": 812356097}]} You can get a Jdeveloper 12c project here.  We have done, a return trip between Nosql and Hive: Key value subsets of a Nosql database, can be viewed using the select query language of Hive  Data from Hive tables can be uploaded into Nosql key-value pairs

The main purpose of this post is to  show how strongly we can tied NoSql and Hive, the focus will be the upload of data into Nosql from Hive. The post  (here) discussed about the use of Hive external...

Revisiting Nosql into Hive External Tables

About two years ago, in the Data Integration blog, David Allan wrote about the integration between Oracle NoSql and Hive, in his post David mentioned the work of Alexander Vilcek to implement a HiveStorageHandler to have access to NoSql data via Hive Sql select commands. Unfortunately the Java code does not compile anymore with the NoSql Java Api (3.0+).I've modified the java source code to comply with the 3.0 java KV API. The new Hive storage handler has new definition possibilities. The current  limitations are the same as the ones stated by the original Vilcek project code: Supports only external non-native Hive tables. Writing data to Oracle NoSQLDB is not supported . Parsing of Hive SerDe properties is very rudimentary yet and spaces between NoSQL DB keys definitions in the key mappings properties in the Hive table create statement will cause key names to be misinterpreted. Columns names and types specified in the Hive table definition are ignored; only NoSQL DB Major and Minor Keys mappings in the Hive table create statement define the column names. A NoSQL DB Value for a given key is always interpred as string in the Hive table. Hive CREATE TABLE Syntax: CREATE EXTERNAL TABLE <hive_table_name> (column_name column_type,column_name column type, ...) STORED BY 'nosql.example.oracle.com.NosqlStorageHandler' WITH SERDEPROPERTIES ("kv.major.keys.mapping" = "<majorKey1,majorKey2,...>", "kv.minor.keys.mapping" = "<minorKey1,minorKey2,...>" "kv.minor.metadata" = "true|false", "kv.key.prefix" = "H", "kv.value.type" = "<avro|other>", "kv.key.range" = "<key restriction range>", "kv.key.ismajor" = "true|false","kv.host.port" = "<kvstore hostname>:<kvstore port number>", "kv.name" = "<kvstore name>");  Example 1: Data stored in Oracle NoSQL Database: /Smith/Bob/-/birthdate: 05/02/1975 /Smith/Bob/-/phonenumber: 1111-1111  For this kind of data the minor key birthdate or phonenumber are metadata (field names) and the value of the key value pair is the real value for the metadata Example 2: /USAir1/gauge4/-/10-11-2012:12:12:56:77789:45678  /USAir1/gauge4/-/10-11-2012:12:12:56:77790:45640 For this kind of data the minor key is a timestamp for a measure (45678, 45640) of a gauge of a plane. There in no metadata on the minor key. The new table definitions  have the following parameters: "kv.value.type" when the value type is avro the value is transformed into string by using the json transformation of the avro value "kv.minor.metadata"  it's true or false value allows to make a difference between the two uses of key value pairs, true for Example 1, false for Example 2 "kv.key.prefix"  allows to define an starting path for a major key "kv.key.ismajor" it's value is true if the key prefix is a complete major key path or false otherwise   "kv.key.range"  is used in cases  where the key prefix does not correspond to a major key or when there is no key prefix In real bigdata store you should not use null values for both "kv.key.prefix"   and "kv.key.range" . Detail of metadata Example 1: The following put operations have been done on our store using the kvstore client commands:  put kv -key /H/Smith/Bob/-/birthdate -value 05/02/1975 put kv -key /H/Smith/Bob/-/phonenumber -value 1111-1111 put kv -key /H/Smith/Bob/-/userid -value 1 put kv -key /H/Smith/Patricia/-/birthdate -value 10/25/1967 put kv -key /H/Smith/Patricia/-/phonenumber -value 2222-2222 put kv -key /H/Smith/Patricia/-/userid -value 2 put kv -key /H/Wong/Bill/-/birthdate -value 03/10/1982 put kv -key /H/Wong/Bill/-/phonenumber -value 3333-3333 put kv -key /H/Wong/Bill/-/userid -value 3 A table creation to include all the above insertions is: CREATE EXTERNAL TABLE MY_KV_TABLE (lastname string, firstname string, birthdate string, phonenumber string, userid string)       STORED BY 'nosql.example.oracle.com.NosqlStorageHandler'       WITH SERDEPROPERTIES ("kv.major.keys.mapping" = "lastname,firstname", "kv.minor.keys.mapping" = "birthdate,phonenumber,userid",  "kv.minor.metadata" = "true", "k.key.pvrefix" = "H", "kv.value.type" = "String", "kv.key.range" = "", "kv.key.ismajor" = "false", "kv.host.port" = "bigdatalite:5000", "kv.name" = "kvstore");  a select * from my_kv_table returns 9 values. The creation and the select commands needs that the jar containing  nosql.example.oracle.com.NosqlStorageHandler be a part of the hive lib jars, lets say this jar is kvhive.jar. any select applied to the table (different to the select *) invokes map/reduce procedures and needs hive add jar commands to target kvclient.jar and  kvhive.jar.  For example:  hive> SELECT lastname, firstname, collect_set(birthdate)[0], collect_set(phonenumber)[0], collect_set(userid)[0] FROM my_kv_table GROUP BY lastname, firstname;  produces: OK Smith      Bob      05/02/1975      1111-1111      1 Smith Patricia      10/25/1967 2222-2222 2 Wong Bill 03/10/1982 3333-3333 3 Detail of a non-metadata example 2: We have defined an Avro schema to record the list of min and max values of a given measure for a range of time stamps.  The Hive table declaration in this case is: CREATE EXTERNAL TABLE MY_KV_A_TABLE (flight string, sensor string, timeref string, stuff string)       STORED BY 'nosql.example.oracle.com.NosqlStorageHandler'       WITH SERDEPROPERTIES ("kv.major.keys.mapping" = "flight,sensor", "kv.minor.metadata" = "false", "kv.minor.keys.mapping" = "timeref", "kv.key.prefix" = "A/8/11", "kv.value.type" = "avro", "kv.key.range" = "", "kv.host.port" = "bigdatalite:5000", "kv.name" = "kvstore","kv.key.ismajor" = "true");  In this case the value type specified is avro, the key prefix A/8/11 is A for analytic data prefix 8 for plane number 8 and 11 for gauge measure number 8. A select * from my_kv_a_table returns records of the type:  8 11 W2412 {"values": [{"min": -2.141749606E9, "max": 2.14331348E9}, {"min": -2.118390555E9, "max": 2.090954548E9}, {"min": -2.130274733E9, "max": 2.098153258E9}, {"min": -2.082685648E9, "max": 2.023497823E9}]}  {"values": [{"min": -2.141749606E9, "max": 2.14331348E9}, {"min": -2.118390555E9, "max": 2.090954548E9}, {"min": -2.130274733E9, "max": 2.098153258E9}, {"min": -2.082685648E9, "max": 2.023497823E9}]} is the json format of the avro values stored in the nosql database. The HiveSorageHandle Jdeveloper code can be get here.

About two years ago, in the Data Integration blog, David Allan wrote about the integration between Oracle NoSql and Hive, in his post David mentioned the work of Alexander Vilcek to implement...

Using multiDelete for efficient cleanup of old data

In a recent project one of our field engineers ( Gustavo Arango ) was confronted by a problem, he needed to efficiently delete millions of keys beneath a key space where he did not know the complete major key path, which ended in a time stamp.   He quickly discovered a way to efficiently find these major key paths and then use them to perform high speed multi-value deletions without causing unusual chattiness ( a call for each key-value pair ) on the network.   I thought it would be useful to review this solution here and give you a link to a Github example of working code to play with and understand the behavior.  This is possible by using Oracle NoSQL's method storeKeysIterator( ) on your connection to the cluster.    This iterator can be used to obtain a list of store keys in a specific range, in a single call and without loading all of the value data beneath the keys: First, you need a partial key path: Key key = Key.createKey( "/root/pieceOfKey" ); Next you need a range:  KeyRange kr = new KeyRange( "aastart" ,true, "zzfinish", true); Now you can get your Key iterator  ( getKVStore( ) returns a connection handle to cluster, an instance of KVStore ): Iterator<Key> iter = getKVStore().storeKeysIterator(Direction.UNORDERED, 0, key, kr, Depth.DESCENDANTS_ONLY); So, this is nice as the storeKeysIterator will do a query to the cluster and return ONLY the keys that start with that partial path and optionally in the range specifier, no need to give a range if you want all the keys.     Now, to efficiently delete many keys with a single call to the cluster, you need to have a full major key path.  So, now that you have the whole set of keys, you can ask them what is their full major key path and then use that information to do the efficient multiDelete operation, which in Gustavo's case of storing and managing dense time series data, meant millions of keys/values being deleted with only a small number of actual network calls and very little data being transferred across the network.  boolean enditerate = false;             while(! enditerate){               if(iter.hasNext()){                 Key iterkey = iter.next();                 iterkey = Key.createKey(iterkey.getMajorPath());                 int delNb = getKVStore(). multiDelete(iterkey, kr, Depth.PARENT_AND_DESCENDANTS, durability, 0, null) ;                    res += delNb;               }else{  enditerate = true; } If you want to get tricky and do this in parallel, you can wrap this in a worker thread and place the initialization of the iterator inside the while loop, effectively treating the iterator like a queue.  It would cause a new query to happen every time the loop iterates, but in the mean time, a lot of other threads may have deleted half of that queue. In fact, there are also parallel versions of the store iterator that will query the entire cluster in parallel using the number of threads that works best given your hardware configuration and the number of shards in your cluster.  You can learn more about in the online documentation. If you are interested in playing with the ideas in this blog, there is a Github example project in Eclipse that will get you started. 

In a recent project one of our field engineers ( Gustavo Arango ) was confronted by a problem, he needed to efficiently delete millions of keys beneath a key space where he did not know the complete...

Master-slave vs. peer-to-peer archictecture: benefits and problems

Almost two decades ago, I was a member of a databasedevelopment team that introduced adaptive locking. Locking, the most popular concurrency controltechnique in database systems, is pessimistic. Locking ensures that two or more conflicting operations on the same dataitem don’t “trample” on each other’s toes, resulting in data corruption. In anutshell, here’s the issue we were trying to address. In everyday life, traffic lights serve thesame purpose. They ensure that trafficflows smoothly and when everyone follows the rules, there are no accidents atintersections. As I mentioned earlier, the problem with typical lockingprotocols is that they are pessimistic. Regardless of whether there is another conflicting operation in thesystem or not, you have to hold a lock! Acquiring and releasing locks can be quite expensive, depending on howmany objects the transaction touches. Every transaction has to pay this penalty. To use the earlier traffic light analogy, ifyou have ever waited at a red light in the middle of nowhere with no one on theroad, wondering why you need to wait when there’s clearly no danger of acollision, you know what I mean. The adaptive locking scheme that we invented was able tominimize the number of locks that a transaction held, by detecting whether there were one or more transactionsthat needed conflicting eyou could get by without holding any lock at all. In many “well-behaved” workloads, there are fewconflicts, so this optimization is a huge win. If, on the other hand, there are many concurrent, conflicting requests,the algorithm gracefully degrades to the “normal” behavior with minimal cost. We were able to reduce the number of lock requests per TPC-Btransaction from 178 requests down to 2! Wow! This is a dramatic improvement in concurrency as well as transactionlatency. The lesson from this exercise was that if you can identifythe common scenario and optimize for that case so that only the uncommonscenarios are more expensive, you can make dramatic improvements in performancewithout sacrificing correctness. So how does this relate to the architecture and design ofsome of the modern NoSQL systems? NoSQLsystems can be broadly classified as master-slave sharded, or peer-to-peersharded systems. NoSQL systems with apeer-to-peer architecture have an interesting way of handling changes. Whenever an item is changed, the client (oran intermediary) propagates the changes synchronously or asynchronously tomultiple copies (for availability) of the data. Since the change can be propagated asynchronously, during some intervalin time, it will be the case that some copies have received the update, andothers haven’t. What happens if someone tries to read the item during thisinterval? The client in a peer-to-peer system will fetch the same item frommultiple copies and compare them to each other. If they’re all the same, then every copy that was queried has the same(and up-to-date) value of the data item, so all’s good. If not, then the system provides a mechanism toreconcile the discrepancy and to update stale copies. So what’s the problem with this? There are two major issues: First, IT’S HORRIBLYPESSIMISTIC because, in the common case, it is unlikely that the same data itemwill be updated and read from different locations at around the same time! For every read operation, you have to readfrom multiple copies. That’s a prettyexpensive, especially if the data are stored in multiple geographicallyseparate locations and network latencies are high. Second, if the copies are not all the same, the applicationhas to reconcile the differences and propagate the correct value to theout-dated copies. This means that the application program has to handlediscrepancies in the different versions of the data item and resolve the issue(which can further add to cost and operation latency). Resolving discrepancies is only one part of the problem.What if the same data item was updated independently on two different nodes(copies)? In that case, due to theasynchronous nature of change propagation, you might land up with differentversions of the data item in different copies. In this case, the application program also has to resolve conflicts andthen propagate the correct value to the copies that are out-dated or haveincorrect versions. This can get reallycomplicated. My hunch is that there are manypeer-to-peer-based applications that don’t handle this correctly, and worse,don’t even know it. Imagine have 100s ofmillions of records in your database – how can you tell whether a particulardata item is incorrect or out of date? And what price are you willing to payfor ensuring that the data can be trusted? Multiple network messages per readrequest? Discrepancy and conflictresolution logic in the application, and potentially, additional messages? All this overhead, when all you were tryingto do was to read a data item. Wouldn’t it be simpler to avoid this problem in the firstplace? Master-slave architectures likethe Oracle NoSQL Database handles this very elegantly. A change to a data item is always sent to themaster copy. Consequently, the mastercopy always has the most current and authoritative version of the dataitem. The master is also responsible forpropagating the change to the other copies (for availability and readscalability). Client drivers are awareof master copies and replicas, and client drivers are also aware of the“currency” of a replica. In otherwords, each NoSQL Database client knowshow stale a replica is. This vastly simplifies the job of the applicationdeveloper. If the application needs themost current version of the data item, the client driver will automaticallyroute the request to the master copy. Ifthe application is willing to tolerate some staleness of data (e.g. a version that is no more than 1 second outof date), the client can easily determine which replica (or set of replicas)can satisfy the request, and route the request to the most efficient copy. This results in a dramatic simplification inapplication logic and also minimizes network requests (the driver will onlysend the request to exactl the right replica, not many). So, back to my original point. A well designed and well architected systemminimizes or eliminates unnecessary overhead and avoids pessimistic algorithmswherever possible in order to deliver a highly efficient and high performancesystem. If you’ve every programmed an OracleNoSQL Database application, you’ll know the difference!

Almost two decades ago, I was a member of a database development team that introduced adaptive locking. Locking, the most popular concurrency controltechnique in database systems, is pessimistic....

Look I shrunk the keys

Whetherwe use relational or non-relational database to durably persist our data on thedisk, we all are aware of the fact that how indexes plays a major role in accessingthe data in real time. There is one aspect most of us tend to overlook whiledesigning the index-key i.e. how to efficiently size them. Not thatI want to discount the fact but in traditional databases where we used to storefrom hundreds of thousands to few million of records in a database, sizing the index-keydidn’t come (that often) as a very high priority but in NoSQL database whereyou are going to persist few billion to trillions of records every byte savedin the key goes a long mile. That isexactly what came up this week while working on one of the POC and I thought Ishould share with you the best practices and tricks of the trade that you canalso use in developing your application. So here is my numero unorecommendation for designing the index Keys: Keep it Small. Nowthere is nothing new there that you didn't know already, right?Right but I just wanted to highlight it, so if there is anything you rememberfrom this post then it is this one bullet item. Allright, here is what I was dealing with this week: couple billion records of telematics/spacialdata that we needed to capture and query based on the timestamp (of the feed that was received) and x and y co-ordinates of the system. To run the kind of queries wewanted to run (on spacial data), we came up with this as an index-key: /S/{timestamp}{x-coordinate}{y-coordinate} How weused above key structure to run spacial queries is another blog post but forthis one I would just say that when we plugged in the values to the variables ourkey became 24 bytes (1+13+5+5) long. Here’s how: Table Prefix => type char = 1byte (eg. S) Timestamp => type long = 13bytes (eg.1386286913165) X co-ordinate => type string =5 bytes (eg. 120.78 degree, 31.87 degree etc) Y co-ordinate => type string =5 bytes (eg. 132.78 degree, 33.75 degree etc) Withamount of hardware resource we had available (for POC) we could create 4 shardscluster only. So to store two billion records we needed to store (2B records/4shards) 500 million records on each of the four shards. Using DBCacheSizeutility, we calculated we would need about 32 GB of JE cache on each of theReplication Node (RN). $java -d64 -XX:+UseCompressedOops -jar $KVHOME/lib/je.jar DbCacheSize -records 500000000 -key 24 === Database Cache Size === Minimum Bytes    Maximum Bytes          Description---------------       ---------------        ----------- 29,110,826,240   32,019,917,056         Internal nodes only But weknew that if we can shrink the key size (without losing the information) we cansave lot of memory and can improve the query time (as search is a function of #of records and size of each record) as well. So we built a simple encoding programthat uses range of 62 ASCII characters (0-1, a-z, A-Z) to encode any numericdigit. You can find the program from hereor build your own but what is important to note here is that we were able torepresent same information with less number of bytes: 13 Byte Timestamp (e.g. 1386286913165)became 7 byte (e.g. opc2BTn) 5 byte X/Y co-ordinate (e.g.132.78) became 3 byte each (e.g. a9X/6dF) i.e. 14byte encoded key (1 + 7 byte + 3 byte + 3 byte). So what’s the fuss that we shrunk our keys (it’sjust 10 bytes saving), you would ask? Well, we plugged in the numbers again toDBCacheSize utility and this time theverdict was that we needed only 20GB ofJE cache to store same half a billion records on each RN. That’s 40%improvement (12GB of saving/Replication Node) and is definitely an impressivestart. $java -d64 -XX:+UseCompressedOops -jar $KVHOME/lib/je.jar DbCacheSize -records 500000000 -key 14 === Database Cache Size === Minimum Bytes    Maximum Bytes          Description---------------       ---------------        ----------- 16,929,008,448   19,838,099,264         Internal nodes only To conclude: you just seen how simple encoding technique can save you big time when you are dealing with billions of records. Next timewhen you design an index-key just think little harder on how you can shrink itdown!

Whether we use relational or non-relational database to durably persist our data on the disk, we all are aware of the fact that how indexes plays a major role in accessingthe data in real time. There...

Fast Data - Big Data's achilles heel

At OOW 2013 in Mark Hurd and Thomas Kurian's keynote, they discussed Oracle's Fast Data software solution stack and discussed a number of customers deploying Oracle's Big Data / Fast Data solutions and in particular Oracle's NoSQL Database.  Since that time, there have been a large number of request seeking clarification on how the Fast Data software stack works together to deliver on the promise of real-time Big Data solutions.   Fast Data is a software solution stack that deals with one aspect of Big Data, high velocity.   The software in the Fast Data solution stack involves 3 key pieces and their integration:  Oracle Event Processing, Oracle Coherence, Oracle NoSQL Database.   All three of these technologies address a high throughput, low latency data management requirement.   Oracle Event Processing enables continuous query to filter the Big Data fire hose, enable intelligent chained events to real-time service invocation and augments the data stream to provide Big Data enrichment. Extended SQL syntax allows the definition of sliding windows of time to allow SQL statements to look for triggers on events like breach of weighted moving average on a real-time data stream.  Oracle Coherence is a distributed, grid caching solution which is used to provide very low latency access to cached data when the data is too big to fit into a single process, so it is spread around in a grid architecture to provide memory latency speed access.  It also has some special capabilities to deploy remote behavioral execution for "near data" processing.   The Oracle NoSQL Database is designed to ingest simple key-value data at a controlled throughput rate while providing data redundancy in a cluster to facilitate highly concurrent low latency reads.  For example, when large sensor networks are generating data that need to be captured while analysts are simultaneously extracting the data using range based queries for upstream analytics.  Another example might be storing cookies from user web sessions for ultra low latency user profile management, also leveraging that data using holistic MapReduce operations with your Hadoop cluster to do segmented site analysis.  The question then becomes how do these things work together to deliver an end to end Fast Data solution.  The answer is that while different applications will exhibit unique requirements that may drive the need for one or the other of these technologies, often when it comes to Big Data you may need to use them together.   You may have the need for the memory latencies of the Coherence cache, but just have too much data to cache, so you use a combination of Coherence and Oracle NoSQL to handle extreme speed cache overflow and retrieval.   Here is a great reference to how these two technologies are integrated and work together.  Coherence & Oracle NoSQL Database.   On the stream processing side, it is similar as with the Coherence case.  As your sliding windows get larger, holding all the data in the stream can become difficult and out of band data may need to be offloaded into persistent storage.  OEP needs an extreme speed database like Oracle NoSQL Database to help it continue to perform for the real time loop while dealing with persistent spill in the data stream.  Here is a great resource to learn more about how OEP and Oracle NoSQL Database are integrated and work together.  OEP & Oracle NoSQL Database.

At OOW 2013 in Mark Hurd and Thomas Kurian's keynote, they discussed Oracle's Fast Data software solution stack and discussed a number of customers deploying Oracle's Big Data / Fast Data solutions...

Accolades - Oracle NoSQL customers speak out with praise

For all of those participating in the Oracle NoSQL Database community and following the product evolution, there have been a number of changes emerging on Oracle OTN for the NoSQL Database.In particular, on the main page Dave's Segleau's NoSQL Now presentation on Enterprise NoSQL is prominently displayed.  This is a great discussion on the trends involved in NoSQL adoption which highlights the most important aspects of NoSQL technology selection and what Oracle in particular is bringing to the movement.    Many of you know that for Oracle getting companies to speak up publicly on their use of our technology is much harder than it is for pure open source startups.  So, I am particularly pleased with the accolades starting to emerge from the users of Oracle NoSQL.   Plus, there is new content getting published every day to help our growing community to champion NoSQL technology adoption within their teams and organizations.Starting to grow: I've noticed that our Meetup group is also gaining a lot of momentum.  We are now over 400 members strong and growing aggressively.   There is an awesome Meetup coming next week ( Oct 15th at Elance 441 Logue Avenue, Mountain View, CA ) where Mike Olson, co-founder and Chief Strategy Officer of Cloudera will be talking about the virtues of NoSQL key-value stores.  There are already 88 people signed up for this event, so hurry up and join now or you may end up on a wait-list. Spread the word, tell your friends, an Enterprise backed NoSQL is on the move!!

For all of those participating in the Oracle NoSQL Database community and following the product evolution, there have been a number of changes emerging on Oracle OTN for the NoSQL Database. In...

Flexible schema management - NoSQL sweetspot

I attended a few colleague sessions at Oracle Open World focusing on NoSQL Database use cases.   Dave Segleau from the Oracle NoSQL Database team did some work on the challenges associated with Web Scale personalization.   The main point he was emphasizing is that these personalization kind of applications have very simple data lookup semantics, but that the data itself is quite volatile in nature and comes in all shapes and sized making it difficult to store in traditional relational database technology.   The other challenges then follow, which are commonly involved in most NoSQL based applications, dealing with this data of variety at scale and in near real-time. Here are some references to those session which are worth a review: http://www.slideshare.net/slideshow/embed_code/26877748 http://www.slideshare.net/slideshow/embed_code/26876561  Then the other day, I stumbled upon this story about how Airlines are planning to provide a more personalized shopping experience in the travel process.  I could not help be see the parallels between the requirements found in the online shopping world and those found in ticketing within the Airline industries plans to roll out new personalized services to the travelers.   Clearly, this is a great application area to be considering the use of NoSQL Database technology.  Data variety, scale, responsiveness, all the ingredients that make for an ideal use case for employing NoSQL technology in the solution. 

I attended a few colleague sessions at Oracle Open World focusing on NoSQL Database use cases.   Dave Segleau from the Oracle NoSQL Database team did some work on the challenges associated with Web...

Oracle delivering value to the Startup while embracing the Enterprise

At the recent NoSQL Now! conference in San Jose, AndyMendelsohn, SVP Database Server Technology at Oracle delivered a double punchannouncement of both the world’s first Engineered System for NoSQL and amove to the open source business model of per server annual support subscription. These two options highlight the drive by Oracle to providevalue to the developers of both the high end Enterprise and Startup customersalike. Surprisingly, both of these announcementsreveal low Total Cost of Ownership solutions for both ends of the businessspectrum. Startups who are just gettingstarted with their business and controlling costs using open source packages,renting their infrastructure in the cloud and Enterprise companies who arecontrolling expenses while building out substantial Big Data clusters toleverage well understood reserves of corporate data. The value delivered by offering an open source subscription supportmodel is well understood. It is the samebusiness model used with Oracle’s MySQL, 10Gen’s MongoDB and DataStax’s Cassandra amongothers. The OracleNoSQL Database Community Edition is an open source, fully functional,horizontally scalable key-value store database lacking only in the integrationswith higher end solutions such as Oracle Event Processing, Oracle Coherence,Oracle Enterprise Manager. Those higherend solution integrations are available in the Enterprise Edition. While open community developer forums like StackOverflowand the Oracle NoSQLDatabase forums can be used to get great answers to technical questions, theannual subscription support is well suited for production environments and canbe purchased online at the OracleStore. The subscription support offersthe same commercial grade response as that found with the Oracle relationaldatabase and comes at a cost of only $2000 per server, beating the averageindustry per server price of other commercial NoSQL support offerings. The value delivered by the Oracle Big Data Appliance is lessopaque. Often people have been noted asasserting that Engineered Systems are necessarily expensive. However, as a manufacturer of high end servers,Oracle achieves certain economies of scale that make its Big Data Appliance(BDA) offering highly competitive compared with Do-it-Yourself clusters. Here is a great analysis of the Total Cost ofOwnership when building out one of these solutions, highlighting the BDA’scost competitiveness. So what does a NoSQL Engineered System look like? NoSQLis an expandable database server that allows always-on online expansion withthe growth of demand for its managed data in a horizontal manner. Similarly, the Big Data Appliance EngineeredSystem is an expandable hardware infrastructure that can grow with the demandfor your business. The BDA comes in expandable units of 6 server nodes where eachserver is composed of 2 x Eight-Core Intel ® Xeon ® E5-2660Processors (2.2 GHz) with 64GB on board RAM expandable to 512GB. On the storage side, each server offers 12 x3TB 7,200 RPM High Capacity SAS Disks, providing a total of 72 disks in asingle Rack. When it comes to NoSQLDatabase architecture, the bountiful disk design is ideal to provide a widepartition space for maximum write throughput, leveraging parallelization ofwrites with highly available read replicas. In some respects, the secret sauce is thatthese servers are held together with an Infiniband backbone offering a sustained40Gb/sec bandwidth. At those speeds, the “eventual” in eventually consistent mightas well be called instantaneous, allowing the deployment of memory replicatedoperations with extreme high reliability and consistency. A system can be expanded by adding more nodes in 6 node incrementswithout the need to add additional network switches, just plug in more servers, connect then to the network backbone and then the hardware becomes available for databaseexpansion. It is worth noting that Big Data often involves manytechnologies in the delivery of a solution. So, while it is possible to configure a fully dedicated NoSQL Database BDA,it is also possible to configure the Rack in 3 server allocations to the NoSQLDatabase and allocate remaining hardware to Hadoop ( BDA is pre-packaged withCloudera’s distribution fully licensed thru Oracle ) and other softwaresolutions such as Oracle R for advanced analytics. A completedescription of the BDA can be found in the productdata sheet.

At the recent NoSQL Now! conference in San Jose, Andy Mendelsohn, SVP Database Server Technology at Oracle delivered a double punch announcement of both the world’s first Engineered System for NoSQL...

Oracle NoSQL Database Events

Hey all you Oracle NoSQL Database (OnDB) fans, here is a noteregarding some of our upcoming activities.  In particular, the Oracle team is giving several session presentations at the upcoming NoSQL Now! conference in San Jose.   Andy Mendelsohn, SVP Database Server Technologies at Oracle is actually giving the keynote presentation for this conference. Its going to be awesome and I highly recommend folks to come out and meet the NoSQL team. Also, there is a webcast coming up with the Oracle Developers User Group (ODTUG) that will give a great overview of application development using OnDB NoSQL Now2013 ( Aug 20th -22nd ) San Jose Convention Center 150 West San Carlos St., San Jose, CA 95110 Register toattend this event with an Oracle exclusive discount. KeyNote – Andrew Mendelsohn, SVP DatabaseServer Technologies ( Wed 8/21 9-9:30am) In thiskeynote session, Andy Mendelsohn, Oracle's Senior Vice President of DatabaseServer Technologies discusses how Oracle NoSQL Database is delivering on themarkets expectations for flexibility, performance and scalability withoutcompromising data reliability. In addition, you'll learn how Oracle is helpingcustomers take advantage of Big Data by integrating NoSQL technologies intoenterprise data centers and evolving its portfolio of Engineered Systems. - See into here. Panel: Enterprise NoSQL, Where next? - David Rubin, Director NoSQL Engineering ( Wed 8/21 2:15-3:00pm ) Enterprise customers are waking up to theNOSQL message in a big way, comfortable now that the technology is robustenough for many enterprise applications, and that the leading companiesdeveloping NoSQL products will be around to provide support. Meanwhile, NoSQLvendors are doing their part by addressing enterprise requirements forsecurity, ACID transactions, and even SQL compatibility. The latest moves bymajor database incumbents (such as Oracle) give further reassurance toenterprise customers that NoSQL technologies should become part of the datamanagement and software portfolio. - See info here. Session: NoSQL and Enterprise Apps- DaveSegleau ( Thur 8/22 9:30-10:00am ) This sessionwill discuss the key database and application requirements that we hear fromour enterprise customers. This session will explore how these requirements areaddressed in Oracle NoSQL Database. It will describe several real-world usecases from customers who are using Oracle NoSQL Database today. From atechnical perspective, this session will focus on: * Performance, Scalabilityand Predictability * NoSQL cluster management * Integration with Hadoop, OracleDatabase, Coherence and Event Processing * Real-world use cases - See info here.  Session: Graphs, RDF and Oracle NoSQL - Zhu Wu( Thur 8/22 2:00-2:45pm ) Graphand NoSQL are both hot areas of activity, and are seriously considered for BigData modeling and storage, respectively. Graph, a different data modelingparadigm than the traditional relational and XML data modeling, providesintuitive and flexible data construction, manipulation, query and navigation.NoSQL, in turn, is a database repository, providing an excellent storagearchitecture that is distributed, horizontally scalable and fault tolerant. Webelieve that integration of the Graph data model with the NoSQL storage modelcan provide a robust and massively scalable platform for managing Big Data. Inthis talk, we will share our first-hand experience in implementing RDF Graph (aW3C standards-based Graph language) capabilities on the Oracle NoSQL Database,a K/V based, distributed and horizontally scalable platform - See info here. OracleDeveloper Users Group Webcast ( Aug 27th) ODTUG Webcast: Oracle & NoSQL from the Inside – Aug. 27, 9:00am PT The NoSQL (Not Only SQL)database market is hot with activity, serving new requirements for applicationdevelopment and deployment.  Get up-to-date on this subject, take a diveinto how Oracle's NoSQL database has evolved to integrate with the Oracleecosystem of projects, and get a hands-on tour of what it takes to build anapplication with the Oracle NoSQL Database and deploy its distributed, highlyavailable, scale-out architecture. Click to Register. 

Hey all you Oracle NoSQL Database (OnDB) fans, here is a noteregarding some of our upcoming activities.  In particular, the Oracle team is giving several session presentations at the upcoming NoSQL...

InfoQ looking for users to categorize NoSQL Tech use

Often, when people talk about NoSQL technologies, there is an attempt to categorize the solutions.  In a new Adoption Trends breakdown by InfoQ they also take this tact, providing the following categorizations:   Columnar, Document, Graph, In-Memory Grid, Key-Value.  I think this definitely has some utility, yet in another respect, misses the main point about this technology segment.  These technologies have come to existence to fulfill the needs of 1 primary and 1 ancillary core ability.  The primary ability, surfaced by Amazon, LinkedIn, Facebook, etc is the ability to scale and remain available in the face of tremendous data and concurrent users.  The ancillary ability is to provide more agility for the line of business which is constantly adjusting its solutions with changing needs and understanding of the consumer.  What considerations should drive the thought process of adoption for NoSQL?  Each of the NoSQL technologies from that list of categories has, from its database perspective, a key underlying principle that enables those core NoSQL abilities, a break away from server side relationship management, moving responsibility of data constraints to the application tier ( -vs- the database ) via the implementation of a key-value based distribution model.  It is that key-value paradigm that enables the scale-out architecture thru key distribution and so in some sense, they are all key-value stores.  In fact, it is for that reason that we've seen for instance Cassandra evolve it's value implementation several times over the last couple of year from binary to super column, to table structure.  If it had not been for the underlying key value nature of the implementation, they could have never undergone those drastic changes in data storage format in such a short period of time.  This is why the Oracle NoSQL Database (OnDB) was implemented as a key-value store.  It provides the ability to layer multiple value abstractions under the core key based distribution and scale-model.  Today it supports 2 value abstractions, Binary and JSON with a 3rd abstraction on the way, Table.  Each of the value abstractions provide different utility in application implementation and feature the best run-time / storage characteristics for a particular data use.  For example, when storing image and video data, the binary abstraction is best suited, especially when it is overlayed with the OnDB streaming interfaces.  However, when you want to store nested data structures with internal value inter-dependencies and sub-field updates, JSON is a great value abstraction.  Finally, if you have the need to model data in a format amenable to integrated systems and capable of supporting a richer set of query semantics, the Table abstraction does the job best.   Btw - I might argue that the Graph category of NoSQL database is really an application layer above a NoSQL database.  Its the reason we've seen NoSQL databases like Objectivity enter the Graph database category and why you will find that OnDB supports Graph storage and retrieval for Oracle Spatial and Graph ...but this is a different Blog topic all together.  Anyway, the point I am trying to make is that companies use of data will vary greatly.  The real category to which all of the NoSQL database implementations belong, is the key-value category.  Bringing in NoSQL technology that provides a range of value options that can be selected and intermixed to achieve the optimal solution for a given application will provide the greatest flexibility and reduction of risk.  The scalability of a key based distribution architecture should be ever present, but the application of the value abstraction will most surely vary for each solution space.  This is something project leads and managers adopting these new technologies should reflect on as they invest their resources and time in learning and adopting a particular product.  The repeatability and applicability of that investment for unforeseen future work. Btw - in the InfoQ Adoption Trends article, there is a survey on current and future use of the many vendor technologies.  I encourage everyone to take the time to visit the site and share your position on this important area of data management.

Often, when people talk about NoSQL technologies, there is an attempt to categorize the solutions.  In a new Adoption Trends breakdown by InfoQthey also take this tact, providing the...

High-Five for Rolling Upgrade in Oracle NoSQL Database

In today’s world of e-commerce where businesses are operated 24/7, the equations for revenue made or lost are some times expressed in terms of latency i.e. the faster you serve your customer the more likely they are going to do the business with you. Imagine in this scenario (where every millisecond count) your online business going to be inaccessible for few minutes to couple of hours because you needed to apply an important patch or an upgrade. I think you got an idea that how important it is to stay online and available even during the course of any planned hardware or software upgrade. Oracle NoSQL Database 12c R1 (12.1.2.1.8) release puts you on a track where you can upgrade your NoSQL cluster with no disruption to your business services. And it makes it possible by providing some smart administration tools that calculates the safest combination of storage nodes that can be brought down in parallel and upgraded keeping all the shards in the database available for read/write at all the time. Let's take a look at a real world example. Say I have deployed a 9x3 database cluster i.e 9 shards, with 3 replica each shard (total of 27 replication-nodes) on 9 physical nodes. I got a highly available cluster (thanks to the intelligent topology feature shipped in 11gR2.2.0.23) with replicas of each shards spread across three physical nodes so that there is no single point of failure. All right so here is how my topology looks like: [user@host09 kv-2.1.1]$ java -jar lib/kvstore.jar runadmin -port 5000  -host host01 kv-> ping Pinging components of store mystore based upon topology sequence #136 mystore comprises 90 partitions and 9 Storage Nodes Storage Node [sn1] on host01:5000    Datacenter: Boston [dc1]    Status: RUNNING   Ver: 12cR1.2.1.1         Rep Node [rg2-rn1]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5012         Rep Node [rg3-rn1]      Status: RUNNING,MASTER at sequence number: 41 haPort: 5013         Rep Node [rg1-rn1]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5011 Storage Node [sn2] on host02:5000    Datacenter: Boston [dc1]    Status: RUNNING   Ver: 12cR1.2.1.1         Rep Node [rg1-rn2]      Status: RUNNING,MASTER at sequence number: 45 haPort: 5010         Rep Node [rg3-rn2]      Status: RUNNING,REPLICA at sequence number: 41 haPort: 5012         Rep Node [rg2-rn2]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5011 Storage Node [sn3] on host03:5000    Datacenter: Boston [dc1]    Status: RUNNING   Ver: 12cR1.2.1.1         Rep Node [rg2-rn3]      Status: RUNNING,MASTER at sequence number: 45 haPort: 5011         Rep Node [rg3-rn3]      Status: RUNNING,REPLICA at sequence number: 41 haPort: 5012         Rep Node [rg1-rn3]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5010 Storage Node [sn4] on host04:5000    Datacenter: Boston [dc1]    Status: RUNNING   Ver: 12cR1.2.1.1         Rep Node [rg6-rn1]      Status: RUNNING,MASTER at sequence number: 41 haPort: 5012         Rep Node [rg4-rn1]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5010         Rep Node [rg5-rn1]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5011 Storage Node [sn5] on host05:5000    Datacenter: Boston [dc1]    Status: RUNNING   Ver: 12cR1.2.1.1         Rep Node [rg6-rn2]      Status: RUNNING,REPLICA at sequence number: 41 haPort: 5012         Rep Node [rg4-rn2]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5010         Rep Node [rg5-rn2]      Status: RUNNING,MASTER at sequence number: 45 haPort: 5011 Storage Node [sn6] on host06:5000    Datacenter: Boston [dc1]    Status: RUNNING   Ver: 12cR1.2.1.1         Rep Node [rg5-rn3]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5011         Rep Node [rg4-rn3]      Status: RUNNING,MASTER at sequence number: 45 haPort: 5010         Rep Node [rg6-rn3]      Status: RUNNING,REPLICA at sequence number: 41 haPort: 5012 Storage Node [sn7] on host07:5000    Datacenter: Boston [dc1]    Status: RUNNING   Ver: 12cR1.2.1.1         Rep Node [rg9-rn1]      Status: RUNNING,MASTER at sequence number: 41 haPort: 5012         Rep Node [rg7-rn1]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5010         Rep Node [rg8-rn1]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5011 Storage Node [sn8] on host08:5000    Datacenter: Boston [dc1]    Status: RUNNING   Ver: 12cR1.2.1.1         Rep Node [rg8-rn2]      Status: RUNNING,MASTER at sequence number: 45 haPort: 5011         Rep Node [rg9-rn2]      Status: RUNNING,REPLICA at sequence number: 41 haPort: 5012         Rep Node [rg7-rn2]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5010 Storage Node [sn9] on host09:5000    Datacenter: Boston [dc1]    Status: RUNNING   Ver: 12cR1.2.1.1         Rep Node [rg7-rn3]      Status: RUNNING,MASTER at sequence number: 45 haPort: 5010         Rep Node [rg8-rn3]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5011         Rep Node [rg9-rn3]      Status: RUNNING,REPLICA at sequence number: 41 haPort: 5012 Notice that each storage node (sn1-sn9) is hosting one MASTER node and two REPLICA nodes from entirely different shards. Now if you would like to upgrade the active cluster to a latest version of Oracle NoSQL Database without a downtime then all you need to do is grab the latest binaries from the OTN and lay down the bits (NEW_KVHOME) on each of the 9 nodes (only once if you had a shared drive accessible from all the nodes). From the administration command-line-interface (CLI) simply perform 'show upgrade': [user@host09 kv-2.1.8]$ java -jar lib/kvstore.jar runadmin -port 5000  -host host01 kv-> show upgrade Calculating upgrade order, target version: 12.1.2.1.8, prerequisite: 11.2.2.0.23 sn3 sn4 sn7 sn1 sn8 sn5 sn2 sn6 sn9 SNs in each horizontal row represents the storage nodes that can be patched/upgraded in parallel and multiple rows represent the sequential order  i.e. you can upgrade sn3, sn4 & sn7 in parallel and once all three are done, you can then move to the next row (with sn1, sn8 & sn5) and so on and so forth. You must be asking what if you have fairly large cluster and you don't want to manually upgrade them, can you not automate this process by writing a script, well we have already done that for you as well. An example script is available for you to try out and can be found from: <KVROOT>/examples/upgrade/onlineUpgrade I hope you would find this feature useful just the way I do. If you need more details on this topic then I would recommend that you visit Upgrading an Existing Oracle NoSQL Database Deployment  in the Administrator's Guide. If you are new to Oracle NoSQL Database, get the complete product documentation from here and learn about the product from self paced web tutorials with some hands on exercises as well.

In today’s world of e-commerce where businesses are operated 24/7, the equations for revenue made or lost are some times expressed in terms of latency i.e. thefaster you serve your customer the more...

Oracle

Integrated Cloud Applications & Platform Services