by Michael Schulman, Anand Chandak, Mayuresh Nirhali - Oracle
Use Case: Scalable Email Application
We'll describe how a simplified email application can be implemented using a NoSQL repository. A NoSQL solution is well suited for an email application since it supports schema flexibility, 24x7 availability and horizontal scalability. In addition, a NoSQL repository can deliver very low latencies for insert and read operations, thus making it a natural choice for a web-scale interactive email application.
Scalable Email Applications Requirements
Let's look at an email application that must support large numbers of users, each having possibly tens of thousands of messages per year. Some of the high-level requirements are as follows:
A critical aspect of application development process is data modeling. Good data modeling can enhance application performance, deliver a rich user experience and make it easy to evolve the application as requirements change.
Table Structure (1:n) and Choice of Primary/Shard Keys
The email application exhibits a one-to-many (parent-child) relationship: each user has several folders; each folder has a large number of e-mails. In a typical deployment, there can be thousands or hundreds of thousands of users, so Parent (user) and child (folders for a user) and child (e-mails within a folder) is a natural way to organize this data. In the code snippets below, we'll illustrate these concepts using Oracle NoSQL Database syntax and terminology.
Oracle NoSQL Database stores the parent/child table data so that it is very efficient to traverse the parent child relationship, given a parent record. Furthermore, the entire child data set associated to its parent is guaranteed to reside on the same shard, enabling the retrieval of all or a filtered subset of the data in a single network call.
Let's start with the table definitions presented below, using Oracle NoSQL Database syntax and terminology.
1. Parent table to store the
user information. This will contain the credentials of each user.
Every table must have one or more fields designated as the
primary key. The primary key is defined at the time that the table is created, and cannot be changed afterwards. A primary key uniquely identifies a row in the table. A shard key identifies the primary key fields that are used for sharding—data that contain the same values for the shard key fields are guaranteed to be stored on the same shard. This results in better access locality and transactional changes to the data that share the same shard key.
In the case of the
Users table, both the shard key and primary key is represented by the
2. A secondary index is created on the
emailAddress field in order to efficiently retrieve the user details associated with a specific e-mail address.
Create index emailIndex on users (emailAddress)
3. Child table
users.folder. The '.' Dot in the name of the table signifies that it is a child table. In this example,
folder is specified as a child table of the
users table, using dot notation to associate the
folders table with its parent (users). Every user will have several folders. This table stores the folders associated with each user.
create table users.folder ( folderId string, folderName string, primary key (folderId))
4. Child table
users.folder.message. There are 2 levels of parent-child hierarchy: the
message table is a child of the folder table, which in turn is a child of the
users table. The message table is used to hold every email message that is created by a user or received by a user.
create table users.folder.message ( messageId string, from string, toList string, ccList string, bccList string, subject string, body string, createdDate timestamp(6), primary key (messageId, createdDate) )
Some of the benefits of the above model:
- Ability to scale: We choose the user id as the shard key for the following reasons:
userIDs), there sufficient cardinality to distribute data (sharded by
userID) across multiple shards.
- ACID requirements: The choice of
userID as the shard key and the parent-child table model provides a natural grouping of the email messages, enabling atomic deletion of e-mails or movement of e-mail messages from one folder to another.
Let's now look at some code snippets for some of the common operations in the email application. The entire program is available on GitHub.
Insert a New User Record
In many applications, it is very common to specify the input data as a JSON document. Oracle NoSQL Database provides a convenient way to automatically convert JSON to table row format and vice versa for inserts and retrievals respectively. We will leverage this capability for the email application. The code below takes the JSON as input and inserts the row uniquely in the
Retrieve a User's Email
Let's look at the code to retrieve the list of folders along with the messages in those folders. This requires querying multiple nested tables i.e. users, folders and messages for a particular user. Oracle NoSQL Database provides a convenient way to retrieve data in a nested hierarchy in a single API call. The following sample code takes
userId as input and lists out the records from all the tables for that user.
As this blog post illustrates, a NoSQL repository is ideally suited for an e-mail application that needs a simple, fast, scalable and highly available repository.
For developers that want to get started developing with Oracle NoSQL Database on the Oracle Bare Metal Cloud, a simple to use install script is now available to assist in getting Oracle NoSQL Database up and running very quickly. Read the whitepaper, and then download and install the script.