Break New Ground

An Email Application Using a NoSQL Repository: Design and Implementation

Guest Author

by Michael Schulman, Anand Chandak, Mayuresh Nirhali - Oracle

Use Case: Scalable Email Application

We'll describe how a simplified email application can be implemented using a NoSQL repository. A NoSQL solution is well suited for an email application since it supports schema flexibility, 24x7 availability and horizontal scalability. In addition, a NoSQL repository can deliver very low latencies for insert and read operations, thus making it a natural choice for a web-scale interactive email application.

Scalable Email Applications Requirements

Let's look at an email application that must support large numbers of users, each having possibly tens of thousands of messages per year. Some of the high-level requirements are as follows:

  1. Highly concurrent read and write access for e-mail users is required.
  2. The writes to reads ratio will be more skewed towards reads.
  3. Moving one or more messages atomically from one folder to another must be supported.
  4. Deleting one or more messages atomically from a folder must be supported.
  5. Reads will typically occur at a specific User's inbox and results will be sorted by arrival date of the message. Reads will be paged and typically, a read will retrieve 25 messages to display.

Data Modeling

A critical aspect of application development process is data modeling. Good data modeling can enhance application performance, deliver a rich user experience and make it easy to evolve the application as requirements change.

Table Structure (1:n) and Choice of Primary/Shard Keys

The email application exhibits a one-to-many (parent-child) relationship: each user has several folders; each folder has a large number of e-mails. In a typical deployment, there can be thousands or hundreds of thousands of users, so Parent (user) and child (folders for a user) and child (e-mails within a folder) is a natural way to organize this data. In the code snippets below, we'll illustrate these concepts using Oracle NoSQL Database syntax and terminology.

Oracle NoSQL Database stores the parent/child table data so that it is very efficient to traverse the parent child relationship, given a parent record. Furthermore, the entire child data set associated to its parent is guaranteed to reside on the same shard, enabling the retrieval of all or a filtered subset of the data in a single network call.

Let's start with the table definitions presented below, using Oracle NoSQL Database syntax and terminology.

1. Parent table to store the user information. This will contain the credentials of each user.

Every table must have one or more fields designated as the primary key. The primary key is defined at the time that the table is created, and cannot be changed afterwards. A primary key uniquely identifies a row in the table. A shard key identifies the primary key fields that are used for sharding—data that contain the same values for the shard key fields are guaranteed to be stored on the same shard. This results in better access locality and transactional changes to the data that share the same shard key.

In the case of the Users table, both the shard key and primary key is represented by the userID column.

2. A secondary index is created on the emailAddress field in order to efficiently retrieve the user details associated with a specific e-mail address.

Create index emailIndex on users (emailAddress)

3. Child table users.folder. The '.' Dot in the name of the table signifies that it is a child table. In this example, folder is specified as a child table of the users table, using dot notation to associate the folders table with its parent (users). Every user will have several folders. This table stores the folders associated with each user.

create table users.folder (
	folderId string,
	folderName string,
	primary key (folderId))  

4. Child table users.folder.message. There are 2 levels of parent-child hierarchy: the message table is a child of the folder table, which in turn is a child of the users table. The message table is used to hold every email message that is created by a user or received by a user.

create table users.folder.message ( 
	messageId string,
	from string, 
	toList string,
	ccList string,
	bccList string,
	subject string,
	body string, 
	createdDate timestamp(6),
	primary key (messageId, createdDate) ) 

Some of the benefits of the above model:

- Ability to scale: We choose the user id as the shard key for the following reasons:

  • Sufficient cardinality – Since there are a large number of users (unique userIDs), there sufficient cardinality to distribute data (sharded by userID) across multiple shards.
  • Elastic expansion – As more users are added to the email application, we can simply add shards to the store to provide additional storage and throughput.

- ACID requirements: The choice of userID as the shard key and the parent-child table model provides a natural grouping of the email messages, enabling atomic deletion of e-mails or movement of e-mail messages from one folder to another.

Let's now look at some code snippets for some of the common operations in the email application. The entire program is available on GitHub.

Insert a New User Record

In many applications, it is very common to specify the input data as a JSON document. Oracle NoSQL Database provides a convenient way to automatically convert JSON to table row format and vice versa for inserts and retrievals respectively. We will leverage this capability for the email application. The code below takes the JSON as input and inserts the row uniquely in the users table.


Retrieve a User's Email

Let's look at the code to retrieve the list of folders along with the messages in those folders. This requires querying multiple nested tables i.e. users, folders and messages for a particular user. Oracle NoSQL Database provides a convenient way to retrieve data in a nested hierarchy in a single API call. The following sample code takes userId as input and lists out the records from all the tables for that user.

As this blog post illustrates, a NoSQL repository is ideally suited for an e-mail application that needs a simple, fast, scalable and highly available repository.

For developers that want to get started developing with Oracle NoSQL Database on the Oracle Bare Metal Cloud, a simple to use install script is now available to assist in getting Oracle NoSQL Database up and running very quickly. Read the whitepaper, and then download and install the script.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.