Scalability is very important for Java applications as the number of users, number of transactions, and data are increasing exponentially. Sharding distributes and replicates the data across a pool of databases that do not share hardware or software. Each individual database is known as a shard. Java applications can linearly scale up or scale down by adding databases (shard) to the pool or by removing databases (shards) from the pool.
In addition to achieving linear scalability, sharding has many other benefits. It provides extreme data availability by eliminating single point of failure and isolating the faulted shards from other working shards. It makes cloud deployment easier as the size of the shard is small. Sharding also makes data sovereignty and data proximity possible by locating different parts of data in different countries or regions.
Sharding uses horizontal partitioning in which shard contains the table with the same columns but a different subset of rows. This partitioning is based on a sharding key. Choosing a good partitioning strategy is very important for sharding. A good sharding key will uniformly distribute the data across all shards so that DMLs and queries have fair distribution without creating any hot shards. Usually the primary key of the table that uniquely identifies the users or objects of the applications qualifies as a sharding key. There can be multiple sharding keys too. Example, Customer_ID could be the sharding key and REGION could be the super sharding key. The diagram below shows how the data is distributed into three shards but together they are represented as a single logical database.
Java applications require sharding key(s) or super sharding key (if it exists) for establishing a connection to a particular shard. Once the session is established to a shard, all SQL queries and DMLs are executed in the scope of the given shard. The JDK9 standard Sharding APIs accept sharding key and super sharding key to develop shard aware Java applications. For example, Oracle JDBC driver and the Universal Connection Pool (UCP) from 18.104.22.168 have been enhanced to accept sharding key and super sharding key while connecting to a sharded database.
Looking at Oracle Sharding as an example. Oracle Database v22.214.171.124 supports sharding through Global Data Services (GDS). There are Shard Directors or GSM listeners that route connections to the appropriate shards based on the sharding key passed during a connection request. It maintains an up-to-date shard topology (sharding key range mappings stored in a particular shard).
The Universal Connection Pool (UCP) caches the shard topology and acts as a shard director. This way, UCP improves the performance of a shard aware Java applications by getting the fast path to shards saving the additional hop to the shard director. To begin with, UCP requires only one successful connection to the shard to pull the shard topology and cache it on the client side. Later, when connections are requested by passing the sharding key, UCP makes a lookup in its cached topology to know which shard the key belongs to and returns a connection to the correct shard. This is known as the "Direct Routing".
There are scenarios when the application needs to aggregate the data across all the shards. In such a scenario, cross shard queries can be executed by connecting to the Shard Coordinator also known as Shard Catalog. The Shard Coordinator or Shard Catalog allows the users to submit SQL statements without a sharding key. The coordinator’s SQL compiler analyzes and rewrites the query into query fragments that are sent and executed by the multiple shards. After the query processing the data is aggregated by the coordinator. This is known as "Proxy Routing".
The Sharding APIs require the sharding key to be passed for establishing a connection to the database. The following steps are required in getting a connection to a sharded database:
(1) Build the sharding key: Make sure to pass the sharding key value and the sharding data type while building the sharding key. You can connect to a sharded database through SQL Developer or SQLPlus and get a list of sharding keys for testing purposes.
(2) Build the super sharding key: Super sharding key is optional. If your sharded database is not using super sharding key then you can ignore this step.
(3) Getting a connection to the shard: After the sharding key and super sharding key are built, they need to be passed to get a successful connection to the shard that contains the data pertinent to the sharding key.
Refer to JDBCShardingSample.java for a quick code sample to test the sharded database.
Note: JDBC driver 126.96.36.199 doesn't support JDK9 standard Sharding APIs and the plan is to support it in the future database release. Instead, the Oracle JDBC driver uses oracle.jdbc.OracleShardingKey for supporting sharding APIs for applications using JDK8 and JDK9.
Refer to the code snippet that shows how to establish a connection to a sharded database using Oracle JDBC driver.
import oracle.jdbc.OracleShardingKey; import oracle.jdbc.OracleType; import oracle.jdbc.pool.OracleDataSource; OracleDataSource ods = new OracleDataSource(); ods.setURL(url); ods.setUser(user); ods.setPassword(pwd); // 1. Build the Sharding Key Date shardingKeyVal = new java.sql.Date(0L); OracleShardingKey shardKey = ods.createShardingKeyBuilder() .subkey(shardingKeyVal, OracleType.DATE) .build(); // 2. Build the Super Sharding Key (Optional) OracleShardingKey superShardKey = ods.createShardingKeyBuilder() .subkey("Customer_Location_US”,oracle.jdbc.OracleType.VARCHAR2) .build(); // 3. Get a connection from the specific shard Connection conn = ods.createConnectionBuilder() .shardingKey(shardKey) .suerShardingKey(superShardKey) .build();
UCP Sharding APIs require the sharding key to be passed for establishing a connection to the sharded database. Refer to UCPShardingSample.java for for a quick code sample to test the sharded database.
Refer to the code snippet that shows how to establish a connection to a sharded database using Oracle Universal Connection Pool (UCP).
import oracle.jdbc.OracleShardingKey; import oracle.jdbc.OracleType; import oracle.ucp.jdbc.PoolDataSourceFactory; import oracle.ucp.jdbc.PoolDataSource;
PoolDataSource pds = PoolDataSourceFactory.getPoolDataSource();pds.setConnectionFactoryClassName("oracle.jdbc.pool.OracleDataSource"); pds.setURL(DB_URL); pds.setUser(DB_USER); pds.setPassword(DB_PASSWORD); pds.setConnectionPoolName("UCP_POOL"); pds.setInitialPoolSize(5); //Initial connections when the pool is created pds.setMinPoolSize(5); // Minimum number of connections pds.setMaxPoolSize(20); // Set the maximum number of connections // 1. Build the Sharding Key String email= "firstname.lastname@example.org"; OracleShardingKey shardKey = pds.createShardingKeyBuilder() .subkey(email, OracleType.VARCHAR2) .build(); // 2. Build the Super Sharding Key (Optional) OracleShardingKey superShardKey = pds.createShardingKeyBuilder() .subkey("Location_US”,oracle.jdbc.OracleType.VARCHAR2) .build(); // 3. Get a connection to the specific shard Connection conn = pds.createConnectionBuilder() .shardingKey(shardKey) .suerShardingKey(superShardKey) .build();