Achieving Data Sovereignty with Oracle Sharding (Part 1)

September 29, 2022 | 7 minute read
Pankaj Chandiramani
Director, Product Management
Text Size 100%:

In this two part blog on Data Sovereignty, i will share how customers are achiving Data Sovereignty with Oracle Sharding including step by step implemantation details.

Part 1 : Overview of Data Sovereignty and how Sharding can provide a perfect solution for achieving data sovereignty

Part 2: Customer usecase - How one of the biggest re-insurance company is using Oracle Sharding to achive data sovereignty

Join us at Cloud World 2022 for a session on Data Sovereignty : LIT4195: Complying with Data Sovereignty Regulation using Oracle Sharding [Wednesday. Oct 19 | 4:20 PM PDT]

Introduction

Country- or region-specific data placement, residency, and sovereignty regulations are becoming more prevalent. We discuss how Oracle Database customers leverage Oracle Sharding to comply with such regulations. We also discuss various flavors of such rules and corresponding architecture and implementation patterns.

Data sovereignty generally refers to how data is governed by regulations specific to the region in which it originated. These types of regulations can specifywhere data is stored, how it is accessed, how it is processed, and the life-cycle of the data.

With the exponential growth of data crossing borders and public cloud regions, more than 100 countries now have passed regulations concerning where data is stored and how it is transferred. Personally identifiable information (PII) in particular increasingly is subject to the laws and governance structures of the nation in which it is collected. Data transfers to other countries often are restricted or allowed based on whether that country offers similar levels of data protection, and whether that nation collaborates in forensic investigations.

Data sovereignty requirements are driven by local regulations which could result in different application architectures. A few of them are:

  • Data must be physically stored in a certain geographic location. For example, within the boundaries of a specific country or a region comprising of several countries. It is fine to access and process the data remotely so far as the data is not stored in remote locations. From a technical standpoint, this implies that data stores like databases, object stores, and messaging stores that physically store the persistent data must be in a certain geographic location. However, the application run time which has business logic for processing of data could be outside the geographic location. Examples of such applications parts include application servers, mobile applications, API Gateways, Workflows, and so on.

  • Data must be physically stored and processed in a certain geographic location: In this case, storing of data and processing of data must take place within the defined geographic location.

Problem Statement

Achieving Data Sovereignty has become a complex problem for IT as in most cases they end up duplicating the entire stack (application and database) in attempts to provide a solution to the business. This not only adds hardware cost but also impacts the operational cost. As more and more countries are introducing compliance policies around Data Sovereignty, this solution is not scalable at all and some companies are even thinking about retricting business to certain countries as the cost of running a seprate stack doesnt add up to the business benefits.

Use Case of Achieving Data Sovereignty with Oracle Sharding

A large but imaginary financial institute, Shard Bank, wants to offer credit services to users in multiple counties. Each country where credit service will be provided has its own data privacy regulations and the Personally Identifiable Information (PII) data have to be stored in this country.

The access to the data has to be limited and data administrators in one country cannot see data in others. The solution for this use case is user-defined Sharding with shards configured in different countries and Real Application Security (RAS) for data access control.

Overview of Oracle Sharding Solution

Oracle Sharding solution provides you with in-country data storage, and still supports a global view of all the data.

The example below demonstrates a hybrid Oracle Sharding user-defined deployment between OCI data centers and on-premises across multiple regions. In this Oracle Sharding configuration, you can store and process all data locally. Each database (in each sovereign region) is made into a shard and the shards belong to a single sharded database. Oracle Sharding allows you to query data in one shard (within one country), and Oracle Sharding supports multi-shard queries (that can query data from all the countries).
 
Description of Figure 10-2 follows
 

The global sharded database is sharded by a key indicating the country in which it must reside. In-country applications connect to the local database as usual, and all data is stored and processed locally.

Any multi-shard queries are directed to the shard coordinator. The coordinator rewrites the query and sends it to each shard (country) that has the required data. The coordinator processes and aggregates the results from all of the countries and returns result.

Oracle Sharding makes this use case possible with the following capabilities:

  • Direct-to-shard routing for in-country queries.
  • The user-defined sharding method allows you to use a range or list of countries to partition data among the shards.
  • Automatic configuration of replication using Oracle Active Data Guard, and constrain the replicas to be in-country.

The benefits of this approach are:

  • Each shard can be in a cloud or on-premises within the country.
  • Shards can use different cloud providers (multi-cloud strategy) and replicas of a shard can be in a different cloud or on-premises.
  • Online resharding allows you to move data between clouds, or to and from the cloud and on-premises.
  • Strict enforcement of data sovereignty providing protection from inadvertent cross region data leak.
  • Single Multimodel Big Data store with reduced volume of data duplication.
  • Better fault isolation as planned/unplanned down time within one region/LOB does not impact other regions/LOBs.
  • Ability to split busy partitions and shards as needed.
  • Support for full ACID properties is critical for transactional applications.

Benefits of Implementing Data Sovereignty with Oracle Sharding

Oracle Sharding meets data sovereignty requirements and supports applications that require low latency and high availability.

  • Sharding makes it possible to locate different parts of the data in different countries or regions – thus satisfying regulatory requirements where data has to be located in a certain jurisdiction.

  • It also supports storing particular data closer to its consumers. Oracle Sharding automates the entire lifecycle of a sharded database – deployment, schema creation, data-dependent routing with superior run-time performance, elastic scaling, and life-cycle management.

  • It also provides the advantages of an enterprise RDBMS, including relational schema, SQL, and other programmatic interfaces, support for complex data types, online schema changes, multi-core scalability, advanced security, compression, high-availability, ACID properties, consistent reads, developer agility with JSON, and much more.

 

Implementing Data Sovereignty with Oracle Sharding

Oracle Sharding distributes segments of a data set across many databases (shards) on different computers, on-premises, or in the cloud. These shards can be deployed in multiple regions across the globe. This enables Oracle Sharding to create globally distributed databases honoring data residency.

All of the shards in a given database are presented to the application as a single logical database. Applications are seamlessly connected to the right shard based on the queries they run. For example, if an application instance deployed in the US needs data that resides in Europe, the application request is seamlessly routed to an EU data center, without the application having to do anything special.

 
Description of Figure 10-1 follows

Additionally, Oracle Database security features such as Real Application Security (RAS) and Oracle Database Vault can be used to limit data access further, even within a region. For example, an administrator in the EU region can further be restricted to see data only from a subset of countries and not all EU countries. Within a Data Sovereignty region, data can be replicated across multiple data centers by using Oracle Data Guard and Oracle GoldenGate for such replication.

Oracle Sharding management interfaces give you control of the global metadata and provide a view of the physical databases (replicas), data they contain, replication topology, and more. Oracle Sharding handles data redistribution when nodes are added or dropped.

You can access worldwide reporting without actually copying the data from the various regions. Sharding can run multi-shard reports without copying any data from any region. Oracle Sharding pushes queries to the nodes where the data resides.

Oracle Sharding provides comprehensive data sovereignty solutions that focus on the following aspects:

  • Data Residency: Data can be distributed across multiple shards, which can be deployed in different geographical locations.

  • Data Processing: Application requests are automatically routed to the correct shard irrespective of where the application is running.

  • Data Access: Data access within a region can be restricted further using the Virtual Private Database capability of Oracle Database.

  • Derivative Data: Ensuring that the data is stored in an Oracle Database, and using Oracle Database features to contain the proliferation of derivative data.

  • Data Replication: Oracle Sharding can be used with Oracle Data Guard or Oracle GoldenGate to replicate data within the same Data Sovereignty region.

Follow the step-by-step instruction including screenshots for configuring sharding to achive Data Sovereignty

 

Pankaj Chandiramani

Director, Product Management

With an extensive 18-year background in business technology, Pankaj Chandiramani currently holds the position of Director of Product Management for Oracle Database. His expertise lies in the development and marketing of enterprise Software as a Service (SaaS), hybrid, and on-premises products. He has successfully contributed to various domains, including AI/ML, IT Operations, Data Management, and DevOps.


Previous Post

MySQL HeatWave on AWS: A Cutting Edge Cloud Database for the Masses

Guest Author | 4 min read

Next Post


Ensuring Data Consistency in Microservice Based Applications

Todd Little | 7 min read