In this two part blog on Data Sovereignty, i will share how customers are achiving Data Sovereignty with Oracle Sharding including step by step implemantation details.
Part 1 : Overview of Data Sovereignty and how Sharding can provide a perfect solution for achieving data sovereignty
Part 2: Customer usecase - How one of the biggest re-insurance company is using Oracle Sharding to achive data sovereignty
Country- or region-specific data placement, residency, and sovereignty regulations are becoming more prevalent. We discuss how Oracle Database customers leverage Oracle Sharding to comply with such regulations. We also discuss various flavors of such rules and corresponding architecture and implementation patterns.
Data sovereignty generally refers to how data is governed by regulations specific to the region in which it originated. These types of regulations can specifywhere data is stored, how it is accessed, how it is processed, and the life-cycle of the data.
With the exponential growth of data crossing borders and public cloud regions, more than 100 countries now have passed regulations concerning where data is stored and how it is transferred. Personally identifiable information (PII) in particular increasingly is subject to the laws and governance structures of the nation in which it is collected. Data transfers to other countries often are restricted or allowed based on whether that country offers similar levels of data protection, and whether that nation collaborates in forensic investigations.
Data sovereignty requirements are driven by local regulations which could result in different application architectures. A few of them are:
Data must be physically stored in a certain geographic location. For example, within the boundaries of a specific country or a region comprising of several countries. It is fine to access and process the data remotely so far as the data is not stored in remote locations. From a technical standpoint, this implies that data stores like databases, object stores, and messaging stores that physically store the persistent data must be in a certain geographic location. However, the application run time which has business logic for processing of data could be outside the geographic location. Examples of such applications parts include application servers, mobile applications, API Gateways, Workflows, and so on.
Data must be physically stored and processed in a certain geographic location: In this case, storing of data and processing of data must take place within the defined geographic location.
Achieving Data Sovereignty has become a complex problem for IT as in most cases they end up duplicating the entire stack (application and database) in attempts to provide a solution to the business. This not only adds hardware cost but also impacts the operational cost. As more and more countries are introducing compliance policies around Data Sovereignty, this solution is not scalable at all and some companies are even thinking about retricting business to certain countries as the cost of running a seprate stack doesnt add up to the business benefits.
A large but imaginary financial institute, Shard Bank, wants to offer credit services to users in multiple counties. Each country where credit service will be provided has its own data privacy regulations and the Personally Identifiable Information (PII) data have to be stored in this country.
The access to the data has to be limited and data administrators in one country cannot see data in others. The solution for this use case is user-defined Sharding with shards configured in different countries and Real Application Security (RAS) for data access control.
Oracle Sharding solution provides you with in-country data storage, and still supports a global view of all the data.
The global sharded database is sharded by a key indicating the country in which it must reside. In-country applications connect to the local database as usual, and all data is stored and processed locally.
Any multi-shard queries are directed to the shard coordinator. The coordinator rewrites the query and sends it to each shard (country) that has the required data. The coordinator processes and aggregates the results from all of the countries and returns result.
Oracle Sharding makes this use case possible with the following capabilities:
The benefits of this approach are:
Oracle Sharding meets data sovereignty requirements and supports applications that require low latency and high availability.
Sharding makes it possible to locate different parts of the data in different countries or regions – thus satisfying regulatory requirements where data has to be located in a certain jurisdiction.
It also supports storing particular data closer to its consumers. Oracle Sharding automates the entire lifecycle of a sharded database – deployment, schema creation, data-dependent routing with superior run-time performance, elastic scaling, and life-cycle management.
It also provides the advantages of an enterprise RDBMS, including relational schema, SQL, and other programmatic interfaces, support for complex data types, online schema changes, multi-core scalability, advanced security, compression, high-availability, ACID properties, consistent reads, developer agility with JSON, and much more.
Oracle Sharding distributes segments of a data set across many databases (shards) on different computers, on-premises, or in the cloud. These shards can be deployed in multiple regions across the globe. This enables Oracle Sharding to create globally distributed databases honoring data residency.
All of the shards in a given database are presented to the application as a single logical database. Applications are seamlessly connected to the right shard based on the queries they run. For example, if an application instance deployed in the US needs data that resides in Europe, the application request is seamlessly routed to an EU data center, without the application having to do anything special.
Additionally, Oracle Database security features such as Real Application Security (RAS) and Oracle Database Vault can be used to limit data access further, even within a region. For example, an administrator in the EU region can further be restricted to see data only from a subset of countries and not all EU countries. Within a Data Sovereignty region, data can be replicated across multiple data centers by using Oracle Data Guard and Oracle GoldenGate for such replication.
Oracle Sharding management interfaces give you control of the global metadata and provide a view of the physical databases (replicas), data they contain, replication topology, and more. Oracle Sharding handles data redistribution when nodes are added or dropped.
You can access worldwide reporting without actually copying the data from the various regions. Sharding can run multi-shard reports without copying any data from any region. Oracle Sharding pushes queries to the nodes where the data resides.
Oracle Sharding provides comprehensive data sovereignty solutions that focus on the following aspects:
Data Residency: Data can be distributed across multiple shards, which can be deployed in different geographical locations.
Data Processing: Application requests are automatically routed to the correct shard irrespective of where the application is running.
Data Access: Data access within a region can be restricted further using the Virtual Private Database capability of Oracle Database.
Derivative Data: Ensuring that the data is stored in an Oracle Database, and using Oracle Database features to contain the proliferation of derivative data.
Data Replication: Oracle Sharding can be used with Oracle Data Guard or Oracle GoldenGate to replicate data within the same Data Sovereignty region.
Pankaj Chandiramani is Director of Product Management for Oracle Database and has been working in business technology for the past 16 yrs developing and promoting enterprise SaaS, hybrid and on-prem products in various domains such as AI/ML, IT Operations, Data Management and DevOps.