This blog is co-authored with Kiran Tailor.
Data warehouses have been around since the 1980s. A data warehouse can collect and store data from multiple sources. Stored, this data can be accessed and analyzed to help in decision making using different analytics tools. Any data warehouse requires servers, storage, an operating system, and a database. When integrated, they’re referred to as a data warehouse appliance. These appliances entered the market in the early 2000s and are installed and maintained in on-premises data centers. Teradata and Netezza are two of the more popular examples of these systems.
Currently, the data warehousing market is going through explosive growth. This growth is not only in terms of what companies are spending but also the amount of data being stored, analyzed, and monetized. Data insights are more important than ever to increase revenues and reduce costs. Data warehousing solutions encourage developers and system administrators to focus on outcomes rather than focusing on maintaining the physical appliance. But many customers are still relying on their on-premises data warehousing appliances, which have the following challenges:
Time to operationalize: The operationalization of such a system begins long before the system arrives on-premises. The system requires extensive power, cooling, space, redundancy, backup, and safety compliance for the on-premises data center. When the data center is ready, next comes the challenge of installing, turning on, and fine-tuning the system. This ongoing task requires time, money, and expertise, adding to Opex and Capex costs.
Hardware and software refresh cycle, patching, and vendor lock-in: Every few years, customers need to upgrade the servers, storage, and other hardware components. Patches are applied, usually requiring maintenance windows, experts, and downtime. The software needs updating, all resulting in direct and indirect costs. Moreover, customers need to stay with the original vendor.
System scaling: Scaling of these rigid systems can include performance, storage, or both. There might also be upper limits in terms of total data storage. When reached, the customer needs to acquire a new system and go through operationalization challenges again. With next-gen apps and applications now being utilized extensively, resulting in multiple times the amount of data these appliances were designed for, this data warehousing model can’t meet the challenge.
Compatibility and integrating of new features and tools: Because these systems are provided by a specific vendor, integrating new tools or getting new features can be complex and time-consuming, resulting in customers falling behind.
Maintaining security posture: New security threats, sophisticated hackers, and malicious software are growing rapidly. These threats make it challenging to keep data stored in on-premises appliances safe and secure. As the saying goes, data is the new gold. So, we can see how companies are being targeted for their and their customers’ data.
Let’s look at Oracle Cloud Infrastructure (OCI) and our data warehousing solution: Oracle Autonomous Data Warehouse.
Public clouds are becoming the way corporations tackle some of these challenges. Not only can infrastructure be set up and put into operation quickly, but individual services and complete end-to-end solutions are also deployed in a fraction of time.
Oracle announced its Autonomous Data Warehouse in 2018. Autonomous Data Warehouse is deployed within minutes. It can also automate time-consuming administrative tasks, such as backup and patching, and requires no hardware procurement time. You can deploy it on shared or dedicated infrastructure, configure autoscaling for CPU and storage, performance monitoring, access through private endpoints, and stop or start when not needed with per second billing. These features result in greater cost control and flexibility than appliance deployments. For more information, see Overview of Autonomous Databases.
Oracle Cloud Guard provides an out-of-the-box solution for customers to monitor, identify, achieve, and maintain a strong security posture. This new service provides preconfigured security best practices by Oracle and is enabled with a single click. When it’s provisioned, you can customize individual best practice security rules. For more information on Cloud Guard, refer to the Cloud Guard documentation.
Let’s look at the example of migrating a Teradata data warehouse appliance to OCI. You can take a similar approach for other types of data warehouse, though the on-premises tools needed might differ.
For migration of a Teradata appliance, you need the following components:
SQL Developer, SQL Loader, and Oracle Data Sync (on-premises): These services extract and transform data stored on-premises and transfer it to OCI Object Storage to be loaded into Autonomous Data Warehouse. You can also use Oracle Data Integrator, either on-premises or in OCI using a Marketplace image. Consider using this tool if you want to support ongoing data migration.
Virtual cloud network (VCN) and service gateway (OCI): The service gateway ensures that data transferred from on-premises is done securely using OCI’s private backbone.
Object Storage (OCI): Object Storage stores the transferred data and then loads it into Autonomous Data Warehouse.
Autonomous Data Warehouse (OCI): The new database that hosts data transferred from the on-premises data warehouse
For the overall Teradata migration strategy, we need to consider the following areas:
Migration of the database model, such as table definitions and indexes
Applications for downstream and upstream considerations
Moving the schema is one of the key steps in all migrations. We can use SQL Developer (for Teradata or a custom tool for other data warehouses) for some of these objects. SQL Developer allows you to capture the model and generate the syntax for tables and indexes using its migration repository. When SQL Developer captures the model, you can amend this model before applying the changes to Oracle.
You might have stored procedures, macros, and BTEQ scripts. SQL Developer can’t move these objects. You have to migrate them manually to work with Oracle. For the BTEQ scripts, you can use the Teradata native utility to copy and paste only the SQL.
You have the following options for migrating data:
Export and import (non-incremental data):
Export the data from Teradata using FastExport.
Compress the files.
Move files to Object Storage.
Use the DBMS_CLOUD package to load the data.
For incremental data, you need to export the changed records with the following steps:
Move incremental data to Object Storage.
Load the data to a staging table.
Use your preferred method to do a change data capture (CDC).
For Teradata, SQL Developer can migrate the data utilizing the migration repository created during the database model generation.
Oracle Data Sync, another free tool by Oracle, can connect directly between Teradata and Autonomous Database. This tool can carry out bulk and CDC to Oracle Autonomous Database.
With all migrations, you need to assess all downstream and upstream applications. For example, an ETL application needs to be reconfigured to work with Oracle. Sometimes, this process is a simple matter of choosing a different target within the application. For the migration of other data warehouses, you can use custom tools to achieve tasks performed by SQL Developer.
After the data is transferred to an Autonomous Data Warehouse instance, you can not only access and operate on it using traditional tools, but you can take advantage of built-in analytics tools, such as Oracle machine learning and creating a data catalog. You can also enable analytics services using Oracle Analytics Cloud or Fusion Analytics Warehouse and others.
In summary, a migration of an on-premises data warehouse deployment to Oracle Cloud Infrastructure provides the flexibility, scaling, cost control, data monetization capabilities, and ease of operation. This process addresses some of the main challenges faced with an on-premises deployment. For more information, refer to the following reference architectures: