This blog post guides you through the process of moving petabytes of data from Amazon Web Services (AWS) S3 to Oracle Cloud Infrastructure (OCI) Object Storage using the powerful open source tool, Rclone.
A successful and efficient data migration requires a comprehensive understanding of impacting factors and thoughtful planning. Tailoring your migration approach to account for the unique characteristics of the dataset and considering the implications of factors, such as file size, hierarchy, folder structure, storage tiers, and egress costs, contribute to a smoother and more cost-effective migration process.
Rclone emerges as the right choice for file migrations, particularly when dealing with larger files. Its cross-platform versatility, extensive cloud storage provider support, efficient handling of large files, data integrity features, customizable transfer options, and active community support collectively position Rclone as a powerful and reliable tool for organizations navigating the complexities of file migration.
The OCI Object Storage service comes with predefined defaults, commonly referred to as soft limits, on a per-bucket and per-tenancy basis. Understanding these defaults is crucial, especially when dealing with the migration or importation of extensive datasets. Adjusting or modifying specific parameters might be necessary, based on the unique requirements of the customer.
Like any other cloud vendor, OCI also imposes soft limits to prevent unintended resource spikes. You can adjust these limits on request based on available capacity within a specific region. The soft limits for Object Storage include the following options:
With soft limits, Object storage also has the following hard limits in place:
These hard limits are more stringent and are meant to define absolute constraints for specific aspects of Object Storage within OCI. Understanding and potentially adjusting the soft limits can be crucial for users dealing with large-scale storage requirements within OCI.
In the context of file migration, the process can be simplified by categorizing it into two scenarios based on average file size: files larger and smaller than 4 MB. For files smaller than 4 MB, the limiting factor is the number of requests made to Object Storage because of the higher volume of smaller files. For files larger than 4 MB, the limiting factor is bandwidth. Most customer migrations tend to involve larger files and various factors significantly impact throughput, influencing file transfer performance.
Let’s now focus on the second case involving larger files, delving into the intricacies of factors affecting file transfer performance.
You’re planning to migrate 10 PB of data to OCI Object Storage. How much bandwidth do you need? What are the limiting factors? We recommend OCI FastConnect and AWS Direct Connect for higher throughput, though it isn’t required. You can use a VPN. But it’s important to perform a few preliminary tests to confirm how much transfer rate is required.
Let’s start looking into the following approach to calculate bandwidth. These calculations assume that we change a few Object Storage soft limits. Contact the Object Storage product team to discuss and change these soft limits.
Assume the following parameters for the example: (^) = soft limits, writes_per_bucket up to 30K, throttling_rate_limit_per_tenancy up to 150K
Now, we calculate the days required to transfer 10 PB data:
10 PB transfer takes = 10 * 1024 * 1024 * 1024 MB / 90,000 MBps = 119304.6471 seconds = 1.38 days
Multiple connectivity models like FastConnect or VPN are available to connect OCI to AWS. The following diagram shows an example using an OCI FastConnect partner, Megaport, to OCI to AWS Direct Connect.
To achieve high throughput, use a multithreaded approach, deploy multiple virtual machines (VM) targeting different buckets, and perform your transfer.
To get optimal performance, take advantage of Rclone features, such as number of simultaneous transfers, checkers if you have to transfer the file, bandwidth, skipping checksum, max depth, and file hierarchy. For Rclone setup and configuration refer to the blog, Announcing native OCI Object Storage provider backend support in rclone, and Migrate Data to Oracle Cloud Infrastructure Object Storage Using Rclone in our documentation.
Migrating petabytes of data from one cloud provider to another is a complex task that requires careful planning and processing. Oracle Cloud Infrastructure Object Storage is well-equipped with parameters that allow it to accommodate customer requirements. Rclone simplifies this process by providing a reliable and efficient means of transferring data between different cloud storage services.
Reevaluate your storage needs and cost: Sign up for the free-tier (Cloud Free Tier | Oracle) of OCI and experiment for yourself on what parameters work best for your PB scale movement of data.
Srinivas (Srini) embarked on his career as an RPG programmer, later delving into the design and development of client-server technologies and crafting n-tier/microservices/loosely coupled architectures over two decades. With extensive experience in systems integration, primarily utilizing Microsoft's Biztalk Integration tool, he has spent the past six years dedicated to designing, building, and architecting cloud-native applications on Azure.
Presently, the emphasis is on aiding customers in formulating and cultivating their cloud strategy, with a primary focus on Containerization technologies such as K8s, service mesh, and microservices.
Responsible for driving Private Equity and Strategic partnerships, customer success and solution architecture of Oracle's Cloud Infrastructure (OCI).