Migrating petabytes of data from AWS S3 to OCI Object Storage using Rclone

January 25, 2024 | 4 minute read
Srinivas Mukala
Master Principal Cloud Architect
Thameem Khan
CTO, Cloud First
Manoj Ghosh
Consulting Member of Technical Staff
Text Size 100%:

This blog post guides you through the process of moving petabytes of data from Amazon Web Services (AWS) S3 to Oracle Cloud Infrastructure (OCI) Object Storage using the powerful open source tool, Rclone.

Migration planning and tooling

A successful and efficient data migration requires a comprehensive understanding of impacting factors and thoughtful planning. Tailoring your migration approach to account for the unique characteristics of the dataset and considering the implications of factors, such as file size, hierarchy, folder structure, storage tiers, and egress costs, contribute to a smoother and more cost-effective migration process.

Rclone emerges as the right choice for file migrations, particularly when dealing with larger files. Its cross-platform versatility, extensive cloud storage provider support, efficient handling of large files, data integrity features, customizable transfer options, and active community support collectively position Rclone as a powerful and reliable tool for organizations navigating the complexities of file migration.

The OCI Object Storage service comes with predefined defaults, commonly referred to as soft limits, on a per-bucket and per-tenancy basis. Understanding these defaults is crucial, especially when dealing with the migration or importation of extensive datasets. Adjusting or modifying specific parameters might be necessary, based on the unique requirements of the customer.

Like any other cloud vendor, OCI also imposes soft limits to prevent unintended resource spikes. You can adjust these limits on request based on available capacity within a specific region. The soft limits for Object Storage include the following options:

  • 10,000 buckets per tenancy
  • 4 billion objects per bucket (higher when the bucket is multisharded)
  • 3,000 writes/second per bucket
  • 10,000 reads/second per bucket
  • 2,000 lists/second per bucket
  • Bandwidth up to 1 Tbps

With soft limits, Object storage also has the following hard limits in place:

  • Number of Object Storage namespaces per root compartment: 1
  • Unlimited number of buckets for a customer
  • Maximum object size: 10 TiB
  • Maximum number of parts in multipart upload: 10,000
  • Maximum object part size in multipart upload: 50 GiB 

These hard limits are more stringent and are meant to define absolute constraints for specific aspects of Object Storage within OCI. Understanding and potentially adjusting the soft limits can be crucial for users dealing with large-scale storage requirements within OCI.

In the context of file migration, the process can be simplified by categorizing it into two scenarios based on average file size: files larger and smaller than 4 MB. For files smaller than 4 MB, the limiting factor is the number of requests made to Object Storage because of the higher volume of smaller files. For files larger than 4 MB, the limiting factor is bandwidth. Most customer migrations tend to involve larger files and various factors significantly impact throughput, influencing file transfer performance.

Let’s now focus on the second case involving larger files, delving into the intricacies of factors affecting file transfer performance.

You’re planning to migrate 10 PB of data to OCI Object Storage. How much bandwidth do you need? What are the limiting factors? We recommend OCI FastConnect and AWS Direct Connect for higher throughput, though it isn’t required. You can use a VPN. But it’s important to perform a few preliminary tests to confirm how much transfer rate is required.

Let’s start looking into the following approach to calculate bandwidth. These calculations assume that we change a few Object Storage soft limits. Contact the Object Storage product team to discuss and change these soft limits.

  • total_bandwidth_needed              = concurrent_write_streams * write_speed_per_stream
  • concurrent_write_streams            = minimum of ( total_writes_across_all_buckets, throttling_rate_limit_per_tenancy )
  • total_writes_across_all_buckets  = number_of_buckets * writes_per_bucket   

Assume the following parameters for the example:  (^) = soft limits, writes_per_bucket up to 30K, throttling_rate_limit_per_tenancy up to 150K

  • number_of_buckets                      = 3
  • writes_per_bucket                         = 30K (^), soft limit increased from 3K to 30K
  • total_writes_across_all_buckets  = 3 * 30K = 90K
  • throttling_rate_limit per tenancy  = 150K (^) soft limit increased from 50K to 150K
  • concurrent_write_streams            = minimum( 90K, 150K ) = 90K
  • write_speed_per_stream                = 1 Mbyte per sec (assumption), theoretical limit Object Storage takes up to 60 MB per sec for a single stream
  • total_bandwidth_needed             = 1MBps * 90K == 90,000 MBps

Now, we calculate the days required to transfer 10 PB data:

10 PB transfer takes = 10 * 1024 * 1024 * 1024 MB / 90,000 MBps

                      = 119304.6471 seconds = 1.38 days

Architecture

Multiple connectivity models like FastConnect or VPN are available to connect OCI to AWS. The following diagram shows an example using an OCI FastConnect partner, Megaport, to OCI to AWS Direct Connect.

To achieve high throughput, use a multithreaded approach, deploy multiple virtual machines (VM) targeting different buckets, and perform your transfer.

To get optimal performance, take advantage of Rclone features, such as number of simultaneous transfers, checkers if you have to transfer the file, bandwidth, skipping checksum, max depth, and file hierarchy. For Rclone setup and configuration refer to the blog, Announcing native OCI Object Storage provider backend support in rclone, and Migrate Data to Oracle Cloud Infrastructure Object Storage Using Rclone in our documentation.

Architecture Diagram
 

Conclusion

Migrating petabytes of data from one cloud provider to another is a complex task that requires careful planning and processing. Oracle Cloud Infrastructure Object Storage is well-equipped with parameters that allow it to accommodate customer requirements. Rclone simplifies this process by providing a reliable and efficient means of transferring data between different cloud storage services.

Call to Action

Reevaluate your storage needs and cost: Sign up for the free-tier (Cloud Free Tier | Oracle) of OCI and experiment for yourself on what parameters work best for your PB scale movement of data.

Srinivas Mukala

Master Principal Cloud Architect

Srinivas (Srini) embarked on his career as an RPG programmer, later delving into the design and development of client-server technologies and crafting n-tier/microservices/loosely coupled architectures over two decades. With extensive experience in systems integration, primarily utilizing Microsoft's Biztalk Integration tool, he has spent the past six years dedicated to designing, building, and architecting cloud-native applications on Azure.

Presently, the emphasis is on aiding customers in formulating and cultivating their cloud strategy, with a primary focus on Containerization technologies such as K8s, service mesh, and microservices.

Thameem Khan

CTO, Cloud First

Responsible for driving Private Equity and Strategic partnerships, customer success and solution architecture of Oracle's Cloud Infrastructure (OCI).

Manoj Ghosh

Consulting Member of Technical Staff


Previous Post

Announcing support for alarm suppression and alarm history by dimension in OCI Monitoring

Satyendra Kuntal | 2 min read

Next Post


OCI Dedicated KMS: Owning your keys and HSM partitions in the cloud

FREDERICK BOSCO | 5 min read