David Allan

Architect

Recent Blogs

Troubleshooting Spark Pools using OCI Dataflow

OCI Dataflow Pools are ideal for scenarios requiring numerous executors in a Dataflow cluster for large volumes with minimized startup times. They also isolate resources associated with critical production workloads, remaining unaffected by dynamic development activities. This blog will look at common issues in their use and how to remediate.

Rings of Security for Big Data and AI pipelines in the Cloud

In order to meet the increasing requirements of running more analytical jobs to get more insights into my organization’s data and reap benefits from it, it makes more sense to move Big Data pipelines to the cloud, as this meets requirements of building and deploying clusters within minutes with simplified user experience, scalability, and reliability. You can custom configure the environment, administer through multiple interfaces, and more importantly, scale on demand.

Notebooks, Big Data and Spark Jobs in OCI - Best Practices and Examples

This blog will introduce how to execute PySpark and Scala at scale from notebooks in OCI along with best practices to efficiently process that data. In the ever-evolving landscape of technology, the synergy between powerful data processing, collaborative analysis, and cloud computing has become a cornerstone for innovation. This blog explores the integration of notebooks commonly used for data science and big data on Oracle Cloud Infrastructure (OCI).

Managing your machine learning lifecycle with MLflow and OCI

In this blog, we explore how MLflow, an open-source platform, can be integrated with MySQL and Oracle Cloud Infrastructure (OCI) Data Science to streamline machine learning workflows and enhance collaboration among data scientists and engineers. You can write these applications in Python, Scala, and PySpark and you can also connect a Data Science notebook session to Data Flow to run applications.

Multi-Cloud: Copying Data from Azure Data Lake to Oracle’s OCI Object Storage using OCI Data Flow

In today’s digital landscape, organizations often find themselves working with multiple cloud providers to leverage the best features and capabilities offered by each platform. This multi-cloud approach enables businesses to distribute their workloads, optimize costs, and avoid vendor lock-in. In this blog post, we will explore a real-world scenario where we copy data from Azure Data Lake to Oracle’s OCI Object Storage using OCI Data Flow, showcasing the power of a multi-cloud architecture.

  1. View more

Receive the latest blog updates