As we progress further into the digital era, ensuring the protection of sensitive data isn’t just a priority. It’s a business imperative. Financial services, education, retail, healthcare, and governmental entities are among the myriad industries grappling with the challenges of data privacy and security. Failure to protect personal identifiable information (PII) can lead to identity theft, financial loss, and reputational damage of individuals and businesses alike, highlighting the importance of taking appropriate measures to safeguard sensitive information. To navigate this complex landscape, the Accelerated Data Science (ADS) PII operator emerges as a robust tool designed to identify and secure protected health information (PHI) and personally identifiable information (PII).
What is PII?
PII refers to any information that can identify an individual, encompassing financial, medical, educational, and employment records, such as name, social security number, contact information, and driver license number. In addition to general data privacy regulations, some industries have their own specific regulations for protecting sensitive information. For example, in the US the Health Insurance Portability and Accountability Act of 1996 (HIPAA) regulates how the healthcare and life sciences organizations collect, store, and protect medical records and PHI of patients. These industry-specific regulations help ensure that organizations in these sectors protect sensitive information and maintain the trust of their customers and clients.
The gap in enterprise protecting PII
As enterprises across various industries continue to digitize and handle ever-increasing volumes of personal data, the risk of data breaches and noncompliance with privacy regulations also escalates. Despite advancements in cybersecurity, a significant gap remains in effectively protecting PII at the scale and complexity demanded by large organizations.

ADS PII operator: The vanguard of data protection
Crafted by the synergy of Oracle Cloud Infrastructure (OCI) AI expertise and a keen understanding of data privacy demands, OCI Data Science’s ADS PII operator isn’t merely a tool. It’s a paradigm shift that’s changing how data is protected:
-
Automated detection and classification: Using pattern matching and AI-powered solution, the ADS PII operator efficiently identifies sensitive data on free form texts.
-
Intelligent coreference resolution: A standout feature of the ADS PII operator is its ability to maintain coreference entity relationships even after anonymization. This feature not only anonymizes the data but also preserves the statistical properties of the data.
-
Flexible Integration: The operator seamlessly integrates with your existing systems, providing a layer of protection without disrupting workflows.
-
Simplified task definition and execution: The integration of YAML configurations means that the ADS PII operator is equipped with preset profiles that embody domain-specific knowledge, ready to be utilized out of the box. Yet, it retains flexibility through customizable YAML files, empowering users to tailor the operator to their precise specifications. The accompanying CLI tool is the epitome of user-friendly design, simplifying the deployment and execution process to a single line command. It streamlines the operator’s use across varied environments, enabling you to leverage its capabilities with minimal setup hassle.
-
Tailored to every business: Whether you’re safeguarding a small clinic’s patient records or a multinational’s client data, the ADS PII operator molds to your context. With customization at its core, it aligns with your data protection goals, however granular they are.
-
Enterprise resilience: Designed for organizations demanding the highest standard of data security, the ADS PII operator is built to scale, to withstand, and to excel, backed by Oracle’s robust Data Science and AI platform. For example, enabling batch processing within OCI Data Science jobs allows for the processing of large volumes of data in an efficient and timely manner, ensuring that projects of any size can be handled with ease.
Empowering control with insightful analytics
Beyond mere defense, the ADS PII operator empowers enterprises with deep insights. Each deployment yields comprehensive analytics, detailing protection levels, and potential vulnerabilities, translating complex data security metrics into actionable intelligence.

Start your journey with ADS PII operator
We have the following medical records to process:
| documentation_id | patient_visit_date | content |
| 00001cee341fdb12 | 01/01/2012 | “Hi, this is John Smith. My number is (805) 555-1234.” |
| 00097b6214686db5 | 01/31/2012 | “John recently got a beautiful puppy. He may be allergic to dog hair. “ |
These two records belong to John Smith. We redact and anonymize the PII—name and phone number—in both records.
Requirements
To use the ADS PII operator, install Accelerated Data Science with the following command. ADS supports this feature after version 2.9.0.
pip install oracle-ads>=2.9.0 -U
Configure
Set up “ads opctl” on your machine by running the following command. For more details, see Getting started with ADS operators.
ads opctl configure
Now, you’re ready to generate starter YAML configs for the operators.
ads operator init -t pii --output ~/pii/
Edit the generated pii.yaml file based on your needs.
kind: operator
type: pii
version: v1
spec:
output_directory:
url: oci://my-bucket@my-tenancy/results
name: mydata-out.csv
report:
report_file_name: report.html
show_sensitive_content: true
input_data:
url: oci://my-bucket@my-tenancy/mydata.csv
target_column: content
detectors:
- name: spacy.en_core_web_trf.person
action: anonymize
- name: default.phone
action: anonymize
Run
After you have your pii.yaml written, you can run the PII:
ads operator run -f ~/pii/pii.yaml
Interpret the results
The ADS PII operator produces two output files, mydata-out.csv and report.html, under the given output_directory.
-
mydata-out.csv: This file contains the processed dataset.
documentation_id patient_visit_date content 00001cee341fdb12 01/01/2012 “Hi, this is David Doe. my number is (123) 456-7890.” 00097b6214686db5 01/31/2012 “David recently got a beautiful puppy. He may be allergic to dog hair. “ -
report.html: The report.html file is customized based on report parameters in the configuration yaml. It contains a summary of statistics, a plot of entities distributions, details of the resolved entites, and details about any model used. By default, sensitive information isn’t shown in the report, but for debugging purposes, you can disable this with show_sensitive_content. It also includes a copy of YAML file, providing a fully detailed version of the original specification.

What’s next?
For more information and examples, check out the ADS documentation and download Oracle ADS from PyPI.
Stay tuned for our next blog post where we delve into the nuts and bolts of customizing the use of ADS PII operator. Witness firsthand the might of this tool as it stands guard over the sanctity of your data. With Oracle’s cutting-edge technology, discover peace of mind in an otherwise tumultuous realm of data threats.
Protecting PII isn’t just an operational task—it’s a mission. The ADS PII operator is your ally in this mission. Join us as we step into a new epoch of data security with confidence and assurance.
Try Oracle Cloud Free Trial! A 30-day trial with US$300 in free credits gives you access to Oracle Cloud Infrastructure Data Science service. For more information, see the following resources:
-
Full sample, including all files in OCI Data Science sample repository on GitHub.
-
Visit our service documentation.
-
Try one of our LiveLabs. Search for “data science.”
Got questions? Reach out to us at ask-oci-data-science_grp@oracle.com.



