As we plan our modern information architectures, we have a new concern that we’ve never had to face before: the ethical use of data. In the past, as we collected data from our transactions, there was no question that we could manipulate it as we saw fit. Our architecture responsibility was limited to efficiency, cost, and meeting an enterprise SLA.
But, today, we acquire data from many sources--sources that we may purchase from 3rd parties, capture from government entities, mine from social media platforms. Our analytics can be so good that we can derive conclusions through a confluence of data sources so that we can make the correct predictive conclusions.
At the heart of this is a controversy. We are faced with an ethical dilemma as we seek to make the business more effective. To what degree do we need to anonymize data? Do we need to track the provenance of all data? Of “opt in” status?
I think the architect has a responsibility to be an advocate for awareness of the ethical use of data in their company's care. They should help mitigate risk for the company and at a minimum advise the business on their data privacy, algorithmic, and analytics practices.
The last 10 years has been an intense period of data management innovation, as new business models demanded new approaches to the collection, ingestion, processing and consumption of data. The rapid rise of noSQL, data streaming and big data technologies are evidence of this, as is the current fascination with AI and machine learning. Developers have new tools at their disposal that let their imaginations go wild with the possibilities for how to harness the mountains of data that their organizations are routinely capturing.
The rapid rise of cloud computing has accelerated this trend. Cloud innovations are allowing for low cost collection and processing of mass amounts of data. Thanks to the cloud, the cost of innovating has nose-dived, dramatically reducing barriers to entry. Consequently, a wide variety of engineering innovation has emerged and freely available to students and professionals through Open Source.
With this explosion of data has come a rising tide of concern and scrutiny about how and why it is collected, and the purposes for which it is used. “Data science” processes (processes that rely on mass data collection, such as big data, AI, machine learning, image processing, natural language processing, virtual assistants, etc.) can be used to improve society, but can also enable unwanted intrusion into people’s personal lives, and worse.
Organizations have a responsibility to collect, process and use data legally and ethically. This responsibility applies, not just to the vague notion of your organization, but all the way to the lines of business, analysts, developers and architects—you. As an architect, what are you doing to ensure that data and algorithms are being used in accordance with the law, regulations and ethical principles, such as fairness? And has your information architecture been updated accordingly?
Let’s explore 2 facets of this issue and how architecture can respond: Data privacy and algorithm transparency.
Data privacy, once a seldom-discussed academic sub-topic of data security, is now front and center in the zeitgeist of citizens and regulators worldwide. The recent Facebook/Cambridge Analytica affair and the passage of EU’s GDPR are but 2 examples of how hot data privacy has become lately. Data privacy is a multi-faceted issue that requires a holistic approach, including aspects of architecture, compliance, data security, organization and policy.
Architecture: Your enterprise architecture principles should advocate privacy from the start by designing in privacy and security. Effective methods include limiting data collection, de-identifying data, securely storing retained data, restricting access to data, and safely disposing of data that is no longer needed.
Compliance: The regulatory landscape for data privacy is rapidly changing, and I won’t use this post to delve into it, but suffice it to say that your enterprise architecture needs to be informed by all the latest data privacy regulations, some of which have very real penalties. Here is one place to get started. If your organization transfers data between the US and the EU, you might also consider joining the Privacy Shield Framework, as Oracle has.
Data security: You cannot provide data privacy without a strong foundation of data security. Oracle RDBMS has been the predominant database engine used by organizations worldwide for the past several decades in part due to its comprehensive data security features and rock-solid reputation for protecting data. This is true on-prem and in the cloud—the features and capabilities that protected the world’s data on-prem are now available in Oracle’s cloud database offerings, features like Data Masking and Subsetting, Transparent Data Encryption, Virtual Private Database, Database Vault, two-factor authentication, Label Security, and more.
And if you are using databases other than Oracle, we have created new cloud services that deepen data security for all systems, Oracle or non-Oracle—look into the machine learning-based Security Monitoring and Analytics Cloud.
Policy: Privacy policies are generally written by lawyers, not architects, but as architects we should all be familiar with our own organization’s stated policies and participate in their ongoing evolution. If you are storing personal information, you should provide frequent and prominent disclosures using just-in-time principles and have a holistic view of data collection practices. Make sure your customers can easily contact you and there should be a process for responding to consumer concerns. Providers also need to find ways to effectively educate users about privacy settings.
Algorithms are the “smarts” underneath most data science processes. These algorithms are often like a black box—data goes in and “answers” come out. Algorithms are changing the world. But algorithms can have unintended consequences. There are plenty of examples of unintended consequences in the public square these days—one good overview can be found here. This is not just a problem that Facebook and Google need to solve—every organization needs to be vigilant to avoid unintended consequence. And regulators are not far behind: The EU Data Protection Board has determined that GDPR applies when algorithms use personal data for profiling or automatic decision-making (see here).
Anytime we consider big data, AI and machine learning, we must temper our breathless excitement about “the possibilities” with an appropriate concern for “the other possibilities.” We need to be worried about more than simply getting wrong answers—we need to make appropriate effort to avoid collateral damage with the data under our stewardship.
Businesses, government and scientists are learning that the information they collect can mislead and the conclusions that algorithms make can be wrong. We are seeing more and more concerns about “algorithm transparency,” “algorithmic discrimination” and “algorithmic bias.” It’s a real concern in the cloud era, and your organization should take steps to address it. One way is to establish a Data Ethics Committee (google it), or participate with on of the emerging industry coalitions focused on data ethics, like Pervade or the Council for Big Data, Ethics, and Society. A recent Oracle Data Science blog post entitled “How to Make Sure Your Machine Learning Model Holds Up in Court” includes some concrete ways to evaluate your organization’s machine learning algorithms:
Help people understand the model
Know what factors are important
Embrace messy data
Consult with other experts—early and often
Oracle is not exempt from the need for algorithm transparency: We have developed a broad portfolio of tools that define and execute algorithms for AI, machine learning, neural networks, big data and statistical analysis. Check out our Data Science Cloud Service that lets you build, train, deploy, and manage models on the Oracle Cloud. When our customers use these capabilities, we say to them the same thing the beer ads say: Enjoy responsibly! But Oracle also embeds AI and machine learning algorithms within our software-as-a-service (SaaS) and platform-as-a-service (PaaS) solutions, including big data, analytics and security operations. We strive to make our algorithms transparent so you can have full confidence in their outputs:
Oracle Management Cloud is a suite of monitoring, management, and analytics cloud services that leverage machine learning and big data techniques for anomaly detection. We use peer group analysis and SQL analysis models, which we make transparent by documenting them and allowing the models to be modified to your needs.
Oracle offers Adaptive Intelligent Apps across our entire business applications suite: CX, ERP, SCM, HCM & Manufacturing. Adaptive Intelligent apps have machine learning, artificial intelligence and decision science built in to help you produce better business outcomes.
Oracle Data Cloud is the industry leading marketing data management platform (DMP)—the pre-populated data sets that that marketing organizations use to personalize online, offline, and mobile marketing campaigns with richer and more-actionable information about targeted audiences. We provide transparency by publishing the kinds of personal information we store, where we get it, why and how we use it, how long we keep it, when and how we share it, how we secure it and what technologies we use to gather it. Oracle also provides tools for consumers to opt out of click-tracking and interest-based advertising. See here. Consumers can view individual cookies using the Oracle Data Cloud Registry tool to remove individual interest segments associated with their computer and browser.
We have a cloud-based Autonomous Database that uses machine learning to detect and recover server & storage failures, detect DB operations anomalies, perform hang management, optimize queries and index data. These algorithms use no personal data.
We are inundated with data, and governing its collection and use has become more important than ever. In addition to the suggestions made above, here are 2 more ways for enterprise architects to ensure appropriate use of data in the cloud era. (Adapted and expanded from Wes Pritchard’s blog).
1. Empower consumer choice. In apps, give users tools that enable choice, make it easy to find and use those tools, and honor the user's choices. In the case of personal information this is often required by law.
2. Regularly reassess your data collection practices. Consider your purpose in collecting the data, the retention period, third-party access, and the ability to make a personally identifiable profile of users. GDPR also requires that you document the lawful basis upon which you process personal information. (If you are subject to GDPR) Make sure your information processing policies include this important step.
If you have additional ideas about how to ensure that data and algorithms under your stewardship are being used in accordance with the law, regulations and ethical principles, please leave a comment.