Written By: Patrick McLaughlin, Security Architect and Oracle Fellow
The EU GDPR regulation provides many rights to people who are located in the EU (European Economic area in fact). The rights are described in Section 2 ‘Information and access to personal data’ in Articles 13-22. Some of the rights were available prior to the GDPR; with strengthened rights under the GDPR, together with, the risk of high fines and other legal-remedies under the GDPR, all organisation providing goods and services to individuals located in the EU, are taking the rights of individuals, much more seriously. An individual’s rights can be exercised against the ‘data controller’, who is the organisation who decides to collect, process or store the personal information. The rights include: the right to get access to one’s personal data, the right to rectification if the data is inaccurate, the right to get data in a portable format, rights to restrict, block or object to the processing of one’s personal data, and finally and most importantly from the point of view of this article the right to erasure.
The right to erasure is also known as the right to be forgotten and enables a person (located in the EU) to request that data belonging to them be deleted, for example, if there is no legal basis for its continued processing. The right to be forgotten was established in 2014 by the highest court in Europe the ECJ/CJEU, as a result of the Google Spain v AEPD and Mario Costeja González case.
The right to erasure creates challenges across all IT systems created over the past many decades. There has been a lot of ‘IT-sprawl’ in the past 20 years with the proliferation of application and data silos, with considerable duplication of personal data in many different systems. IT departments had the goal of ensuring high availability of data, including the availability of reliable backups of all data, typically over indefinite periods of time. The designers of applications and backup solutions did not and could not foresee the need to be able to selectively delete, personal data of individuals upon request, across structured and unstructured systems.
Blockchain is a relatively new concept and technology architecture, derived from the bitcoin architecture, but having application outside of crypto-currencies in business systems requiring a high degree of trust and ‘traceability’ between interacting parties. In the past digital signatures based on Public Key Infrastructure were deployed as a solution to (dis)trust between interacting/transacting parties. PKI solutions work well from a technical and legal perspective but they have not come into widespread use. Blockchain, also signature-based, is regarded as a disruptive force that can make business engagement more efficient, change the structure of markets, and enable the creation of new services.
A Blockchain works by keeping a history, of all data written onto it, in principle, forever. Newly written data is cryptographically related to all existing data on the blockchain by including the hash-of-existing-data into the newly computed hash that includes the new data.
Blockchains inventors, like traditional IT architects, did not foresee the need to delete data from the chain and instead highlight the strength of not being able to delete data (data-immutability). One exception is that Accenture has patented a scheme for editing a permissioned blockchain which leaves a ‘scar’ – see here.
In the absence of this editing capability becoming widespread, organisation are faced with the difficulty of complying with, the GDPR right to erasure, in conjunction with, gaining benefit of using blockchain technology. The GDPR requires organisations, who have the role of a data-controller, and are exploring the use of new technologies, that may carry high risks ‘to the rights and freedoms’ for individuals, to carry out a data protection impact assessment, and there is detailed guidance on how to make such an assessment, from the Article 29 working party – see here. Given blockchain is a relatively new technology and if a data controller will use a blockchain to, store, process or communicate personal data, it’s very likely they should carry out such a formal data protection impact assessment, and it will have to address, the difficulty of handling the legal right to erasure and rectification of personal data. The controller may need to consult with their Data Protection Authority and be able to explain and convince the authority about their approach.
It is generally accepted that writing business and personal data directly to a blockchain is undesirable as blockchains are not performant enough (yet), and instead a hash of the dataset should be written to the blockchain.
In the GDPR, personal data has a very wide definition and includes any data item that could potentially be used to identify an individual. A somewhat surprising example is that a dynamic IP address can be personal data if it can be used to help identify an individual see here.
So, with the blockchain its necessary to think about the right to erasure of data concerning an identified or identifiable person.
The personal data may not be secret, but its presence in a transaction on a blockchain is what an individual may wish to have deleted. For example, an individual may want to erase the fact that they stayed at a hotel chain at a certain time, or that they bought medication over the internet for a certain ailment. It’s also possible that hashed personal data written to a blockchain could be guessed or found out by trial and error / brute-force-attack, in the same way that dictionary-attacks work to crack passwords - the complexity of doing so, will depend on the ‘formula’ for calculating what gets hashed, and the formula could be guessed or ascertained by other means.
The result is that simply writing hashed personal data to the blockchain that cannot be overwritten or deleted is incompatible with the need to delete data under the GDPR right to erasure. Hashed data is more akin to pseudonymised data in GDPR terms, as the data subject is at least somewhat identifiable to the data controller. Were this not the case, one has to ask how would the data controller process the hashed personal data? The assumption must be that they have the underlying data stored off the blockchain and they know the formula to check if that data is present on the blockchain e.g. by hashing some combination of attributes – otherwise what is the purpose of having data on the blockchain!
The obvious solution is to not write either personal data either, in-the-clear or in hashed format to a blockchain.
Below I discuss what can be done where one needs to write hashed personal data to the blockchain.
The GDPR makes it clear that anonymised data i.e. data that in no way can be related back to an individual is not personal data. There is reference to ‘data rendered anonymous in such a way that the data subject is no longer identifiable‘, which begs the question how could one anonymise data. Hashing is not sufficient; however, encryption would do the job if the encryption key is immediately deleted. Not deleting the encryption key, would mean the data could be decrypted and hence the individual would be identifiable.
So, a good solution would be to only store encrypted, hashed personal data on the blockchain and if a data erasure request is accepted, reliably throw away the encryption key(s) to make the data anonymous and un-recoverable. This is the closest to full erasure than can be done. Storing encrypted data clearly enhances the security of the stored data and given that having appropriate data security, is another requirement of the GDPR there is an additional benefit of encryption.
A key management solution would be needed that would assist with data erasure, through key deletion. People will have the right to request deletion of a subset of their data, for example, if they withdraw their consent for some very-sensitive personal data to be processed. Therefore, sophisticated use of a key management solution that enables encryption of fine-grained personal-data would be required, as an accompaniment to the blockchain.
Clearly the encryption keys should not be stored on the blockchain as the blockchain would not allow their deletion! They could be stored in a simple, 2-column KeyID and Value database (relational or non-relational). The value would be the personal-data-item encryption-key, itself encrypted using, a ‘master’ key-encryption key. The KeyID would be derived, for example, by hashing the data being stored together with a nonce. The master key-encryption key could be stored in a Hardware Security Module, to increase its protection. It’s likely that other columns will be needed e.g. to record the deletion of the encryption key in response to a specific data-erasure request, received from a specific person, at a specific time. An interesting legal question arises as to whether the organisation can / should record and store the request for Erasure and the action taken on the blockchain. The benefit would be to have an dependable (perhaps immutable) record of the activity, as part of the formal record of processing, but what if a request is then received to erase all data identifying the same individual. Data controllers, need to consult their legal representatives on whether and where, such a record of erasure should be maintained as evidence of acting appropriately, on the original individuals request.
When it comes to deleting personal data, not stored on the blockchain, today organisations are trusted to simply delete the data and confirm they have done so. By extension, the same organisation storing personal data on a blockchain, could be equally trusted to delete the encryption key, associated with that individual encrypted item.
Instead of relying on a single key to encrypt and decrypt hashed personal data, it’s possible to split the encryption key into 2 or more parts, so that for example the data controller has one part and the individual has the other part. To encrypt or decrypt data, both key parts would be needed. Requiring m of n key-shares to enable encryption or decryption, is a well-established technique, even though its uncommon in commercial systems – see here.
A 3-key, key management scheme is already in use, to enable the right to erasure, in a blockchain application that enables the storage of diplomas and degrees – see here. The scheme has the following keys.
If the graduate deletes her key, the system will no longer be able to decrypt the diploma and thus the diploma is effectively anonymised/deleted. The graduate does not need to rely upon and trust the school/college to delete a single encryption key.
Data-protection/privacy by design and default, is a key tenet of the GDPR and this principle is expected to be applied when developing new systems.
To handle the right to erasure, an application developer must leverage a key generation function and encryption library to ensure that hashed personal data is encrypted before storing on the blockchain.
A better alternative would be to make the encryption transparent for the developer so she can have personal data transparently encrypted using a dynamically generated key by simply calling a function that highlights the data as ‘personal’ so it undergoes the extra processing steps before storage:
put (bloodtype, Alice, blockchainX, personal)
or putPersonal (bloodtype, Alice, blockchainX)
A search function should be able to locate and transparently decrypt the data
get (alice, bloodtype, personal) -> Group AB
An erasure function would have the effect of transparently deleting the encryption key resulting in:
erase (alice, bloodtype, personal) -> confirmed
get (alice, bloodtype, personal) -> Not found
All of these functions should be under the control of an access management system so that only the right people or entities could read, write or delete personal data on the blockchain. This article does not address the governance needed to handle erasure requests. All erasure request will not be accepted so an approval process will be required, for example, request to erasure records with the tax department will not be accepted.
A further alternative would be to have a personal data discovery function running in the background inspecting any personal data being: written, read or deleted on the blockchain and have it, transparently do the underlying encryption, decryption or key deletion as appropriate. Such an approach would need to be 100% reliable given the maximum fines under the GDPR of €20M or 4% of global revenue, whichever is higher, for infringing individuals rights, including the right to erasure and rectification of data.
A final alternative would be to use a hybrid scheme where the functions are explicitly invoked, for example by smart-contracts (programs) and a process is additionally running and checking if data being written to the blockchain is personal data. If so the write could be rejected if the data is not encrypted or the smart personal data detector could autonomously encrypt the data: a) to make it more secure and b) to ensure that the option is there to support a data subject erasure request.
The right to have inaccurate data changed is also enshrined in the GDPR. Let’s say Alice’s blood-type is not in fact AB and should be O; there is a compelling reason to ensure this data-item is rectified.
The solution would be to first erase the inaccurate data as above and add the correct blood-type to the blockchain at an appropriate location:
erase (alice, bloodtype, personal) -> confirmed
put (correctbloodtype, Alice, blockchainX, personal)
For personal data update, a dynamic personal data update function could transparently invoke the same two functions and even transparently verify the result:
erase (alice, bloodtype, personal) -> confirmed
put (correctbloodtype, Alice, blockchainX, personal)
get (alice, bloodtype, personal) -> Group O.
This could be done for all occurrences of the same attribute on the blockchain. The programmer would simply have to call an update function with or without the personal flag:
update (correctbloodtype, Alice, blockchainX, [personal])
This article is intended to highlight a real problem with using blockchain technology to process personal data, to propose concrete candidate solutions, that can stimulate discussion on how real the need is and to help reach conclusions among stakeholders involved in the development and adoption of blockchain technology.
 European Parliament and Council Regulation 2016/679 of 27 April 2016, repealing Directive 95/46/EC (General Data Protection Regulation), OJ L119/1, http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679&from=EN