By Stefano Tagliabue, CIPP/E
Anonymisation is of particular relevance at the moment, given the increased amount of information being made publicly available through Open Data initiatives and through individuals posting their own personal data online. Furthermore, the concept of anonymisation is fundamental for organisations that intend to take advantage of the possibilities offered by Big Data analytics without putting at risk the privacy of the data subjects.
In this context, the UK Information Commissioner’s Office (ICO) issued in November 2012 a code of practice on anonymisation and the management of the related data protection risks.
The ICO moves from the consideration that the current EU Data Protection Directive says that the principles of data protection shall not apply to data rendered anonymous in such a way that the data subject is no longer identifiable.
However, the concept of identification, and therefore of anonymisation, is not straightforward because individuals can be identified in a number of different ways. This can include direct identification and indirect identification, where two or more data sources are combined. But determining what other information are out there, who they are available to and whether they are likely to be used in a re-identification process can clearly be extremely problematic.
Besides, we use the broad term anonymisation to cover various techniques: For instance, a distinction can be drawn between techniques used to produce aggregated information and those—such as pseudonymisation—that produce anonymised data but on an individual-level basis.
From a privacy perspective, the ICO points out that the risks related to a possible re-identification of the data subject might include:
- information about someone’s private life ending up in the public domain;
- an anonymised database being cracked;
- individuals being caused loss, distress, embarrassment or anxiety;
- reduced public trust in the disclosing organisation, and
- legal problems where insufficiently redacted data is disclosed.
In some circumstances, the release of anonymised data that point to a group of individuals can present a privacy risk. An example would be releasing information about a serious crime committed by someone living in a small geographical area if reprisals were likely.
A useful test—used also by the ICO and the competent English Tribunal—involves considering whether an intruder would be able to achieve re-identification if motivated to attempt this.
The “motivated intruder” is taken to be a person who starts without any prior knowledge but who wishes to identify the individual from the anonymised data. This approach assumes that the motivated intruder is reasonably competent; has access to resources such as the Internet, libraries and all public documents, and would employ investigative techniques such as making enquiries of people who may have additional knowledge, etc. However, the motivated intruder is not assumed to have any specialist knowledge such as computer hacking skills or to have access to specialist equipment or to resort to criminality, such as burglary, to gain access to securely kept data.
There will clearly be borderline cases where it will be difficult, or even impossible, to determine whether it is likely that re-identification will take place. The risk posed to individuals by disclosure, or the public benefit of this, are not factors that privacy laws in general allow to be taken into account when determining whether or not information is personal data. In reality, though, some types of data will be more attractive to a motivated intruder than others—and more consequential for individuals. So, in the ICO’s view, these factors should inform an organisation’s approach to disclosure, especially in borderline cases.
Furthermore, for an organisation involved in the anonymisation and disclosure of data, the ICO suggests as a good practice to have an effective and comprehensive governance structure in place. Such governance structure should cover the following areas: responsibility; staff training; procedures; knowledge management regarding any new guidance or case law; a joined-up approach with other organisations; privacy impact assessment (PIA); transparency—for instance, anonymisation techniques employed, the risks involved, how data subjects can exercise their rights, etc.; review of the consequences of the anonymisation program; disaster recovery—what to do if re-identification does take place.
Besides, organisations should carry out a periodic review of their policy on the release of data and of the techniques used to anonymise it, based on current and foreseeable future threats.
It is also good practice to use re-identification testing—a type of penetration testing—to detect and deal with re-identification vulnerabilities. There can be advantages in using a third party to carry out such testing, as it may be aware of data resources, techniques or types of vulnerability that you have overlooked or are not aware of.
On the other hand, the ICO interestingly says that the UK’s Data Protection Act does not require anonymisation to be completely risk-free; you must be able to mitigate the risk of identification until it is remote. The ICO’s code also supports the view that privacy laws should not prevent the anonymisation of personal data given that anonymisation safeguards individuals’ privacy and is a practical example of the Privacy-by-Design and data-minimisation principles that data protection law promotes. Besides, anonymisation may help organisations to comply with their data protection obligations whilst enabling them to make information available to the public. In fact there are cases where organisations may want, or be required, to publish information derived from the personal data they hold. For example, health service organisations are required to protect the identities of individual patients but may also be required to publish statistics about patient outcomes.
The ICO’s code also offers interesting insights on the various techniques that may be employed in order to effectively anonymise data, such as data-masking, pseudonymisation, aggregation, etc., and includes examples and case studies. In applying these techniques, the context of the related information and other variables is key. The objective should be to achieve the maximum level of detail that can be balanced with the protection of individuals’ privacy. To this end, the ICO recommends that a PIA should be carried out. It is also important to distinguish between publication to the world at large; e.g., under a Freedom of Information Act or open data initiatives, and limited access, for instance, within a closed community of researchers where it is possible to restrict the further disclosure or use of the data.
What about the other side of the coin; i.e., creating personal data from anonymised data? If an organisation collects or creates personal data, then it will take on its own data protection responsibilities in respect of that data. This could clearly present reputational or legal problems, particularly where individuals would not expect your organisation to have personal data about them or may find this objectionable. In this regard, the ICO makes clear that, where an organisation collects personal data through a re-identification process without individuals’ knowledge or consent, it will be obtaining personal data unlawfully and could be subject to enforcement action.
Stefano Tagliabue, CIPP/E, CISSP, CISA, works in Telecom Italia’s Privacy Department and has years of experience in managing privacy and information security issues in the telecommunication industry. Stefano co-chairs the IAPP KnowledgeNet in Milan, Italy.