Deidentified, Anonymized, Aggregated, Pseudonymized, What the Heck Are You Talking About!

Deidentified, anonymized, aggregated, pseudonymized, what the heck are you talking about!
Vanessa Deschênes [1]
ROBIC, LLP
Lawyers, Patent and Trademark Agents
“It is secure, the information is anonymized.” Have you ever heard that sentence? I bet you have!
On many occasions, while working on projects or speaking with a service provider, we are told that our information is secured because it is anonymized or that there is no problem using the data because it is kept anonymous. On a day-to-day basis, you or some of your team members may frequently encounter the concepts of “anonymity”, “deidentified,” and/or “aggregated” data.
For instance, the new Consumer Privacy Protection Act (“CPPA”), states that “an organization may use an individual’s personal information without their knowledge or consent for the organization’s internal research and development purposes, if the information is deidentified before it is used”.[2]
Another example could be found in a contract that excludes deidentified information from the scope of the parties’ obligations, or even your own Security team saying that the information has been anonymized because it does not contain “personally identifiable information.”
Everyone in charge of privacy or compliance with privacy laws should be vigilant when they hear the terms “deidentified,” “anonymized,” “aggregated,” “pseudonymized”, “not personally identifiable,” or similar terms, not only because they are not synonyms (as we will explain below) but because those terms have an impact on your privacy obligations as they may, or not, fall within the scope of data privacy laws. While there is a clear risk that you may disregard the terms of privacy laws in the mistaken belief that you are not processing personal data, you should exercise caution when attempting to determine if you are in presence or not of personal information.
But what are the differences?
What is Deidentified, Anonymized, Aggregated, or Pseudonymized Data?
Did you know there is a difference between some of these terms? First, we want to draw your attention to the fact that anonymization is hard and that most organisations are not qualified to build their own anonymization process. Even more, over the years, researchers[3] have proved that anonymized data can never be totally anonymous.[4] Unfortunately, that does not stop organisations from claiming datasets are completely and securely anonymized which could eventually become a problem as data can be re-identified.
Most companies usually deidentify rather than anonymize. This means that names and obvious identifiers are stripped out, but the remainder of the data is left untouched. In addition to the existing confusion, not all privacy laws treat these concepts the same way either. Take Bill C-11 as an example. The new CPPA defines deidentification as the act “to modify personal information — or create information from personal information — by using technical processes to ensure that the information does not identify an individual or could not be used in reasonably foreseeable circumstances, alone or in combination with other information, to identify an individual”.[5]
If we look at this definition, we realize that CPPA doesn’t set forth the means by which deidentification of information is to take place. Moreover, the definition conflates a concept of pseudonymized data and non-identifiable data. If we analyze the proposed Quebec Bill (Bill 64), the new section 23 provides criteria for anonymization which also helps us understand what the difference between anonymization and deidentification is:
“information concerning a natural person is anonymized if it irreversibly no longer allows the person to be identified directly or indirectly.”[6]
Looking at this provision, according to Quebec, we can conclude that:
- the anonymization process must be irreversible;
- it must be impossible to directly or indirectly identify the person concerned.
Regarding deidentification, the new section 12 states that “personal information is deidentified if it no longer allows the person concerned to be directly identified”.[7]
As you may have noticed, the term “indirectly” is thus at the heart of the distinction between anonymization and deidentification. If you can use additional information to identity the individual, then this is deidentification and not anonymization.
But what about elsewhere?
Ontario
As some of you may know, in order to continue the mandate of digital innovation, Ontario is currently seeking to address the gaps in the Ontario’s legislative privacy framework, as well as to establish comprehensive, up-to-date rules that will protect privacy rights and increase Ontarian’s trust in digital services.
To initiate this conversation, the Government identified a series of topics which reflect key areas it wishes to explore. These key areas will be taken into consideration in the creation of an Ontario-based privacy law. One of the important key areas identified is deidentification.
If we take a look at the Privacy Reform Discussion Paper, deidentified personal information means “personal information that has been pooled in a manner that would prevent the identification of any individuals’ personal data in the mix”.[8] According to the Discussion Paper, “methods of deidentification include the removal of “identifiers” (e.g. removing names, identifying numbers), obscuring information (e.g. giving an age range in place of exact age), and removing or aggregating information about outliers or small cell size data subjects (e.g. where fewer than five people have the same postal code)”.[9]
Europe and GDPR
To make things more complicated than they already are, GDPR introduce a concept called “pseudonymization”. The GDPR defines pseudonymization as:
“…the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.”[10]
Pseudonymization may involve replacing names or other identifiers which are easily attributed to individuals with, for example, a reference number. Although you can tie that reference number back to the individual if you have access to the relevant information, you have added technical and organizational measures to ensure that this additional information is stored separately to ensure privacy.
As you may understand, pseudonymization is only meant to be a security measure and it does not change the status of the data as personal data. In fact, Recital 26 makes it clear that pseudonymized personal data remains personal data and within the scope of the GDPR: “…Personal data which have undergone pseudonymization, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person…”[11]
And what about anonymized data? Well, Recital 26 mentions : “…The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.”[12]
In other words, personal data that has been anonymized is not subject to the GDPR. Note that if you can, at any point, use any reasonably available means to re-identify the individuals to which the data refers, that data will not have been effectively anonymized and will rather be considered pseudonymized.
But Why?
While deidentification is considered reversible, why would one use such technique, you may ask.
Even though deidentification is not anonymization, it remains a useful data minimization technique and security measure. Why would you want your data to be deidentified and not anonymized? Well, let’s say you want to train an AI model using machine learning. In order for the model to make sense, you need data that can be linked together. This being said, the training data does not need to know exactly who the individual is, just that the same individual does something in particular, for instance.
Now what?
It will be interesting to see how Canadian policy makers will address those issues, especially knowing that these techniques are evolving rapidly and that some experts argue their research show that anonymisation is not enough for companies to get around laws such as GDPR.[13]
Maybe we should start considering other techniques such as “differential privacy”, “homomorphic encryption” or “synthetic datasets”. That said, as mentioned before, those fields are evolving rapidly and that’s the reason why you should, at least, understand what degree of deidentification or anonymization is necessary under applicable privacy laws to transform personal information into non-identifiable personal information and stay compliant.
© CIPS, 2020.
[1] Vanessa Deschênes is a Lawyer for ROBIC, LLP, a firm of Lawyers, Patent and Trademark Agents.
[2] Section 21, An Act to enact the Consumer Privacy Protection Act and the Personal Information and Data Protection Tribunal Act and to make consequential and related amendments, online.
[3] The Guardian, ‘Anonymised’ data can never be totally anonymous, says study, online.
[4] Martin Scaiano, Grant Middleton, Luk Arbuckle, Varada Kolhatkar, Liam Peyton, Moira Dowling, Debbie S. Gipson, Khaled El Emam, A unified framework for evaluating the risk of re-identification of text de-identification tools, Journal of Biomedical Informatics, Volume 63, 2016, pages 174-183, available online, consulted on December 7, 2020.
Simon, G. E., Shortreed, S. M., Coley, R. Y., Penfold, R. B., Rossom, R. C., Waitzfelder, B. E., Sanchez, K., & Lynch, F. L., Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records, EGEMS (Washington, DC); Volume 7,1 6; 2019, available online, consulted on December 7, 2020
Emam, K. E., Dankar, F. K., Vaillancourt, R., Roffey, T., & Lysyk, M, Evaluating the Risk of Re-identification of Patients from Hospital Prescription Records., The Canadian journal of hospital pharmacy, Volume 62,4, 2009, pages 307-19, available online, consulted on December 7, 2020
[5] Section 2 (deidentification), An Act to enact the Consumer Privacy Protection Act and the Personal Information and Data Protection Tribunal Act and to make consequential and related amendments, online.
[6] Section 23, Bill 64, An Act to modernize legislative provisions as regards the protection of personal information, online.
[7] Section 12, Bill 64, An Act to modernize legislative provisions as regards the protection of personal information, online.
[8] Discussion Paper: Improving private sector privacy for Ontarians in a digital age at page 8, online.
[9] Ibid.
[10] Section 4 (5), Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), online.
[11] Ibid at Recital 26.
[12] Ibid.
[13] See note 2