Study shows privacy policies are longer and harder to understand in 2021

Website privacy policies are around four times longer than they were two decades ago, according to new research by a cyber security expert at De Montfort University Leicester (DMU). 

An analysis of 50,000 privacy policy texts published between 1996 and 2021, conducted by Dr Isabel Wagner, Associate Professor in Computer Science (Cyber Security) at DMU, showed that the average length has increased to more than 4,000 words, compared to just over 1,000 words in the year 2000. 

Privacy Policy
Image: Shutterstock

The research involved gathering data from privacy policies on some of the world’s most visited websites, as well as examining historical versions of webpages stored on the Internet Archive’s Wayback Machine. 

This data was then analysed using the machine learning algorithm BERT, which examines large amounts of human language data to identify patterns. 

Dr Wagner discovered the average privacy policy length increased significantly around May 2018 – when the European Union’s General Data Protection Regulation (GDPR), a set of laws designed to protect consumers’ data, came into effect – and at the start of 2020, when California introduced similar rules.  

“A website’s privacy policy is a legal document that explains what data the site collects from its users, how and for what purpose it processes the data, and with what other parties it shares the data,” explained Dr Wagner.  

“Privacy policies are notorious for being lengthy documents that are hard to understand and it is well-known that most users do not read privacy policies, but almost all users tick the box to agree with them.  

“I think, from a user’s point of view, these policies are fundamentally broken.”  

Dr Isabel Wagner
The research was conducted by Dr Isabel Wagner (pictured)

Her analysis also showed that policies published today are harder to read and require more access to user data for the organisations that write them. 

According to the Flesch reading ease scale, which measures the readability of text, privacy policies written in 2021 had scores similar to academic papers written for the likes of the Harvard Law Review. 

“We found concerning developments in the data practices described in policies, such as increased collection and sharing of sensitive data and lack of choice,” added Dr Wagner.  

“It is especially concerning that these data practices are obscured in lengthy policies that require university education to understand, and that would take users more than one hour per day to read.” 

As a result of the study, Dr Wagner suggests that until policies are significantly simplified, machine learning could help users navigate through the extensive jargon. 

“If the user’s browser could automatically label what privacy policies say, using our machine learning approach, then the browser could also match this against the user’s preferences and display a user-friendly summary,” added Dr Wagner.

Posted on Wednesday 23 February 2022

  Search news archive