P3P: Pretty Poor Privacy? By Karen Coyle

kcoyle.net : Home contact info Search
topics: copyright technology libraries privacy more...

P3P: Pretty Poor Privacy?

A Social Analysis of the Platform for Privacy Preferences (P3P)

By Karen Coyle
June, 1999

For a response to this and other critiques of P3P, see the Center for Democracy & Technology's P3P and Privacy: An Update for the Privacy Community
... and my response to CDT's response.

Engineering Privacy

The Platform for Privacy Preferences is a newly proposed World Wide Web protocol that has been developed by the World Wide Web Consortium (W3C). Reading through the main P3P documentation I kept having d�j� vu flashbacks to the 1950's when we were all agape at the promise of a future in which machines would do everything for us. News shows reported on technological accomplishments of the time, and robots were always a big news story. They would be Gort-like, that is with a human form. The scientists would proudly explain how the big eye was really a sensor; there would be microphones for ears, and a speaker for a mouth. Eventually, a machine like this would be in every home, washing the dishes and serving dinner and answering the door.

The 1950's was also the time of Mr. Potato Head, the original Mr. Potato head where you used an actual vegetable as the head and substituting another organic item, such as a turnip, was considered a creative act. Into the potato you stuck a set of plastic ears, two eyes, a nose and a mouth, and then engaged in play with the creature you had created. Amazingly, we didn't see the similarities between the robots that were being offered as our future and Mr. Potato Head, but looking back on it I have to admit that it's hard to distinguish the two phenomena. Take an object with a familiar shape, add something symbolizing eyes, ears, nose and mouth and declare it to be almost human.

P3P is the software equivalent of Mr. Potato Head. It is an engineer's vision of how humans decide who to trust and what to tell about themselves. It has a set of data elements and neatly interlocking parts, and it has nothing at all to do with how people establish trust.

A Summary of Concerns

It is designed not to protect data privacy but to facilitate the gathering of data by web sites. Were it designed to protect data privacy it would make it harder, not easier, for users to pass their personal information to requesting sites.
It oversimplifies and quite possibly misrepresents the trust interaction, and always in favor of the web site that is asking for an individual's information.
Many people will not understand that "privacy practices" are not the same as "privacy." P3P therefore allows sites to create an air of privacy while they gather personal data.
It is very one-sided in the information exchange: there are detailed data elements relating to the user (name, address, place of work, date of birth) and no data elements relating to the requestor.
There is nothing about P3P that would enforce or even aid the enforcement of the "deals" that are struck through its algorithms. In this sense, P3P embraces the technical while ignoring the entire social context that such a technical solution should exist within.

What's the Real Problem?

The stated problem that P3P is designed to solved is that there is a trust problem on the World Wide Web. This is something we can all probably agree on, but we might not agree on the nature of that problem. The authors of the P3P protocol state that: "Many online privacy concerns arise because it is difficult for users to obtain information about actual Web site information practices.... Thus, there is often a one-way mirror effect: Web sites ask users to provide personal information, but users have little knowledge about how their information will be used." [P3P article] Or, as the same document states: "Internet users are concerned about the privacy of information they supply to a Web site." So P3P is based on the assumption that this lack of knowledge is the problem.

Others might conclude that the gathering of data is the problem, and that the lack of knowledge of how it will be used aggravates the problem. Essentially, privacy practices are not the same as privacy as in "the right to be left alone." Privacy preferences are exercised within the context of a data exchange; the user gives more or less information based on a set of factors. Nowhere do the authors of P3P suggest that less information should be exchanged between users and Web sites. If your definition of privacy includes anonymous Web surfing, then P3P will not help you achieve that goal.

If the problem is as the P3P authors state it, then the P3P protocol would provide users information on the privacy practices of web sites that they visit. It does that, but it does much more: it provides a way for users to exchange their data with web sites without having to key it in. P3P includes data elements for a large number of user data elements (name, address, phone number, gender, date of birth). Clearly, these data elements do not solve the stated problem of the need for users to be informed. Perhaps they solve another problem. Perhaps they solve the problem that

"... while interest in e-commerce is increasing, most of the shopping public is too confused by the Web retail process -- or too afraid of posting credit card numbers -- to make e-commerce a viable business. In fact, Zona Research has found that 62% of potential Web shoppers abort their transactions in frustration over the process...."1

We don't know why shoppers end their transactions before completing them, but it could probably be shown that more transactions will be completed (and more information will be exchanged) when that exchange is automated, based simply on the "ease" factor. The "one-click" shopping at Amazon.com is designed precisely to reduce this drop-out factor by eliminating steps between the initial purchase decision and the actual completion of the purchase process. The P3P documentation states: "Sites can use P3P to increase the level of confidence users place in their services, as well as improve the quality of the serviced offered, customize content, and simplify site access."

If this is the motivation behind P3P, then I have to conclude that its goal is not more privacy, but more exchange of data by making it easier to send data than to not send it. In a sense this turns the transaction into an "opt out" from today's situation where the exchange of personal data on the web is always an "opt in." It does so by having your data ready and available to pass to the requesting site with little or no effort on your part. P3P seems not to be a privacy protocol; instead it is a data exchange protocol with statements informing users as to the potential use of that data. While this may empower consumers to some degree, it is not my definition of privacy.

Human Factors

As a piece of pure research, the P3P protocol is an interesting testbed. But it's not intended to just be research; P3P is intended to become part of the daily activity of hundreds of millions of users of the World Wide Web and to interact intimately between the user and the world of e-commerce. The choice of data elements and the inner workings of the exchange between client and server are important elements of P3P, but they provide a very incomplete picture of how it will work in the real world. Given the P3P protocol a wide variety of user interfaces can be developed, and the degree to which P3P succeeds in helping users express their privacy preferences depends as much on those as yet undetermined user interfaces as it does on the elements of the P3P protocol itself.

Data Deception

There are some indications in the P3P documentation that users may not enjoy a fair treatment through P3P. For example, take the data element "date of birth." Date of birth is a highly personally identifying data element, which is why it is required on medical records, driver's licenses, and other documents that need to give precise identification. Combined with a person's name, date of birth is a very accurate data element. It isn't clear why date of birth would be required for commercial transactions, and it's shocking that the P3P documentation itself gives examples in which the user is informed that the visited web site would like to gather her "age" when in fact it will be retrieving something with much greater privacy implications. Age, or at least an age range (eg. 45-55), is often part of the demographic information that marketing finds useful, but to substitute date of birth for this is truly an act of deception, and that the documentation suggests that a web site can call this element "age" is enough for me to mistrust any implementation of P3P. Can a site refer to zip+4 (which can be as specific as a single building or business) as my "geographical region?" The real question is how clearly and honestly will the transaction be presented to the user? Will it be in terms that the user understands? When asked for ones age, will the user know that it is actually date of birth that is being transmitted to the requesting site? On the Web today a user knows the difference between filling in yy/mm/dd or typing in an age.

Imprecise Privacy Practices

The data elements carrying user data to the requesting web site are quite precise: there are seven subelements to the "Name" data type, five to the phone number. The elements relating to the privacy practices of the web site are oddly imprecise. Most of these are represented by one character codes (although textual explanations can be sent along with them), and the meaning of the codes can be quite puzzling. For example, where the Web site states the purpose for which the data will be used, there is one purpose that is "Completion and Support of the Current Activity." What is the "current activity?" Could that include viewing the Web site? The element "Research and Development" conflates information gathered for the purposes of improving the site and information gathered to support marketing. The element "Contacting visitors for Marketing of Services or Products" combines the whole range of direct marketing activities as well as notifying visitors about updates to the site.

How these data elements are presented to the user will be very important. It may not be obvious that a site that asks for information for "research and development" is actually gathering marketing data. And a user could be offered to be contacted when the site is updated without knowing that they are also agreeing to direct marketing. These ambiguity of these data elements means that they can easily be presented to users in deceptive ways. Yet no such ambiguity is built into the data elements that the user must present.

Persuasion

P3P is definitely being designed within the commercial sphere and it works hard not to hinder the persuasive marketing of e-commerce. It even has a data element that allows services that want to gather data to state their case: "Every proposal can have a set of consequences that can be shown to a human user to explain why the suggested practice may be valuable in a particular instance even if the user would not normally allow the practice." [Emphasis mine.] The data elements of P3P will clearly be wrapped in a very attractive marketing package. This element, in a sense, negates the rationale behind P3P when it suggests that people can make data exchange decisions for reasons other than the privacy practices of the requesting site. Consumer studies have shown that people give out their personal data when the product or service being offered is highly desirable to them. This suggests an economic model, and one based on hard to define human desires, rather than a careful weighing of privacy practices.

Web Site Information

Interestingly, all of the informational data elements in the P3P protocol refer to the user: name, address, phone number, zip code. Although the protocol states that a site requesting information must include "identifying information about itself," the extent and format of this information is not mentioned. As a matter of fact, the examples in the protocol show only a web site name ("http://www.CoolCatalog.com/") as the "entity" element. But before I do business with a company on the Web, I want to know more about them than their URL. Unless this is a brand I already know, I can think of some key information that I would need in order to decide if I trust the site:

What is the company's address & phone number?
What type of business is it? Is it a private company or is it public? Is it retail or wholesale?
Where is it incorporated?
When did it first form?
Is it a subsidiary of another company, and if so, what company?
How large is it? (number of employees and/or revenue)

I will definitely not give my address and phone number to a company that will not give me the same. Look long and hard on some web sites and you'll find that many companies do not give out this information anywhere on their sites. Your only recourse, if you have a problem, is to e-mail them, and e-mail is too easily ignored.

If P3P is about trust, it is stacking the deck heavily toward the web companies and against the user. It seems very suspicious to me that P3P better protects the privacy of the Web site's owner than the users it claims it wishes to serve.

Data is Forever

The real weakness of P3P is that it covers a fraction of a moment in a relationship that can last a lifetime. Once the P3P transaction is over, all bets are off as to where your data has gone and how long it will be there. There's also no way to know if the agreed-upon use of the data is being stored with the data itself. Five, ten, or even more years from now, after the company has changed owners or evolved into something quite different, do you really believe that the P3P agreement will be honored? The use and transformation of data over time is one of the great privacy problems that we face. Yet although there is a data element for the retention of the data, it is in an optional element in a final miscellaneous category, and it can only indicate whether or not a service provider gives this information. This is a very important part of the negotiation and it should be mandatory for anyone gathering data to give a retention period on it, even for data that is being used "for the current transaction." I find it hard to believe that businesses will delete the names and addresses of customers once the purchased item leaves the shipping dock, even if they don't intend to use it for marketing purposes. These odd pools of data are big part of the entire privacy problem and users should be informed each time their data is stored anywhere for any purpose, whether it's for a day, a year, or, as I believe is true in most cases, indefinitely.

Once a user has given out information, their choices seem to end. There isn't a way for a user to review or correct data after it has been gathered, although sites may provide such a mechanism separately. A data element called "Access to Identifiable Information" states whether or not users can "view identifiable information and address questions or concerns to the service provider." Obviously, "address concerns" is a pretty weak statement. None of the values given here imply that there is in fact any redress. And nothing in any part of the protocol would allow a user to end the relationship with the "service" should they change their mind (i.e. after getting excess marketing). It's a real weakness of this protocol that it doesn't define such opt-out mechanisms. This makes the protocol look unbalanced in favor of getting data to the Web site owners and not giving users adequate opportunities to exercise their preferences beyond the initial contact. Opting out can only happen during the negotiation phase, but the Web site can then use mail or e-mail information to make continued contact without returning to the P3P negotiation. Users must have the same ongoing ability to decide whether to continue in the negotiated relationship.

Enforcement

The most glaring problem with P3P is that there is absolutely no enforcement of the promises that are made to users. Because the United States does not have data protection laws, there is no legal recourse if a company promises privacy but uses the gathered data for other purposes. And because the companies know that there is no enforcement, there may be little incentive to protect the data that they gather through security or through company practices.

Violations of privacy in the digital world are particularly insidious because they are not obvious. None of us knows what data stores hold information about us because we can't see into the millions of databases that are in the hands of others. This situation would be a great challenge for enforcement and undoubtedly some less scrupulous companies would make use of the data in defiance of such laws. But without any laws governing this activity, there is virtually no reason for a company to adhere to its own P3P proposals. If a company states that it is gathering information for the "completion of the current activity", then later uses that information for marketing research, none of this will be visible to the person whose data was gathered. Such a breech of the agreement will have absolutely no negative consequences for the company that holds the data.

Social Impact Report

A decade or more ago it was both feasible and resonable to add new protocols to the Internet with little regard for the overall social consequences. The Internet at that point was a small, closed community of researchers and academics and changes on the Net had little effect outside that community. Today we are operating in a global netowrked environment that has become a major part of our communications and economic ecologies for hundreds of millions of persons in over 100 countries. The potential impact of engineering on society is extremely high.

Perhaps it is time to require that changes to the Net be accompanied by a "social impact report." This report would show that studies have been done to determine the nature of the problem from both a social and a technical viewpoint. Outside experts would scrutinize the report for unintended social consequences and to make sure that the engineering solution is appropriate to the problem. Thus, discussion of the social impact of a new protocol would become part of the design process.

We don't have social impact reports today, but given that this protocol has a strong possibility of social consequences, it would be irresponsible to implement it without also taking on a serious study of its impact. This would mean defining one or more hypotheses, doing studies to gather data related to these before the protocol is implemented and then doing studies at select intervals to actually evaluate what has changed. It requires a clear statement of goals in relation to the protocol. The document jointly issued by the Electronic Frontier Foundation and Microsoft promoting P3P stated a goal of greater user knowledge of what data is gathered and for what purposes. I would also like to set forth and test the hypothesis that P3P will result in more data being gathered, and that more sites will require revelation of personal data as a requirement for access to information on those sites. In my mind this would reveal a net loss of privacy by users of the World Wide Web.

1 O'Shea, Dan. E-commerce gets personal. Telephony, Feb 15, 1999, pp. 10-12 return

Back to Karen Coyle's Home Page