Note: We strongly recommend printing the PDF version of this paper.
Copyright is held by the
author/owner(s).
WWW2003, May 20-24, 2003, Budapest, Hungary.
ACM 1-58113-680-3/03/0005.
Platform for Privacy Preferences (P3P) is the most significant effort currently underway to enable web users to gain control over their private information. It provides a way for a web site to encode its data-collection and data-use practices in a machine-readable XML format, known as a P3P policy [10]. The designers of P3P simultaneously designed a preference language called APPEL [9] to allow users to express their privacy preferences. The goal was to enable users to programmatically check their privacy preferences against a P3P policy to decide whether to release their data to the web site. The P3P policy language became a W3C Recommendation in April 2002; APPEL is a Working Draft.
Unfortunately, there are subtle interactions between APPEL and the P3P policy language that make APPEL extremely hard to use correctly. We present a critique of APPEL that points out several shortcomings of APPEL. These shortcomings arise from a fundamental design choice: to only allow logical operations at nodes corresponding to P3P elements. These shortcomings cannot be overcome without completely redesigning the language.
We propose a small preference language that uses a strict subset of XPath 1.0 [7] for almost all of its functionality. It additionally makes use of the quantified expression every from the XPath 2.0 Working Draft [6]. This construct does not increase the expressive power of XPref, but makes the preferences written in XPref easier to understand and hence less error prone.1We refer to the proposed language as XPref.
Although XPref is a small language, it subsumes the full functionality of APPEL, while avoiding its pitfalls. In essence, XPref replaces the body of APPEL rules with XPath expressions. XPath was developed to match the structure of an XML document using a compact path notation and it was designed to be integrated as a subsystem within other systems. By reusing XPath, XPref is leveraging all the efforts that have gone into debugging its semantics and developing efficient implementations.
Although XPref is strictly based on XPath, it does not use many of the expensive features of XPath. So, while we expect many XPref implementations to use off-the-shelf XPath implementations, it is possible to build specialized implementations with much smaller memory footprint. This aspect of XPref is quite attractive for enabling preference checking in thin, mobile devices that are likely to dominate Internet access in the future.
Since XPref subsumes APPEL, it should be possible to programmatically translate APPEL into XPref. Indeed, we provide a translation algorithm that should simplify the migration from APPEL to XPref.
The rest of the paper is structured as follows. We start with a brief overview of the P3P policy language as well as the APPEL preference language in Section 2. We critique APPEL in Section 3, pointing out some of its shortcomings and why it will be hard to fix them. We present XPref in Section 4 and show how it avoids the pitfalls of APPEL. We also give an algorithm for translating APPEL rules into XPref. We discuss related work in Section 5 and conclude with a summary and directions for future work in Sections 6.
We assume familiarity with the basic concepts of XML [16]. Throughout the paper, we use element and attribute as in the XML specification. To distinguish XML elements appearing in an APPEL preference from those appearing in a P3P privacy policy, we will sometimes refer to an APPEL element as an expression. Its subelements will be correspondingly called subexpressions.
In this section, we briefly review the core features of the P3P policy language as well as APPEL. See [10] and [9] for complete specifications of the policy language and APPEL respectively.
P3P policies are described in XML format as a sequence of STATEMENT
elements that include the following subelements:
P3P has predefined values for PURPOSE (12 choices), RECIPIENT (6), and RETENTION (5). Examples of PURPOSE include:
A policy can provide opt-in or opt-out values for the required attribute of PURPOSE and RECIPIENT elements. The opt-in value says that the user must provide explicit consent to the stated purpose/recipient. The opt-out value gives the user flexibility to reject the specified purpose/recipient, but the user needs to take additional action for the opt-out to take effect.
An Example Policy [4]
Volga is a bookseller who needs to obtain certain minimum personal
information to complete a purchase transaction.
This information includes name, shipping address, and credit card number.
Volga also uses the purchase history of customers to offer personalized
book recommendations, for which
it needs customers' email address.
Figure 1 shows what Volga's policy may look like in the P3P policy language. The first STATEMENT says that the name, postal address, and miscellaneous purchase data (i.e., book titles, credit card number, etc.) will be used for completing the current purchase transaction.
|
The second STATEMENT allows Volga to use miscellaneous purchase data for creating personalized recommendations and email them to the customer. However, the opt-in value of the required attribute of the purposes individual-decision and contact implies that the explicit customer consent is necessary. By default, the value of the required attribute is set to always, which precludes the possibility of customer opt-in or opt-out.
An APPEL rule is satisfied by matching its constituent expressions and
(recursively) their subexpressions.
Every APPEL expression has a connective attribute that defines the logical operators between its
subexpressions. A connective can be:
or, and, non-or (negated or), non-and (negated and),
or-exact, and and-exact.
The default connective is and.
The two unusual connectives are and-exact and or-exact,
whose semantics are as follows:
An Example Preference
Figure 2 shows an APPEL rule set that
implements the following preference:
Block sites whose policies indicate that the information collected can be used for contact or telemarketing.The rule set consists of two rules. The first rule specifies conditions under which the access to a site must be blocked. The second rule is guaranteed to fire if the first one does not since the rule body contains the appel:otherwise element and allows the site to be accessed.
|
At first glance, APPEL comes across as an attractive language. It is small, readable, and uses standard XML for syntax. The user has to simply provide the pattern of the substructure of interest in the rule body and associate the desired behavior in the rule head. Unfortunately, the APPEL constructs interact with the P3P policy language in unintended ways, making it non-trivial to get even simple preferences right. It is easy to write a preference that appears correct and find that it does not accomplish the intended goal. The shortcomings we identify grew out of our experiences with APPEL while testing our implementation of P3P [4]. Concurrently, the implementers of the JRC APPEL engine have also pointed out some deficiencies of APPEL [12].
We invite the reader to join in Jack's odyssey with APPEL. Jack wants to write a very simple preference:
Jack thinks that APPEL has provided the or-exact connective precisely for expressing preferences like his, and writes the preference shown in Figure 3. This preference appears to permit access to only those sites whose privacy policies have statements containing strictly the current or pseudo-analysis purposes. After all, the semantics of or-exact is that if a statement lists purposes other than current and pseudo-analysis, the rule will not match.
|
Unfortunately the above preference does not achieve what Jack intended. The source of the problem is that the policy language allows a policy to contain multiple statements and the rule fires if any of the statements satisfies the rule. For example, a site may have a policy containing two statements; the first statement only lists current as the purpose, but the second statement includes telemarketing. Jack's preference will match the website's policy because of the first statement, and Jack has unknowingly consented to telemarketing.
At this point, Jack spends more time with the APPEL manual and adds the and-exact connective to the policy element in his APPEL rule, as shown in Figure 4. He thinks that and-exact will force all statements in the matching policy to strictly contain the purposes he wants.
|
Unfortunately, this modification does not work. The policy element in a P3P policy usually contains other subelements such as entity and access, apart from statement. Thus placing an and-exact connective at policy will cause the rule to fail for many acceptable websites whose policy element included other subelements, even if all the statements conformed to the rule.
Jack now starts wondering if he can solve the problem by converting his specification of what is acceptable into what is unacceptable. He notices there is a fixed number of predefined purposes in P3P. So, he rewrites his preference, as shown in Figure 5, enumerating purposes that are unacceptable. Notice that the preference has become harder to understand. As we will see momentarily, this difficulty is exacerbated when we have preferences that involve more than one dimension (e.g., involve both purpose and recipient).
|
Jack's travails are not over yet. The P3P policy language allows websites to define extensions to purpose. Jack has to guard himself against the possibility that a website may define and add, say, telemarketing-home as a subelement of purpose in its policy. This policy will match Jack's preference as written in Figure 5 and Jack has again inadvertently consented to telemarketing. The solution is to add extension as an additional subelement of purpose in Jack's preference.
Jack now decides that he does not mind websites using his personally identifiable information for individual analysis so that they may give him personalized recommendations, as long as they do not give this information to anyone else. Thus, Jack wishes to express the following preference, which is only a slightly extended version of his earlier preference.
Jack has learned that he must specify preferences by writing rules about what is unacceptable. So, Jack converts
purpose, recipient (purpose = "current" purpose = "pseudo-analysis" (purpose = "individual-analysis" recipient = "ours")) acceptinto
purpose, recipient (purpose "current" purpose "pseudo-analysis" (purpose "individual-analysis" recipient "ours") block
Consider the parse tree shown in Figure 6 corresponding to the latter expression above. In a single APPEL rule, we can only specify those logical expressions whose parse trees are (structurally) subtrees of the XML tree corresponding to the P3P schema. For example, Figure 7 shows the class of logical expressions involving three purpose values and one recipient value that can be specified in a single APPEL rule. Clearly, there is no way to map the tree from Figure 6 into the tree in Figure 7. Therefore, this expression cannot be directly expressed in APPEL.
Fortunately, Jack realizes that the above expression is the same as evaluating the following two rules:
Rule 1: purpose (purpose "current" purpose "pseudo-analysis" purpose "individual-analysis") blockHe now has to further convert all the negations into enumerations to get the complex preference shown in Figure 8.2
Rule 2: purpose, recipient (purpose = "individual-analysis" recipient "ours") block
|
We contend that Jack was not alone in being fooled into
writing incorrect preferences;
those well versed in APPEL have fallen prey to the same traps.
For example, the "simple ruleset" example in Figure 3.1 of the APPEL
Working Draft [9] makes several mistakes:
It is also interesting to observe that the APPEL designers recommend the following about the ordering of rules in a ruleset (Section 5.2 of [9]):
After starting out with all cases that are deemed acceptable (request rules), append all situations under which only limited request should be made (limited rules). The final set of rules cover all cases that should result in a blocked request (block rules).However, as we showed with our examples, the correct way to write APPEL preferences generally is to place block rules before the request rules!
We give below two suggestions that can partially ameliorate some of APPEL's problems.
One idea is to add an any-except connective to APPEL that returns true if the P3P policy element contains any subelements except those listed in the APPEL preference.3Such an operator would have allowed Jack to express his first preference as shown in Figure 9.
|
However, for writing Preference 2, Jack would still have to transform the relatively simple double negation into a form that could actually be expressed in APPEL (due to the constraint on the operators). If Jack had even a slightly more complex preference that involved combinations of purpose, recipient, and data, the resulting preference would become very convoluted even with an any-except connective.
Jack's second attempt (Figure 4) would have worked if the P3P policy language had a STATEMENTS element that was the parent of all the STATEMENT elements (rather than POLICY being the parent of the STATEMENT elements). In that case, Jack would have associated the and-exact connective with the STATEMENTS tag instead of the POLICY tag. Thus it is natural to wonder whether modifying the P3P standard by adding a STATEMENTS tag would cure APPEL's shortcomings.
|
Consider Jack's second preference. We still have the problem that the expression parse tree has a structure that is not a subtree of the XML tree for the P3P policy. While the preference shown in Figure 10 appears to express this preference, it is incorrect. A P3P policy with a single statement that contains all three purposes, along with the recipient ours, would incorrectly be rejected by this preference. The only advantage of the STATEMENTS tag is that we can now break this preference into two rules that are easier to understand (though not easier to get to from the original preference):
Rule 1: purpose, recipient ((purpose = "current" purpose = "pseudo-analysis" purpose = "individual-analysis") recipient = "ours") accept
Rule 2: purpose (purpose = "current" purpose = "pseudo-analysis" accept
As we discussed in Section 3.5, an APPEL rule can be viewed as a tree rooted at the RULE node. APPEL operators (connectives) can only be placed at the nodes. The fundamental limitation of APPEL arises from the fact that the children of a node can only be combined using the logical connective specified at the node. It is not possible to combine them in any other logical way, except to break them in multiple rules (as we did in Figure 8). APPEL also chose not to include logical operators for combining multiple rules in a ruleset; they are evaluated strictly in order. Taken together, these two limitations box in APPEL.
We are, therefore, not sanguine that band-aiding will fix the problems of APPEL. Fortunately, there is a better way out. We can design an XPath-based language that incorporates the lessons and aspirations of APPEL, subsumes APPEL's functionality, and leverages all the past effort in debugging the semantics of XPath and developing efficient implementations. We discuss this language, which we call XPref, next.
We begin by giving the rationale for the design of XPref in Section 4.1. We give a summary of XPref in Section 4.2, and then give some examples to illustrate its functionality in Section 4.3. We show that XPref subsumes the functionality of APPEL in Section 4.4.
We assume familiarity with XPath [7]. We include a brief overview of XPath in Appendix A for quick reference. Appendix B describes the subset of XPath 1.0 used in XPref. Appendix C gives a complete BNF specification of XPref.
The starting point of any language design is the articulation of what the language is supposed to accomplish and what it should be able to express in a natural way. Since the APPEL designers incorporated feedback from potential users when designing APPEL, we treat the goals of APPEL (not the actual implementation) as the desiderata for XPref.
APPEL was designed such that each rule accomplishes
one of the following goals:
We decided to base our design on XPath for the following reasons. XPath has been designed to be integrated as a subsystem within other systems. It provides a common syntax and semantics shared across multiple systems and is used to match the structure of an XML document against a compact path notation used for navigating through the hierarchical structure of an XML document. Since P3P policies are XML documents, XPath is a natural contender for expressing the rule conditions that can be pattern matched against policies. In addition, by using XPath, we automatically accrue all the benefits of using a mature standard.
XPath can both identify combinations of P3P elements, and verify that only specified combinations of P3P elements are present. Thus we can use XPath to define both specify-unacceptable and specify-acceptable rules, and express arbitrary logical expressions.
Having replaced APPEL rule bodies with XPath expressions, we retained the APPEL constructs for specifying rule behavior and rulesets. These constructs are specified in an XML format and naturally integrate with the rule bodies written in XPath. We also retained the APPEL semantics of evaluating rules in the order in which they appear in a ruleset.
XPref retains the two outermost XML elements in APPEL: RULESET and RULE. However, rather than using the subelements of RULE to specify acceptable or unacceptable combinations of P3P elements, XPref uses an XPath expression for this purpose. This expression is specified in a new condition attribute of the rule. A rule fires if the XPath expression returns a non-empty result. The other attributes of RULE, such as explanation, are retained as such from the APPEL specification.
XPref uses a strict subset of XPath 1.0 [7] for writing rule conditions, with one exception: the quantified expression every is borrowed from XPath 2.0 [6]. Having a small subset admits the possibility of smaller footprint implementations of XPref. A smaller footprint is important for preference checking in the mobile clients that are likely to dominate Internet access in the future. Appendix B specifies this subset and also explains our choices.
The every expression does not increase the expressive power of XPref, but we chose to include it to simplify preferences that require exact selection of a combination of policy elements. It is straightforward to write a small preprocessor to translate an XPref preference into one that strictly uses XPath 1.0 features, or embed this translation in the XPref processor. Thus, XPref can be implemented using any standard XPath 1.0 implementation.
A preference that blocks if the P3P policy includes contact or telemarketing purposes can be written in XPref as follows4:
<RULESET> <RULE behavior="block" condition="/POLICY/STATEMENT/PURPOSE/* [name(.) = "contact" or name(.) = "telemarketing"]" /> <RULE behavior="request" condition="true"/> </RULESET> |
If we wish to modify the above preference not to block websites that require opt-in for contact and telemarketing, we would rewrite the rule as follows:
<RULESET> <RULE behavior="block" condition="/POLICY/STATEMENT/PURPOSE/* [(name(.) = "contact" or name(.) = "telemarketing") and @required != "opt-in"]" /> <RULE behavior="request" condition="true"/> </RULESET> |
<RULESET> <RULE behavior="block" condition="/POLICY/STATEMENT [ PURPOSE/* [ name(.) = "individual-analysis"] and RECIPIENT/* [ name(.) != "ours"] ]" /> <RULE behavior="request" condition="true"/> </RULESET> |
We now illustrate that the every expression makes it easy to express the acceptable combinations in a preference. The following preference ensures that all purposes across all statements are strictly current or pseudo-analysis.
<RULESET> <RULE behavior="request" condition="/POLICY [ every $pname in STATEMENT/PURPOSE/* satisfies (name($pname) = "current" or name($pname) = "pseudo-analysis") ]" /> <RULE behavior="block" condition="true"/> </RULESET> |
Preference 2, which gave Jack so much trouble in Section 3.5, can be expressed using the every expression, as shown in Figure 11.
|
This preference can also be written using only XPath 1.0 as follows, though the preference writer now has to think in terms of what is unacceptable, rather than what is acceptable.
<RULESET> <RULE behavior="block" condition="/POLICY/STATEMENT/PURPOSE/* [name(.) != "current" and name(.) != "pseudo-analysis" and (name(.) != "individual-analysis" or ../../RECIPIENT/* [name(.) != "ours"]) ]" /> <RULE behavior="request" condition="true"/> </RULESET>
Figure 12 gives an algorithm for translating the body of an APPEL rule into an XPref condition. In addition to showing that XPref subsumes the functionality of APPEL, this algorithm can also be used for converting existing APPEL preferences into XPref. To simplify exposition, the algorithm pseudocode omits checks for avoiding generating superfluous parenthesis as well as unneeded trailing OR or AND operators in the query.
|
An APPEL expression is satisfied by matching its attributes and the
constituent subexpressions which are connected through the APPEL
logical operators. The algorithm for translating a rule body into
XPref works as follows:
|
/self::node() [POLICY [STATEMENT [PURPOSE [ (contact [@required="always"] OR telemarketing) ]]]] |
/self::node() [POLICY [STATEMENT [PURPOSE [ ((contact [@required="always"] AND telemarketing) AND every $s in ./* satisfies ($s/self::contact [@required="always"] OR $s/self::telemarketing)) ]]]] |
While developing APPEL, the APPEL working group explored the possibility of using an existing database query language rather than designing a new language. They decided against this alternative because of the large implementation burden on user agent implementers and the mismatch for XML processing [8]. However, they did speculate that the XML XQuery work may eventually result in a language suitable for encoding and processing P3P preferences. The APPEL designers also considered the possibility of using database query languages [8]. Our work completed this thought by proposing XPref based on XPath and showing that it serves nicely as a preference language for P3P.
Independent of our work, it has been noted in [12] that constructing preferences in APPEL is complex and error prone. They speculate that XPath might be an alternative.
There have been efforts to develop graphical tools to simplify preference creation. For example, the JRC APPEL Preference Editor [13] provides a Java-based GUI for preparing APPEL preferences. For developers of these tools, a symbolic language that is small and not error prone (such as XPref) would help.
Current APPEL implementations include AT&T Privacy Bird [15], which implements an APPEL engine in a browser extension to Internet Explorer 5. There is also a Java-based APPEL implementation available from JRC [13]. A server-centric architecture for matching preferences against policies is presented in [4]. The server-centric approach lays the groundwork for enforcing privacy policies at the database level. XPref is agnostic to specific implementation architectures and can be used in client as well as server centric architectures.
The context for the work presented in this paper is our effort to design information systems that protect the privacy and ownership of individual information while not impeding the flow of information. Our other related papers include [1] [2] [3] [5] [11].
The raison d'être of P3P is to provide automated matching of user preferences
with privacy policies; thus its success depends on the availability of
an associated preference language. While APPEL was designed for that
purpose, subtle interactions between APPEL and P3P lead to serious
problems:
Therefore we explored alternatives to APPEL, and in particular, showed that XPath serves quite nicely as a preference language. It solves all the problems mentioned above. We identified the minimal subset of XPath that is needed, thus allowing a matching engine to potentially use a smaller memory footprint. We also gave an XPref to APPEL translator, showing that XPath is as expressive as APPEL.
We conclude by mentioning two interesting future directions for extending XPref. First, how do we specify a preference such that different websites can be automatically ranked with respect to their privacy friendliness? Second, there is recent work on quantifying the value of privacy by formulating the problem as a coalitional game [14]. How can we extend the preference language to incorporate such negotiations?
XPath was designed to match the structure of an XML document against a compact path notation used for navigating through the hierarchical structure of an XML document.
An XPath path expression, when applied to a document, returns a sequence of distinct nodes in document order. There are seven different types of nodes in an XML document: root nodes, element nodes, text nodes, attribute nodes, namespace nodes, processing instruction nodes, and comment nodes. A path is made up of steps, each step is an expression that returns nodes. Each step navigates the document from the context node along an axis defined by the step. The context node is an implicit variable used to represent a node in the XML document while processing an XPath.
XPath axes are used to navigate the structure of an XML document in all directions. Given a context node n which is an element node, the child axis, for example, navigates towards the subelements or attributes of n, while the parent axis moves in the direction of the immediate element node that contains n. By default, each step is along the child axis. Thus, the path /POLICY/STATEMENT returns a sequence of distinct STATEMENT nodes in document order. It retrieves, starting from the root node of the document, element nodes having element name POLICY, and passed each node n thus found onto the next step in the path. For the following step, n is called the context node for evaluating the step. The STATEMENT node selects subelements of POLICY which have the element name STATEMENT and returns these as the result of the path.
Predicates can also be specified at each step in the path in order to restrict the set of nodes derived at a step. Predicates are specified using square brackets and appear after the step. If a predicate evaluates to a numeric value instead of a boolean value, the result is converted to true if the number is equal to the context position. For example /POLICY/STATEMENT[3] selects only the third statement in a policy. This is equivalent to the boolean predicate /POLICY/STATEMENT[position() = 3] which uses the position function that returns the context position.
Attribute nodes are accessed using the abbreviated syntax "@name". For example, current/@require selects the required attribute of the current element. The @ prefix differentiates an attribute nodes from other types of nodes.
We have chosen to include a limited number of axes since the schema of P3P XML policies is fixed and known; axes such as descendant and descendant-or-self (represented with // when using abbreviated XPath syntax) allow finding an element within the nested structure of an XML document without knowing the full path to that element. For example, the XPath //name(.) = "telemarketing" tests for the presence of an element node called telemarketing contained in a P3P policy. While this adds some convenience, the task of evaluating such a predicate would incur a performance penalty since the predicate would have to be evaluated for all nodes in the XML tree, instead of simply at those nodes that are children of a PURPOSE node.
Our motivation for excluding positional predicates is that privacy preferences address whether a policy contains specific sets of elements, irrespective of their relative position in the XML document.
Our motivation for excluding these operators is similar to our motivation for excluding positional predicates; preferences address whether a policy contains specific sets of elements.
We kept XPath string functions that provide services similar to the wildcard "*" operator found in APPEL, which is used, for example, when matching P3P data types in preferences.
Altogether, our reduced XPath BNF has 29 productions (26 productions are taken from XPath 1.0 and 3 productions are taken from XPath 2.0) while XPath 1.0 has a total of 39 productions. XPath 1.0 has a set of 27 core functions, and we retain only eight of them. More importantly, we have omitted some of the more complex constructs of XPath 1.0. These simplifications should result in a smaller footprint implementation when compared to a full XPath implementation.
XPref replaces APPEL's rule body with XPath specifications, but retains the portion of APPEL used for specifying the behavior of the rules and the ruleset. It uses a subset of XPath 1.0 functionality and functions, and additionally borrows the every expression from XPath 2.0. Accordingly, we give the BNF for XPref in four pieces:
|
|
|
|