By Stefan Sackmann, Jens StrÜker, Rafael Accorsi
Communications of the ACM,
September 2006,
Vol. 49 No. 9, Pages 32-38
10.1145/1151030.1151052
Comments
Fifteen years after Mark Weiser's inspiring article on ubiquitous computing [11], his vision has become technically feasible. Objects of everyday use are becoming increasingly interconnected and mobile communication involving devices of all sizes and bandwidths is used in various ways. Highly dynamic information systems (HDS) are emerging, bringing new challenges for the management of information systems: having to cope with components that enter and leave the system spontaneously and be autonomous in their actions. The changing and possibly conflicting requirements of the single components must be taken into account, which demands dynamic negotiation of requirements. Moreover, such highly dynamic systems must be able to contend with the constant growth of communicated data rapidly collected and accumulated in various forms.
Solving the challenges of HDS is accompanied by a prospect of economic potential. A first realization is the present rollout of RFID by major retail groups worldwide. Currently, cost savings through process automation are of prime importance but the use of this technology in retailing goes beyond mere productivity improvements. Tagging items with RFID chips in combination with other wireless technologies, equipping customers with mobile communication devices, and using sensor networks allow, for example, personalizing services that have so far been successfully used in client-server e-commerce scenarios [5].
Back to Top
From Anonymous to Personalized Shopping Experience
The Internet has substantially changed the way of personalization. As depicted in Figure 1, three ways of tailoring services to customers can be distinguished. Online retailers use the Internet today on a large scale to recommend products to known customers according to their previous purchases or interests [9]. These personalized services build upon a one-to-one communication channel and require personal data as an essential input factor. Retailers also use the Internet to offer individualized services, which do not require personal data but context data. For instance, the recommendation of products according to the sequence of clicks, pages requested, or items that have been added to the shopping cart. Since such individualized services can be realized without necessarily identifying the customers, they allow an improved shopping experience while also maintaining their anonymity. Another means of tailoring services to customers involves universal services, such as a product search function or having a look at customer reviews that need neither personal nor context data. Even so, these services are a form of personalization because a single customer can choose a service that meets his or her needs at a particular time. All three kinds of services can be part of a personalization strategy with the objective of building up customer relationships, increasing customer satisfaction, generating a `lock-in' situation, and in the end realizing greater product or service turnover.
Today, consumers are faced with thousands of products in a physical store and must often search a large area to find a particular item. The introduction of highly dynamic systems in stationary retailing enables an electronic one-to-one communication channel and allows the collection of context data comparably cheaply and effectively as in current e-commerce environments. In grocery stores such as the Extra-Future-Store in Germany, computers with a touch screen attached to a shopping cart are deployed as personal shopping assistants (PSA) [4]. Today, these devices are equipped with a bar-code reader and customers can interact with the retailer's information system over WLAN. Future forms of interaction may include customers using their mobile phones to communicate with RFID-tagged products and the retailer's information system [10]. Furthermore, sensors embedded in customers' clothing or products might also become the subject of interactions. Such a technical infrastructure enables the context of each customer to be taken into account, for example the current position within the store or the products in the cart. Combining all this context data in real time with customers' personal data and profiles already stored in the information system, the retailer can use the electronic interaction channels (PSA, mobile phone) to enrich customers' shopping experiences.
Imagine a customer equipped with an appropriate mobile communication device entering a store. To find a certain product, the customer can enter its name into the device and have its location displayed. To obtain additional information, for example a list of possible recipes using this product or information about its origin, the customer scans the RFID-tagged article. Retailers are able to provide such universal services to all customers without necessarily taking the differences between each of them into account. Individualized services, however, additionally require data of the customer's context as an input factor. For instance, a shopping list can be used to optimize the route through the store for time-sensitive or handicapped shoppers. Moreover, special offers or purchasing suggestions can be displayed according to the position of the cart within the store and the products in the cart. The mobile device can also show a running total of current purchases at any time, thereby enabling the customer to control expenditure. Finally, offering personalized services requires personal data such as name, age, purchasing history, or membership in a customer program. By identifying the customer, for example by means of an RFID-tagged customer card, the display can show further items as suggestions based on former purchases. Combining context and personal data is also useful. On the way through the store, special offers can be displayed on the screen according to position and personal needs such as fat-free or whole-food products. In this manner, allergy sufferers can, for instance, be warned about certain ingredients of products. Finally, thanks to personalized automatic checkouts the customer has no need to rummage for cash, pull out cards, or wait in line at the checkout counter [4].
A highly dynamic system is only privacy-aware if it enforces formalized and personalized privacy policies. Such enforcement can be based upon past information (access control mechanisms) or present and derived information (usage control).
Back to Top
Risks of Personalization
Although the economic potential of personalization in stationary retailing seems lucrative for retailers and customers, retail groups have slowed down their activities in this area. While Wal-Mart combined RFID-tagged articles with video surveillance, the German Metro Group tried to establish customer loyalty cards with embedded RFID tags. However, after the sharp criticism of privacy activists, Metro decided to drop the use of RFID tags in cards and Wal-Mart also stopped its RFID-based surveillance1 [2]. If customers were to refuse the processing of context data within the store in general, neither individualized nor personalized services would ever come into being. An analysis of the decisive privacy concerns shows that the loss of control over personal data worries customers. According to a survey of more than 1,000 U.S. customers, two-thirds identified as a major concern the likelihood that RFID would lead to their data being shared with third parties.2
Back to Top
Extensive Data Collection Leads to a Loss of Control
Exploiting sensor networks, RFID identification, automatic video surveillance, localization technologies, and other technologies in HDS undermines the users' desire to control personal data. Extensive and unobservable data collection is an inherent characteristic of HDS:
- Data is increasingly being collected without any indication. There will be no red indicator light on each device signaling the recording of data [3].
- Data collection takes place without any pre-defined purpose, for example, the shopping cart continuously defines and reports its position to the retailers' information system. This information can be used for optimizing the store arrangement, for generating purchase suggestions as well as for identifying the customer.
- Data once collected will be persistent and not deleted due to the continuously decreasing cost of data storage.
- Different devices record each event simultaneously from different viewpoints, for example, a customer browsing a product is recognized by the smart shelf as well as by the video surveillance or the shopping cart. The combination of these different views allows, together with further context data, recognition, or even identification of the customer.
- Recording devices register multiple events simultaneously, for example, video surveillance can record customer A browsing a certain product, customer B passing the corridor, and customer C stealing an item. The interpretation of the logged raw data for various purposes and the extraction of single events make the assignment of a valid privacy policy in some cases impossible.
The realization of a HDS leads to a paradigm shift of data collection and facilitates the relation of context data to individuals. The borderline between context data and personal data increasingly vanishes.
Back to Top
Today's Privacy Technologies Support Obscurity
The inherent data collection in HDS obliterates present-day privacy-enhancing technologies [3] because they are all based on concealing dataa privacy approach referred to as `obscurity' throughout this article. Today's privacy mechanisms are incompatible with the objective of any retailer to provide both: personalization with useful services and assured privacy as well as security.
A classification of privacy mechanisms is given in the table "Privacy and Transparency." In the horizontal columns, the mechanisms are classified according to what they control: access or usage. While access control is usually understood as ex ante defined authentication and authorization, usage control extends access control and encompasses all those mechanisms that actually deal with the runtime detection of privacy violations. In the vertical columns, guidelines, mechanisms, and approaches for privacy are distinguished according to whether they enable all three forms of personalization.
Anonymity, for example, prevents personalized services that require identification of the customer. Pseudonyms and identity management, as the most favored solutions of science and industry, allow personalized services. Both privacy mechanisms follow the obscurity approach and rely on controlled disclosure of data, reducing such a disclosure to the minimum necessary to perform a given transaction. As a result, personalization is limited to the amount of disclosed data. However, the extensive and unobservable collection of context data for providing individualized services already allows the recognition of customers. This is because transactions are part of a chained process: filling the shopping cart, walking through the aisles, scanning products, and payment.
Obscurity, as a privacy approach for personalization in HDS, is inadequate. Once the access to data is granted, there is no control for customers as to how data is usedirrespective of the retailer's initial intention. Proof of being an "honest" retailer acting according to data protection laws and the declared privacy policy can be produced by making data storage and data usage transparent. Different institutions providing a first step to transparency already exist: certification authorities, trusted third parties, privacy seals, codes of conduct, or privacy policies are implemented as a predefined agreement regarding the data usage. A promising approach is to supply tools to define individualized privacy and security policies and languages to express it. Currently, the most favored language for expressing privacy policies is P3P, the Platform for Privacy Preferences. P3P uses XML specifications that state: what kind of data is to be stored; how data is to be used; and its permanence and visibility, that is, how long data is to be stored and the corresponding access rights. Customers, admittedly, can express their desires but are not able to control the usage of their data. On the retailer's side, the rules for access are derived from the specified and possibly individualized privacy policies, for example by translating a valid P3P policy into EPAL (Enterprise Privacy Authorization Language), a formal language to express fine-grained enterprise privacy policies.
A highly dynamic system is only privacy-aware if it enforces formalized and personalized privacy policies. Such enforcement can be based upon past information (access control mechanisms) or present and derived information (usage control). Enforcement can be achieved by an information system that has been proven to fulfill the desired properties, in particular self-limitation, and can expect to gain customers' trust by the resultant transparent access to personal data.
However, the characteristics of HDS restrict the effectiveness of formulated policies with regard to their adaptation. On the one hand, the autonomous components mean an increasing complexity for modeling the system and hinder the proof of their behavior. On the other hand, the changing manner of data collection rules out the assignment of a formulated privacy policy to personal data required for enforcing formulated policies: for example, data collected outside the scope of a formulated policy, data collected by multiple devices is not integrated and related to a policy in real time, and data collected describing different events inherently interwoven may lead to conflicting policies. Technically, research could pursue the development of an adaptive `P3P' or the control of the actual usage of data. First efforts try to prevent an unintended usage of data in real time as pursued, for example, by Park and Sandhu [6] or the article by Pretschner et al. in this section.
Back to Top
Privacy Transparency by Evidence Creation
Instead of seeking an ex ante approach to privacy transparency, we introduce the concept of privacy evidence for ex post control of privacy policies. Transparency in HDS is provided by a cooperative mode between technology for detection and enforceable privacy contracts. The enforcement of privacy contracts requires for all involved parties the possibility to detect privacy violationsfor example, by means of auditand document in a way that is acceptable as evidence, such as in a legal dispute. As depicted in Figure 2, the creation of evidence depends on: policies as reference for a compliant usage of data and log views that encompass all data about an individual stored in an information system.
Today's state of the art for contract representation is P3P. However, P3P cannot express composed privacy policies, in particular policies involving multiple, hierarchical departments or enterprises. These limitations are repaired by NAPS, the Novel Algebraic Privacy Specification [7]. Analogously to P3P, NAPS offers conjunction, composition, and scoping operators for policies, but exhibits desirable algebraic properties. This extension is relevant in HDS because it allows a distributed evaluation of composed policies. Although a practical realization is not yet available, NAPS demonstrates that there is no lack of expressive power regarding the representation of contracts. In other words, we canat least theoreticallyadequately represent contracts.
Back to Top
Secure Logging to Ensure Authenticity of Log Data
Log views are the second requisite for creating privacy evidence generated from log data. However, standard logging mechanismssuch as syslog or syslog-ngcannot be used for evidence creation, as they fail to ensure the necessary authenticity guarantees of log data. To provide such guarantees, secure logging is required and the central question concerns the characteristics log data must display to be accepted as evidence.
Authenticity of log data means: confidentialitylog entries cannot be visualized or accessed by unauthorized individuals; integritythe log entries are accurate (entries have not been modified), complete (entries have not been deleted), and compact (entries have not been illegally added to the log file); and uniquenesslog data shall not allow for parallel realities. To realize these properties, proposals such as reliable syslog or Schneier/Kelsey's [8] are the only conceptual guidelines available today. Based on these existing guidelines, we develop a secure logging protocol to ensure authenticity of log data in a way suitable for evidence creation.
Figure 3 illustrates how secure logging is realized, whereas its details and extension for remote collection of log data are found in [1]. To achieve this, standard cryptographic techniques are employed. Evolving cryptographic keyshereby denoted by Sensure not only confidentiality, but also forward integrity, meaning if an attacker succeeds in taking over a logging device at time t, all the log data stored before t cannot be compromised. Hash chains, denoted by HC, guarantee integrity by creating interdependencies between entries. As a side effect, hash chains also provide tamper evidence and uniqueness guarantees of log data. Finally, entry-level access rights, denoted by AR, provide a way of controlling who has access to the log data. These access rights could be derived, for example, from a user's privacy policies.
Back to Top
Log Views are a Basis for Evidence Creation
Albeit essential, secure logging is not enough to create evidence: views on logged dataconceptually similar to database viewsare required but still not available. Log views are compilations of log entries encompassing all data collected about a user. In the case where log data can be directly assigned to a user and the related policies, generating log views can be tackled without further effort. For instance, in a P3P/EPAL setting, where the recorded data and the corresponding policies are stored together, a log view is just a query on log file parameterized by the user identification.
However, there are cases where this assignment is not directly possible or only within a certain degree of probability. In HDS there is a large variety of events that are recorded as isolated pieces of information without any explicit relationship with the surrounding context. Moreover, HDS follow unspecified, unforeseen, and sometimes even chaotic patterns. This complicates the automated generation of precise log views. Techniques to generate log views include guessing particular situations and measuring their plausibility against known facts, as well as extensive data mining to search for specific patterns in recorded data. The results can, in some cases, doubtlessly be associated with the corresponding customer, but in the majority of cases a probabilistic estimation is the best one can get. Current efforts such as the Web Ontology Language provide an accurate description of context data and processes and could lead to more precise log views.
The completeness of evidence generated based on log views remains an unresolved issue. In particular, it is currently impossible to exclude the existence of "shadow" log files hidden from a user. For example, "covert channels" within the system could redirect data to secondary log files not considered when generating log views. While trusted computing platforms could be used to attest the behavior of a data collector, guaranteeing completeness in HDS is even more challenging because of the inherent extensive and unobservable data collection. In consequence, regulatory institutions such as certification standards or legal advisory boards may be the only solution.
Back to Top
Conclusion
Highly dynamic systems enable several novel ways to personalize the relationship with the customer in stationary retailing. For this, the extensive collection and use of personal and context data are essential, but inherently raise privacy concerns: customers increasingly lose control over and awareness about what data is captured or how it is used. To a great extent, concerns of this kind considerably undermine the success of future personalization strategies. In highly dynamic systems, transparency with regard to the utilization of data is a reasonable way to maintain privacy. The concept of privacy evidence we discussed in this article is an initial step in this direction, as it permits an objective view into the data collected about a customer. Evidence could be used as a "sword" for the customer to incriminate in the case of a misuse, or as a "shield" for the retailer to absolve in the case of a privacy-compliant usage. Privacy evidence paves not only the way to transparency, but also to an acceptable deployment of highly dynamic systems.
Back to Top
References
1. Accorsi, R. On the relationship of privacy and secure remote logging in dynamic systems. In S. Fisher-Hübner et al., Eds., Proceedings of the IFIP International Federation for Information Processing, Volume 201, Security and Privacy in Dynamic Environments, Springer-Verlag, 2006, pp. 329338.
2. Köpsell, S., Wendolsky, R., and Federath, H. Revocable anonymity. In G. Müller, Ed. ETRICS 2006, Lecture Notes in Computer Science 3995, Springer-Verlag, 2006.
3. Langheinrich, M. Personal privacy in ubiquitous computingTools and system support. Ph.D. dissertation, ETH Zurich, Switzerland, May 2005.
4. Litfin, T. and Wolfram, G. New automated checkout systems. In M. Krafft and M.K. Mantrala, Eds., Retailing in the 21st Century: Current and Future Trends, 2006.
5. Murthi, B.P.S. and Sarkar, S. The role of the management sciences in research on personalization. Management Science 49, 10 (Oct. 2003).
6. Park, J. and Sandhu, R. The UCONABC usage control model. ACM Transactions on Information and System Security 7, 1 (Feb. 2004).
7. Raub, D. and Steinwandt, R. An algebra for enterprise privacy policies closed under composition and conjunction. In G. Müller, Ed. ETRICS 2006, Lecture Notes in Computer Science 3995, Springer-Verlag, 2006.
8. Schneier, B. and Kelsey, J. Security audit logs to support computer forensics. ACM Transactions on Information and System Security 2, 2 (May 1999), 159176.
9. Srikumar, K. and Bhasker, B. Personalised recommendations in e-commerce. Int. J. Electronic Business 3, 1 (2005).
10. Strüker, J. and Sackmann, S. New forms of customer communication: Concepts and pilot projects. In Proceedings of the Americas Conference on Information Systems (AMCIS `04), (Aug. 2004, New York).
11. Weiser, M. The computer for the 21st century. Scientific American (Sept. 1991).
12. Wohlgemuth, S. and Müller, G. Privacy with delegation of rights by identity management. In G. Müller, Ed. ETRICS 2006, Lecture Notes in Computer Science Volume 3995, Springer-Verlag, 2006.
Back to Top
Authors
Stefan Sackmann ([email protected]) is an assistant professor with the Institute of Computer Science and Social Studies, Department of Telematics, University of Freiburg, Germany.
Jens Strüker ([email protected]) is an assistant professor with the Institute of Computer Science and Social Studies, Department of Telematics, University of Freiburg, Germany.
Rafael Accorsi ([email protected]) is a doctoral candidate with the Institute of Computer Science and Social Studies, Department of Telematics, University of Freiburg, Germany.
Back to Top
Footnotes
1Chicago Sun-Times. Chipping away at your privacy (Nov. 9, 2003).
2RFID and Consumers: Understanding Their Mindset. Commissioned by Capgemini and the National Retail Federation; www.nrf.com/download/NewRFID_NRF.pdf.
Back to Top
Figures
Figure 1. The personalization pyramid.
Figure 2. Privacy evidence creation.
Figure 3. Realization of secure logging.
Back to Top
Tables
Table. Privacy and transparency.
Back to top
©2006 ACM 0001-0782/06/0900 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2006 ACM, Inc.
No entries found