Based on the positive feedback I received regarding my column, "Caustic Cookies" (May 2001), I conclude there is a genuine interest in the technical aspects underlying privacy and security issues relating to Internet use.
As I said in my May column, the Web is all about a pair of killer Internet protocolsHTML and HTTPthat define, enable, and constrain Web applications. HTTP is the application layer protocol that sits on top of the Transmission Control Protocol (TCP) which, in turn, sits atop the Internet Protocol (IP). The collection of utilities relating to these protocols that reside between the physical communications layer of the Internet, and the productivity tools that we use to get work done (for instance, Web browsers and email clients) is called the TCI/IP protocol suite (a.k.a protocol stack).
The HTTP part of this protocol suite is "stateless." Under the typical scenario, this means that once an initial communication exchange between a client and a server is completed, the connection between them is dropped. This communication exchange is all built around what is commonly called the "TCP three-way handshake," the motivation for which results from the fact that IP is "lossy"that is, if a packet gets lost in an IP transmission, it's gone forever. TCP overcomes this deficiency by keeping track of each leg of the communication exchange. At its most simple level, the three-way handshake works something like this:
Now both client and server are connected and can begin tri-phase dialogues.
Note that in this exchange data persistence ends at Stage 3, so if we're to build upon previous dialogues, we'll have to do it by echoing the contents of previous communications. That's why the Web community began thinking about state management mechanisms in the mid-1990s. The primary state management mechanism they came up with was the "cookie" (see sidebar).
Cookies are used by Web applications as a surrogate for the current state of communication. They have the following properties:
These properties cause some of us to call into question the desirability of using cookies in the first place. In my view, to borrow a phrase from Dijkstra, cookie technology is a mistake carried through to perfectionnot because the concept of transaction management is misguided, after all, there is nothing wrong with the concept of an e-commerce shopping cart, but because this paradigm was set up with inadequate protections and safeguards for end users. For these sorts of reasons, the Clinton administration banned cookies from federal Web sites in the absence of "compelling need" to the contrary in June, 2000.
So one may imagine the members of the IETF Working Group on cookies trying to balance the need for Web transaction management with the vulnerabilities associated with the properties previously mentioned. Alternatives to cookies such as URL encoding, the use of HTML hidden fields, or storage on the server side seemed fraught with difficulties, so perhaps they chose the path of least resistance with the hope that the Web world would behave with some concern for individual privacy and security. After all, hope springs eternal, even on the Internet.
It's time for a reality check. First, every Web "handshake" between client and server is potentially invasive. To illustrate the point, on January 15, 2002, I connected to Amazon.com with the default settings of my Explorer browser just as it came from the developer. Note that entering www.amazon.com produced this extended URL with some additional information embedded: www.amazon.com/exec/obidos/subst/home/ home.html/103-9484532-4674260. I hadn't even shopped for anything and already some user-auditing was taking place. That apparent plaintext number appended to the extended URL is curious, isn't it? Oh, well, isn't a URL just ephemeral data in a browser window?
Here the proverbial plot thickens. Unbeknownst to me, a cornucopia of temporary Internet files were dumped on my hard drive (see Figure 1). Assuming that there's nothing insidious (like Web bugs, an unwise assumption to make, by the way) embedded in the graphics, the only real penalties are the waste of 100KB of disk space and the fact that the address of the Amazon.com cookie administrator was permanently stored on my hard drive. I would prefer to retain the 100KB of disk space and remain anonymous to cookie administrators, but we'll write this off to experience, too.
But this is just the beginning. In addition to the mysterious string added to the URL and the 100KB of Web guano deposited on my hard drive, two cookies were deposited on my hard drive as well:
To borrow a phrase from Dijkstra, cookie technology is a mistake carried through to perfection.
Let's see if we can figure out what some of this means. First, observe that both cookies are linked by some common field values. This suggests some sort of candidate key for a database. Hmmmm. I wonder if there isn't a transaction database in the background. Second, note that there's a field identified as a session ID in the second cookie. Where have we seen that before? Yep, that was the mysterious number at the end of the extended URL. This session ID serves essentially the same role at the applications layer that the sequence number did within the TCP/IP protocol suiteit is part of the authentication process that the application goes through to keep track of the session activity, and link it to a user and an activity log. Now things are beginning to make sense. If we have a transaction database, and an authentication number issued to a user, we probably have identified at least part of a gateway into the database. What's more, when I view the source of the downloaded page from Amazon.com (see Figure 2), I find that same session ID number seems to be added to virtually every link on the page. Take a look at the following HTML fragment for one of the image maps.
Clearly the magic number 103-9484532-4674260 is a core ingredient of the transaction information is being used by Amazon.com about my current activities.
We've reached the point of cascading absurdities. In my last column on cookies, I explained why cookies are well-intentioned mistakes. Now cookies are just part of the quagmire we get into when we engage in poorly thought-through TCP/IP state management. In the example in the previous section, I found that there are several pieces of information about our Web activities that can be spread around cyberspace.
Consider the session ID. This one piece of information appears in the extended URL, two different cookies, and the actual HTML contents of the Web page delivered. What is to prevent a hacker from changing these values and spoofing some other session ID? In a word, nothing. This is one of the primary ways hackers hijack Web sessions. This could be done by modifying the cookie with a text editor and then reconnecting to the Web site, changing the values in the extended URL, or even modifying the hard-coded information after saving the HTML page and then reloading it in the browser. A little trial and error can produce a real mess for innocent victims.
What could one achieve by doing this? For one, hijacking the session ID may reveal enough of the contents of records about users and their behavior to allow a dedicated evil-doer to circumvent the application-level authentication and pretend to be someone else. A truckload of stolen plasma monitors here, a 7-digit withdrawl thereit all adds up. Your imagination can complete the story.
This is a disaster in the making, I hear you cry. Not surprisingly, I've saved the worst for last. If we weren't vulnerable enough, there is an automated environment that takes all of the busywork out of session hijackingspecialized proxy servers.
Proxy servers are conduits between the client (browser application) and the server. The proxy server maintains a complete communication stream at both ends. But because it's an intermediary between client and server, it has the capability to intercept and alter the information as it's passed back and forth within the communication stream. Such valued morsels as session IDs, cookie contents, transaction information, prices, amounts, account numbers, passwords, and so forth are all fair game. Any session credential exchanged in the communications that is in plaintext can be easily altered. If the meaning of the information isn't clearly identified, experimentation is called for. Further, secure sockets layer (SSL) and other encryption environments are of no use because the hijacking takes place at the applications layer above SSL. Hackers have access to several different types of proxy servers that have the built-in capability of editing and transmitting session information in real time.
Actually, there's more to the story, but I'll have to deal with topics like account harvesting and database invasions in a later column.
I'll leave you with this thought: in the world of online banking and e-commerce, the price to be paid for personal security is eternal vigilance.
Figure 1. The temporary Internet files resulting from a single access to www.amazon.com.
Figure 2. This downloaded page from Amazon.com contains a bounty of hidden data when viewed as HTML source.
©2002 ACM 0002-0782/04/0100 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2002 ACM, Inc.
No entries found