BRAP forensics is one of the latest additions to the digital forensics toolset.1 One of the more subtle forms of computer activity mining, it has considerable potential for privacy abuse. Some practitioners distinguish browser forensics from applications footprinting, but the two investigative procedures are so closely related (browsers are, after all, applications) that subsuming them both under the same category of computer activity mining seems more reasonable.
Computer activity mining (CAM) involves the recovery of information about a computer user, or a computer's use, from the computer itself. As such, it is one of the core areas of modern digital forensics along with log analysis, timeline analysis, keystroke capture and analysis, system imaging, and so forth. Log analysis is perhaps the best-known example as it has been a staple of network forensics for years, and is a primary tool for network administrators to reverse engineer hacks of their systems. It is so common in fact that sophisticated hackers consider log cleansing the final stage of a successful hack.
Another core area of digital forensics is media analysis (aka file system forensics)the practice of recovering data from non-volatile storage devices. Where CAM focuses on activity, media analysis focuses on data. BRAP forensics bridges the gap by revealing stored data as well as information about user behavior. That's what makes it interestingand threatening to those concerned with personal privacy management.
In addition, the courts have made computer activity mining an important area of electronic discovery. Law enforcement officials routinely look to CAM for evidence of wrongdoing. This is particularly true in the prosecution of cases involving unacceptable computer use, sexual harassment, child pornography, EULA, computer fraud, identity theft, and intellectual property cases. As with media analysis, BRAP forensics should be thought of as indiscriminate. Once the warrant is served and the forensics completed, personal privacy issues are no longer applicable.
While the browsing experience is familiar to most computer users, the nuances remain nebulous. These nuances are the grist for the BRAP forensics mill. Internet Explorer (IE) on Windows is noteworthy in this regard because it leaves behind a surplus of browser residue. I will focus on IE, though examples may be derived from non-Windows operating systems and alternative browsers.
The browser is the navigation and rendering tool for the Web. When the user clicks on an icon or link, the browser sends an HTTP request to a remote resource. That triggers a download of information. There are many by-products of this exchangesome well understood, some less so.
Cookies are one such by-product. Since HTTP is "stateless," the Web development community introduced these identifiers to store information about the client-server exchange for subsequent connections, either during the current browser session (session identifiers) or during subsequent browser sessions (persistent identifiers). Persistent IE identifiers reside in Documents and Settings>(User)> Cookies under the name of the Web site that produced it. For example, when I recently visited the www.microsoft.com Web site, seven cookies from webtrends.com, atdmt.com, indextools.com, and dcstest.wtlive.com were deposited in this folder on my computer. The Webtrends Web site reports that "Influential technology companies such as Microsoft have used WebTrends Marketing Lab 2 to get a real-time view into both online visitor activity and offline customer information," so I have some idea of why the cookie was left.
When parsed, the two webtrends.com cookies appear as shown in Figure 1a and Figure 1b. The precise meaning of the "value" field is irrelevant to the current discussion. The two datapoints of interest are the timestampsfirst because the timestamp records when my computer was touched by WebTrends, and second because that record won't expire for 10 yearsneither of which leaves me with a particularly good feeling about the experience. As I wrote in a previous column ("Caustic Cookies," April 2001) cookies are transforming our private sanctuaries into electronic auditoriums.
In addition, these cookies collect like lint even if IE security settings are increased. The default browser privacy setting for the risk-averse user might involve putting the privacy setting on HIGH for the Internet zone (IE>Tools>Privacy), because the BLOCK ALL COOKIES setting restricts functionality beyond tolerable levels. The HIGH setting should block tracking cookies and cookies from sites without a compact privacy policy. However, since IE doesn't clear private data on closing (as Firefox does), one must do it manually (IE>Tools>Delete Browsing History>Delete All). Therein lies the rub: the private data is archived in Windows every time the system creates a restore point (XP, 2000) or an incremental shadow copy (Vista). So, if the information isn't manually deleted before that day's backup, it's easy pickings for a BRAP forensicist. System restore points and shadow copies include personal data whether or not you know it. In some cases you can shut them off, but then there's no recovery mode for the operating system. In short, the computer most likely has a record of some or all Web sites visited, and this record is recoverable. The operative question is: Is this what you want?
The same applies to cache and URL history. This data is organized in a largely cryptic INDEX.DAT file in Documents and Settings\<User>\Local Settings\Temporary Internet Files\Content IE5. To illustrate, Figure 2a shows a hex editor's perspective of INDEX.DAT after a single IE visit to Google.com. Note that the cache filenames are identified in the header of INDEX.DAT. Figure 2b shows the parsed contents of the file. As with cookies, if the user doesn't manually remove all of this data it accumulates in the backup files and is readily accessed. Other tools exist to recover cached images.
Unintended residue is also a by-product of typical application use, especially with Microsoft productivity tools. I'll illustrate this point with the now-classic example of how Word metadata was used to embarrass Tony Blair's government.
Users become familiar with the Word metadata through the properties box (found under MS Word>File>Properties>Summary). In 2003, Richard Smith extracted the revision log from a 2003 document sent by Tony Blair's government to Colin Powell that was used to justify the attack on Iraq. As it turned out, parts of the document were copied from an article written by a postgraduate student. The source document was easily identified because the copy preserved spelling, grammatical, and typographical transgressions. The metadata in the source document appears in the sidebar here. The metadata of immediate interest are the four abbreviated names in the revision history: phamil, jpratt, ablackshaw, and MKhan, which were usernames of four people in the Blair government. The log reveals three autorecovery backups to the LOCAL\temp directory for userid="cic22," a subsequent copy by jpratt onto a floppy (A drive); another copy made by ablackshaw onto a floppy, and the final editing on Mkhan's computer. According to Smith, Parliamentary hearings revealed that Pratt passed on a floppy disk to Blackshaw who sent it to Colin Powell for his presentation to the United Nations. The revelation of this information, together with the plagiarism, proved to be a credibility disaster for the governments involved.
Consider the millions of email attachments in global circulation daily. How many people actually know about the volume of metadata they are broadcasting?
We all like to think of the delete key as the quintessential digital cleansing experience. But as we know, modern operating systems do not overwrite deleted file data areas but rather just reassign the affected disk space to the operating system for further use. The intermediate step in this process in Windows involves a recycle bin or recycler. But putting digital waste in the recycle bin doesn't destroy anything. In fact it exposes the user to even more risk because the file information is compressed into a smaller part of the disk, which makes recovery easier.
If you think about it, all of the data necessary to recover a deleted file must go in the recycle bin. Otherwise the file couldn't be undeleted. In Windows XP, for example, the information is stored in a file, INFO2. The information retained includes path, file size, delete time/date, and unique recycle ID. Of course, one could recover this information with a hex editor, but it's much easier to just parse it, as shown in Figure 3. In this case, I had emptied the recycle bin, sanitized it with Evidence Eliminator, and then deleted an Adobe Reader installer so that it alone is the only contained file. Note that I can recover the location of the file, the time/date deleted, the placement of the file within the recycler, and other information from the data recovered in the recycle bin. Until the recycle bin is emptied, this file is very much readable. But, even if the recycle bin is emptied, only this metadata is lost. The actual file data remains recoverable with a hex editor (unless the clusters have been reallocated to another filewhich isn't all that likely on high-capacity drives; see my August 2006 column for additional details).
Another interesting twist is that even if image files are deleted, the recycle bin has been emptied, and the registry and disk have been sanitized, the thumbnails of any image files that remain might still be recoverable if they were ever indexed by Windows Explorer because the image index, THUMBS.DB, stays behind with the folder.
It is important that the computer user understand BRAP forensics because of its potential for invasion of privacy. Far from innocuous, browsers and applications software may reveal more of our behavior than we expect. In terms of subtlety, BRAP forensics goes beyond the older, more traditional areas of computer activity mining. Where a computer log provides information that is relatively objective and impersonal, BRAP forensics provides information that is subjective and personal. Think of it this way: knowing that someone logged into a computer and used a word processor is far less invasive than knowing that someone created a document for a specific person, visited a sequence of Web sites, viewed certain image files, saved the document, and then copied it to a USB memory stick with a known unique ID. BRAP forensics drills down to this level of granularity. And the small form factor of today's removable storage media encourages the circulation of personal and private information.
What I find most objectionable is that the production of this data residue is counterintuitive. The bottom line is that this residue exists for the convenience of myopic software developers who believe their vision of computer use is so incontrovertible that there is no need to entertain other points of view, such as those that put a premium on safeguarding personal privacy. How difficult would it be to offer the user complete control over the backup of non-system files and metadata? Or to allow users the option of browsing the Web without recording tracking cookies or URL histories? Or to create a file system where "delete" actually means delete. To the typical user, learning of these developer excesses retroactively is akin to learning that all of the world's typewriters had been secretly producing invisible carbon copies for Interpol. Who would have imagined that anyone ever thought this was a good idea? While hardware-based encryption systems like BitLocker are an improvement, software use of personal information should follow the "need-to-know" paradigm. Encrypting data residue is never as effective as not storing it in the first place.
1I use the acronym BRAP for BRowser and APplications.
DOI: http://doi.acm.org/10.1145/1349026.1349030
Figure 1. Example WebTrends cookie.
Figure 2. a. (top) A hex editor perspective on the INDEX.DAT file and the four cache folders. b. (bottom) The parsed contents of INDEX.DAT.
©2008 ACM 0001-0782/08/0600 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2008 ACM, Inc.
No entries found