With the advance of modern computer forensics tools, disk wiping (aka data wiping or disk/data erasing) has become increasingly important in the protection of proprietary, confidential, private, and personal information for the law-abiding computer user. Because of Windows' ubiquity in the desktop and notebook markets, a cornucopia of disk wiping utilities are now available. We wondered just how effectively these utilities are cleansing disks.
Our interest in disk wiping began with the observation that the Windows built-in utility, cipher, wipes disks by filling a file, EFSTMPWP, with enough data to consume all available non-allocated space. Most noteworthy was our observation that EFSTMPWP could on occasion take up so much space that the OS no longer had room to breath and would hang up. Windows wouldn't reload until EFSTMPWP was deleted by booting to a non-resident OS.
Our appetite whetted, we sought to observe the behavior of some of these utilities and compare the results. Though we document our results here, our greatest interest lies in understanding why these utilities produced the observed results. Toward that end, we'll begin with a brief overview of the Windows New Technology File System (NTFS) and then link the NTFS data structures to the disk residue.
NTFS version 5 is the most common Windows file system in use today. It has some clever features that add efficiency, especially when it comes to file searching. The most important disk structure on an NTFS drive is the MFT (Master File Table), which is stored in the root directory of the volume as the file, "$MFT." The MFT is used to keep track of how the disk blocks are allocated and what is stored in them. The MFT records are themselves rather flexible, having resident and non-resident "attributes" based on the type of information being stored. For instance, a standard file would have attributes like "$STANDARD_INFORMATION," containing security identifiers, file access and modification times, owner identifiers, and so forth. A standard file will also have at least one "$FILENAME" attribute that is used to describe the filename (a second "$FILENAME" is not uncommon, used to represent the old DOS 8.3 convention). The actual file content is either stored in the MFT entry itself (if it's small enough) or referenced by the "$DATA" structure within the file's MFT entry. Storing a file within the MFT is indicated by the "resident" attribute; else $DATA points to the blocks where data is stored and the attribute is marked as non-resident.
Directories share $STANDARD_INFORMATION and $FILENAME attributes, but include additional structures like $INDEX_ROOT and $INDEX_ALLOCATION that make up the B-tree structure used to keep track of directory entries. The use of B-trees adds complexity to the identification of a deleted or moved file's parent directory since the entire B-tree is re-sorted every time a file is deleted. This usually overwrites the old reference to the deleted or moved file. In essence, $INDEX_ROOT and $INDEX_ALLOCATION are the structures that make disk organization at the file name layer possible.
While there are many variations on this theme, the preferred approach to disk wiping at this writing seems to involve the creation of a new file containing a wiping pattern (for example, all zeros, all ones, random zeroes and ones). This "single-file" approach takes advantage of the host file system for efficiency, because the "pattern" is created and applied to all available free disk space (including, most importantly, deleted files). This makes it unnecessary to deal with blocks, clusters, and sectors individually. We ignore partially allocated disk space (such as RAM slack and file slack) for the present purposes. The general theme is this: Windows doesn't delete files, it simply marks the physical space that the files occupied as unallocated and available for reuse. If a disk wiping utility obliterates all of the unallocated space it will, among other things, obliterate the space formerly occupied by deleted files. It's just that simple.
It is clear that most disk wipers leave behind a lot of telltale information that may have proprietary or security implications.
After the free space has been wiped, some utilities make an effort to scour through the $INDEX_ROOT and $INDEX_ALLOCATION of the directories to be sure that everything has been cleared out. For all but one of the products we tested, this is the extent of the wiping that is accomplished. Even after this second step, considerable data residue usually remains.
Recall the earlier discussion of the MFT structures. The directories themselves are simply constructs to allow for the user's organization of the items on the disk. The MFT entries connect the user level to the data level. While some of the wiping tools did seem to make changes to the contents of the deleted MFT entries, we found that most of these tools consistently failed to remove all information.
Figure 1 shows that we can clearly derive file names, both 8.3 and long (Unicode) file names. The tools, with one exception, make no effort to overwrite the old MFT entries. This is a problem for two reasons: first, the file and directory names are commonly indicative of content. One might infer from such information the nature of the business, the level of confidentiality, names, and so forth. Worse yet, if the files are small enough (if the resident attribute was set) $DATA will contain all of the original data (see Figure 2).
What went wrong? Remember that the disk wiping utilities typically wipe unallocated space. In order to be confident that this approach works completely, it is incumbent on the user to determine exactly where the data is (allocated space, unallocated space, slack space, MFT resident). It is unreasonable to expect users to have that level of awareness. In this case the data is stored within the $MFT and the $MFT_MIRROR, which are allocated space. While there may be slack space associated with them, the area where MFT entries exist is clearly not slack space. For these reasons, most disk wiping utilities miss them.
To determine how some of the popular utilities handle disk sanitization, we copied a subdirectory from an NTFS disk to several NTFS-formatted, SanDisk 256MB memory cards. The subdirectory structure included files with alternate data streams (ADS), word processing documents, programs, and graphics. The test sequence involved:
For post-erasure disk analysis, we used WinHex, Access Data's FTK, and a piece of custom C code (see URL Pearls). Figures 3a and 3b show two different perspectives on analyzing disk residue and illustrate the underlying digital forensics with two different forensics tools. A summary of our findings is shown in the table, which lists utilities and our observed results.
What we found is that only one product in our test environment, Evidence Eliminator, eliminated enough of the data to fall within our comfort zone. It is clear that most disk wipers leave behind a lot of telltale information that may have proprietary or security implications. Caveat emptor is appropriate here: disk wiping utilities (with the single exception), especially including the built-ins, may leave enough metadata residue for an observer to tell a lot about you and your organization. And if the files are small enough, the entire files are left untouched.
We emphasize that these results must be taken in context. First and foremost, we limited our concern to data residue that could potentially be recovered with software. The reason for this is that hardware recovery is expensive enough to make casual snooping impractical. The use of sophisticated magnetic sensors and electron microscopes to recover erased data places most of the risk in the realm of governments and government agencies that may be more likely to use digital surveillance and real-time capture (such as Carnivore and Magic Lantern). We note that some of the disk wiping utilities we used did have features that purport to mitigate hardware recovery.
Second, we did not test all storage environments. For that reason, we provided the setup configuration settings so others may duplicate the experiments in their own environments. We predict that an NTFS file system on any medium will behave in a similar, but not necessarily identical, fashion. We have experienced one disk wipe on an 80GB external USB drive that produced more residue than we found on the memory cards, even with Evidence Eliminator. So a word of caution is appropriate.
Third, we didn't make any effort to clean the registry hive. "Messing with the registry is really dangerous," says Microsoft, and one is wise to take their word for it. Telltale residue is left behind in the registry without question—how valuable this information is to an onlooker is open to conjecture. Some vendors, such as Evidence Eliminator, encourage the use of registry cleaning tools such as Microsoft's own REGCLEAN, but our fear of turning our workstations into boat anchors disabused us of any temptation to run it. For those who are tempted, beware that REGCLEAN has been reported to cause as many problems as it fixes, and REGCLEAN only cleans HKEY_CLASSES_ROOT, which is not usually the most trouble-prone part of the registry. HKLM, for example, is unaffected by REGCLEAN. If that doesn't scare you away, consider that Microsoft no longer supports REGCLEAN.
Finally, there is another category of product that we didn't test: the so-called disk sanitizers or disk purgers. These are products that are marketed to people who intend to repurpose or recycle their computers. In the absence of empirical test results, our advice would be to favor those that claim compliance with appropriate government standards and receive high marks in trade reviews.
Figure 2. Filename and data residue.
Figure 3a. Winhex 12.75 integrated with X-Ways Forensics. Note the presence of MFT metadata and persistent filenames.
Figure 3b. Access Data's Forensics Tool Kit 1.61. Note directory structure and file name residue.
©2006 ACM 0001-0782/06/0800 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2006 ACM, Inc.