Web servers are accessible by anyone who can access the Internet. Although this universal accessibility is attractive for all kinds of Web-based applications, Web servers are exposed to attackers who may want to alter their contents. Alterations range from humorous additions or changes, which are typically easy to spot, to more sinister tampering, such as providing false or damaging information.
To strengthen the security of Web systems, we have developed and implemented a server-side Web content integrity verifying agent called the Dynamic Security Surveillance Agent (DSSA). It automatically intercepts a user's request and checks the integrity of the requested page on-the-fly before the Web server responds to the client. This is to ensure users find access to the correctly provided information and services, and also to protect the reputation of organizations from damages that could otherwise occur because of illegally altered material on browsers.
Although in this scenario the organizations running Web sites are the primary parties interested in the integrity of their sites' contents, for further confidence users have the opportunity to use the client-side verification techniques to verify the integrity of the displayed browser pages when they wish to conduct security-sensitive online transactions.
However, when Web sites are configured for secure communication between clients and servers, the need for a client-side integrity check is considerably reduced. A secure communication channel can be established using Secure Socket Layer Protocol (SSLP) or Secure Hypertext Transfer Protocol (SHTTP).
DSSA is different from the systems whose integrity check is based on digital signatures. These are mainly client-based systems, which utilize digital certificates and secret keys as part of their client's trusted software. For instance, micropayment systems, which are used for small purchases over Internet [4], rely on their clients to verify the integrity of electronically transmitted material. Therefore, if a Web page was altered at the Web server, the user would still see the modified content or would notice it after the event when the authenticity checks were performed. At this stage the damage is already done.
Digital signatures can potentially provide security by enabling a client to verify the legitimacy of a document by its alleged owner. However, a digital signature can be circumvented by an attacker who may replace the current page by an original but out-of-date page. This modification will not be detected unless the Web server produces a time-stamped signature, putting extra computational load on the server.
DSSA applies a collision-free hash function to verify the integrity of a Web page at the server. The use of a hash function is preferred to digital signatures because it is 1,000 times faster than RSA signature generation, and it is 100 times faster than RSA signature verification systems [4].
A hash function, H, takes the contents of a variable-size Web page and returns a fixed-size string called a hash value, h. The value h is similar to a human fingerprint and uniquely identifies an original Web page from its counterfeit. Comparing the h value of a Web page with its previously calculated true value would prove page integrity or otherwise. Hash functions are called collision-free because it is practically impossible to produce the same hash values for two different Web pages.
Application of fingerprinting techniques, such as hash algorithms for detecting illegal modifications of files is not new. It was used in traditional computing systems such as Unix for a long time.
However, Web systems are distinctly different than traditional distributed computing systems. For example, Web pages are document files that exhibit a rich file structure as far as both their contents and relations are concerned. In addition to the HTML source code, the contents of a typical Web page could include a blend of files such as text, graphics, images, animations, pictures, and video clips, as well as components including frames, cascading style sheets (CSS), executable applets, and program instructions.
Therefore, adoption of known technologies to the Web environment requires an assessment of suitability for situations like server-side integrity checks.
A collection of Web pages held by a Web server normally creates a tree-like structure; integrity checks should also involve their links.
Verifying the integrity of Web content is not just confined to the HTML source code and accompanying text, or parts of, but also its embedded objects and Web components must remain intact from unauthorized alteration before transmission to Web users. Our proposed solution looks at every detail in the page for checking its integrity.
Content integrity has been perceived differently by different groups of users and Web developers as follows:
The following methods can be used for integrity verification:
We propose the following server-side method:
If a Web page was altered at the Web server, the user would still see the modified content or would notice it after the event when the authenticity checks were performed. At this stage the damage is already done.
The DSSA Web content integrity verifier continuously runs on the Web server and checks the authenticity of the Web pages requested by the users. If it detects any alterations, it signals the Web server to block their transmission and take preventive action.
This solution is based on a cryptographically strong hash function and reduces the Web server's calculation overhead. We have chosen the Secure Hash Algorithm, SHA-1, developed by the U.S. National Institute of Standards and Technology. It has been one of the strong hash algorithms at the time of this writing [7]. It provides 160-bit condensed representation of a page's contents as compared to other MD4 family hash functions such as MD5, which, produces a 128-bit-long message digest.
DSSA has been implemented in Java and its operation is shown in Figure 1. The term "agent" is used to define a security checking mechanism that is implemented on the Web server. It performs the following functions:
A digital signature can be circumvented by an attacker who may replace the current page by an original but out-of-date page. This modification will not be detected unless the Web server produces a time-stamped signature, putting extra computational load on the server.
Conformance to the suggested solution does not assure complete security. Web owners should ensure that an overall system security and associated implementation provides an acceptable level of security suitable to the function and security policy of their site.
Using a comprehensive set of experiments, we have measured the time required by DSSA to perform its integrity check before the Web server dispatches the requested page to the client.
Results have taken into account the effect of a clogged network and variability of the network bandwidth on time metrics. They also reflect the absence of Web proxies where Web pages are not cached.
Figure 2 shows the actual DSSA processing time. This calculates the hash value of different-size Web pages at the Web server. This graph represents the time differences between situations where DSSA was intercepting and checking the user request and the situations where no checking was done on the same requested page. For example, for a 20KB Web page, DSSA ran for about 11 milliseconds in order to complete the verification processes.
We also found that for HTML pages less than 5,000 bytes in size, there are no significant differences between presence and absence of the DSSA application because of the small amount of time is spent on calculations (nearly 1 millisecond). However, there are significant differences when the size of Web pages is greater than 0.5KB. We also found there is a linear relationship between the processing time and the file size.
Because the results are mainly dependent on the operating system, CPU speed, Web server configuration, and the site's setup, it is rather difficult to suggest a generalized relationship between the time versus HTML file size so that it applies to every situation (we used a 300MHz computer on NT 4.0 platform and an Internet Information Web server).
Therefore, our findings on time metrics provide a guide to adopting this technique to Web sites with different configurations.
When the integrity check on the server side is completed and authenticity of the computed hash is successfully determined, the Web server begins responding to the user's request.
Figure 3 shows the cumulative time (DSSA processing time plus time spent to transmit and display a page on a user's machine) for different-size Web pages. The same graph also shows file transmission in absence of DSSA overhead. The results show three distinct clusters of data: the first cluster is associated with files less than 10KB in size where they are either text pages or pages with simple images and links. Here, there is almost a linear relationship between the file size and the latency (the time before a page is displayed on a user's machine). There are no significant differences between the absence and presence of the DSSA.
Because of Internet transmission, the measured latency (Figure 3) is significantly higher when compared with the actual time spent on the integrity check on the Web server (Figure 2).
For example, for a 33KB-size Web page, loading time plus integrity checking time is around 118 to 120 milliseconds, while for the same file it took an average of 8 to 9 times longer to reach the user. This is mainly due to the effect of the Internet as being an interconnected packet-switching network of computers.
The second cluster with a jump in measured time refers to a moderately sized group of Web pages having medium-size objects such as images. This is because of the bandwidth problem when transfering more complex data across the Internet. This cluster shows a moderate sawtooth effect when file sizes of greater than 10KB and less than 25KB are loaded and checked.
The third cluster confirms the effect of larger page sizes on Internet transmission time due to more complicated data (large compressed image files, executable applets, to name two).
In this group (file sizes larger than 25KB and up to 35KB) there is a pronounced sawtooth effect on the time to send the files to the client. However, for a 35KB Web page we found the latency to be not more than one second, taking into account parameters contributing to the measurement previously described.
Figure 3 also shows no significant difference between the applications of DSSA or normal transfer of the files by our test Web server. We conclude, in this situation, that the main processing task happens rapidly in a short amount of time at the Web server, but its effect diminishes when a Web page is transmitted to clients over the Internet.
Our experiments also show the latency over the Internet is much more noticeable when a user tries to access a Web site, running with or without DSSA, using a modem.
Web proxies normally try to cache Web pages to respond to HTTP requests from the clients on behalf of the Web server. Proxies assist to reduce latency by 26% [2], depending on the technology, including the client's Internet access. This is in comparison with the direct response from the Web server to user requests. Despite this benefit, centralized Web-page caches could also be targeted by hackers if not protected securely [3].
Conventionally, Web page signing has been one solution for protection and to detect modifications. DSSA can also be used at proxies to prevent alteration of cached files and to speed up subsequent operations. But if users refresh the page on their browsers, a new copy of the page is fetched from the Web server and an integrity test is performed on the Web server.
We have suggested a server-based Web agent to verify the integrity of Web pages from malicious attack and described its implemented results. We have found that a Web server's CPU overhead due to application of hash algorithm is negligible compared to transmission delays. The experimental findings on time metrics can be used as a guide for adopting this technique to Web sites with different configurations.
The proposed server-side integrity check protects the interest of Web server providers and clients. This together with integrity verifications on the client's site provide confidence to users where security sensitive online transactions are concerned. At this stage we are trying to further expand our solution to include verification of dynamic contents in absence of a secure communication line between the Web server and the clients.
1. Harmelen, F.V. and Meer, J. WebMaster: Knowledge-based verification of Web-pages.. In Proceedings of Practical Applications of Knowledge Management, PAKeM'99. The Practical Applications Company, London, pp 147166.
2. Kroeger, T.M, Long, D. and Mogul, J.C. Exploring the bounds of Web latency reduction from caching and prefetching. University of California at Santa Cruz, 1997; www.cse.ucsc.edu/~tmk/publications/ideal/.
3. Peacock, I. and Powell, A. BIBLINK.Checksuman MD5 message digest for Web pages. Ariadne, 1998; www.ariadne.ac.uk/issue17/biblink.
4. Rivest, R. and Shamir, A. Payword and micromintTwo simple micropayment schemes. M. Lomas, Ed. In Proceedings of 1996 International Workshop on Security Protocols, Springer, 1997, pp 6987.
5. Sedaghat, S. Designing Electronic Forms in Web Applications: Integration of Form Components. Springer Verlag, Vol. 1987 (2001).
6. Sedaghat, S. and Pieprzyk, J. Secure online data acquisition systems in a Web-based environment. In Proceedings of the International Workshop on Cooperative Internet Computing. Hong Kong Polytechnic University, Hong Kong, 2000, pp 3742.
7. U.S. National Institute of Standards and Technology. Secure Hash Standard, SHA-1. U.S. Federal Information Processing Standards (FIPS) Publication 180-1, 1995; www.itl.nist.gov/fipspubs/fip180-1.htm.
Figure 1. Operation flow of the Web content integrity verifier, DSSA.
Figure 2. Time to check a Web page for its integrity by DSSA at the Web server.
Figure 3. Impact of response time (cumulative time = testing time + the time to display a page on a browser).
©2002 ACM 0002-0782/02/1100 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2002 ACM, Inc.
No entries found