acm-header
Sign In

Communications of the ACM

Computer-supported cooperative work in design

On-the-Fly Web Content Integrity Check Boosts -sers' Confidence


Web servers are accessible by anyone who can access the Internet. Although this universal accessibility is attractive for all kinds of Web-based applications, Web servers are exposed to attackers who may want to alter their contents. Alterations range from humorous additions or changes, which are typically easy to spot, to more sinister tampering, such as providing false or damaging information.

To strengthen the security of Web systems, we have developed and implemented a server-side Web content integrity verifying agent called the Dynamic Security Surveillance Agent (DSSA). It automatically intercepts a user's request and checks the integrity of the requested page on-the-fly before the Web server responds to the client. This is to ensure users find access to the correctly provided information and services, and also to protect the reputation of organizations from damages that could otherwise occur because of illegally altered material on browsers.

Although in this scenario the organizations running Web sites are the primary parties interested in the integrity of their sites' contents, for further confidence users have the opportunity to use the client-side verification techniques to verify the integrity of the displayed browser pages when they wish to conduct security-sensitive online transactions.

However, when Web sites are configured for secure communication between clients and servers, the need for a client-side integrity check is considerably reduced. A secure communication channel can be established using Secure Socket Layer Protocol (SSLP) or Secure Hypertext Transfer Protocol (SHTTP).

DSSA is different from the systems whose integrity check is based on digital signatures. These are mainly client-based systems, which utilize digital certificates and secret keys as part of their client's trusted software. For instance, micropayment systems, which are used for small purchases over Internet [4], rely on their clients to verify the integrity of electronically transmitted material. Therefore, if a Web page was altered at the Web server, the user would still see the modified content or would notice it after the event when the authenticity checks were performed. At this stage the damage is already done.

Digital signatures can potentially provide security by enabling a client to verify the legitimacy of a document by its alleged owner. However, a digital signature can be circumvented by an attacker who may replace the current page by an original but out-of-date page. This modification will not be detected unless the Web server produces a time-stamped signature, putting extra computational load on the server.

DSSA applies a collision-free hash function to verify the integrity of a Web page at the server. The use of a hash function is preferred to digital signatures because it is 1,000 times faster than RSA signature generation, and it is 100 times faster than RSA signature verification systems [4].

A hash function, H, takes the contents of a variable-size Web page and returns a fixed-size string called a hash value, h. The value h is similar to a human fingerprint and uniquely identifies an original Web page from its counterfeit. Comparing the h value of a Web page with its previously calculated true value would prove page integrity or otherwise. Hash functions are called collision-free because it is practically impossible to produce the same hash values for two different Web pages.

Application of fingerprinting techniques, such as hash algorithms for detecting illegal modifications of files is not new. It was used in traditional computing systems such as Unix for a long time.

However, Web systems are distinctly different than traditional distributed computing systems. For example, Web pages are document files that exhibit a rich file structure as far as both their contents and relations are concerned. In addition to the HTML source code, the contents of a typical Web page could include a blend of files such as text, graphics, images, animations, pictures, and video clips, as well as components including frames, cascading style sheets (CSS), executable applets, and program instructions.

Therefore, adoption of known technologies to the Web environment requires an assessment of suitability for situations like server-side integrity checks.

A collection of Web pages held by a Web server normally creates a tree-like structure; integrity checks should also involve their links.

Verifying the integrity of Web content is not just confined to the HTML source code and accompanying text, or parts of, but also its embedded objects and Web components must remain intact from unauthorized alteration before transmission to Web users. Our proposed solution looks at every detail in the page for checking its integrity.

Back to Top

Types of Content Integrity

Content integrity has been perceived differently by different groups of users and Web developers as follows:

  • Integrity in terms of Web page development. This refers to checking the integrity and workability of links, existence of images and other files used in the Web page, as well as the syntactic validity, correct use of the HTML language, associated conventions, and Web authoring techniques.
  • Integrity in terms of data validity and security. This refers to collecting valid data securely over the Internet using Web forms [5, 6] or integrity verification of Web content based on semantic content of the site [1].
  • Integrity in terms of content authenticity. This is to ensure no tampering is done to the document once it is created [3].

Back to Top

Integrity Check Methods

The following methods can be used for integrity verification:

  • Each page is authenticated (signed) using a digital signature as a separate document. As discussed earlier, this is computationally demanding. One can sign two messages per second or verify 200 signatures per second [4]. This is not suitable for the type of work we propose on the server side.
  • A sequence of pages are traversed and signed as a single authenticated path. This is more efficient than the previous method, but verification is done at the end of the path. This is time consuming and inappropriate for our purposes.
  • The site pages have a tree structure. In this case it is possible to check the system integrity in a more flexible way by the application of digital signatures. This is also computationally expensive.

We propose the following server-side method:

  • A hash function is applied on-the-fly to the full content of a requested Web page, including all of its embedded objects. If the calculated hash value is the same as the previously calculated and securely stored true value, then the page is authentic and the Web server would respond to clients. Otherwise it assumes corrective measures about the breaches in page security.

If a Web page was altered at the Web server, the user would still see the modified content or would notice it after the event when the authenticity checks were performed. At this stage the damage is already done.


Back to Top

Web Content Integrity Verifier

The DSSA Web content integrity verifier continuously runs on the Web server and checks the authenticity of the Web pages requested by the users. If it detects any alterations, it signals the Web server to block their transmission and take preventive action.

This solution is based on a cryptographically strong hash function and reduces the Web server's calculation overhead. We have chosen the Secure Hash Algorithm, SHA-1, developed by the U.S. National Institute of Standards and Technology. It has been one of the strong hash algorithms at the time of this writing [7]. It provides 160-bit condensed representation of a page's contents as compared to other MD4 family hash functions such as MD5, which, produces a 128-bit-long message digest.

DSSA has been implemented in Java and its operation is shown in Figure 1. The term "agent" is used to define a security checking mechanism that is implemented on the Web server. It performs the following functions:

  • Dynamically listens to the Web communication channel between the Web server and Web browsers and intercepts clients' requests for a Web page.
  • Calculates the hash value of the requested page and compares it with the previously calculated hash value stored in a secure database on another write-protected local server. Storing hash values on a different server provides better security where its communication channel with the Web server is also secured by SSLP.
  • If a Web page references documents located on servers other than the site's server (external), there are two possibilities for checking the page integrity. If all externally referenced documents are available, DSSA checks their integrity as well. Otherwise, it compares the final hash value with another previously stored value for the same page without those links.
  • Based on the comparison results, the Web server takes appropriate action. If the hash values are the same, DSSA sets the Web server's flag to send the requested page to the user. If the test fails, DSSA sends an email message to alert the site's administrator of the situation. At the same time, DSSA tries to fetch the requested page from a secure back-up server and sends it to the user. In case of any problems with the back-up server or its page integrity, it transmits a message to the client showing unavailability of the site for the time being. At this situation, it automatically rejects any further communication from the clients to the Web server until the problem is resolved.

A digital signature can be circumvented by an attacker who may replace the current page by an original but out-of-date page. This modification will not be detected unless the Web server produces a time-stamped signature, putting extra computational load on the server.


Conformance to the suggested solution does not assure complete security. Web owners should ensure that an overall system security and associated implementation provides an acceptable level of security suitable to the function and security policy of their site.

Back to Top

Experimental Results, Time Metrics

Using a comprehensive set of experiments, we have measured the time required by DSSA to perform its integrity check before the Web server dispatches the requested page to the client.

Results have taken into account the effect of a clogged network and variability of the network bandwidth on time metrics. They also reflect the absence of Web proxies where Web pages are not cached.

Figure 2 shows the actual DSSA processing time. This calculates the hash value of different-size Web pages at the Web server. This graph represents the time differences between situations where DSSA was intercepting and checking the user request and the situations where no checking was done on the same requested page. For example, for a 20KB Web page, DSSA ran for about 11 milliseconds in order to complete the verification processes.

We also found that for HTML pages less than 5,000 bytes in size, there are no significant differences between presence and absence of the DSSA application because of the small amount of time is spent on calculations (nearly 1 millisecond). However, there are significant differences when the size of Web pages is greater than 0.5KB. We also found there is a linear relationship between the processing time and the file size.

Because the results are mainly dependent on the operating system, CPU speed, Web server configuration, and the site's setup, it is rather difficult to suggest a generalized relationship between the time versus HTML file size so that it applies to every situation (we used a 300MHz computer on NT 4.0 platform and an Internet Information Web server).

Therefore, our findings on time metrics provide a guide to adopting this technique to Web sites with different configurations.

When the integrity check on the server side is completed and authenticity of the computed hash is successfully determined, the Web server begins responding to the user's request.

Figure 3 shows the cumulative time (DSSA processing time plus time spent to transmit and display a page on a user's machine) for different-size Web pages. The same graph also shows file transmission in absence of DSSA overhead. The results show three distinct clusters of data: the first cluster is associated with files less than 10KB in size where they are either text pages or pages with simple images and links. Here, there is almost a linear relationship between the file size and the latency (the time before a page is displayed on a user's machine). There are no significant differences between the absence and presence of the DSSA.

Because of Internet transmission, the measured latency (Figure 3) is significantly higher when compared with the actual time spent on the integrity check on the Web server (Figure 2).

For example, for a 33KB-size Web page, loading time plus integrity checking time is around 118 to 120 milliseconds, while for the same file it took an average of 8 to 9 times longer to reach the user. This is mainly due to the effect of the Internet as being an interconnected packet-switching network of computers.

The second cluster with a jump in measured time refers to a moderately sized group of Web pages having medium-size objects such as images. This is because of the bandwidth problem when transfering more complex data across the Internet. This cluster shows a moderate sawtooth effect when file sizes of greater than 10KB and less than 25KB are loaded and checked.

The third cluster confirms the effect of larger page sizes on Internet transmission time due to more complicated data (large compressed image files, executable applets, to name two).

In this group (file sizes larger than 25KB and up to 35KB) there is a pronounced sawtooth effect on the time to send the files to the client. However, for a 35KB Web page we found the latency to be not more than one second, taking into account parameters contributing to the measurement previously described.

Figure 3 also shows no significant difference between the applications of DSSA or normal transfer of the files by our test Web server. We conclude, in this situation, that the main processing task happens rapidly in a short amount of time at the Web server, but its effect diminishes when a Web page is transmitted to clients over the Internet.

Our experiments also show the latency over the Internet is much more noticeable when a user tries to access a Web site, running with or without DSSA, using a modem.

Back to Top

Proxies and Integrity Checker

Web proxies normally try to cache Web pages to respond to HTTP requests from the clients on behalf of the Web server. Proxies assist to reduce latency by 26% [2], depending on the technology, including the client's Internet access. This is in comparison with the direct response from the Web server to user requests. Despite this benefit, centralized Web-page caches could also be targeted by hackers if not protected securely [3].

Conventionally, Web page signing has been one solution for protection and to detect modifications. DSSA can also be used at proxies to prevent alteration of cached files and to speed up subsequent operations. But if users refresh the page on their browsers, a new copy of the page is fetched from the Web server and an integrity test is performed on the Web server.

Back to Top

Conclusion

We have suggested a server-based Web agent to verify the integrity of Web pages from malicious attack and described its implemented results. We have found that a Web server's CPU overhead due to application of hash algorithm is negligible compared to transmission delays. The experimental findings on time metrics can be used as a guide for adopting this technique to Web sites with different configurations.

The proposed server-side integrity check protects the interest of Web server providers and clients. This together with integrity verifications on the client's site provide confidence to users where security sensitive online transactions are concerned. At this stage we are trying to further expand our solution to include verification of dynamic contents in absence of a secure communication line between the Web server and the clients.

Back to Top

References

1. Harmelen, F.V. and Meer, J. WebMaster: Knowledge-based verification of Web-pages.. In Proceedings of Practical Applications of Knowledge Management, PAKeM'99. The Practical Applications Company, London, pp 147–166.

2. Kroeger, T.M, Long, D. and Mogul, J.C. Exploring the bounds of Web latency reduction from caching and prefetching. University of California at Santa Cruz, 1997; www.cse.ucsc.edu/~tmk/publications/ideal/.

3. Peacock, I. and Powell, A. BIBLINK.Checksum—an MD5 message digest for Web pages. Ariadne, 1998; www.ariadne.ac.uk/issue17/biblink.

4. Rivest, R. and Shamir, A. Payword and micromint—Two simple micropayment schemes. M. Lomas, Ed. In Proceedings of 1996 International Workshop on Security Protocols, Springer, 1997, pp 69–87.

5. Sedaghat, S. Designing Electronic Forms in Web Applications: Integration of Form Components. Springer Verlag, Vol. 1987 (2001).

6. Sedaghat, S. and Pieprzyk, J. Secure online data acquisition systems in a Web-based environment. In Proceedings of the International Workshop on Cooperative Internet Computing. Hong Kong Polytechnic University, Hong Kong, 2000, pp 37–42.

7. U.S. National Institute of Standards and Technology. Secure Hash Standard, SHA-1. U.S. Federal Information Processing Standards (FIPS) Publication 180-1, 1995; www.itl.nist.gov/fipspubs/fip180-1.htm.

Back to Top

Authors

Soroush Sedaghat ([email protected]) is the senior information analyst at the University of Western Sydney in Australia.

Josef Pieprzyk ([email protected]) is a professor of computing at Macquarie University in Australia.

Ehsan Vossough ([email protected]) is a computer lecturer at the University of Western Sydney in Australia.

Back to Top

Figures

F1Figure 1. Operation flow of the Web content integrity verifier, DSSA.

F2Figure 2. Time to check a Web page for its integrity by DSSA at the Web server.

F3Figure 3. Impact of response time (cumulative time = testing time + the time to display a page on a browser).

Back to top


©2002 ACM  0002-0782/02/1100  $5.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2002 ACM, Inc.


 

No entries found