Bistro: a scalable and secure data transfer service for digital government applications | |
Leana Golubchik, William C. Cheng, Cheng-Fu Chou, Samir Khuller, Hanan Samet, C. Justin Wan |
Table of Contents |
Government at all levels is a major collector and provider of data. Our project focuses on the collection of data over wide-area networks (WANs) and addresses the scalability issues that arise in the context of Internet-based massive data collection applications. Furthermore, security, due to the need for privacy and integrity of the data, is a central issue for data collection applications that use a public infrastructure such as the Internet. Numerous digital government applications require data collection over WANs [5]. One compelling example of such an application is the Internal Revenue Service's electronic submission of income tax forms. Other digital government applications include collecting census data, federal statistics, and surveys; gathering and tallying of electronic votes; collecting crime data for the U.S. Justice department; collecting data from sensors for disaster response applications; collecting data from geological surveys; collecting electronic filings of patents, permits, and securities (for SEC) applications; grant proposals and contract bids submissions; and so on. All these applications have scalability and security needs in common. The poor performance that may be experienced by current digital government users, given the existing state of technology (as in Figure 1a), is largely due to how (independent) data transfers using TCP/IP work over the Internet. TCP/IP is good at equally sharing bandwidth between data streams, which in large-scale applications can lead to poor performance for individual clients (as they receive only a very small share of this bandwidth). Given that TCP/IP is here to stay for the foreseeable future, what is needed is a scalable yet cost- effective solution that can be easily deployed over the existing Internet technology. We are designing and developing a system called Bistro, which addresses the scalability needs of digital government data collection applications while allowing them to share the same infrastructure and resources efficiently, cost-effectively, and securely [1]. Bistro's basic approach is to introduce intermediate hostsbistroswhich allow replacement of a traditionally "synchronized client push" approach with a "nonsynchronized combination of client-push and server-pull" approach (as depicted in Figure 1b). This in turn allows spreading of the workload on the destination server and the network over time, with subsequent elimination of hot spots as well as significant improvements in performance for both clients and servers. Our ongoing research [2, 4] indicates that orders of magnitude of improvement can be achieved with the Bistro architecture and the corresponding data collection algorithms it affords. Bistro's design allows for a gradual deployment and experimentation over the Internet (by simply downloading Bistro server software and installing it on public servers). Bistro's security protocol and trust structure [3] are designed such that only encrypted data travels through (not necessarily trusted) bistros. This means a government agency does not need to trust bistros installed by other agencies or commercial institutions. At the same time, these (untrusted) bistros can significantly improve the agency's data collection performance. Each application (within each agency) can have its own scalability, security, fault tolerance, and other data collection needs, and these applications and agencies can still share available resources, if so desired, across all Bistro servers. We believe an appropriately designed single infrastructure such as Bistro can address all digital government wide-area data collection needs in a scalable, secure, and cost-effective manner. (For more information, see bourbon.usc.edu/iml/bistro/. |
1. Bhattacharjee, S., Cheng, W.C., Chou, C-F, Golubchik, L, and Khuller, S. Bistro: A platform for building scalable wide-area upload applications. ACM SIGMETRICS Performance Evaluation Review 28, 2 (Sept. 2000), 2935. (Also presented at the Workshop on Performance and Architecture of Web Servers, June 2000.)
2. Cheng, W.C., Chou, C-F, and Golubchik, L. Performance of online batch-based digital signatures. Submitted for publication.
3. Cheng, W.C., Chou, C-F, Golubchik, L., and Khuller, S. A secure and scalable wide-area upload service. In Proceedings of the 2nd International Conference on Internet Computing 2 (June 2001), 733739.
4. Cheng, W.C., Chou, C-F, Golubchik, L., Khuller, S., and Wan, Y.C. On a graph-theoretic approach to scheduling large-scale data transfers. Submitted for publication.
5. Cheng, W.C., Chou, C-F., Golubchik, L., Khuller, S., and Samet, H. Scalable data collection for Internet-based digital government applications. Proceedings of the 1st National Conference on Digital Government Research. (Los Angeles, CA, May 2001), 108113.
©2003 ACM 0002-0782/03/0100 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2003 ACM, Inc.
No entries found