The term peer-to-peer (P2P) refers to distributed systems without any central control, where all the nodes (called peers) are equivalent in functionality. In a P2P system, peers can collaborate and communicate with each other without the need for centralized components. P2P systems organize the peer computers in a virtual communication network called an overlay network. Overlaynetworks generally have self-organizing characteristics, which means they are established and maintained by P2P software without any human intervention and that P2P software manages events such as peers joining and leaving the network. This self-organizing nature of P2P helps reduce the management cost of the computer infrastructure. However, the decentralized nature of P2P networks makes it difficult to develop efficient algorithms for tasks such as clustering and search, which are required by many P2P applications.
The topology of the overlay network is a graph whose vertices are the peers in the network, and whose edges are the connections between the peers. In this article, the term topology adaptation refers to adjusting an overlay network topology to satisfy certain criteria when peers leave or join the network. This article describes an abstract algorithm based on a sociological model, which can be used to create a family of topology adaptation algorithms. We present one such topology adaptation algorithm that can be executed by the peers to create a network of hubs (super-peers) in a P2P overlay network.
On the basis of the overlay network architecture, P2P applications can be divided into three major categories: centralized, decentralized structured, and decentralized unstructured (the latter is also called pure P2P) [3]. The algorithm presented here is only useful for pure P2P networks. In a decentralized structured network, the location of a peer is determined by the key space for which it is responsible, thus making topology adaptation difficult.
In existing P2P overlay networks (such as Gnutella; www.gnutella.com), there is typically no control over the type of peers that are connected as neighbors. This leads to a suboptimal grouping of peers. For example, in a file-sharing application, peers with high bandwidth capacity may be grouped together with peers with low bandwidth capacity, which may lead to degradation in performance. P2P applications can benefit by connecting peers on the basis of their characteristics to use the capabilities of all the peers more efficiently. For example, in file-sharing applications, it is beneficial if peers with similar properties (bandwidth, geographic location, or contents) are connected to each other.
In 1960, Thomas Schelling, an economist, proposed a model to explain the existence of segregated neighborhoods in the U.S. [5, 6]. He observed that the appearance of such segregated neighborhoods is caused neither by a central authority, nor by the desire of people to stay away from dissimilar people; instead, it is the cumulative effect of simple actions (moves) by individuals who want at least a certain proportion of their neighbors to be similar to themselves.
Schelling's model is decentralized and self-maintaining in nature, making it attractive for topology adaptation in dynamic environments such as pure P2P networks, which lack a central authority. Here, we present an abstract algorithm (called abstract Schelling algorithm) based on Schelling's model, which can be used to create a family of topology adaptation algorithms for P2P networks. The algorithms can be executed by the peers to create a P2P topology satisfying certain criteria. The topologies developed using this approach can adapt to the continuous arrival of new peers in the network. This article describes one such algorithm, which can be executed by the peers to create a network of hubs within a decentralized unstructured network.
In Schelling's model, the world is modeled as a m x n grid. Approximately two-thirds of the cells in the grid are populated by blue or red turtles. The remaining cells are empty. Each cell can host a maximum of one turtle. In the beginning, a random number of blue and red turtles are randomly distributed on the grid. All the turtles desire at least a certain percentage of their neighbors to be of the same color. If a turtle is not satisfied with its neighbors, it moves to an adjacent empty cell (if available) chosen randomly. The simulation goes on until all the turtles are satisfied with their neighbors. As the simulation progresses, segregation can be observed on the grid. Such segregation is an emergent behavior caused by the desire of the turtles to ensure a certain minimum percentage of their neighbors are the same color as themselves. In Schelling's model, the turtles act using their awareness of the local network topology. This model is interesting for P2P systems because the peers lack a global picture of the network topology. In the model, grouping is maintained even when turtles join or leave the system, which makes it attractive for the dynamic environments of P2P networks.
We used the Template Method [2] design pattern to create an abstract Schelling algorithm. In this design pattern, the skeleton of an algorithm is defined in an operation, delegating the steps that may change to a subclass. The subclasses implement the steps that vary. The abstract Schelling algorithm makes it possible to create a family of topology adaptation algorithms that share the same structure. In Schelling's model, the steps that may vary are the satisfaction criteria, the actions to be performed if a peer is not satisfied, and the frequency at which the satisfaction state should be checked.
In the abstract Schelling algorithm, a peer calculates its satisfaction state at predefined intervals. Satisfaction state is a Boolean value indicating whether a peer is satisfied with its local view of the overlay network topology. If a peer is not satisfied with its neighbors, it performs topology adaptation steps. The satisfaction criteria, the topology adaptation steps, and the time delay after which a peer calculates its satisfaction state, will vary with the application and the topology desired. Table 1 shows some operations that can be used to describe satisfaction criteria and topology adaptation steps. Table 2 presents an example of satisfaction criteria and topology adaptation steps. We used them to create a topology adaptation algorithm that can cluster peers with similar bandwidths in a pure P2P network to utilize the bandwidth available in the network efficiently [7]. By using the criteria shown in Table 2 and topology adaptation steps in which peers drop a dissimilar neighbor selfishly, another topology adaptation algorithm can be created to cluster peers.
Existing pure P2P overlay networks tend to be inefficient when it comes to performing a search on the network. Pure P2P systems (such as Gnutella) flood the network with query messages. Considerable research has been done to improve the efficiency of search in a pure P2P network. Two prominent approaches are currently employed: use random routes to propagate the search query [1] or use super-peers that maintain a directory of resources in the network to resolve a query. In super-peer systems (such as Kazaa; www.kazaa.com), a client sends search requests to its super-peer, which replies to the query. This reduces the need for expensive broadcasts on the overlay network. In a super-peer topology, ordinary peers are connected to only one super-peer and are not connected to each other. The super-peers are in turn connected to each other to form a pure P2P system.
In a super-peer system, the failure of a super-peer can have catastrophic consequences, causing complete communication failure for the cluster of nodes attached to that particular super-peer. To resolve this problem, a variation (see Figure 1) of the super-peer topology can be used in which the ordinary peers are connected to each other and can be connected to more than one super-peer (called a hub). In this topology (called the hub-based topology in this article), the failure of a single hub is not catastrophic because the connections between the peers can be used for communication in case of hub failure. This topology is similar to the topology used in JXTA [8]. In JXTA, each super-peer maintains a list of all the other super-peers of which it is aware. At regular time intervals, the super-peer passes this list to all other known super-peers. Over time, JXTA expects each super-peer will discover all the other super-peers in the network.
In a hub-based topology, the hubs are connected to each other to form a network. Hubs are high-availability and high-capacity peers. They are used in the overlay network to perform resource-intensive tasks such as maintaining a directory of resources in the network, which can be used to process search requests. A peer examines its capacity to decide whether it will act as a peer or a hub. Here, we present an algorithm (the hub algorithm, based on the abstract Schelling algorithm) that can be used to create an adaptive network of hubs within a pure P2P network. The hub algorithm uses the topology adaptation steps and the satisfaction criteria shown in Table 3.
We have performed simulations to study the effect of applying the hub algorithm to a static overlay network and a dynamic overlay network with a constant inflow of peers. All the peers are within the same process in the simulator. All the simulations were performed on networks with small-world characteristics, which is typical of P2P overlay networks. In the networks 90% of the peers (chosen randomly) are ordinary peers and 10% are hubs. The maximum number of connections is 5 for an ordinary peer and 20 for a hub. The percentage of hubs and the maximum number of connections are chosen so the connection slots available with the hubs are more than double the number of peers. Each peer is initially connected to three randomly chosen peers from the overlay network. By design, the generated random topology is guaranteed to be connected. The search operation is performed using a Depth First Search [4] on the overlay network. Our first set of simulations was performed on a static random network created from scratch. Hmax is defined as the maximum number of hubs desired as neighbors by a hub. The simulations were done on four different random networks (created by using different seed values for the random network generator on Linux) of 100 to 1,000 peers, each using Hmax values from 1 to 10. The simulations go on until all the peers are satisfied or the simulator reaches 1,000 iterations. For a random network of 100 peers, less than 400 messages are exchanged for the hub algorithm to converge. For a random network of 1,000 peers, less than 10,000 messages are exchanged for the hub algorithm to converge.
A critical value of Hmax (called HmaxCritical) was observed below which all the peers were not satisfied, even after 1,000 iterations. The value of HmaxCritical is different for different random networks. A typical value of HmaxCritical observed in the simulations is 5. When Hmax is below HmaxCritical, the simulations do not converge, because some of the hubs are unable to find another hub to establish a connection and therefore remain unsatisfied. When Hmax is greater than or equal to HmaxCritical, all the peers are satisfied and the simulations converge within five iterations. The simulations converge when all the peers are satisfied, which means each hub in the overlay network is connected to at least one other hub and at most to Hmax hubs, and all the ordinary peers are connected to at least one hub.
The second set of simulations was performed on a random network where new peers joined the system at every iteration of the simulator, to demonstrate that the approach can be used in dynamic environments. The simulations start with a small random network of 100 peers, and five new peers are added at every iteration until the number of nodes reaches 5,000. The simulations were made using Hmax=5 because this is a typical value for HmaxCritical. The simulations go on until there are 5,000 peers in the overlay network and all the peers are satisfied, or until the simulator reaches 1,500 iterations.
We use four metrics to study the topology of the overlay network: hub-hub degree, hub-peer degree, peer-peer degree, and peer-hub degree. The hub-hub degree is defined as the total number of hub-to-hub connections in the overlay network divided by the total number of hubs. The hub-peer degree is defined as the total number of hub-to-peer connections in the overlay network divided by the total number of hubs. The peer-hub degree is defined as the total number of hub-to-peer connections in the overlay network divided by the total number of peers. The peer-peer degree is defined as the total number of peer-to-peer connections in the overlay network divided by the total number of peers.
Figure 2 shows plots of these four metrics against time (simulator iterations) for simulations performed on one of the random networks using the hub algorithm. Throughout the simulations, each peer is on average connected to approximately two hubs (peer-hub degree of 2). When the simulations start, each hub is on average connected to approximately two other hubs (hub-hub degree of 2). However, within 100 iterations, the hub-hub degree reaches 3 and stays at this value throughout the simulations. In an exactly similar simulation where the only change was that we did not use the hub algorithm, the hub-hub degree remained constant at about 2.
Figure 3 shows a plot of the number of messages exchanged to perform topology adaptation against time (simulator iterations) for the random network of Figure 2. The total number of messages (120,000) may seem to be on the high side. Whether this cost is acceptable depends on the improvement in performance obtained via topology adaptation during the operation of the network. For long-lived overlay networks with significant communication, we expect topology adaptation to be worth the cost in most cases; but for short-lived networks with little communication, it is probably not. Even so, there is still room for optimizing the scheme presented here. For example, in our simulations, the search operation was implemented without any caching. Caching could be used to improve the efficiency of the search operation, thereby reducing the total number of messages exchanged.
In a hub network, the hubs are expected to be connected to each other to form a connected network. Our simulations show the hub algorithm presented here does not ensure the hub graph (the graph whose vertices are the hubs of the overlay network) remains unpartitioned. The hub network is normally used by the peers to search for resources in the overlay network, using the directory maintained for this hub network. In a partitioned hub network, the directory of each hub network partition is incomplete; searching within this incomplete directory yields unreliable results. The hub network may also be used to locate other peers in the overlay network. In a partitioned hub network, the peers connected to a hub in one partition will not be able to use the hub network to locate the peers connected to hubs in other partitions. However, they may use their connections to the ordinary peers to try to locate other peers in the overlay network.
The hub network is unpartitioned in only about 50% of the simulations, a percentage that may appear to be small. However, in all our simulations, the vast majority (more than 99%) of the hubs were members of the single largest graph component (portion of a graph after partitioning). In addition, the peer-hub degree is 1.8, which means many peers are connected to more than one hub. This reduces the adverse impact of the hub network partitioning as some of the peers affected by the partitioning are connected to more than one hub network partition. While there is room for improvement, partitioning affects only a small number of hubs in most cases.
The hub algorithm presented here is useful for adapting the network topology in a pure P2P network, as demonstrated by an adaptive network of hubs created within a pure P2P network using Schelling's model. In the future, we wish to dynamically determine the value of the satisfaction criterion that a peer can use, instead of using simulations. The hub algorithm must also be improved to prevent the partitioning of the network of hubs.
1. Adamic, L.A., Lukose, R.M., Puniyani, A.R., and Huberman, B.A. Search in power-law networks. Physical Review E 64, 4 (Oct. 2001).
2. Gamma, E., Helm, R., Johnson, R., and Vlissides, J. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, 1995.
3. Milojicic, D.S., Kalogeraki, V., Lukose, R. et al. Peer-to-peer computing. HP Labs Tech. Rep. HPL-2002-57R1, 2002.
4. Pemmaraju, S. and Skiena, S. Computational Discrete Mathematics: Combinatorics and Graph Theory with Mathematica. Cambridge University Press, 2003, 280282.
5. Schelling, T.C. Dynamic models of segregation. Journal of Mathematical Sociology 1, 2 (1971), 143186.
6. Schelling, T.C. Micromotives and Macrobehaviour. Norton, 1978.
7. Singh, A. and Haahr, M. Topology adaptation in P2P networks using Schelling's Model. In Proceedings of the Workshop on Games and Emergent Behaviors in Distributed Computing Environments (Birmingham, U.K., Sept. 2004).
8. Traversat, B., Arora, A., Abdelaziz, M. et al. Project JXTA 2.0 Super-Peer Virtual Network; www.jxta.org/project/www/docs/JXTA2.0protocols1.pdf.
This work was funded by the Irish Research Council for Science, Engineering and Technology (IRCSET) under the Basic Research Grants Scheme.
Figure 1. In a hub-based topology, a network of hubs is created within the P2P overlay network. In this topology, the ordinary peers are connected with each other and can be connected to more than one hub, unlike typical super-peer topologies.
Figure 2. Hub-hub, hub-peer, peer-peer, and peer-hub degree against the number of iterations performed by the simulator. The random network initially has 100 peers, and 5 nodes are added at every iteration.
Figure 3. Number of messages exchanged against the number of iterations performed by the simulator. The random network initially has 100 peers, and 5 nodes are added at every iteration.
Table 1. Some operations that can be executed by peers. The operations are used to describe the satisfaction criterion and the topology adaptation steps.
Table 2. Satisfaction criteria and topology adaptation steps that can be used to bring together peers with similar properties (such as bandwidth).
Table 3. Satisfaction criteria and topology adaptation steps that will be executed by the hubs and the ordinary peers to create a backbone network of hubs within the overlay network.
©2006 ACM 0001-0782/06/0300 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2006 ACM, Inc.
No entries found