Consider a stream of events received during a limited period of time, such as applications on the Internet. One event needs to be randomly selected as a "winner." Due to physical limitations, the list of all events cannot be stored. Thus the winner cannot be selected at the end of the period. It is not known how many events will materialize. Each event should be selected with the same probability.
In many heuristic optimization algorithms, such as ascent or tabu search,1 each iteration the best "move" to be employed in the next iteration is selected. For example, the objective is to select the best group of p items out of n candidates maximizing an objective (also called "aspiration"). The search examines all p(n-p) moves: removing an item from the selected group and adding a non-selected item to the group. In the ascent algorithm the best move is selected. In tabu search the best move among moves not in the tabu list is selected.
Tying moves are events in a stream. Whenever a better move is encountered, it is the first event. Every subsequent tying move (when no better move is encountered) is another event.
It is common that the same best aspiration is attained by several moves. Which of the tying moves should be selected? If we select a move if it is better than the best move found so far, the first encountered move will always be selected. If we select a move as long as it is not worse than the best move found so far, the last one will be selected. This may bias the search giving a preference to either early encountered moves or late ones. One option is to evaluate the possible moves in random order. Another way is to save the tying moves and once the process is completed (when we know how many tying moves exist), select one of the tying moves at random. Both of these approaches are cumbersome and require extra code.
We propose a simple approach which is very easy to implement and works extremely well.
When the events are evaluated et seriatim, select event number K with a probability of 1/K. The first event is always selected. The second event replaces the selected event with probability of 1/2, the third replaces the previously selected event with probability of 1/3, and so on. This rule is obvious for one or two events (if there are two events, each is selected with a probability of 50%). However, if eventually there are K events, each of them is selected with a probability of 1/K. This is shown in the following Theorem.
Theorem: When event K is selected with probability 1/K, every event is selected with the same probability.
Proof: We prove the theorem by mathematical induction. The theorem is obviously true for either one or two events. Assume that it is true for K events and we prove it for K+1 events. The last event is selected with probability of 1/(K+1). By the induction assumption the first K events were selected with a probability of 1/K each. Following the addition of the (K+1)th event, there is a probability of 11/(K+1) that a previously selected event is no longer selected, yielding a probability of 1/K (11/(K+1))=1/(K+1) for each event to be selected.
This strategy can be extended to selecting a group of s events out of a stream of events with unknown cardinality. The first s events are selected. Event K (K>s) replaces a randomly selected group member with probability of s/K.
©2010 ACM 0001-0782/10/0100 $10.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.
No entries found