acm-header
Sign In

Communications of the ACM

Last Byte

Sampling the Goods


man stands before three blocks of light, illustration

Credit: Getty Images

The basic principle of sampling is to gain as much information as possible from each sample. If you know what you will find in a sample, then there is no point in sampling it. The more uncertainty, the more you learn—the basic principle of information theory.

Warm-up 1: Consider that you are faced with three containers, each containing 1,000 packages as in the figure. One container holds packages that are all damaged. One container holds packages that are all good. One container contains half good and half bad. If you want 1,000 good packages, describe an inspection strategy that guarantees you can identify 1,000 good packages and determine which container contains all good, which contains all damaged, and which is mixed. Note that inspection does not destroy a good package.

uf1.jpg
Figure. There are three cargo containers. Each container contains 1,000 packages. One contains packages that are all damaged. One contains packages that are all good. One contains 500 of each. You want 1,000 good packages. Inspecting a single package takes an hour, so you would like to inspect as few as possible. What strategy should you use and how long will you expect to inspect?

Solution to Warm-up 1: Call the containers A, B, and C. Take packages in turn from each container. If any package from a container is damaged, then ignore that container in the sequel. If you get to the point there is only one container left, that is the container you want and no more inspections are needed. This works because any container that has any damaged package is either mixed or fully damaged. The container without any damaged package must therefore be the good one. You will find the container all of whose packages are damaged in the first round robin for sure. If you never find the mixed one after choosing 500 from each of two containers, then one more examination will determine which container has all good packages and which container is mixed.

Warm-up 2: If you use the Warm-up 1 strategy, what is the smallest number of packages that need to be inspected for you to be sure to know which container contains only good packages?

Solution to Warm-up 2: Two packages is the minimum to inspect. If the packages from containers A and B are both bad, then surely C has only good packages.

Question 1: What is the maximum number of packages you might need to inspect using the Warm-up 1 strategy?


Inspection does not destroy a good package.


Solution to Question 1: The worst case for that strategy is the first round finds two good (undamaged) packages and one damaged one. Subsequent rounds on the remaining two containers keep finding good packages until the 500th package of the mixed container is selected. At that point, you have 1,000 good packages. So this requires 1,001 inspections.

Question 2: The Warm-Up 1 strategy requires many inspections in the worst case. Is there a strategy that guarantees you will not need more than 502 inspections to identify 1,000 good packages?

Solution to Question 2: Take one from each of two containers, say A and B. If you get two damaged packages you are done: container C has only good packages. But in the worst case, you will get one damaged package and one good one. Keep looking in the container, say B, having the good package. If you ever find a bad package, then, container C has only good packages. Otherwise, after examining the 501st package from B, you will know B has only good packages.

The trouble with the Question 2 strategy is approximately half the time, you will inspect at least 501 packages. It is very unlikely the Warm-Up strategy will require the inspection of even a fraction of that number.

Question 3: What is the likelihood the Warm-Up 1 strategy will require the inspection of more than 21 packages? An answer within a factor of two is fine.

Solution to Question 3: Appromimately 1 in 1,000. In the first round, we get one bad and two good packages. We discard the container having the bad one and then have just containers B and C. Now, let's focus on the mixed container, say B. For B to survive round k, its package has to be good. The probability that this is so is 1/2 (slightly less because there have been only good packages prior to the kth round). Thus, for B to survive until round 10 (corresponding to the first round where three packages are inspected and 9 rounds when two packages are inspected for a total of 21 packages), B must survive 10 tests with a 1/2 chance of getting caught, which is 1/1,024. So, the Warm-Up 1 strategy requires fewer inspections than the Question 2 strategy most of the time.


What is a strategy to get 1,000 good packages while minimizing the average case number of inspections?


Upstart 1: Suppose the mixed container has a fraction f of good packages and 1-f of bad and f > 1-f. For some value of f, could the Warm-Up 1 strategy require fewer inspections than the Question 2 strategy in the worst case? In the average case?

Upstart 2: What if all containers have different fractions of good packages. We know the fractions, but we do not know which container has which fraction. Also, there are at least 1,000 good packages among all containers. What is a strategy to get 1,000 good packages while minimizing the average case number of inspections?

Upstart 3: Same as Upstart 2 but we want 1,000 packages of which at most a fraction b could be bad.

Back to Top

Author

Dennis Shasha ([email protected]) is a professor of computer science in the Computer Science Department of the Courant Institute at New York University, New York, NY, USA, as well as the chronicler of his good friend the omniheurist Dr. Ecco.

Back to Top

Footnotes

All are invited to submit their solutions to [email protected]; solutions to upstarts and discussion will be posted at http://cs.nyu.edu/cs/faculty/shasha/papers/cacmpuzzles.html


Copyright held by author.
Request permission to (re)publish from the owner/author

The Digital Library is published by the Association for Computing Machinery. Copyright © 2023 ACM, Inc.


Comments


Derek Dreyer

The solution to Question 1 seems to be incorrect. After inspecting the two potentially good containers 500 times each, you would in the worst case have indeed amassed 1000 good packages -- but the goal of the problem was not just to find 1000 good packages but also to figure out which container was which (see the Warmup 1 problem statement). To do that, you would need one more test in the worst case, making the answer 1002, not 1001.

That said, there is a slight variation of the Warmup 1 strategy which does have a worst-case bound of 1001 inspections instead of 1002. Say A and B are the two potentially good containers, and you inspect A first before B each time. In the worst case, say you have gotten to the point of inspecting A 500 times, B 499 times, and C once (at the beginning). Then instead of inspecting B for the 500th time at this point, you should inspect A for a 501st time. If A is the good container, it will be good. If not, it will be bad. That saves you one inspection, for a total of 1001 inspections in the worst case.


Displaying 1 comment