Sunday, March 19, 2006

Give Me a String at Random from THIS File

Constraint:
1) You can make only one sequential pass through the file.
2) You may not store any additional information like a table of offsets.

I would like to receive the answers from you all. After receiving answers from you i will post the answer.

PS: Seems this question also asked in Microsoft Inverview.

--------------------------------------------------------------------

Solution: ( Taken from a book, if anyone know best solution, please post comment )

The basic technique is to pick survivors, and recalculate as you go along. This is so com-putationally inefficient that it's easy to overlook. You open the file and save the first string. At this point you have one candidate string with a 100% probability of picking it. While remembering the first string, you read the next string. You now have two candidates, each with a 50% probability. Pick one of them,save it, and discard the other. Read the next string, and choose between that and the saved string,weighting your choice 33% toward the new string and 67% toward the survivor (which represents the winner of the previous two-way selection). Save the new survivor.

Keep on doing this throughout the entire file, at each step reading string N, and choosing between that(with probability 1/N) and the previous survivor, with probability (N - 1)/N. When you reach the endof the file, the current survivor is the string picked at random!

No comments: