Reusing Chain Merges 1

As the hardware is inherently parallel, when regenerating chains in the lookup process there may be spare slots available anyway so that one can look through chain merges also.

Reusing Chain Merges 2

When generating a table vertically that is all chains complete columns 0 - 1000 then all chains complete 1000 - 2000 and so on, one can detect chain merges early. When a chain merge is detected, one of the 2 merging chain get stored in a seperate table and is not further processed. What we got then is a second table with shorter chains. These can still be used in the lookup process. During the lookup we generate some chains speculatively till the end condition to look them up in the database. When we also look up speculativ chains that are not yet complete in our short chain database, we reuse both the merges in the generation process instead of throwing them away and we use the incomplete lookup chains instead of only using those when they are complete. There is no extra computation required, only more disk lookups. And those short chains need more storage space per fraction of keyspace they cover.

Reusing speculative lookup chains

Chains that get generated in the lookup process can be reused by storing them in the database. One can either only store long chains that the lookup process start in the first colour of the rainbow table or store all chains. Short chains waste more storage space, etc., you know the drill. The problem is that when start values are compressed because they contain a constant number of bits (the table id), then those chains used during the lookup that do not have the same constant part cannot be stored. So one would have to store those extra chains in a seperate uncompressed table. But then if every node in the network stores the same chains, this introduces too much redundancy, so it may be ok, if only those chains matching the table id get stored. this way all chains would be saved on some node.

Verifying remote tables

The slave nodes claim to have a table stored on their local disk. To verify that claim, the server can ask the node to give him some of his chains and calculate them again. To prevent nodes from calculating chains on the fly, there is a timeout value that is too small to alow that and still is large enough to allow the transmission of the data. To prevent the client from precalculating only enough chains to pass the verification, the server sends some end values with the verify request and asks the node to send those chains that are closest to these end values (send the one chain with minimal distance per end value given by the server). The difference in the chains returned from the node will give an indication on the number of chains in the node's table, since a bigger table has a larger density of end values.

Verifying generation status

Verification of the generation status is both less needed and more demanding on the nodes resources. Although it is less needed, it is still nice to have an overview of the current status of the nodes and thus an overview of the project progress in the generation phase. It is more demanding because the chain that is being generated is not sorted. So the node has to scan the tables linearly. The time required to do this takes about a minute for a typical A5/1 table and current hardware. So the server has to ask for more chains then can be computed on the expected hardware of the node. A typical video card (gtx260) computes ~ 150 chains per second (9000 per minute), so the server should ask for maybe 90000 chains. The server has to send 720kb and the client 1440kb. Tests have to be done on this matter, but the idea should be clear.

Update: The server only needs to send a seed value to some well known PRNG and ask the node to generate the values from it. And it follows that the node can only be asked to search a fraction of the table, but then would be given only a fraction of the time.

Verifying generation status 2

The above method has a major drawback: the client only needs to compute a fraction of chains and can claim at each status update that this old chunk of chains is the one he just spent hours computing. He just has to make sure that he does not repeat values sent to the server. A defense against such kind of attack would be to ask the client to always send chains, whose endpoints are the largest in the set of chains that are reported in the update. So the client keeps the chains since the last update sorted in ram (requires only a few megabyte) and then sends the 100 largest chains with the update. since one cannot choose a start value to have a specific end value, the client must generate the number of chains claimed in the status report for it to be able to provide the 100 largest verification chains of a proper density. And since the verification chains are always the largest ones, he cannot use the same over and over again as the server would easily recognize that. Statistically the largest chains are always in the same range. The server has to store all updates of the client to check future updates against duplicate chains. Then the server has to recompute those 100 chains to verify them and can thus verify that the client has computed a particular amount of chains deduced from the density of the sent end values.