The table is a mixture of 2 concepts, namely distinguished points and rainbow tables.

It is a rainbow table with 32 colors/columns/round functions. Each of those colors is approx. 32000 A5/1 steps wide, that is to say the round function changes when a distinguished point of size 15 bit is reached. The chain also ends at a 15 bit wide distinguished point, or at the point where you would change to the 33rd round function.

So a chain has a length of 215 * 32 = 220 on average. Each table is expected to have 228.5 chains and we want to generate 28.5 tables for a grand total of 220+28.5+8.5 = 257.

As for the lookup complexity:

You need to generate (1 + 2 + 3 + ... + 32) * 215 A5/1 links on average for each of the 204 keystream samples. This is roughly 3.5 billion. With 162 million fast hardware it will take 21 seconds to do so. You also have to do 32 * 204 = 6528 disk accesses. One disk seek is 10ms, so all seeks complete after 65 seconds. take 3 disks or an SSD if you don't want the disk to be a bottleneck. This lookup has to be done on all of the 28.5 network nodes. If you wanted to do the lookup on a single computer with one graphics card you would need 21 * 28.5 seconds or roughly 2 hours. The single location storage would be about 2.5 TiB while the distributed node storage is about 6 GiB.

Chain merges

Chain merges occur when there is a collision in the same rainbow table column. Here are the actual numbers from our test table:

million chainsmillion mergesmillion unique chainsmillion unique chains table 2merges percentagemerges percentage relative to 400m chains tablemerges percentage table 2relative to 400m chains table2
200.619.419.430.150.60.15
402.437.637.660.62.40.6
605.254.8TBD8.71.3TBDTBD
80971TBD11.32.3TBDTBD
10013.786.386.313.73.413.73.4
15028.7121.3TBD19.17.2TBDTBD
20048.0152.0152.0241248.024
30096.0204.0TBD3224TBDTBD
400151.3248.7TBD38.538.5TBDTBD
500TBDTBD280.6TBDTBD43.9-
600TBDTBD310TBDTBD48.3-

So what do these numbers tell us? They say our naive calculations above are wrong. Our tables do not cover 248.5 of the keyspace. So what can you do about it? You can throw away the merges and calculate new chains till you have 228.5 chains without merges. Since you can expect to have to throw away 50% of your chains, you need to calculate 150%. This number is not correct since you also get merges in the 50% you are calculating again. The other possibility is to keep the merges. Since they merge at half the chain length on average, you still have to compute extra chains, but not as many. And you are storing degenerate chains on disk which take up more diskspace per fraction of keyspace covered.

We all love diagrams don't we, this one shows the compression used. Green fields are constant and not stored. The index field is not stored also, but you have to create an index. The advantage of an index is that it speeds up the lookup process and reduces the amount of storage needed. If you cut off N bits of the end value, then you have to store 2N file offsets into your table in the index. A sane size of the index would balance between the size of the index and the size of the table (minimize size_table + size_index) and would have to fit into RAM completely. If you have enough RAM and a large index (hundreds of megabytes), then after finding the table offset in the index, you can read a couple of sectors (all table entries between to index offsets) with one hard disk seek and search through those values.

And this one shows the overall table structure

Attachments