Table formats
Over the course of the project 3 different major table designs were used. The first 2 were similar in that they both used the immediate 64 bits that were generated by the A5/1 engine as input for the next round. The third one first clocked 100 steps forward and then the following 64 bits were used. The first format was dp15k32, distinguished points of size 15 bits with 32 rounds. Later we found that the assumption of a single LAPDm frame of known plaintext was quite conservative, so we changed to dp15k8. Due to more keystream samples available, more lookups could be done and the tables needed to cover less of the whole keyspace. While keeping our goal of 2TB table size, we aimed at lower lookup times together with the obvious shorter table generation time. That goal was met with the reduction on the number of round functions.
Finally with the discovery that only 16% of the keyspace was even reachable in the real world after the 100 clockings that are part of A5/1 operation in GSM networks, we changed our format to dp12k8e100, 12 bits distinguished points, 8 rounds and 100 extra clockings. The keyspace covered with these new tables was 8 times smaller (as a result of going from dp15 to dp12), which is roughly in alignment with the fact that only 16% of the keyspace matters anyway. Together with the fact that each round takes 164/64 = 2.5 times more cycles to compute, generation time and lookup time dropped by a factor of 8/2.5==3.2 compared to the dp15e8 format.
Both older formats are suboptimal but if you can afford the extra flash memory modules, can be used on top of the 2TB of new tables being generated.
Lookup time
There are 2 different times: minimum time for a single lookup and average lookup time for an infinite stream of lookups. From each keystream sample, 8 lookup chains can be generated for each table, starting at one of the rounds. Each of those chains is then calculated till the end point, lookup is done to the table on the storage device and the start value from the disk is then used to compute a new chain up to the point where the lookup chain was started. The 2 parts of the lookup chain are as long as a chain stored on disk, and a bit more (about half a DP length). Assuming there are 40 tables and 400 keystream samples to lookup: if 40 * 8 * 400 = 128000 chains can be processed by a particular hardware backend in parallel, then the minimum lookup time can be attained. If the hardware is not wide enough, then some of the chains have to be computed after others have finished. If the hardware width is larger than 128000 then the average lookup time will be smaller than the minimum lookup time. With a single HD5870 GPU, 320 physical cores operate on 32bit wide slices, so the hardware width is 10240. Because each physical core can operate on 4 32bit slices in the same time as a single one, it is safe to say that the HD5870 has 1280 cores and can operate on 40960 chains at a time. To reach minimum lookup time, 3 HD5870 are required. But because the time needed to process 5120 threads is less than 4 times the time required to process 1280 threads, a single HD5870 would be faster than 3 * minimum time.
Rainbow table explained (dp15k32)
The table is a mixture of 2 concepts, namely distinguished points and rainbow tables.
It is a rainbow table with 32 colors/columns/round functions. Each of those colors is approx. 32000 A5/1 steps wide, that is to say the round function changes when a distinguished point of size 15 bit is reached. The chain also ends at a 15 bit wide distinguished point, or at the point where you would change to the 33rd round function.
So a chain has a length of 215 * 32 = 220 on average. Each table is expected to have 228.5 chains and we want to generate 28.5 tables for a grand total of 220+28.5+8.5 = 257.
As for the lookup complexity:
You need to generate (1 + 2 + 3 + ... + 32) * 215 A5/1 links on average for each of the 204 keystream samples. This is roughly 3.5 billion. With 162 million fast hardware it will take 21 seconds to do so. You also have to do 32 * 204 = 6528 disk accesses. One disk seek is 10ms, so all seeks complete after 65 seconds. take 3 disks or an SSD if you don't want the disk to be a bottleneck. This lookup has to be done on all of the 28.5 network nodes. If you wanted to do the lookup on a single computer with one graphics card you would need 21 * 28.5 seconds or roughly 2 hours. The single location storage would be about 2.5 TiB while the distributed node storage is about 6 GiB.
Chain merges (dp15k32)
Chain merges occur when there is a collision in the same rainbow table column. Here are the actual numbers from our test table:
| million chains | million merges | million unique chains | million unique chains table 2 | merges percentage | merges percentage relative to 400m chains table | merges percentage table 2 | relative to 400m chains table2 |
| 20 | 0.6 | 19.4 | 19.4 | 3 | 0.15 | 0.6 | 0.15 |
| 40 | 2.4 | 37.6 | 37.6 | 6 | 0.6 | 2.4 | 0.6 |
| 60 | 5.2 | 54.8 | TBD | 8.7 | 1.3 | TBD | TBD |
| 80 | 9 | 71 | TBD | 11.3 | 2.3 | TBD | TBD |
| 100 | 13.7 | 86.3 | 86.3 | 13.7 | 3.4 | 13.7 | 3.4 |
| 150 | 28.7 | 121.3 | TBD | 19.1 | 7.2 | TBD | TBD |
| 200 | 48.0 | 152.0 | 152.0 | 24 | 12 | 48.0 | 24 |
| 300 | 96.0 | 204.0 | TBD | 32 | 24 | TBD | TBD |
| 400 | 151.3 | 248.7 | TBD | 38.5 | 38.5 | TBD | TBD |
| 500 | TBD | TBD | 280.6 | TBD | TBD | 43.9 | - |
| 600 | TBD | TBD | 310 | TBD | TBD | 48.3 | - |
So what do these numbers tell us? They say our naive calculations above are wrong. Our tables do not cover 248.5 of the keyspace. So what can you do about it? You can throw away the merges and calculate new chains till you have 228.5 chains without merges. Since you can expect to have to throw away 50% of your chains, you need to calculate 150%. This number is not correct since you also get merges in the 50% you are calculating again. The other possibility is to keep the merges. Since they merge at half the chain length on average, you still have to compute extra chains, but not as many. And you are storing degenerate chains on disk which take up more diskspace per fraction of keyspace covered.
We all love diagrams don't we, this one shows the compression used. Green fields are constant and not stored. The index field is not stored also, but you have to create an index. The advantage of an index is that it speeds up the lookup process and reduces the amount of storage needed. If you cut off N bits of the end value, then you have to store 2N file offsets into your table in the index. A sane size of the index would balance between the size of the index and the size of the table (minimize size_table + size_index) and would have to fit into RAM completely. If you have enough RAM and a large index (hundreds of megabytes), then after finding the table offset in the index, you can read a couple of sectors (all table entries between to index offsets) with one hard disk seek and search through those values.
And this one shows the overall table structure
Attachments
-
compression.png
(28.1 KB) - added by sascha
13 months ago.
-
table.png
(32.4 KB) - added by sascha
13 months ago.


