The GitHub repo for this project can be found at https://github.com/zaljubouri/twitch-rand.
In part one, we discussed what goes into making a real random number generator—we chose our entropy source, determined it to be poor, and decided to use it anyway. In this article, we'll dig into the details of actually creating the generator. I've mentioned the National Institute of Standards and Technology (NIST) a couple of times already and I'm going to do so again, because they also provide documentation on creating a deterministic random bit generator (DRBG) in their prequel to SP 800-90B: the Recommendation for Random Number Generation Using Deterministic Random Bit Generators SP 800-90A, revision one. A real page turner, I delved deep and discovered a gold mine—the information here forms the entire basis of my DRBG. It details the concepts and requirements that power a DRBG, and provides three algorithms based on hash functions and block ciphers that would help me fulfil my dreams of generating random numbers.
A quick-and-dirty explanation on the make-up of a DRGB: entropy goes in, an internal state is instantiated, and numbers can then be generated using said internal state. The state goes through reseeding based on certain conditions, pulling entropy from a pool to "refresh" it, preventing it from becoming stale and predictable. In my case, the entropy is the time between Twitch chat messages, and I followed NIST's Hash_DRBG
mechanism to take care of the internal state and eventually generate random numbers; SHA-256 was my one-way hash function of choice (CryptoJS providing a JavaScript implementation). I didn't worry about utilising nonces, personalisation strings, or additional input as they are all optional and, for an MVP, unnecessary.
Hash_DRBG
indicates how the internal state is managed.The pseudocode provided for internal state initialisation, seeding, and reseeding was relatively straightforward to implement. I tested as I was going along, but I was using "small" numbers—only around thirty-two bits in length. When I started trying the actual number generation—utilising 256-bit long hashes—I noticed I was losing a lot of accuracy. I trusted too deeply in JavaScript's number
type, not really thinking about its maximum integer. It turns out to be 253 - 1, which was enough for my initial testing, but nowhere near able to handle the operations my generation function performs. Thankfully, bigint
exists—which in V8
's engine has a maximum length of a whopping billion bits—and resolved all of my precision woes.
The next consideration was how to actually provide the entropy to the internal state. I needed at least 256 bits of entropy before the mechanism would instantiate, and also for every subsequent reseed. Since I fudged the number of entropic bits I was receiving, and Twitch chat fired messages off at a rapid pace, this quota was quickly met. Despite this, I felt—on the off chance all the chatters decide to observe a minute's silence—I should try to use pools to store excess entropic data, so if there is a situation where a significant amount of random numbers are requested coinciding with an entropic drought, the DRBG would still be covered for reseeding and remain unpredictable. There are two popular algorithms in pooling: the Yarrow algorithm and the Fortuna method of accumulation.
Yarrow utilises what its authors call a slow pool and a fast pool. When data is received from the noise source, it is added to a pool in an alternate fashion, i.e. the first input goes to the fast pool, second to the slow, third to the fast, and so on. The fast pool is used for frequent reseeding, whereas the slow pool is used more sparingly. A reseed occurs when the fast pool in total contains entropy above a threshold limit (e.g. 100 bits), and the slow pool is used to reseed when at least two entropy emissions are above a certain threshold.
Fortuna uses thirty-two pools; entropy is added to each cyclically for an even distribution. Pool zero (P0) takes the lead in determining when a reseed should occur, initiating one whenever it reaches the entropy size threshold. Its contents are used alone to reseed in the first instance, but each subsequent reseed starts pulling in the contents of other pools if 2i (i being the index of the pool) is a divisor of the number of reseeds, so P0 is used every reseed, P1 every other reseed, etc.
Fortuna has become very popular, being utilised by FreeBSD's implementation of /dev/random
, as well as by Apple in their OSes. This is likely due to its immense security against enumeration attacks; even if you had to reseed every hundred milliseconds, it would take nearly seven years before the thirty-second entropy pool is used. Based on this, I decided to go with it to provide the entropy to my RNG.
With the pool and internal state functionality in place, the final piece was linking them to the collection of entropy, and to provide a mechanism by which I could start generating random numbers. Using InversifyJS
, I can create an RNG object with the pool and internal state initialised, then find the most active current broadcasts with Twitch's API and join their chat channels using tmi.js
(the same way I did in part one). As the bot receives message events, it adds the time deltas to the pools, and the first seeding of the RNG would occur after pool zero reaches 256 bits of entropy. From there on out, the internal state would be seeded and random numbers are finally able to be requested. As data continues to be collected, reseeding occurs using different pools ensuring the numbers are always fresh.
I created a visual representation of the DRBG which can be viewed at twitch-rand
, showing the pools being filled and emptied, and how the internal state is manipulated.
After all of this work, it would be good to check the returned numbers for their randomness; this is a bittersweet proposition, as it may show the futility of this whole exercise. However, I feel like I have learned enough during this project that I'll be satisified no matter the results—eventually. As an initial test, I generated a million random numbers from twitch-rand
and ran it through a set of IID (independent and identically distributed) tests. The results in Figure 4 show that it (thankfully) passed, and the numbers coming out of twitch-rand
are at least IID. Out of the million numbers generated, only 258 were duplicates, and none duplicated more than once. I was initially surprised there were so many, as the RNG was set to generate numbers up to 232, but according to the birthday problem there is a very high chance of duplicates occurring, even across large numbers.
┌─────────┬─────────────────────────┬──────┬─────────────┬────────────┐│ (index) │ testName │ pass │ counterZero │ counterOne │├─────────┼─────────────────────────┼──────┼─────────────┼────────────┤│ 0 │ 'EXCURSION' │ true │ 460 │ 0 ││ 1 │ 'NUM_DIRECTIONAL_RUNS' │ true │ 1401 │ 5 ││ 2 │ 'LEN_DIRECTIONAL_RUNS' │ true │ 3981 │ 5950 ││ 3 │ 'NUM_INCREASE_DECREASE' │ true │ 7580 │ 20 ││ 4 │ 'LEN_RUNS_MEDIAN' │ true │ 287 │ 303 ││ 5 │ 'NUM_RUNS_MEDIAN' │ true │ 789 │ 3 ││ 6 │ 'AVG_COLLISION' │ true │ 6708 │ 0 ││ 7 │ 'MAX_COLLISION' │ true │ 7205 │ 0 ││ 8 │ 'PERIODICITY' │ true │ 4 │ 9996 ││ 9 │ 'COVARIANCE' │ true │ 4661 │ 0 │└─────────┴─────────────────────────┴──────┴─────────────┴────────────┘The data can be assumed to be independent and identically distributed.
iid-tests
.Other popular randomness tests include dieharder
and rng-tools
. These are extremely tough suites that stress an RNG to its limit, especially dieharder
which has nearly twenty different tests demanding an extreme amount of randomly generated numbers, and a lot of high quality entropy to ensure they're all sufficiently different—twitch-rand
categorically and unequivocally failed all of these tests. It's regrettable but expected, as towards the end of part one we recognised the weakness of our entropy source.
Throughout this project, I gained a lot of insight into how the random number generation we take for granted works. I hope that this glimpse was interesting and accessible—I realise that I have barely scratched the surface of the subject matter. A large amount of research and work is done in this area by incredibly smart people as part of both academic and private groups. Their findings and creations are responsible for ensuring the encryption systems we rely on are suitably difficult for bad actors to penetrate, keeping our digital secrets safe. Unfortunately, twitch-rand
is unlikely to be powering the latest and greatest in cryptographic algorithms, but that's something I've come to terms with.