Saturday, March 27, 2021

How Random

A lagniappe of working remotely the past year has been a daily opportunity to ponder randomness for just a moment as I enter the transitory passcode generated on my phone from an authentication app in order to complete a VPN connection to my network, a required measure of security implemented by my workplace.  Over the course of the past few years of using the app, my intuition was that close to every 6-digit passcode generated by it included at least 1 repeated digit.  Daily use only intensified my suspicion.  It seemed that every time I thought to consider the number being generated for me, at least one pair, and frequently 2 were involved.  Something about this didn't sit right with me.  Were randomly generated 6 digit numbers almost guaranteed to repeat a digit?  Or was this app simply not generating random numbers?   

This wasn't just an idle concern.  Years ago, when I first developed some programming skills that were shallow in comparison to others but deep enough to make me dangerous, I spent some of my spare time trying to conceive of ways to devise a random number generator, usually in the service of primitive game design.  I discovered that writing an algorithm to try to produce actual randomness was a great deal more complex than it seemed at first blush to my fertile imagination.  Most self-contained algorithms that I tried lacked something in the production of verifiably random-like sequences of numbers.  I had the most success incorporating some unplanned element of the run-time environment into my calculation, e.g., the evenness or oddness of the thousandth of a second that the program was invoked or of the ascii value of the third letter of the verbal representation of the product of both the tenths and hundredths of a second of the timestamp.

Fast forward through years of American decline to the point of our present situation of isolation.  Facing the mystery each weekday morning, I daily started to daydream about ways to explore the question of how random my authentication app's random number generator was.  My first thought was that the question was something I could tackle through logic alone, perhaps with the help of Excel formulas.  Of course, the purpose of the app being to connect me to work, it never failed that my ambition to solve the problem would be supplanted by my desire to remain gainfully employed, by doing actual work instead. Consequently, my barely started efforts would sit in cobwebs for months before the cycle would repeat, the struggle between ambition and reality playing itself out each time with the same predictable results. 

Today for some reason, with the end of the month approaching and a whole weekend lying before me, it occurred to me that my ability to reason could use some help.  A lightbulb went off and I realized that all I needed to do was to count every possible 6 digit number between 000000 and 999999 that contained at least one pair and then compare it to the incidence of pairs in the passcodes proffered by the authentication app.  Having limited time, I needed a shortcut with the counting, so I dusted off some Python knowledge and built a program to assist with the task. 

Since it had been a while since I'd used Python, to warm myself up I wrote a function to pad an integer n with up to 5 leading 0s, (e.g., render 1 as '000001', 223 as '000223' and so forth) put the digits into a list and then sort the list numerically*:

def AnalNum(n):
a = f"{n:06}"
l = [int(x) for x in str(a)]
l.sort()
return l

The sorting turned out not to be necessary but served mostly as acknowledgement of my realization that using this brute force method and thinking of each number as one of a million possible "hands" that could be "dealt" by the authenticator, I could abandon the complexities of factoring in the location in the string of each member of the pair that played havoc with my ADD when I was attempting to use logic alone.  

My next challenge was to count the instances of pairs, and while I was at it of triples, quadruples, quintuples and sextuples in a given 6-digit number n:

def AnalList(n):
l=AnalNum(n)
Ze,Si,Pr,Tr,Qr,Qt,Sx=0,0,0,0,0,0,0
item=0
while item<10:
Ct=l.count(item)
if Ct==1:
Si+=1
elif Ct==2:
Pr+=1
elif Ct==3:
Tr+=1
elif Ct==4:
Qr+=1
elif Ct==5:
Qt+=1
elif Ct==6:
Sx+=1
else:
Ze+=1
item+=1
L=[Si,Pr,Tr,Qr,Qt,Sx]
return L

Lastly, I needed a function to sequentially feed each digit from 000000 to 999999 into the above and then analyze the output as a means of categorizing each by the instance of tuples (pairs, triples, quadruples, etc.) in the string of digits. Writing the program I realized there were a finite number of exclusive possibilities for a 6-digit number.  To wit, it could have:

  • One and only one pair: e.g., 113456
  • One pair and a triple: 121216
  • One pair and a quadruple:121211
  • Two pairs: 121256
  • Three pairs: 121233
  • One and only one triple: 123226
  • Two triples:  121212
  • One quadruple: 123222
  • One quintuple: 122222
  • One sextuple: 222222
  • Six unique digits: 123456


In light of this, I realized I could get all the data I needed for my probability analysis with the population of 11 counts:

def AnalIsis(N):
OnePair, OnePairOneTriple, OnePairOneQuad, TwoPair, ThreePair, OneTriple, TwoTriple, OneQuadruple, OneQuintuple, OneSextuple, SixUnique = 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
q=0
while q<=N and q<1000000:
L=AnalList(q)
if L[0]==6:
SixUnique+=1
elif L[1]==3:
ThreePair+=1
elif L[1]==2:
TwoPair+=1
elif L[1]==1 and L[2]==1:
OnePairOneTriple+=1
elif L[1]==1 and L[3]==1:
OnePairOneQuad+=1
elif L[1]==1:
OnePair+=1
elif L[2]==2:
TwoTriple+=1
elif L[2]==1:
OneTriple+=1
elif L[3]==1:
OneQuadruple+=1
elif L[4]==1:
OneQuintuple+=1
elif L[5]==1:
OneSextuple+=1
q+=1
X=[SixUnique, OnePair, OnePairOneTriple, OnePairOneQuad, TwoPair, ThreePair, OneTriple, TwoTriple, OneQuadruple, OneQuintuple, OneSextuple]
return X

Miraculously, once my syntax errors and indents were worked out, I only needed to run the third function once with a parameter of 999999 to get the numbers I was looking for within seconds.  I compiled the results into a chart which gave me the probabilities for each category.  I now needed some data from the authentication app.  I prompted it 100 times to generate a sample of 100 of its randomly generated numbers† and plotted the results into my graph to compare with the actual probabilities (click to enlarge):

Counts of instances of specific repetitions of digits in 6-digit numbers between 000000 and 999999 compared with those of 100 passcodes generated by a VPN authentication app

Unsurprisingly none of the more exotic possibilities turned up in my sample.  But the actual probabilities to be found among the million 6-digit possibilities were in fact amply demonstrated to strongly favor at least one pair or other tuple among the digits; in fact, it could be expected to occur 85% of the time.   In my test, due to a particularly high number of cases with no repeated digits, it had occurred in only 76% of the numbers generated-- much less than my prediction of closer to 99%-- but otherwise across the board, my sample of 100 hewed remarkably closely to the demonstrated probabilities, an indication to my mind that the randomizer used by the app is pretty good, and proving once again that our intuition and aesthetic notions about what is random are not always correct (at least mine aren't).  

This exercise has also demonstrated to my mind a strange paradox.  Sure, for 100 attempts, the results reasonably agreed with the distribution of tuples across the million 6-digit numbers between 000000 and 999999.  You could imagine however a program designed to dole out each of the members of  the 11 categories in frequencies that clung deceptively closely to the probabilities in the above chart and that never repeated a number once in a million generations. What you would wind up with would be a guaranteed sequence of 1 million 6-digit numbers that fell perfectly into the probability percentages determined by brute force analysis of each in turn.  Curiously, after 1 million iterations assuming the application kept track of the precise sequence of its previous million, the sequence would repeat, and would do so indefinitely by virtue of the algorithm.  In short, it would be the opposite of random! My exercise with a sample of only 100 numbers would be unable to detect this!  In fact, it could be the case that my authentication app is simply repeating a carefully constructed sequence of 1 million 6 digit numbers that only appear to be random, but to an obsessive compulsive too perfect degree.  The only way to prove whether this is the case would be to keep track of my next million passcodes.

I think I can live with the mystery.

~~~~~~~~~~

* Note: It will be best to think of the python contained in this post as not so much exemplary of programming virtue as confessional of programming sin.

† Do not try at home.  As I discovered Monday morning, the consequence of treating an authentication app as a random number generator for a sample of significant size is being locked out of the network for too many attempts.


No comments:

Post a Comment