In the USA, the social security number (SSN) is often used to authenticate people over the phone. Let’s leave the general badness of this idea out of the current discussion and focus on the particularly bad idea I heard about recently.
In order to protect the SSN, many companies keep only the last four digits of the SSN on file and ask the caller to give those four digits. If they match, well, that authenticates the caller (you bet). This particular corporation, however, wanted to go further and only stored the sum of the last four digits. Callers where then asked to calculate the sum of their last four digits and only report that sum over the phone. The company in question apparently felt this improved the security of the SSN.
Now, think about this for a moment… ok, finished?
Let’s look at how this works:
Obviously, a number of 4 digit combinations map to the same sum. That could not have escaped the inventor of the scheme. Maybe he thought that guessing the sum would still be too hard, since there is just one chance in 37 of getting it right (0 being the minimum and 36 being the maximum).
But this isn’t true, either. Think about this: there is only one single 4-digit combination that sums to 0, namely “0000”. Similarly, there is only one single combination (“9999”) that sums to 36. There are just a few combinations that sum to 1 or to 35 (four each), and so on. But we have 10,000 combinations that map into 37 bins, so there must be a lot of combinations that map into the numbers in the middle, like 18.
Actually, the problem is well described by the “Central Limit Theorem”, which tells us that the distribution is very close to a Gaussian distribution:
Calculating this shows us that 670 of the 10,000 combinations sum to 18, which means that just guessing “18” gives you a 1/15 chance of being right. Actually, guessing any number between 14 and 22 gives you at least a 1/20 chance of being right. Remember that a 1/20 chance of being right means that you, on average, will have to try 10 times before getting it right. That is not much worse than asking the caller to give only one single digit from his SSN instead of four, which you could guess in an average of five tries.
This is the way I would expect the system to be exploited:
Caller: Hi, I’m Jack CeoMan, could you give me my password, please?
Support: Sure, give me the sum of the last four digits in your SSN, and I’ll do it.
Caller: Ummm… don’t have it with me, but I think, let’s see,… 18.
Support: No, that isn’t right.
Caller: How can adding be so difficult… damn it’s 20, unless I’m thinking of my wife’s SSN.
Support: Can’t be yours.
Caller: Oh, damn, wait, could this be it: 16?
Support: nope, you’d better go check your number and call back, sir.
Caller: ok, will do.
So, you can easily guess three times without raising suspicion. Since the organization is large, has a number of support staff and nobody knows anyone else by voice (else they would never have instituted the system, would they?), you can easily call back a couple of hours later and try again with someone else. Do this three times and you’ve got yourself a password.
Too easy, by far.
Moral of the story: don’t invent your own security protocols. You’re bound to make mistakes.
Note: I didn’t calculate the number using the central limit theorem. I took the easy way out and wrote a loop in Delphi. Yes, I’m ashamed of it, but it worked. A very elegant solution was proposed by Earl Fife:
Here is a more mathematical solution relying on C(n,k), the number of
ways of selecting k object from among n distinct objects. Note:
d_1 + d_2 + d_3 + d_4 is the sum of the 4 numbers, hence we are
interested in d_1+…+d_4=18.
The value of each d can be represented by that many 1s, so d=4 can be
veiwed as d=1,1,1,1, or more succinctly d=1111. And a 4 digit number
can be viewed as strings of 1s separated by +s, e.g.
2781=11+1111111+11111111+1. Since we are interested in 4-digit numbers
whose digital sum is 18, we have a total of 18 1s and 3 +s. Each
arrangement of them constitutes a unique number.
To select an arrangement of 18 1s and 3 +s, all we need to do is
designate which of the 18+3 positions contain the +s. The rest all get
1s, so there are C(18+3,3) = 1330 of them. This corresponds to the
number of integer solutions to d_1 + … + d_4 = 18, d_i >=0.
I.e., the computation would allow for there to be values of d exceeding
9, so we need to subtract situations in which one of the digits is 10,
or 11, or .., 18. In the case 10: solve 10 + d_2 + d_3 +d_4 = 18, i.e.
d_2 + d_3 + d_4 = 8 (there are C(8+2,2) of solutions, and since 10 could
have been in any of the d_i’s there are 4C(8+2,2) of them. Repeat for
the case d_i=11, etc.
C(18+3,3) – 4(Sum[C(18-i+2,2), 10<=i<=18]) = 1330 - 4(45 + 36 + 28 + 21 + 15 + 10 + 6 + 3 + 1) = 1330 - 4(165) = 670. Well, maybe it is not compact enough to be elegant, but it is more interesting than a loop.