Complexity vs simplicity in software

We have a lot of vulnerabilities in software, and it doesn’t seem to diminish.

One of the major reasons we have all these vulnerabilities is that every software developer (or organization) needs to develop every darn litte thing itself. IOW, the networking code, the user interaction, the database handling, etc, just to be able to sell that one good idea or process that is their own. A very small part of every product is what differentiates it from the others, while the rest of the product is a rehash of what everyone does and has to do. Nobody likes to do all those parts, but we have to. And that’s where the vulnerabilities appear, in general.

Even though I don’t like the car analogy, I’m going to use it: it’s as if every car manufacturer had to create every high level system itself, like the braking system, electrical system, interior, etc. They don’t, they buy those from a few suppliers that know how to do that stuff right and cheap. The equivalence of APIs and operating systems would be components like wipers, bolts, nut, etc. The equivalence of the network stack would be the high voltage inductor, while the network handling code would be the ignition system.

Actually, it’s a bad analogy. One of the problems we have is that no physical analogy is right for software systems, so we keep misleading ourselves with them. But still, that’s my analogy for now.

We can’t quite yet get to the same level of component reuse in software as car manufacturers can. So every software engineer, in principle, has to create his own ignition system, even if he’s only interested in building his own idea of a sleek sports car. And, sure, his ignition system won’t equal one made by Bosch. Duh.

Trying to teach him how to make perfect ignition systems won’t work. He’ll be bored and distracted from his main interest. So this has to change.

(All the “software components” stuff is/was about this, but nobody has gotten the abstraction level right yet. They haven’t found the right place to cut the cables, so to speak. My feeling is they’ve always tried to solve too many problems in a too general way with all these systems, including Corba, COM, VCL, whathaveyou. Like I’d ever want to use a readymade ignition system in a juice press. Which actually may not even have a fuse box which the ignition system relies on. The analogy turns slightly ridiculous here…)

The other thing here is “complexity” vs “simplicity”. Complexity screws everything up. It causes exponentially increasing development times and bug counts. It makes for fragile systems and maintenance nightmares. But, you say, people want complex software… no they don’t. They want systems with complex behaviour, which is not the same as complex software. You can build very complex systems from a set of simple software units, if you do it right. That way the total complexity of the system increases linearly with size and function instead of exponentially.

This is what structured programming, OOP and all the rest is all about: trying to build complex systems using simple software. The trick is reducing complex processes to interacting simple processes, then treating each of them as a simple subsystem. Surprisingly few development organisations seem to get this and keep on building complex software using these techniques. Personally, I have a very hard time even imagining which system actually has to be built using complex software and can’t be done by a collection of simple software parts. I don’t really think there are any.

Dividing a complex system into simple software parts, if done right, makes each part easily manageable, allows you to divide the development organization into smaller coherent parts, allows easier documentation, easier testing and bug resolution, easier replacement, etc, etc. It’s a boatload of Good.

The current paradigm change (yey, got to use “the word”!) that’s occurring due to multi-cores will actually help. The transition will be very painful (developers and designers seem to have great difficulty thinking of business processes in anything but a sequential fashion), but if we’re lucky, that may be what’s needed. Pain is good.

Short version: you can’t confront “simplicity” with “complexity” without indicating what level of composition you’re talking about.

Bad SSN idea

In the USA, the social security number (SSN) is often used to authenticate people over the phone. Let’s leave the general badness of this idea out of the current discussion and focus on the particularly bad idea I heard about recently.

In order to protect the SSN, many companies keep only the last four digits of the SSN on file and ask the caller to give those four digits. If they match, well, that authenticates the caller (you bet). This particular corporation, however, wanted to go further and only stored the sum of the last four digits. Callers where then asked to calculate the sum of their last four digits and only report that sum over the phone. The company in question apparently felt this improved the security of the SSN.

Now, think about this for a moment… ok, finished?

Let’s look at how this works:

Summing of last four digits

Obviously, a number of 4 digit combinations map to the same sum. That could not have escaped the inventor of the scheme. Maybe he thought that guessing the sum would still be too hard, since there is just one chance in 37 of getting it right (0 being the minimum and 36 being the maximum).

But this isn’t true, either. Think about this: there is only one single 4-digit combination that sums to 0, namely “0000”. Similarly, there is only one single combination (“9999”) that sums to 36. There are just a few combinations that sum to 1 or to 35 (four each), and so on. But we have 10,000 combinations that map into 37 bins, so there must be a lot of combinations that map into the numbers in the middle, like 18.

Actually, the problem is well described by the “Central Limit Theorem”, which tells us that the distribution is very close to a Gaussian distribution:

Distribution of sum of four digits

Calculating this shows us that 670 of the 10,000 combinations sum to 18, which means that just guessing “18” gives you a 1/15 chance of being right. Actually, guessing any number between 14 and 22 gives you at least a 1/20 chance of being right. Remember that a 1/20 chance of being right means that you, on average, will have to try 10 times before getting it right. That is not much worse than asking the caller to give only one single digit from his SSN instead of four, which you could guess in an average of five tries.

This is the way I would expect the system to be exploited:

Caller: Hi, I’m Jack CeoMan, could you give me my password, please?
Support: Sure, give me the sum of the last four digits in your SSN, and I’ll do it.
Caller: Ummm… don’t have it with me, but I think, let’s see,… 18.
Support: No, that isn’t right.
Caller: How can adding be so difficult… damn it’s 20, unless I’m thinking of my wife’s SSN.
Support: Can’t be yours.
Caller: Oh, damn, wait, could this be it: 16?
Support: nope, you’d better go check your number and call back, sir.
Caller: ok, will do.

So, you can easily guess three times without raising suspicion. Since the organization is large, has a number of support staff and nobody knows anyone else by voice (else they would never have instituted the system, would they?), you can easily call back a couple of hours later and try again with someone else. Do this three times and you’ve got yourself a password.

Too easy, by far.

Moral of the story: don’t invent your own security protocols. You’re bound to make mistakes.

Note: I didn’t calculate the number using the central limit theorem. I took the easy way out and wrote a loop in Delphi. Yes, I’m ashamed of it, but it worked. A very elegant solution was proposed by Earl Fife:

Here is a more mathematical solution relying on C(n,k), the number of
ways of selecting k object from among n distinct objects. Note:
C(n,k)= n!/(k!(n-k)!)

d_1 + d_2 + d_3 + d_4 is the sum of the 4 numbers, hence we are
interested in d_1+…+d_4=18.

The value of each d can be represented by that many 1s, so d=4 can be
veiwed as d=1,1,1,1, or more succinctly d=1111. And a 4 digit number
can be viewed as strings of 1s separated by +s, e.g.
2781=11+1111111+11111111+1. Since we are interested in 4-digit numbers
whose digital sum is 18, we have a total of 18 1s and 3 +s. Each
arrangement of them constitutes a unique number.

To select an arrangement of 18 1s and 3 +s, all we need to do is
designate which of the 18+3 positions contain the +s. The rest all get
1s, so there are C(18+3,3) = 1330 of them. This corresponds to the
number of integer solutions to d_1 + … + d_4 = 18, d_i >=0.

I.e., the computation would allow for there to be values of d exceeding
9, so we need to subtract situations in which one of the digits is 10,
or 11, or .., 18. In the case 10: solve 10 + d_2 + d_3 +d_4 = 18, i.e.
d_2 + d_3 + d_4 = 8 (there are C(8+2,2) of solutions, and since 10 could
have been in any of the d_i’s there are 4C(8+2,2) of them. Repeat for
the case d_i=11, etc.

Final Result
C(18+3,3) – 4(Sum[C(18-i+2,2), 10<=i<=18]) = 1330 - 4(45 + 36 + 28 + 21 + 15 + 10 + 6 + 3 + 1) = 1330 - 4(165) = 670. Well, maybe it is not compact enough to be elegant, but it is more interesting than a loop.

Banks and (in)security

Phishing: setting up a false website, looking more or less like the bank’s site, and getting users to enter their username and password, so that the phisher can then log on to the bank himself and empty the user’s account.

Admittedly, the preceding paragraph took some freedoms with the definition of phishing, but I’m discussing just banks now, so let’s leave it at that.

The preceding paragraph also assumed that the bank uses username/password logon, which is not always the case. Depending on how one logs on to the bank, a number of scenarios for phishing develop:

Username and password

These sites are ridiculous. You get the username and password and you can log on and transfer money to your heart’s delight. You can obtain the username and password through a lookalike site, pretending you failed to log on and collecting the passwords that way. You get the user to go to your site by sending out email with a link to your site, saying something like “your account has been suspended and you must log on again to enable you to keep using it” or some such drivel.

Banks with OTP scratch cards

Some banks issue cards with one-time-passwords (OTP). You have to log in using your username and the next code on the card. You get to see the next code by scratching off the silver layer covering it. In some cases, you need a second code to confirm transactions. In some cases, the transaction codes are not on the same card, or not in the same series, as the logon codes. That’s good, because a fake website that collects logon codes by “failing” to login twice in a row will only collect logon codes and no transaction codes.

Note that if the phishing site connects to the bank in the background, it can allow you to actually log on to the bank and perform all your transactions. Meanwhile, the phisher can modify your transactions, for instance by changing all the account numbers you send money to, to point to his own bankaccount, or that of an accomplice. You’ll never notice, until people start complaining about not getting paid. This kind of attack is a man-in-the-middle (MITM) attack.

Banks with hardware tokens, timebased

If you log on using a hardware token that changes it’s (usually) 6-digit number every minute, you don’t risk having someone get your password and use it behind your back. It’s only valid for a minute after all. (There’s a “window” of a few codes on both sides, so it could be anything between five and 10 minutes, actually, but that’s another story.)

But if you’re talking to a MITM, he’s just forwarding the logon screen from the bank to you, and your logon password from you to the bank, so the hardware token hasn’t helped you one bit. The same goes for the signing of the transaction, since it’s done the same way.

Hardware token with challenge/response

A lot of banks start to use challenge/response hardware tokens. This is probably the best method so far, but it still stinks. If you have a MITM, again he’s forwarding the bank’s pages to you and your responses back to the bank, so the hardware token makes no difference.

The hardware token is also used to sign the transactions. That is, a number is calculated from your transaction list, including account numbers and amounts, and that number is sent to you. You input that number into your hardware token and calculate a new number, a “signature”, that you send back to the bank. There is no way for the MITM attacker to change your actual transactions without the challenge number changing. Sounds great, right? Except… if your challenge number changes, how would you know? It’s just an eight digit number, which reflects your transactions according to some undisclosed formula, so you have no way of knowing which transactions you’re actually signing! That’s ridiculous!

One bank I’m using uses the total amount of the transactions as one part of the challenge. That is great! But… the account numbers aren’t part of it. So as long as the MITM doesn’t change the amounts, he can reroute the money wherever he desires.

Pretty Pictures

Some banks have gotten the brightest of ideas: let the user select one or several pictures he likes and present these to him before he logs on next time. A phisher wouldn’t know to show the right pictures, and the user would notice. Except it doesn’t work.

To be able to present these pictures to the user, the bank first needs at least a username. Well, what do you know, even the MITM can give that to the bank. So, the user sends his username to the MITM, who sends it to the bank, who sends the pretty pictures to the MITM, who passes them back to the user.

End result? That the user is now convinced that the MITM phishing site is authentic. The bank told him so, didn’t they?

Machine certificates

Some banks send their users certificates to store on their computers. The bank then has two strategies:

1. Some banks show the pretty pictures after reading the certificate. No certificate, no pretty pictures, and they fall back to other authentication methods.

2. Skandiabanken.se: you can’t log on without the certificate. But, of course, if you don’t have one, you can get one.

What’s my problem with these two methods? First, the fallback method is often a weak point. The other problem is with the certificate itself.

Skandiabanken relied on machine certificate plus a regular username/password logon until december last year, when they suddenly and without warning sent out OTP scratch cards to everyone and added a onetime code to the login, overnight. They refuse to talk about what went wrong, but it’s obvious that the machine certificate was not enough security. Maybe it can be stolen? Maybe a machine trojan can use it to log on? I don’t know. But I do know that Skandiabanken does not trust machine certificates anymore, so neither should you.

So what does work, then?

I have a fair idea. First, forget about logons, they’re not that important. We can secure those with hardware tokens or scratch OTP. We can’t guarantee that there’s no MITM snooping on us, but we can guarantee he’s not stealing our money if we use digital signatures on human-readable bank orders. This is how that could work:

You log on to the bank site and enter all your transactions. At the end, the bank creates an email message for you by opening your mail agent (Outlook Express, Mac Mail, Eudora, Thunderbird, whatever), addressing the mail to itself, and as body of the mail it enters a completely readable list of transactions with account numbers, names, amounts, everything. The user then clicks the “Sign” button in the mail agent and adds his digital signature to the message and out it goes. A couple of minutes later, the bank retrieves the message, verifies the signature and executes the orders.

It’s that simple.

The only thing standing in the way of this is getting digital signature software installed in end-users’ machines. And getting the digital keys and certificates set up. This is not rocket science and could easily be achieved if the banks put their minds to it. And if they stopped shrugging it off as something “too difficult”. It isn’t. It’s easy.

Note that it doesn’t matter how many MITM attacks you have in this scenario. The transactions are inviolate. You may need some technology to protect the user’s keys, though, like smart cards. But not even this is very hard to do.

Nothing else that I can see has a remote chance of standing up to MITM attacks.

We’re not home quite yet, though. We have to consider that the signing operation can be compromised. As long as the software that does the signing runs on a computer that can be compromised, so can that software. It may need to be hardened, it may need to use the Trusted Platform stuff, it may have to be taken off-system. But at least we’ve reduced the problem considerably, and by doing that have a fighting chance.

Just sticking more virtual keyboards, pictures, and magic phrases into logon pages isn’t doing anyone any good. It’s nothing but security theater and instilling a false sense of security.

PS: I just noticed that the domain “belovedbank.com” that I used in my example mail is free. Go for it, guys!

Proving You’re Worthy, Online

An often recurring problem online is how to prove you’re eligible to access a particular resource, if that resource is limited to people belonging to a certain group. This problem occurs, more abstractly, if the resource is managed by some organization that is not itself responsible for determining who is eligible. Examples: sites accessible to physicians, but not run by the organization that actually licenses physicians. Or, sites accessible to holders of particular certifications, but not run by the certification authority. One of those problem children is the CISSP cert, since there are a number of resources for holders of the CISSP cert.

The CISSP forum is a place where only CISSPs can read and write. ISC2, the organization that issues the CISSP certificate, takes care of registering users on that forum, so that way, the selection of participants is ensured. But there are other resources that are exclusively available to CISSPs, like being in the CISSP LinkedIn group or accessing CISSP specific Wiki sites or archives. These “third party” resources have no easy way to determine who is actually a CISSP and who is not, particularly since ISC2 is certainly not giving just anyone access to their database. Worse, many CISSPs want to register to those third party resources using an email address that is not necessarily known to ISC2, since many of us have multiple email addresses.

The reason I present a solution to this little problem in the form of a protocol is that it is probably of a more general interest. Just try the protocol and you’ll probably see a lot more uses for it.

I set up a fake authority site, which represents ISC2 in the above scenario and another fake site that promises to hand out weekly free beer over email to any bona fide CISSPs who register. (I still have to figure out the actual beer delivery logistics, so I’m not honouring my commitments in this respect.)

To test it out, first go to the “Free Beer” site and follow the instructions. The Free Beer site will redirect you to the “ISC2 site”, where you can get a digitally signed certificate that certifies that you’re a bona fide CISSP, so to speak.

Copy and paste that certificate back into the Free Beer site, have it verified and parsed, and you’re supplied with free beer every week until your CISSP cert expires. Or would be, if any of the above was true, which it isn’t.

You can just as well use the authority site to produce a signed statement that you can paste into an email. The advantage of that is that the receiver’s email program will automatically verify the signature as soon as the email is viewed, if it has the corresponding public key.

Now, go play with it. It’s fun.

I even provide you with the full source for all the pages, which is surprisingly little and simple code. You’ll find links to the source in the pages themselves.

The current implementation assumes you’ve got gpg (or pgp) and php installed on your webhost, but practically every decent webhost has, as far as I know. That’s the only requirement, actually.

Please note: I’m just using ISC2 and the CISSP certificate as an example here. That’s because I think they should use this protocol, but there are lots of others that could use it just as well.

Beware the Rise of the Appliances!

To test out different wikis, I got the obvious idea of downloading VMWare appliances preinstalled with one or the other of those wiki systems. Very easy to get running and easy to test. Once you have them, that is, since most of them are distributed using BitTorrent and many have few, if any, seeds. But then it struck me…

Say there’s malware in any of those appliances? I mean, you’re downloading not only an app or a few apps, you’re downloading an entire operating system, which you then proceed to run in a VM on one of your desktops. Probably inside your private or corporate network. Now, how smart is that?

Assume you try to protect yourself by not allowing the appliance access to your internal net, but give it its own NIC witch hooks up to your DMZ segment. Even then, that appliance may run an exploit that can burrow itself into your host OS, and there’s no way you can detect that. Until it’s too late.

So, what is to be done? Use only a machine that’s not used for anything else? What’s the point of virtualization, then? Or not download any VM appliances at all? That’s tough. Or only download appliances from people you trust? I don’t know anyone that produces appliances like that, yet, so who would that be?

Mac developers with Windows attitudes

We all know by now that Mac users usually run as non-admins on their machines and what a good thing this is. Apps generally ask for admin credentials during install to get their setup done. Great stuff. I have just one (ok, maybe two) apps that don’t handle this right and these have to be installed under an admin account. So I have to copy the installer to /Users/Shared, switch to the admin user, install the app, switch back. A hassle, albeit a small one.

Anyway, I wrote an email to one of the vendors about this, and asked them to change this, see things the Apple Way and do right. This is the reply I got:

“Thanks for your feedback. This is a limitation of the installer software we use. The only alternative we have at present is to force every user to enter their admin password to install, and that seems an unacceptable compromise. We’re sorry for any inconvenience.”

Oh, man… that sounds just like a Windows developer to me. But these guys are Mac developers to the core! How can this be? What is the world coming to?

My reply, if you care:

“No, forcing every user to enter admin password is the exact RIGHT solution. That is what all other Mac software does. Don’t be Windows weenies; those are the ones who can’t handle “limited user accounts”, and see what happens then.

Believe me, you really should ask for admin credentials during install. Please, be a man (or woman) and do the right thing. Fix the install.”

Will they listen? I doubt it. Humanity gets what it deserves. Or deserves what it gets. Whatever.

I’ll go cry in my pillow now.

The MSDN credibility gap

I’ve been a longtime subscriber to MSDN magazine and its predecessor, MS Systems Journal, and I’ve always liked to read their stuff and learn. The last year or so, I haven’t read more than the columns at the very end, the editorial and maybe something by Michael Howard on security or John Robbins on debugging. For some reason, I don’t trust the rest of the mag.
Continue reading “The MSDN credibility gap”

iTunes and your inner human, if any

My wife just asked me if I was thinking of another woman.

Huh?

Seems I was playing “She’s always a woman to me” by Billy Joel for the third or fourth time in a row on the stereo, and she was looking for a meaning to it. Actually, I was testing my new Airport Express that I’d connected to the living room stereo and using my iTunes to send the music stream to it. Each time I tested, I just clicked on one of the first tracks in my list, and that happened to be just that track.

How is that list sorted? Well, it turns out it’s sorted on “Last Played” date and time. There’s also a column with “Play Count”. I very rarely play tracks on my iTunes, preferring to use the iPod, but those counts are updated from the iPod to the iTunes every time I connect the two.

So, where am I going with all this? The music we play, especially if we have large and diverse collections, are often a reflection of the mood we’re in. Since everything we play is registered, that means our mood is registered in a fairly direct way.

From my play counts and dates, it would be very easy to see if I’ve been doing excercise, since I use particular tracks for that with a good rhythm (Country & Western, Nathalie Imbruglia, Michael Jackson, Eric Clapton’s rock numbers), or been programming (psychedelic & trance). If I’m down, the selection is different, etc. (If you would have an affair with another woman, I doubt you would have the iPod on, but that’s beside the point.) Now all this gets registered.

PS: my wife just asked “are you?”, so it’s time to stop now.

Biological comparison nonsense

To me, this business with comparing malware and anti-measures in the IT security world with biological systems and in particular immune systems is nonsense on so many levels. People draw parallels with monoculture versus diversified cultures, and immunizing systems and so on. I say: Bah!

First, biological systems have no designer or design targets, no requirements specs, no whitepapers, no nothing. The only thing it has is a testing department. It also has gobs of time and material at its disposal. The entire evolutionary thing is based on “code monkeys” hacking out random code by the ton, then throwing it out on the “market” only expecting a random small fraction to succeed.
Continue reading “Biological comparison nonsense”