The Semantics of Signing

When we apply a digital signature to a data structure, we only apply it to the data actually present in the structure. But most of that data is only meaningful in relation to external data tables, and used with certain applications, which can change without influencing the signature on the data structure. This is a serious problem in many application areas, but in none as much as in medical informatics.

Digital signatures are usually used to sign textual documents. The simpler the technology behind these documents, the more meaningful is the digital signature. For instance, if you sign a plain old ASCII email message, it’s very easy to validate the digital signature and to check what you’re actually saying in the message. You don’t need any particular application to see the signed contents.

But if you sign a Word document, you only need to open it with Microsoft Word and save it back again to invalidate the digital signature. The new version of the document will differ in sufficient detail to cause the digital signature to become invalid. Also, without having the same version of Word as the document was written with, you can’t be sure what the meaning of the data is. It is possible, though admittedly unlikely, that the same file would be rendered as a different text using a future version of Word or using another word processor entirely. So, what did you actually sign? You signed a representation of the intended meaning as expressed by a certain version of a certain application, that is all.

If you sign a bank transaction, you would probably sign a list of account numbers and amounts. These account numbers may belong to somebody else, or not be traceable at all, in the far future, making the signature meaningless, or worse, deceptive. But a signature on a bank transaction doesn’t need to hold up for very long. Also, there is no doubt about the application being used to produce it and verify it, so it doesn’t really matter.

But if you sign a medical document, you’re in for another set of difficult requirements. The rest of this article is about medical documents and signatures and the problems associated with these.

Time to live

Bank transactions and emails aren’t usually saved for great lengths of time. If a digital signature on an email can’t be verified 20 years hence, it’s not such a big deal. If you can even find that email 20 years hence… If you can’t verify the signature on a contract 20 years hence, it may also be irrelevant. But if a medical document can’t be read or verified 20 years hence, it may be a real problem. Depending on country and laws, medical documents have to be available for at least 20 years after being created, or maybe up to 20 years after the death of the patient. That translates to a 100 years or more! But, honestly, 20 years or 100 years doesn’t matter; both time-scales are beyond the future horizon for any current technology except the very most basic and trivial.

So we have to create digital signatures in a way that has a fighting chance of being useable in at least 20 years time. Or we have to re-sign these documents (add a signature to the signature) whenever technology changes sufficiently to jeopardize the reproducibility of the signatures. Such a re-signing could be performed by a trusted third party or some kind.

External code tables and concepts

Medical documents are exceptionally dependent on external context such as code tables. Patients are often identified by a national number or a hospital registration code, that have no meaning without a table to look them up in. Lab tests are identified by code. Doctors and other staff can be identified by board license numbers. Documents refer to each other by codes and numbers. When you sign a document containing all this, you’re actually signing a bunch of codes and numbers that mean nothing without the external documents and tables they refer to.

This means that you have to include all the external tables needed for the correct interpretation into the document, or at least into the signature on the document. But since the external documents in turn also depend on other external documents and tables, this gets out of hand real quick.

One way of avoiding this explosion in required attachments is to include checksums or signatures only of the external tables. That is, each document contains a list of tables and documents it refers to, and a checksum on each of those tables and documents. Each code you use then refers to that list of external references. All of this should be part of the information in the document that is digitally signed. The problem is that if even one of those tables can’t be located, the validity of the signed document comes into question. The advantage, however, is that the signature still verifies and if the missing table isn’t crucial to the meaning of the document, the rest of the document can still be trusted.

But if we refer to external tables, how do we identify them? Including a URL won’t help much 20 years into the future. One possibility is to include a checksum of the table and leave it up to our successors to find a way of locating the table somehow. Another possibility is to attach every table you need in its entirety to the document, but we already indicated that this could get huge in no time.

The Application Problem

Even if we found a solution to the problem of external code tables, we now have the problem of translations inherent to applications. No data structure is entirely self-describing and most applications produce data that is hardly describing at all. Example: if a block of data is deleted by changing the first byte to zero, that deletion is only a deletion in the eyes of an application that sees this zero byte as a flag indicating deletion. If that record is read by another application, or a later version of the same application, the record may not be seen as deleted at all.

If you verify the digital signature on such a data structure, you can validly claim that the deleted block of data in there is deleted, or not deleted, entirely depending on the exact application and version you are using to render the data into human readable information.

What this means is that you have to include the application into the signed block somehow. Maybe in the form of a checksum on the application code, maybe by actually including the application code into the signed block. But what value is this application 20 years hence? There won’t be a machine that can run it, I’m sure. You won’t be able to reverse engineer it either. So there you are, with a verifiable signature on a block of data, which you can’t determine what it actually says.

A worse problem is that you risk not being able to recover medical records at all 20 years hence, unless to take extreme care to maintain old applications. Or keep upgrading the data as applications evolve, adding signatures to old signatures as the old signatures don’t validate anymore.

Solutions, then?

After making you miserable with all the problems digital signatures present, I must present some indication of solutions. In order to appreciate the solution space, we have to separate two different uses of the digital signatures:

  1. To allow receiving applications to verify origin and data
  2. To allow a human to sign a meaningful document and to verify what the original signer meant to say in a human readable form

I don’t see a way of accomplishing both objectives with a single technique, so I propose to implement and use both signature schemes in parallel.

Application signatures

Applications need to apply a signature on data to make it possible for other applications to trace the origins of the data and to verify data integrity. In principle, the applied signatures should belong to the application producing the data, not to the person using the application, since that person has no direct control over what the application puts into the data or even what it actually means. I think you should view this as the application certifying that the given form of the data is its interpretation of what the user desires it to output. Included in the data should be an identification of the user, or other application or service, that ordered the application to perform the operation and produce the data. Exactly how that user or preceding service is identified depends, by necessity, on the application itself.

These application signatures will almost certainly not be useful in the far future, but that doesn’t matter all that much. The intended receiving applications probably won’t exist anymore, anyway.

Human readable representation signatures

When a user causes an application to produce a document or data structure, the application must at the same time produce a bitmap representation of what the user actually intends to create and sign. That is, a picture of a simulated paper document with the intended content. For instance, if the user creates a prescription for some drug, the bitmap should show a classic prescription form with all names and dosages written out in full. This is the bitmap that the user signs. This bitmap will not be used by any applications, but it is actually the only representation of the entire transaction that we can expect the user to stand behind. It’s the only representation the user can understand directly.

Yes, I hear your protests, he still needs an application to reproduce that bitmap to a human readable form, but such an application can be easily verified to be correct. Such applications do not interpret the data in the light of a volatile external document or table of any kind, which makes them much easier to verify.

This solution assumes the continued existence of the digital signature algorithm used. I have no idea of how to eliminate this assumption, but if you have one, please let me know.

The solution also depends on the existence of a trustworthy application used to sign the bitmap. But since this application can be used for any purpose where a bitmap needs to be signed, it isn’t too far fetched to envision it being a trusted part of the operating system, so I don’t see this as a problem. It is also simple to check the created signatures on the bitmaps to keep that application honest.

The question remains, what do we do with the signed bitmap? I propose to leave it attached to the data structure wherever it goes and through all transformations. On another level, the actual instance of the bitmap and signature may be replaced by an identifier allowing the bitmap and signature to be found, but that is indeed on another level and should be disregarded in this discussion.

Conclusion

The message I’m trying to get across is that it’s no use to standardize on exactly which algorithms and methods you are going to use to sign information in general, and medical information in particular, if you don’t spend enough time figuring out what you are actually going to sign. I can’t emphasize this enough, so here I go repeating it:

The signed block of data needs to make sense:

  • on its own, without other external documents or tables available
  • regardless of the computing environment
  • at least 20 years into the future
  • to a human being