learning: encrypting text – Musings on Languages, IT and other stuff

A while back, one of my friends on twitter introduced me to NaNoGenMo and late at night, I started thinking about what could be done with such a project. As they cited the possibility of 50,000 meows as an example of a possible successful project, I decided that a novel consisting of 50,000 different words making absolutely no sense whatsoever was a possibility, and decided to find a base text of 1000 words, and build an encryption algorithm that would generate 50 different encrypted versions of the text, and that would be it. And I would write the encrypted text generator in Python.

The primary reason I do stuff like this is for learning reasons and very often, you wind up learning more about how you look at a problem, rather than about whatever programming language you use for projects like this. More often than not, you’ll find a little bit of functionality that you didn’t know existed. And if you are really lucky, looking at stuff like this opens doors for you to look at things in more detail.

I am not an encryption specialist (yet) so effectively, I wanted to find some way of turning cleartext into something obviously encrypted but without it being too easy to immediately decode. The angle of attack I specifically wanted to block was frequency analysis. (It’s 10 years at least since I looked at encryption techniques but I remember some of it).

So I looked at building an algorithm which amounted to reducing the number of output letters, but in a random manner. Each individual run of the encryption algorithm generated a random number which was the total number of letters which could be used in encrypting the clear text, and for each letter generated random number, which represented which letter the cleartext letter would map to. I also generated a key of the mapping of original letter to encrypted letter, performed some minimal hiding of reality, and ran the algorithm against some text. It worked beautifully and what’s more, it was very obvious that you couldn’t see what had happened to the text.

Where I ran into a problem was in decrypting each piece of cipher text. When you basically reduce the dimension of available letters, regenerating the mapping from a smaller group of letters becomes a difficult to fix problem, particularly if you only have one example of the encryption algorithm. I could not actually produce a piece of code that immediately decrypted any individual piece of cipher text. The key generated by the encryption algorithm was a one way only key. So I have been pondering this problem in the meantime and I’ve concluded that the algorithm may yet be breakable if you have several examples of the same text encrypted, plus the matching keys and some willingness to go messing calculating the different encryption dimensions. In short, while it’s relatively straightforward to encrypt the text, you need many examples of the algorithm generating different keys plus the associated encrypted texts to break back in. I have not yet implemented this but I will look at it as a later problem.

In the meantime, key learning outcomes from this exercise:

encryption algorithms are easier to design than encryption with matching decription
Python has a useful string translate function which I was happy to find. I have used something similar in assembler programming for changing encoding between different character sets.
after all this, when I started reading up on cipher algorithms again, I discovered that there exists a pycipher library which implements a bunch of standard cipher algorithms. Even so, the existence of this does not mean I won’t, at some stage, have a look at implementing one of the Enigma or, possibly, one of the Lorenz ciphers just for the hell of it.
I want to read up on cryptography again. It’s been too long.

The github page for the project is here.