Code Reviews

This piece on code reviews landed in my email via an O’Reilly newsletter this morning.

I’ve posted a brief response to it but I wanted to discuss it a little further here. One of the core issues with some code reviews is that they focus on optics rather than depth. How does this code look?

There are some valid reasons for having cosmetic requirements in place. Variable names should be meaningful, but in this day and age, that doesn’t mean they also have to be limited to an arbitrary number of characters. If someone wants to be a twerp about it, they will find a way of being a twerp about it no matter what rules you put in place.

However, the core reason for code reviews should be in terms of understanding what a particular bit of code is doing and whether it does it in the safest way possible. If you’re hung up on the number of tab spaces, then perhaps, you’re going to miss aspects of this. If you wind up with code that looks wonderful on the outside but is a 20 carat mess on the inside, well…your code review isn’t understanding what code is doing and it’s not identifying whether it is safe or possible.

So what I would tend to recommend, where bureaucratically possible, is that before any code reviewing is done, coding standards are reviewed in terms of whether they are fit for purpose. Often, they are not.

It won’t matter how you review code if the framework for catching issues just isn’t there.

GIT and open source, the victory or not

During the week, Wired published a piece under the title Github’s Top Coding Languages Show Open Source Has Won.

This is basically – and I am being diplomatic here – not what Github’s Top Coding Languages shows.

Fundamentally, for Github to show this, every piece of operational code would have to be on Github. It isn’t. I’d be willing to bet less than half of it is, and probably less than a quarter, but that’s a finger in the air guess. Most companies don’t have their code on Github.

What Github’s top ten coding language shows is that these are the ten most popular languages posted by people who use Github. Nothing more and nothing less.

I suspect Github know this. I really wonder why Wired does not.

 

Visual Studio Code

It might be possible to have too many code editors; I have a good few anyway. But MS have launched a new one, which is cross platform (Win, Mac and Linux) and which, on first sight looks quite interesting. It is currently in Preview version.

It reminds me vaguely of Brackets; still not sure whether it will replace Sublime or Notepad++ in my heart. I like some of its navigation features though.

learning: encrypting text

A while back, one of my friends on twitter introduced me to NaNoGenMo and late at night, I started thinking about what could be done with such a project. As they cited the possibility of 50,000 meows as an example of a possible successful project, I decided that a novel consisting of 50,000 different words making absolutely no sense whatsoever was a possibility, and decided to find a base text of 1000 words, and build an encryption algorithm that would generate 50 different encrypted versions of the text, and that would be it. And I would write the encrypted text generator in Python.

The primary reason I do stuff like this is for learning reasons and very often, you wind up learning more about how you look at a problem, rather than about whatever programming language you use for projects like this. More often than not, you’ll find a little bit of functionality that you didn’t know existed. And if you are really lucky, looking at stuff like this opens doors for you to look at things in more detail.

I am not an encryption specialist (yet) so effectively, I wanted to find some way of turning cleartext into something obviously encrypted but without it being too easy to immediately decode. The angle of attack I specifically wanted to block was frequency analysis. (It’s 10 years at least since I looked at encryption techniques but I remember some of it).

So I looked at building an algorithm which amounted to reducing the number of output letters, but in a random manner. Each individual run of the encryption algorithm generated a random number which was the total number of letters which could be used in encrypting the clear text, and for each letter generated random number, which represented which letter the cleartext letter would map to. I also generated a key of the mapping of original letter to encrypted letter, performed some minimal hiding of reality, and ran the algorithm against some text. It worked beautifully and what’s more, it was very obvious that you couldn’t see what had happened to the text.

Where I ran into a problem was in decrypting each piece of cipher text. When you basically reduce the dimension of available letters, regenerating the mapping from a smaller group of letters becomes a difficult to fix problem, particularly if you only have one example of the encryption algorithm. I could not actually produce a piece of code that immediately decrypted any individual piece of cipher text. The key generated by the encryption algorithm was a one way only key. So I have been pondering this problem in the meantime and I’ve concluded that the algorithm may yet be breakable if you have several examples of the same text encrypted, plus the matching keys and some willingness to go messing calculating the different encryption dimensions. In short, while it’s relatively straightforward to encrypt the text, you need many examples of the algorithm generating different keys plus the associated encrypted texts to break back in. I have not yet implemented this but I will look at it as a later problem.

In the meantime, key learning outcomes from this exercise:

  • encryption algorithms are easier to design than encryption with matching decription
  • Python has a useful string translate function which I was happy to find. I have used something similar in assembler programming for changing encoding between different character sets.
  • after all this, when I started reading up on cipher algorithms again, I discovered that there exists a pycipher library which implements a bunch of standard cipher algorithms. Even so, the existence of this does not mean I won’t, at some stage, have a look at implementing one of the Enigma or, possibly, one of the Lorenz ciphers just for the hell of it.
  • I want to read up on cryptography again. It’s been too long.

The github page for the project is here.

 

 

Learning programming before going to university

Ryan Walmsley has a piece suggesting you shouldn’t learn programming before going to university. It’s worth a read.

Personally, I am not against people learning to code before they get to university. I am, however, not in favour of people who have no coding skills arriving at university and starting with Scratch. Scratch is a superb tool for teaching kids how to program, and a bit about how computers work. It is not a suitable tool for adults on a coding specialist code in my view. While I am not the biggest fan of Java (disclaimer: have yet to review Lambdas in Java 8 and this may make some of my frustration go away), and I recognise that some people have issues with the lack of strong typing in Python, ultimately, once you get as far as university, you should at least start with tools you have a fighting chance of using in the income earning world. And there are a lot of them. Not in the top ten is Scratch.

Like  a lot of things, tools need to be used appropriately and Scratch is an absolute winner in the sector it was designed for. But I have a book on my desk here that teaches kids how to program in Python and if kids can do that, I see no reason why we need kids level languages like Scratch at university level.

Mathematica on the Raspberry Pi

Seriously, I have a scary to do list but I finally got around to having a go with this the other day. It is very very nice. If you’re leaning towards a RaspPi and are interested in symbolic programming, it’s a pretty good place. Worth remembering that a RaspPi is not scary fast (ie, Mathematica on it is not hugely fast) but it comes across as something that a) is nice to work with and b) I will probably license on a bigger machine at some point.

Extraordinary claims require extraordinary evidence

As of yesterday evening sometime, my twitter feed has lit up with claims that a computer has passed the Turing test for the first time. These claims have their roots in this press release from Reading University.

The details of the test and how it was carried out are thin on the ground. We do know from Reading’s press release that one of the judges was an actor and that in 33% of cases, the computer could not be distinguished from a human.

I have a couple of key questions.

  1. What language did the humans interact with the bot in? This is important because the bot is defined as a 13 year old boy from the Ukraine. If the interaction was in English then for me, all bets are off.
  2. Where is the peer reviewed paper?

The Turing Test is, in many respects, iconic. If someone claims to pass it, a press release is going to be nowhere near adequate to support that claim. We need to know a lot more about the system concerned, how it works and how it operates.