November 2014 – Musings on Languages, IT and other stuff

useful: password encryption

I’m just stashing a couple of pieces on password storage, security and encryption here which Pinterest Engineering posted in the last few days:

Part 1

Part 2

One of the problems I am looking at solving is password replacement. Pretty sure a lot of other people are doing so as well but…

Navigation stuff and research

I scribbled a quick overview of why I am interested in pre-war navigation aids here. It’s part of my ongoing research interests so stuff may or may not turn up as I get more information feeding that.

In other news, I have a bunch of other things which interest me so they will probably generate some stuff under the research interests link.

If you look at my CV or my About pages you will see that in 1998 I spent some time in Finland learning Finnish. It was four intense weeks of Finnish through the medium of Finnish, supported by the nice people in the Finnish Government who, at the time, funded two people to go to study at summer school in Rauma each year. I am not sure if they still do it and I don’t even know now how I found out about it. I was a temporary agent in Brussels at the time but I got the scholarship on the basis of being Irish.

Anyway, four weeks of intense study will take you so far, and I did some follow up work while I was still living in Belgium, but life took an interesting turn into computer programming and somehow, this habit of acquiring bits and pieces of human languages got set aside in favour of speaking directly to computers. I wrote assembler for a long time. And I learned Java, and I made projects happen with Javascript, Python and R. And SQL.

I did a postgrad in computer science last year which finished up around September and usually, when I do something like that, I follow it up with some completely different activity. Instead of it being some form of craft work (my last postgrad was followed up by a stained glass and mosaics design course), I decided to pick up Finnish again.

Finnish is an interesting language in a lot of respects. It is a branch of the Finno-Ugric languages which are generally thought to be unrelated to the Indo-European languages (although I believe there is some research in philology questioning this). Unlike most languages, it has an entirely logical spelling system with no irregularities. It has some grammatical oddities and structurally, it has some serious idiosyncrasies. Above all, it is a highly compacted language. I remember some of the very basic stuff, but I have more or less forgotten the verb and noun rules.

Being prudent, I have picked up the books I bought 15 years ago to study Finnish, mainly because I know they are good books, and also because getting decent dictionaries seems to be harder now than it was then. And this includes going through the Akateeminen Kirjakauppa book store online. The other thing I am doing, which is linked to some rather traumatic memories involving German, is reading a news story a day with the aid of a dictionary. This is massively challenging for several reasons, of which, having forgotten the verb and noun rules, identifying root forms of both can be impossible, and of course, it just takes a long time. I do, however, believe it is one of the more effective ways of broadening your vocabulary. It’s just not that easy.

In terms of language acquisition, some things are much easier now. I am not really going to talk about Duolingo (I have doubts about it as a learning tool for me anyway, and I don’t believe it offers Finnish), but the simple availability of media. Even for Finnish, there is a substantial amount of material available through Youtube videos, for example. There are a number of radio stations available via TuneIn. YLE, the state broadcasting service does a special Easy Finnish news report which is what I use as raw material for my reading exercises. At the weekend, I think I will be able to watch skijumping and ice skating in Finnish.

When I was learning French, to get any media at all, I used to hide in a car which had a long wave receiver. At this point, it doesn’t massively matter how much I understand, only that the amount I am understanding is growing on an ongoing basis. My passive vocabulary will grow much more quickly than my active vocabulary and this is not all that surprising since this morning’s news story was about unemployment and part time work.

The interesting thing, from my point of view, is how much hard work goes into language acquisition. Being absolutely honest, it is harder work than learning programming languages.

And yet, in certain respects, it is very rewarding. One of the interesting things about Finnish is how sentences are structured and how that might suggest a completely different way of looking at the world. I find it fascinating purely from that point of view, never mind being able to converse with people in Stockmann when I go shopping there. In a lot of ways, I am really sorry I set it aside for so long. I am having fun with this.

learning: encrypting text

A while back, one of my friends on twitter introduced me to NaNoGenMo and late at night, I started thinking about what could be done with such a project. As they cited the possibility of 50,000 meows as an example of a possible successful project, I decided that a novel consisting of 50,000 different words making absolutely no sense whatsoever was a possibility, and decided to find a base text of 1000 words, and build an encryption algorithm that would generate 50 different encrypted versions of the text, and that would be it. And I would write the encrypted text generator in Python.

The primary reason I do stuff like this is for learning reasons and very often, you wind up learning more about how you look at a problem, rather than about whatever programming language you use for projects like this. More often than not, you’ll find a little bit of functionality that you didn’t know existed. And if you are really lucky, looking at stuff like this opens doors for you to look at things in more detail.

I am not an encryption specialist (yet) so effectively, I wanted to find some way of turning cleartext into something obviously encrypted but without it being too easy to immediately decode. The angle of attack I specifically wanted to block was frequency analysis. (It’s 10 years at least since I looked at encryption techniques but I remember some of it).

So I looked at building an algorithm which amounted to reducing the number of output letters, but in a random manner. Each individual run of the encryption algorithm generated a random number which was the total number of letters which could be used in encrypting the clear text, and for each letter generated random number, which represented which letter the cleartext letter would map to. I also generated a key of the mapping of original letter to encrypted letter, performed some minimal hiding of reality, and ran the algorithm against some text. It worked beautifully and what’s more, it was very obvious that you couldn’t see what had happened to the text.

Where I ran into a problem was in decrypting each piece of cipher text. When you basically reduce the dimension of available letters, regenerating the mapping from a smaller group of letters becomes a difficult to fix problem, particularly if you only have one example of the encryption algorithm. I could not actually produce a piece of code that immediately decrypted any individual piece of cipher text. The key generated by the encryption algorithm was a one way only key. So I have been pondering this problem in the meantime and I’ve concluded that the algorithm may yet be breakable if you have several examples of the same text encrypted, plus the matching keys and some willingness to go messing calculating the different encryption dimensions. In short, while it’s relatively straightforward to encrypt the text, you need many examples of the algorithm generating different keys plus the associated encrypted texts to break back in. I have not yet implemented this but I will look at it as a later problem.

In the meantime, key learning outcomes from this exercise:

encryption algorithms are easier to design than encryption with matching decription
Python has a useful string translate function which I was happy to find. I have used something similar in assembler programming for changing encoding between different character sets.
after all this, when I started reading up on cipher algorithms again, I discovered that there exists a pycipher library which implements a bunch of standard cipher algorithms. Even so, the existence of this does not mean I won’t, at some stage, have a look at implementing one of the Enigma or, possibly, one of the Lorenz ciphers just for the hell of it.
I want to read up on cryptography again. It’s been too long.

The github page for the project is here.

Recommendations at Etsy

Robert Hall, one of the data engineers at Etsy, a large online craft market place, has written a comprehensive overview of how they manage recommendations. It’s a very interesting piece in that it’s quite open particularly as relates to something which could be considered commercially sensitive, and it broaches on both the mathematics and infrastructure side of things.

Read it here. The Etsy Code as Craft blog is well worth a read in ongoing terms.

An hour on Github

Leaving aside that this is really nice too, one of the things I like is it mentioned what tool was used which means if it’s something I’m unfamiliar with (which it is) I can go and have a look.

Declaration of interest: I have a github account.

Christmas wishlist

Every time I check the Wolfram site in hope that this has arrived. Please, for Christmas…can I have the data science platform?

via Quartz: Satellites circling the earth

This is cool.

This is every active satellite orbiting earth

I really, really like creative visualisations like this.

Everyone should learn to code

This, from the Wall Street Journal.

It annoyed me, not because I disagree with the idea of people learning to code – I don’t – but because as a piece supporting the idea that people should learn to code, it has some glaring errors in it and doesn’t really support the idea that people should learn to code. Personally I think a lot of tech people should learn to communicate more effectively but a lot of them appear to think they don’t have to so let’s just explain why this piece is a problem.

The most important technological skill for all employees is being able to code. If, as Marc Andreessen once noted, “Software is eating the world,” knowing how to code will help you eat rather than be eaten. Understanding how to design, write and maintain a computer program is important even if you never plan to write one in business. If you don’t know anything about coding, you won’t be able to function effectively in the world today.

So, two major assertions here: the most important technological skill for all employees is being able to code and “if you don’t know anything about coding, you won’t be able to function effectively in the world today”.

These assertions are patently not true. To be frank, the most important technological skill for an employee, in my opinion, is the ability to describe what’s gone wrong on the screen in front of them. That’s also a communications issue but it does enable technology experts to help them. As for “if you don’t know anything about coding, you won’t be able to function effectively”, I strongly disagree with that and would suggest that ultimately, the problems lie with interface design which employees are not actually responsible for the most part.

You will inevitably work with people who program for a living, and you need to be able to communicate effectively with them. You will work with computers as a part of your job, and you need to understand how they think and operate. You will buy software at home and work, and you need to know why it works well or doesn’t. You will procure much of your information from the Internet, and you need to know what went wrong when you get “404 not found” or a “500 internal server error” messages.

Not one thing in this paragraph requires coding skills. It requires programmers to learn to communicate effectively and given a lot of them have trouble with the basic need to document what they are doing already, it’s a steep learning curve. With respect to software, again, how well it works depends on how well it is documented and designed. You do not need to be able to program to understand a 404 not found or a 500 internal server error.

Of course, being able to code is also extremely helpful in getting and keeping a job. “Software developers” is one of the job categories expected to grow the most over the next decade.

But not every employee is a software developer and nor should they be.

But in addition to many thousands of software professionals, we need far more software amateurs. McKinsey & Co. argued a few years ago that we need more than 1.5 million “data-savvy managers” in the U.S. alone if we’re going to succeed with big data, and it’s hard to be data-savvy without understanding how software works.

Data and programming are not the same things. Where data is concerned we frantically need people who get statistics, not just programming. IME, most programmers don’t get statistics at all. Teaching people to code will not fix this; it’s a tool to support another knowledge base.

Even if you’ve left school, it’s not too late. There are many resources available to help you learn how to code at a basic level. The language doesn’t matter.

Learn to code, and learn to live in the 21^st century.

I’m absolutely in favour of people learning to think programmatically, and logically. But I don’t think it’s a requirement for learning to live in the 21st century. The world would be better served if we put more effort into learning to cook for ourselves.

I hate puff pieces like this. Ultimately, I mistrust pieces that suggest everyone should be able to code particularly at a time when coding salaries are low at the time we are being told there’s a frantic shortage. I’ve seen the same happen with linguistic skills. There are a lot of good reasons to learn to code – but like a lot of things, people need to set priorities in what they want to do, what they want to learn on. Learning to write computer code is not especially different; learning to apply it to solving problems on the other hand takes a way of looking at the world.

I’d prefer it if we looked at teaching people problem solving skills. These are not machine dependent and they are sadly lacking. In the meantime, people who have never opened a text editor understand that 404 Not found does not mean they could fix their problems by writing a program.

Learning programming before going to university

Ryan Walmsley has a piece suggesting you shouldn’t learn programming before going to university. It’s worth a read.

Personally, I am not against people learning to code before they get to university. I am, however, not in favour of people who have no coding skills arriving at university and starting with Scratch. Scratch is a superb tool for teaching kids how to program, and a bit about how computers work. It is not a suitable tool for adults on a coding specialist code in my view. While I am not the biggest fan of Java (disclaimer: have yet to review Lambdas in Java 8 and this may make some of my frustration go away), and I recognise that some people have issues with the lack of strong typing in Python, ultimately, once you get as far as university, you should at least start with tools you have a fighting chance of using in the income earning world. And there are a lot of them. Not in the top ten is Scratch.

Like a lot of things, tools need to be used appropriately and Scratch is an absolute winner in the sector it was designed for. But I have a book on my desk here that teaches kids how to program in Python and if kids can do that, I see no reason why we need kids level languages like Scratch at university level.

Month: November 2014