Cloud versus local

I had an odd dialogue coming with Microsoft Word this morning. The document I wanted to open, it said, had an issue synching, and the Skydrive and local copies of the document were different. I needed to choose which to retain.

I was not, it must be said, very happy about this, but chose local as I assumed that on the half dozen occasions I hit save last night, it saved locally first.

This turned out to be a mistake. The most recent local version was saved 3 hours before I finished work.

I was doing a lot of messing with dropping image files into that document last night so it was regularly saved. It was also saved when I shut up shop yesterday evening but none of those saves appeared to get written to local disc.

This is a huge problem for me. I have an always on connection so connectivity isn’t generally an issue. I’m the only person accessing the Skydrive, and I do it from two computers, both of which only I use. MS’s dialog told me another user had updated the document last night. That other user was me, on the same computer as I am using now.

I’m not going to complain bitterly about the problems this is causing me, suffice to say my day has suddenly become a whole lot worse than it was before I discovered this. But I do have to say this.

  • the dialog box, on telling me another user has updated the file, needs to tell me who that user is. I know in this case it was me, but in Skydrive’s case, that’s not always going to be true. With shared documents, it’s almost guaranteed not to me.
  • The dialog box, on telling me there’s an issue, needs at least to tell me which file is older. This really should be obvious to anyone.

I ran a completely unscientific straw poll this morning. On balance, more people expected the local copy to be more recent than the cloud copy with some comments about exceptions around documents stored in a browser. So I have to say, the assumption that the local file was the most recent was not particularly inane – it’s what most people expect.

I’m not sure what the problem is but the evidence I have right now is that it’s tied to something Microsoft have done between SkyDrive and Office. I only know this because the folder concerned included other no MS application based files which did get saved locally and did get synchronised correctly.

Right now, I’m faced with replicating a whole pile of work which is not ideal. It’s only three hours and it’s write up and it’s possible it will take me significantly less time to do it as I have most of the output, or can get it very easily as I have the scripts generating it (and some of that will have to be done unfortunately).

The take away message from this is:

  • most people expect local versions of files to be updated before cloud versions, particularly if they are editing in locally installed software
  • if you’re telling them that their files are out of sync because another user has updated, you must tell them who that user is and you must give them the time stamps of both versions

I find it hard to believe that this occurred to no one working on this in Microsoft.

I could live with the cloud version being the more recent version if I was told that it was. Instead, the utterly useless dialog box I got didn’t tell me this. I know the other user involved in this case was me, on the same computer, and I can’t see why MS’s dialog can’t communicate this.

SNCF Hackathon Transilien

SNCF, the French national rail company, ran an open data hackathon last weekend. I didn’t know about it in advance and anyway the schedule at the moment was such that there was no way I could have made it without a lot more notice but it struck me as interesting that they did it.

You can find the SNCF’s open data page here. Transport for London do something similar and I’ve seen some very interesting map projects come out of that. I’ve also plaintively wailed for similar access to Dublin Bus’s data.

There’s a lot of things different people can do with different data and they don’t always work for your company. I think it’s interesting to see the transport companies doing something, and some very interesting stuff has been done with available data in the aviation sector too (Flight Radar 24 is a key example).

However, I hadn’t come across a company actually running an open hackathon on their data so this weekend’s event in Paris – it focussed on transilien services which is commuter rail in the Ile de France area around Paris – was an interesting development. I’d keep an eye open for similar events and try to get there in the future if I can hang the small details together.


What do you love about programming?

Via a tweet from, I think, Kathy Sierra, in which she said this was the one interview question she had never been asked.

I started programming, a bit, when I was 13 and did it on and off until I was about 16. And then I stopped for 10 years. In 1999 I did an interview with a major Irish company which was looking for IT staff but who did not, for various reasons, have to have a degree in computer science. I got through that process and despite expecting to be put working on web technologies, I was sent for assembler training and then spent the next chunk of my life as an assembler programmer. Since then I have programmed a bit in Java, some in VB, some in R and now, occasionally in Python and again in Java.

Programming is an interesting activity. I love starting off with a problem to solve, and I love thinking about how I might solve the problem given the available tools. When you’re learning a language, this leads to various interesting algorithms as you code around a lack of knowledge. Sometimes it leads to massively inelegant solutions, other times it leads to things of pure beauty. I love programming purely for the problem resolution aspect of it, the fact that I can sit down with nothing but a piece of paper and a task to accomplish. For me, programming is more the side of working out how to accomplish something rather than purely executing it in code. There are, if you like, many ways to do that – the hard bit is the working out not necessarily the coding.

I don’t, in general, mind debugging my own code mainly because I generally understand what it is I was trying to accomplish. You learn a lot from the way you look at problems when you’re trying to identify where you went wrong in trying to solve them. In this respect, programming is always a learning process.

What I love about coding is typically it opens up the possible. What can we achieve tomorrow that we could not do today?

Comparative infographics

There is a massive growth in the production of infographics of varying quality and if you’re interested, there’s a tumblr full of dodgy ones here. Mostly, that focuses on the quality of the graphic design and whether it accurately portrays the underlying data. 

However, I want to consider one particular type of infographic and that is an infographic that purports to compare two entities. I find they can be problematic even if they are beautifully designed. The main underlying issue is data quality.

They can be done according to a lot of useful rules such as citing the source of the data you are using for comparison – but if they miss a key component of a comparative infographic, then no matter how beautiful they are, they are still of questionable merit. Each comparison must be a like with like comparison.

So, for example, a graphic seeking to compare social media penetration doesn’t get it right if it’s loading Facebook page loads with Flickr image uploads. That’s a beyond unfair and misleading comparison. My favourite one lately has been a comparison of London and Paris in which the cost of an average dinner out was compared with the cost of dinner out in one of Paris’s more exclusive restaurants and the greater London area was not compared with the greater Paris area, the cost of a family ticket in Disneyland was compared with the annual number of visitors to Harry Potter World, prices cited were in two different currencies making the comparison almost meaningless.

Ultimately, I have to ask how highly we can praise an infographic for being graphically beautiful but not informative because the underlying data is not useful for comparison’s sake. Ultimately, I would say the value in an infographic is linked to how informative the underlying data is and where comparative graphics are concerned, whether suitable comparative datapoints have been used.

Ben Schneiderman’s 8 Golden Rules of Datascience

This popped up in my twitter feed today – it’s a photograph of a slide from a talk given by Ben Schneiderman. I’m not sure I’d call them golden rules per se, but they are definitely a very decent framework to follow:


  • Choose actionable problems and appropriate theories
  • Consult domain experts and generalists


  • Examine data in isolation and contextually
  • Keep cleaning and add related data
  • Apply visualization and statistics: patterns, clusters, gaps, outliers, missing and uncertain data


  • Evaluate your efficacy, refine your theory
  • Take responsibility, own your failures
  • World is complex, proceed with humility.

Professor Schneiderman’s home page is here. The link to the tweet I picked all this up from is here via Kirk Borne and Seth Grimes