Passenger Air transport in Europe, 2004 to 2011

One of the things I wanted this year was a little experimentation with Tableau so I had been looking around for some data to play with. The above data comes to you courtesy of Eurostat and it relates to passenger transport by air in Europe. I’ve covered the period 2004-2011 and the countries concerned because they provided complete data for the periods concerned. There are a couple of other countries with incomplete data in the Eurostat tables as well which you can find here.

I did a little bit of work with the data because I wanted to identify two underlying stories. The first one – which you can see in this display – takes the absolute passenger figures and divides them by the population of the countries concerned so that we get a measure for passenger flights per head of population. This is interesting because it can highlight a couple of things – necessity (see Ireland and Iceland for example which are relatively small countries with no other connection options to other countries), economic strength (see Switzerland) and, rather more difficult to measure, the importance of travel.

The other story – which parts of this dashboard hint at – is how economic performance impacts on air transport. For this, I will look to get GDP figures into the underlying data and graph them against passengers per head of population. Already, however, if you look at the time lines for Ireland and Iceland, there is a hint that there can be a major impact in this respect.

This is the first project I have undertaken with Tableau and I am using Tableau Public. It has been a sharp learning experience. One of the things which has struck me is that software can be erratic in how it handles dates. The underlying tables for this project are in Excel, and Excel does not handle years as dates. Tableau attempted to interpret the years as days since sometime in 1899. Fixing that is messy and potentially a logistical night mare in the future. When I went to look at date formatting, I was stunned to see Excel didn’t allow me to format a year as date. This is infuriating.

However, I got something out of this process which is a lot of information on how to get data working in Tableau for me.

 

My way or the high way.

Via HBR

Alexandra Samuel wrote a piece on notetaking. It was quite breathtaking. It opened as follows:

I knew right away, when you walked in here with a paper notebook — a paper notebook! — I realized that this meeting was not going to be a good use of our time.

It caused what might best be described as a shitstorm, and seems to have topped out at 304 comments, the overwhelming majority of which were not in favour of the piece. She then wrote a piece about the reaction here where again, the majority of comments gently pointed out that a look in the mirror to remove the plank from her eye might be in order before criticising much of what was said to her.

I have a lot of problems with the piece, the key one being, anyone who meets other people with that sort of attitude; the attitude that her way with her digital gadgets and toys was better than anyone else’s way of organising aide memoires. The simple truth is, it probably isn’t. Certainly, it’s not straight forward that all the digital productive tools in the world make you more productive. What – in my experience – tends to make you more productive is not feeling you have to justify every single little way of doing things.

I own a laptop, a tablet and a smartphone. The laptop runs Windows, the rest is iOS. I also have a lot of notebooks not because they make me less productive but because they cause me to be more productive. Basic day to day list? It’s faster to write it down and tick it off. Mindmapping? Quicker to pull out software. Reflecting on life in general? I’ve kept a journal since I was 19 years old. There’s something inherently more valuable about it than files which go missing and get corrupt.

There is a common view that previous generations before us will have left more of a footprint via their books and their monuments than we will. We may generally produce a huge amount more information than the generations before us, but we do not do very much to retain it. Already, data saved in the 1950s and 1960s is getting harder and impossible to retrieve because we just don’t have the technology. Our technology cycles are changing and information is dying, in some respects, faster than it did 100 years ago.

Important things to me, my life, and my feelings, go in notebooks.

What worried me most about the whole piece was not so much the massively condescending piece as it was published originally although I really do have to say that it came across as childish and condescending, but the overwhelming lack of understanding why she might not be right. This came across in her replies to comments across the piece. For example, she really doesn’t get that a lot of companies for legal and regulatory reasons just are not allowed to use services like Evernote. It’s not a question of a manager being an old fogey that she can write to and point out the errors of their ways so that a bunch of people wind up with laptops and iPads.

As it happens, I don’t think that laptops and iPads enhance listening. My experience is that people who are typing are not processing information at all. I’m a very fast typist – I typically averaged 120wpm in English in my admin days. Alone of all my colleagues, I could type from live dictation. This means that as fast as you spoke, I typed. And as a special trick, I could type in English what you said to me in French.

For a good typist, the iPad keyboard is basically unworkable. Typing things puts a constraint on how you describe unstructured data. Most meetings consist of unstructured data; they consist of brainstorming, problem solving.

Being honest, were I to walk into a meeting with someone like Alexandra, weighed down by her laptop and her iPad, I’d wonder if she really had any interest in the meeting at all. Oh it’s not because I think she’ll be checking her email or her twitter or her Facebook while I’m describing whatever problem we are here to resolve. It’s because I know that people who are typing are not absorbing. This is why, perhaps, Alexandra needs the crutch of search and retrieval of her digital tools. People who remember more get more done.

I think Alexandra, in stating that you don’t have to remember things because it’s all in Evernote, has missed that minor detail.

I should note she has a book on Evernote as a tool available at the moment.

 

Your objective to inform, and not look pretty but useless.

Via Stats Chat in the last week or two.

If you’re not willing to click through, Stats Chat have posted a donut graphic which some New Zealand paper have printed to display some data. Really, you should have a look and then decide whether the graphic actually accurately depicts the data that the Australian paper’s figures appear to be giving.

One of the worst features – in my humble experience – of enhanced graphics capabilities of different software packages (I’m looking at you, Excel, you know I love you but…) is that people will insist on using them. Inappropriately, confusingly and just plain badly. It’s quite worrying in some respects.

So what would be an elite technology company then?

About a week ago, I had a discussion on twitter about this article.

Facebook is not an elite company

(from the San Francisco Chronicle)

The list is a short one. Usually, it includes Google, Amazon, Apple, Facebook and (debatably) Microsoft.

This is the interesting quote.

Different things are essential to different people. So I’d argue that in the grand scheme of things, I’d be severely discommoded without Microsoft and Google but life without Amazon, Apple, and Facebook, provided at least one bookshop was still open, would probably be   well more than survivable.

For me, when we have conversations like this, I don’t like to see sentences like this, however:

For the sake of this scenario, we’re not talking about behind-the-scene all-stars like Nvidia, IBM and Intel, but the companies that people interact with every day.

The simple truth is people interact with IBM, Nvidia and Intel every day of their lives, but the crucial difference is they often don’t know it. In my view, if you took IBM away, you really wouldn’t have much left. You’d potentially have a banking system and aviation system in serious crisis. Pretending they are excluded just because people don’t load stuff up in a browser is missing the point if we’re trying to identify the elite companies; the ones we cannot do without. To some extent, there are replacements for every single product and producer on the list the article was willing to look at, but it’s not anywhere near so straightforward for the second list, the list we don’t want to talk about. The justification for including Amazon has nothing to do with its retail arm and everything to do with the fact that a lot of other sites are hosted on their AWS, for example, something which an awful lot of people don’t know. This puts them in the same box as the IBM systems underpinning the banks and many of the airlines. It’s what you don’t deal with on a day to day basis which is most critical. And that’s what makes the elite companies elite.

An open letter to Twitter

Hi,

Thanks for the promoted tweet from eToro. I seem to see them regularly.

I understand that you have a business. From my point of view, promoted tweets are little more than ads, or marketing junk. I’d like to be able to switch off promoted tweets from eToro. I’m just not interested.

I get the need to monetise your product. Google manages to ship me reasonably relevant advertising in my Gmail. YOu get a lot more information out me so….why do I get ads for Apple Stock?

I read a piece Hilary Mason wrote the other day about interview questions for data science questions. She said she’d ask what, based on your knowledge of bit.ly’s data, you would do that they are not doing.

Well I don’t know for bit.ly to be honest. I don’t use the service quite enough to comment. However, where Twitter is concerned, I’d do a better job on contextualising the inline advertising. Take me. It’s clear from the accounts I follow, the links I follow, the posts I make, even my description that I have certain specialised interests….photography. Surf. Kitesurf. Computer related stuff. Travel.

Nowhere in my account is any evidence that I am interested in eToro’s services. But I wouldn’t object to more relevant tasting promoted tweets, so how about it? Are you working in that area at all?

 

yours,

 

Treasa

 

 

Why do you develop…

Sometime ago, I had a conversation with a developer on the subject of rectifying a re-occurring issue. There was a straightforward fix a developer could do to fix each occurrence of that issue but the developer, who had also explained several times how to avoid the issue to one or two of the several users wanted to punish the users and stop fixing the problems for them to compel them to make efforts to avoid the problem by following procedure. This might work if you’ve one or two users but more than that, I think it’s unrealistic. Much better to allow for the software to protect against errors particularly if it’s a known and re-occurring issue.

I’ve often replayed that conversation in my mind and realised that I don’t really like it as an idea. While no part of the world is perfect, and there are often underlying considerations, rather than telling users how to avoid problems procedurally, we should enable them not to cause the problem in the first place by either a) preventing it from happening at a coding level or b) automatically fixing it in some way. Failing that, providing them with a tool to fix the issue themselves.

I don’t think we should ever be in a zone whereby it’s considered acceptable to punish users via the software we’ve designed for them. We should be in a zone whereby we develop to protect them against themselves to some extent. Ultimately, a developer’s role is to help a user to accomplish some task. That includes making it easy for them to accomplish that task while making it hard for them to break accomplishing that task. Punishing them because your software design fails on the second part of that role is perhaps a little unfair.

 

Big Data. Many things to many different people

Late last night, I picked up a tweet from Hilary Mason, chief scientist with Bit.ly

I’m troubled by the increasing interpretation of “big data” to mean “data without the scientific method”. When did that happen?

This is an interesting question, made all the more difficult by the growing impression I have that the definition of Big Data is a very dynamic concept. What is big data?

The truth is, I think a lot of people aren’t sure. Hilary herself provides an interesting definition:

I prefer the big = “too big to analyze on one computer” definition

I have some mixed feelings. I don’t like the phrase big data; I never have because it comes over far too much like a marketing buzzword and less like some underlying concept. For various reasons, people have had cause to ask me what I understand by “big data” lately which indicates to me that it’s something that has come at people without them recognising what lies beneath it.

For me, when I have to describe what I see it as, I say this. I say “We generate a lot of data. From different activities within a given organisation. We allow some people to analyse certain specific areas of it because historically, we didn’t have so much data, and it was all subject specific. But things are different now. Our activities generate a lot more data and that data is very much interdependent sometimes. You may be a subject area specialist in one particular area of an organisation and you may only care about that particular area. But your organisation is much bigger than your area and it could be – often is the case for example – that marrying the data from your area with the data from other areas can have a huge impact on the way your organisation does business”. In other words, we have a lot of data now, some of it more voluminous than others, but the vast majority of organisations do not use their data to join the dots coherently.

When I read articles about big data, and data science, I too am troubled. I’m troubled by the impression I have that big data is somewhere we should be at without understanding why we should be at it and what we can get from it. There is a degree of unclarity about what a data scientist actually does. Business is not generally good with a lack of clarity. Matters are not helped when the media helpfully supply articles about how datascience is the next big thing or that being a datascientist is the sexiest new job going.

It probably is but this only serves to attract people who weren’t really interested in the first place.

I’m interested in the interpretation of numbers, what they mean, how we got to those numbers, where we can go with them. How they inform us. I equally got to where I am right now by recognising that a lot of people were very interested in drawing pretty pictures displaying numbers but not so interested in the validity of the numbers. I have seen bar charts comparing social media site usage which compared the number of Facebook page loads with the number of photo uploads on Flickr. You don’t need me to tell you this is not a valid comparison given that Flickr gets page loads and Facebook has photo uploads as well.

My big huge concern with big data is that people look at the big bit but not the data bit.

I can’t write this in 140 characters, by the way; that’s why we are here and not trashing it out into the middle of the night on twitter. Also, Hilary had a good go at it last night.

If you were an executive standing in front of me, I would ask if you ever measured your website response times against your website demands linked to – for example – your sending out a marketing email, or whether there were unexpected regional variations, or whether there are trends in the google search terms bringing people to your website that indicate a questionable link somewhere or a business opportunity lost. You can call this big data if it makes you feel better, or data science. I tend to prefer data science because I suspect it is going to be around a lot longer than big data.

What matters to me is that you get the best possible information out of your data, regardless of how much or how little you have. One of the things that concerns me slightly about discussions on data science and big data is the lack of attention to basic skills in analytics. It is not just a case of running a SQL query and picking out the highest band of the ensuing bar chart. When we look at the skill sets required for data science, there tends to be a focus on computer programming (which is good, don’t get me wrong), but less importance attached to basic statistics.

When I got interested in this about 18 months ago, I had an understanding of what I wanted to do, and went back to college to get the maths and stats skills that are handy here. I’ve been programming for 12 years so I don’t worry too much about the necessary programming skills. What worries me a little is that we will wind up with a lot of people calling themselves data scientists on the basis of a few Python scripts and not a lot of understanding of the actual data.

When people focus on concerns about big data, they talk about the skills squeeze (see this New York Times piece, for example, and Jason Ward of EMC in Ireland on a similar subject) and not the actual underlying business of data science. A key issue is that datascience is not just about dragging out the data into a spread sheet, pressing a few buttons and going “hey presto”. Harvard Business Review had a useful piece on this which I recommend looking at.

Hilary Mason has an interesting piece on Getting Started With Data Science. Another place to start is probably to take a step back and try and describe what you’d want to do with a lot of data. What matters with data, no matter what scale it’s on, is how you interpret it. If you really want an example of why this is important, although it’s not data on a massive scale, Nate Silver’s work on the last US election is a good place to start, particularly given that he and a number of other analysts disagreed on the data interpretation. You need to recognise that data is to inform, not to be bent to your needs and that sometimes, that data will not tell you what you want to hear.

I’d agree with Hilary in that communication is a key skill which very often gets forgotten about, but this issue is not limited to data analytics.

So, this brings us back to what big data is, or is not, and whether it really means what we think it means. I really do think it means so many different things to so many people that as a label, it’s functionally useless. Hence, I’d prefer data science as a label, or data analytics. In this way, you can highlight that yes, methodology matters, and statistical skills will matter.

In essence, I would answer Hilary’s original question as follows: Big Data lost its underlying rigour when it become a product to be sold rather than a job to be done.

How obsessed is Ireland about property?

Brian Lucey flagged this on his twitter feed this morning.

If you don’t want to click through, yesterday he posted the same post twice to his blog; the sole difference being that the two pieces had different titles, one property related, one more general. I’d almost say celebrity mag styled actually but I could be being unfair – the dentist rarely has the end of year edition of Hello or VIP given I have my annual check up in October.

Anyway. The money quote is this:

By a margin of almost 5-1 the property titled post got more hits

The title of Brian’s piece asked “Just how obsessed is Ireland about property?”  but aside from the quote above, he doesn’t actually draw a conclusion – I imagine he leaves it as an exercise to the reader but by implication, he seems to be suggesting that Ireland is obsessed about property by a margin of 5 to 1 over more generic subject blog posts.

I’m going to assume that Brian Lucey has his tongue stuck firmly in his cheek with this but I’m going to do a little spelling out here. You cannot draw any conclusions from the outcome of this experiment based on the information given by Brian in the relevant post.

Here’s why.

  1. We do not know what the sample size was. It is possible (unlikely but even so) that Brian got six hits on his blog total yesterday. The population of Ireland is circa 4.5 million, so it’s dangerous to do any extrapolating the view of the population at large without knowing how large the sample was.
  2. We do not know what the source of the hits were: 1) links from other websites 2) links from Brian’s own Twitter account 3) links from Facebook, Google Plus or any of the main discussion forums, or from his rss feed. This is troublesome because it means we cannot cater for possible bias. If, for example, the bulk of Brian’s hits came from his Twitter followers, it is not safe to assume that this is a random selection of Irish people as 1) people who follow Brian’s twitter feed are more likely to be interested in economic matters and potentially property matters as he speaks about property on the media quite often and a lot of his pieces for the Examiner are property based 2) and there’s a slight bias in social media users against the older population.

    In my view, people who follow Brian Lucey’s writings either on twitter or through the Irish Examiner are more likely to be predisposed to have an interest in Irish property than the population at large. Put simply, it is getting harder to get a random sample of the Irish population easily.  The same goes for people who read NamaWinelake by the way – it is a special interest site which draws people on account of that special interest. To get a random sample, it would be almost better to post the two links on a forum dedicated to – say – GAA supporters – as that would remove the confounding variable of an already existing interest.

 

Here’s a useful primer on why this matters. One of the more famous wrong headlines in history is the Chicago Tribune’s headline announcing Dewey’s victory in the 1948 US Presidential election, an underlying support of which were telephone polls. In 1948, access to a telephone was not uniform across the population, and favoured the more well off than the general population. As a result, if you do not have a valid sample, then your conclusions cannot be guaranteed to be valid. In fact, it’s getting harder to this in Ireland – someone I know noted once that 30 years ago if you took a random sample of mass goers in Ireland you were probably pretty close to a reasonable random sample of the wider population. But because the population of mass goers has changed vis-a-vis the wider population, this was no longer the case.

All I can conclude from Brian’s piece is this. Given a choice between two posts yesterday, five sixths of those reading his blog chose the one most likely to be about property. Given the lack of information about the population reading his blog and the population at large, the size of the sample size and the existing possibility of bias amongst people who read his blog, you cannot draw any conclusions about the wider population of Ireland.

 

I’m pretty sure Brian knows this by the way, but one of the things which tends to concern me about Ireland is the lack of attention to detail regarding figures, numbers and statistics and how they are interpreted. Statistics can be twisted because the vast majority of people are not aware of their limitations in this area.

Useful data sources

This is a temporary directory of possible sources of data for datavizualisation and data analysis projects.

Mainly it’s here (at the moment) because I have identified another home for it. It will probably move to a page of its own later, and maybe out to the projects site. Blink and you’ll miss it.

Eurostat

open.gov

Statcentral Ireland

Pew Internet

Amazon AWS Public Datasets

R datasets list compiled by Vincent Arelbundock

Shish list of open data sets.

Datamarket

Kaggle.

 

 

 

 

Do you waste other people’s time?

When I went into my first commercial job at the age of 22, the company I was working for had also hired a new marketing executive. It wasn’t a big company. It had somewhat informal processes. And the first thing the new marketing executive commented on was that every single meeting started late.

Very few people have any slack in their schedules and the vast majority of people cannot avoid meetings either and meetings culture tends to have a huge knock on effect on how productive people are. Sometimes, I think people need to take a step back and ask whether their meetings etiquette has an impact on other people’s productivity. If I am trying to plan around a 30 minute meeting, does it matter if some one shows up late to that meeting?

Well yes. Very often, the meeting might not start until they arrive, if they are critical. And it may run over time as a result. If you are the person arriving late, you are wasting the time of the people waiting for you, and if as a result, your meeting runs overtime, it may have a serious knock on effect on your own schedule as you turn up later and later for meetings which run late. And this has a knock on effect on everyone else.

Think about it. You turn up late to a meeting. You waste collectively an hour of six people’s time. You have a knock on impact on the schedule of 6 other people who, if you’re lucky, aren’t actually trying to get to another meeting, but who may wind up back at their desk later than planned which may mess up some of their time planning for the day which will have a net negative impact on their productivity. It may cut the amount of time they have free to complete some task before another meeting in their schedule, or the amount of time free to do something you want from them. Meanwhile, you wander off to another meeting and do something similar to another 6 people. You personally could be responsible – by showing up late for your meetings – huge amounts of lost working time and thus lost productivity for your employer. While still being amazingly busy.

Your schedule is not yours alone. Because of the lack of slack in most modern companies, trying to do more with fewer people, your schedule is shared. If you mess up your schedule, you’re probably messing up the schedule of a lot of people around you as well.

Don’t be surprised if this has a knock on impact on their productivity.