Numbers have power

This morning, the front page of the Irish Examiner, which you can see here on Broadsheet (third one down) caught my attention for this headline:

46% back death penalty for child rape

The subheading is “Farmers take hardline on law and order”

The very first line of the the piece underneath is as follows:

The death penalty should be introduced for the crime of raping a child, according to a national opinion poll.

There are several problems with this in my view. I like the Examiner a lot, and the journalist under whose byline this appears, Conall O Fatharta has done quite a lot of interesting reporting in the last few months. But when you’re claiming that a national poll says that the death penalty should be introduced for the crime of raping a child (or, in fact, any crime), then two things are necessary:

  1. the proportion of people (nationally) who support that assertion should be greater than 50% (it’s not in this case, because already, the headline makes it clear that a majority do not); and
  2. the poll should be on the basis of a reasonable sample of the population at large. If you read the piece more closely, however, the poll was limited to farmers.

As such, the headline which the Examiner plonked on top of the story lacks some important detail, and given its position on the front page, that is deeply regrettable.

What do we know about the poll?

The poll was carried out on behalf of (or by) the Irish Examiner and the Irish Creamery Milk Suppliers Association. We do not know (from this report at least) how many farmers were surveyed, and this is important because the journalist has broken down the figures geographically.

Why does this matter?

It matters on several levels, but chief amongst them is that we cannot be certain that a subset of farmers of the ICMSA is representative of the nation as a whole, or, in fact, possibly not even of farmers in general. For example, a simple question we could ask is was it done on the basis of ICMSA membership and in that case, given the that the ICMSA respresents predominantly dairy farmers, is it safe to say the output from a survey of dairy farmers applies to all other farmers.

An additional factor is that the CSO carries out a census of agriculture from time to time and there are a couple of pieces of information which are worth noting. The most recent report which I can find on the CSO’s website is for 2010, so, four years old. The press release is here, on the CSO’s website and it summarises the findings nicely. The full report is here.

There are a couple of key pieces of information in the summary which matter here:

  • More than half of all farm holders were aged 55 years or more. The number of farmers aged under 35 fell by 53% since 2000
  • One in eight (12.4%) farms is owned by a female.

Additionally, it is possible that the vast majority of farmers are rural dwellers but a greater proportion of the population are now urban dwellers. I have not found straight figures for that.

These figures are not representative of the population as a whole. If you look at the CSO census figures, only a third of the population of the country is over the age of 45 which means the proportion of the population which is over the age of 55 is less again.

Additionally, in 2011, at the time of the last census, more than 50% of the population were female. You can find the CSO’s population statistics by age from 2011 here.

Both the headline and the first line of the story give the impression initially that the results are nationally representative but as the survey was of farmers, the participants are age and sex skewed away from the shape of the population as a whole.

So, the subheading mentioned farmers; what actually is the problem here?

Three things: we get our news from various sources which means that pieces of information might get cut, such as on a twitter feed which may not necessarily highlight that this is a Farming Spotlight piece. Not everyone might click through beyond the headline. This is particularly important as links get passed around. This by the way was the Examiner’s own tweet of its front page and inline, you will only see the top half.

Secondly, for me, a story which effectively boils down to “a sample of the population skewed by age, gender and urban/rural divide have this opinion which may or may not be representative of the population as a whole” really doesn’t belong as the top front headline on a national newspaper. In short, while it is, in passing, interesting, it isn’t really a major story.

Thirdly: the Irish Examiner has not provided any useful information (that I can find) in terms of the number of respondents, how the survey was carried out and what the estimated margin of error was. If you check any political poll reporting, the number of people surveyed along with the margin of error is always provided, along with an indication of when the poll was carried out. This is news at the moment, possibly because of the National Ploughing Championships but again, the statistical basis for the survey is missing.

 

Be all and end alls: Natural Language Processing?

I have some doubts about the effectiveness of anything which depends heavily on natural language processing at the moment – I think there’s a lot to interest in the field but I don’t really think it has reached a point of dependability. One of the highest profile – I hesitate to use the word experiment – pieces of work this year, for example, included this comment:

Posts were determined to be positive or negative if they contained at least one positive or negative word, as defined by Linguistic Inquiry and Word Count software (LIWC2007) (9) word counting system, which correlates with self-reported and physiological measures of well-being, and has been used in prior research on emotional expression (7, 8, 10)

(Experimental evidence of massive-scale emotional contagion through social networks, otherwise known as the Facebook emotion study)

Anyway, the reason I am writing about this again today was that this piece from Forbes turned up in my twitter feed and the line which caught my eye was this:

Terms like “overhead bin” and “hate” in the same tweet, for example, might drive down an airline’s raking in the luggage category while “wifi” and “love” might drive up the entertainment category.

Basically, the piece is a bit of a puff piece for a company called Luminoso, and it has as its source this piece from Re/Code. Both pieces are talking about some work Luminoso did to rate airlines according to the sentiment they evoke on twitter.

If you look at the quote from the Facebook study above, the first thing that should step out immediately to you is that under their stated criteria, it is clearly possible for a piece of text to be both positive and negative at the same time. All it has to do is feature one word from each of the positive and negative word lists. Without seeing their data, it is hard to make a call on how much or, whether they checked how frequently, that happened, whether they controlled for it, or whether they excluded. The Forbes quote above likewise is worryingly simplistic in terms of understanding what needs to be done.

This is Luminoso’s description of their methodology. It doesn’t give away very much but given that they claim abilities in a number of languages, I really would not mind seeing more about how they are doing this.

Some comments on Apple’s latest PR

I don’t make a habit of blogging about gadgetry per se but there are a couple of comments I want to make about Apple’s latest lot of shenanigans.

Apple has cancelled the iPod Classic.

In all the screaming and howling about the watch and the iPhone 6 and its variants, and the payments ideas and all that, this is not getting anywhere near enough traction and discussion. I do not expect them to reverse this decisions because notoriously, electronics companies do not actually listen to me.

My first Apple product was an iPod nano which proved to be inadequate on the storage front so was replaced with a Classic forthwith. It probably will not make Apple happy to know that I have had that same device for the last six years, particularly as I am on my third iPhone in the same period.

I like the Classic. It has enough storage. It does exactly what I need it to which is play music. And it does not need to be connected to the internet. The alternatives, the iPod Touch and the iPhone, top out at 64gig. Sure, you can access stuff through the cloud and that’s fine if you’re at home with Wifi, pay next to nothing for data and are not roaming. Maybe in the US this is actually a sane way of doing things and of course, the EU is working on getting rid of data roaming charges anyway but…frankly, there’s a stretch of the rail line between Dublin and Cork, around Tipperary, where the mobile signal is fairly limited. I listen to a lot of music and storage matters to me. But I also want to be able to carry that music around with me in my handbag and that means the 128gig iPad isn’t really a replacement option either.

So I am deeply, deeply unhappy with Apple over this move, and unhappy enough with Apple to look at my contact points with Apple (currently an iPhone, an iPad, the aforementioned iPod Classic and iTunes via a Windows machine) and see about replacing them with non-Apple equivalents. It will take a while, but there are likely to be some benefits, key amongst them, Apple will not be able to deliver music to me which I do not want.

I am not really a fan of free stuff that I don’t want and the latest U2 album is on the list of free stuff which I don’t want and which should not have appeared in my library without me asking for it.

The biggest selling U2 album is the Joshua Tree and it, apparently, has sold twenty five million copies. In no version of this universe is it likely that half a billion iTunes users wanted their new one and yet Apple gave it to us and yes, it’s sitting in my library.

You can look at all the technology stuff that Apple does, and then look at this promotional gimmick and wonder why they did it. Why did Apple feel the need to do this?

I really have no idea. You would have to assume that companies do stuff like this to support the bottom line but ultimately, U2’s last album, released in 2009, sold five million copies. Compared to the Joshua Tree, that is not stellar. Compared to half a billion people who suddenly find themselves with the new album…which they probably did not want…it’s pretty pathetic. On U2’s part, it screams of a need to be loved.

On Apple’s part, it screams of a company which finds itself having to do the sort of PR it has not traditionally needed because the cachet of its own brand was enough and which is demonstrating that it just does not know how to do it. U2 are not cutting edge. They’ve been around for 30 years. Classic rockers. Seriously, if Apple wanted someone which was on what I assume was their brand message, I’d have chosen Daft Punk. Of course, if Apple think that U2 is on their brand message, then I’m inclined to wonder what their future holds.