January 2014 – Musings on Languages, IT and other stuff

AIRO – Two Tier property market

It’s not dated so I am not absolutely certain when AIRO posted this to their site. It’s a graph of the changes in two sections of the Irish property market since 2005, Dublin, and National ex-Dublin.

It’s very interesting for a couple of reasons. It demonstrates that both the increase and decrease in market prices in Dublin was sharper than it was in National ex-Dublin. This doesn’t totally surprise me – anecdotally there has always been some evidence to suggest the prices are were behaving at more extreme levels in Dublin. It’s interesting to see how the graphlines cross (do click through – it’s worth it).

The data is from the CSO and as far as I am aware, CSO data is limited to the mortgage market. This is interesting because there is some evidence to suggest that a lot of the market in Dublin, in particular, in recent months, has been cash driving. Without having the CSO data in detail, and a cleaned up extract from the Property Price register, it would be hard to say for certain what the split was.

The other chief regret I have about this data is that it only goes back as far as 2005. I’m mindful of sounding like an auld one but there is some evidence to suggest that the period from about 1997 might be educational as well. I guess a lot depends on what data you have available to you.

Anyway, this was done in Tableau and there is some scope for playing around in it. I am glad AIRO did it – it’s a useful exercise, and perhaps, there might be some scope for doing a county by county comparison. We have a lot more data now on the property market than we did even 3 years ago (yes, I have some programming under way for it myself) so information should be easier to come by, particularly if and as we get postcodes, the data will be cleaner up front.

Book Review: The Signal and the Noise by Nate Silver

Over the semester break I spent some time ploughing through books which were on my to read list. One of them was The Signal and the Noise by Nate Silver.

I kind of like Nate Silver’s writing, and I especially like his analysis but I had started the book, gotten half way through, got distracted and only picked it up again in January. So the review is more or less “I seem to remember this was fascinating” and “the content of this book should be fascinating but I’m not really sure I like it any more.

I like numbers. I like playing with them. I like manipulating them. I’m not very good at them; I don’t have many regrets in life but a maths and languages course up front might have been a better choice when I was 17 rather than pure maths.

I like that there is an increasing recognition that there is meaning in numbers and that the meaning needs to be interpreted. In many respects, that’s not that different to languages anyway. There is meaning in words; it has to be extracted; interpreted.

So to Nate Silver. Yes, he got the polls right in the last few US elections, and yes, he’s doing the start up thing with Five Thirty Eight now.

The focus of the book, to some extent, was the art of prediction, and his dependency on Bayes. It featured some case studies – baseball and gambling are included (although I really do suggest that you have a look at MoneyBall if you’ve any interest in the application of statistical inference and prediction to the baseball numbers as it’s a better read on that front). There was a section meteorology which was fascinating. A key point which he raises is perception and what people want from a weather forecast. Is it a weather forecast, or some entertainment?

One of the stories in it which fascinated me related to Deep Blue and the chess match with Garry Kasparov. What particularly interested me there was the idea that the computer behaved in a specific way, based on a bug. But the way it behaved rattled Kasparov and caused some investigation as to what the long term outcome of that move could be.

I’m interested in machine learning so this is something which would catch my attention in a lateral way. We train computers to make decisions; sometimes it is not clear whether a given decision is based on a bug or some aspect of the training.

However, a couple of things annoyed me about the book. The Kindle edition has a frustrating number of typos. I can understand this in a scanned book I just think it’s a bit unforgivable now. And there are a lot of elements of the book where Nate Silver assumes he is writing for a uniquely US based audience. I don’t think this was ever going to be a safe assumption for him.

A couple of sections of the book fascinated me in a way that led me back to subject specific books of which one is earthquake prediction – we just aren’t good at it at all at the moment. As it is, I have a more than passing interest in earthquakes, volcanoes and rogue waves so which I finished this, you can make an approximate guess what other books were on my reading list.

I’m inclined to say that The Signal and the Noise is a fascinating book and well worth reading. But it’s difficult to grade in terms of is this a five star read, is it four or is it just average. I’m inclined to classify it as a book you should read, but be aware that it’s not a perfect reading book; there are elements of it which might annoy you. And you could skip it if you were so inclined. It is the sort of book that should help your Trivial Pursuit score and will open your mind. Oh and you’ll probably be left with the impression that Nate Silver is brighter than you are which isn’t always the most edifiying either.

So yeah, this is coming to you from the Raspberry Pi

About the most exciting thing I have to report at this point is that the wireless is now working on the Raspbian install which is an improvement over the last three times I’ve plugged in that particular SD card.

This is important because it means that I can finally start working in comfort at my desk rather than curled up on the living room floor.

I ran into two main issues:

the wireless would not work in Raspbian
two of the three keyboards I have at my disposal did not want to work effectively – I wound up with repeating letters which made getting a password entered impossible. The third keyboard is working and to facilitate that and the new monitor, major desk reorg required.

So okay, I’ve got a browser working on it; I can fire up Wolfram and Mathematica, Python is installed, what next?

Well.

I am very shortly going to go and get one of the Raspberry Pi books and look into building a can’t fail media centre and I will write instructions about that here when it’s done and running.

I also want to try and build a weather station. And a robot. And I want to build snake on it as well but I think I may have code for that.

My reading list, for anyone who is interested includes:

Raspberry Pi for Kids
Raspberry Pi User Guide
Raspberry Pi in Easy Steps and
Linux User Issue 134

There are also numerous websites. Raspberry Pi’s own website and Wolfram’s site, for example. I anticipate hours of endless fun.

Wedding Magazines and other thoughts.

Here’s some random information which might be worth looking at in some more detail.

On Saturday, I counted – sad person that I am – the number of wedding magazines on sale in Easons in Heuston Station. I did this basically because Irish Rail hadn’t told me what platform my train was going from, I didn’t feel like getting some food, and I was hanging around. There was a large display of them just inside the door. So easy to count and so attractive to do so when there seemed to be rather a lot of them.

So I can tell you the answer that I came up with was 13. I suppose if I had been really good I might have taken a photograph of the display. I can tell you that there were two subspecialisation, mainly one on wedding flowers and one on wedding cakes. The rest were things like Bride, or Bridal Magazine. There was a surfeit of white. It was a bit overwhelming.

When I posted this to twitter, a couple of things happens. Someone knew there was a bridal show on at the RDS – news to me – and then this.

Damien Mulley told me there were approximately 21,000 weddings in the country each year.

Paul Savage told me that according to Facebook, 78,000 people were engaged.

Damien Mulley came back and noted that according to Facebook, 42,000 of those were female, aged 20 or older.

You can have a look at the conversation here.

The average circulation of the general Irish fashion mags like Image and Irish Tatler is around 25,000. I’m having serious problems getting any wider circulation figures and this distresses me – the JNRS is coming back at me with newspaper and newspaper related circulation figures. But no magazines.

I can pick up some of the advertising rate cards for the Ireland based magazines and I can tell you that for one of them, the bulk of their readership is in the 25-34 age bracket.

But actual circulation figures, the magazines in Ireland appear to be very coy about.

In one respect, it might be an interesting exercise to:

figure out what the picture of bridal magazines in Ireland has been for the last 15 years or so. Have we always sold 13 different magazines? What is the market entry and exit rate for them
Figure out how many of them are selling every month. The cover price rate is somewhere in the region of around 5E.
Figure out some way of comparing their advertising rate cards which are not uniform across the different charges.
Figure out how they compare to the other women’s interest segment magazines.

Why am I interested in this? Well deep down I am wondering whether Ireland can sustain that many bridal magazines when it’s already having trouble sustaining its broadsheet newspapers. I’m also interested in seeing whether weddingsonline.ie has had an impact on the market in any indirect or direct way.

And of course, part of me is wondering about market segmentation in the glossy magazine market. Ireland has a population of around 4.5 million. It’s not, by any stretch of the imagination a huge market. This is not just limited to the whole bridal magazine thing – we also produce a couple of other specialist interest magazines, the sales of which are also augmented by imports from the UK and in some cases, the US.

Finally – the comments from Paul and Damien when I discussed this on twitter the other day were interesting because it shows that some ballpark information regarding the possible target cohort of this particular market segment could be obtained from other, social, sources.

So basically, if any one has any idea how I might get granular circulation data to play with for all magazines on sale in the Irish market at the moment, I might be interested in setting some time to have a flute around it.

Care.data

If you’re in the UK at all, you may have heard of some discussion around something called care,data. The general idea about it is that all healthcare data is centralised and that this repository of data would be made available to researchers. Such a repository of data would be massively useful for healthcare researchers.

So far so good. As someone with a great deal of interest in data, and how it can be best used to advance human society, you’d think I’d be wild about this idea. I’m not wild about the implementation and this is a pity.

The data, we are told, will be pseudoanonymised. This is the number one problem I have with it – it’s not actually properly anonymised. It comes with postcode data and NHS number. In the UK, postcode data can in a lot of cases be personally identifiable. This is wrong.

This is before you start asking questions about who gets to use the data. Plus, given the changes to the NHS organisation in the UK courtesy of the current government, you’d have to ask whether the data is even going to be as useful as it might have been 10 years ago under a centralised system.

So okay, I can knock it and be concerned. But I do believe something akin to it would be useful. Not necessarily directly profitable, but useful. So how could we implement it?

Well, there’s no reason why we can’t, straight out, why postcode is relevant? It provides regional variation information. So one of the things we need to do is provide geographically classified data. Using postcodes to create a geographic classication which does not include the postcode itself is, or should be, straightforward enough. Ergo, the postcode issue can be dealt with.

The NHS number can be replaced with a different primary key number which is not made available as part of the database of care,data data, but for which a conversation table exists with the original data. Again, depending on the actual implementation of the data structures, this should be straightforward.

This deals with the data privacy side of things and one of the big huge issues I have with the current idea.

After that, we need to be aware that more data doesn’t always cater for better/more accurate detailing. Large datasets can amplify statistical errors which, given we are talking about health data sets matter a lot, They affect real people.

These errors are the type of errors where, for example, 1 in 100 cases might be misdiagnosed because a particular test isn’t 100% acccurate for example.

Ultimately, I’m strongly in favour of this project, or, more to the point a project like it, provided it comes with built in data protection concerns and is implemented to benefit health care rather than, for example, corporate health business interests. As matters stand, I’m inclined to feel that there are lacunae here at the moment.