Datagraphics: the property tax in Ireland.

According to the Irish Times, the Revenue Commissioner’s guideline map for property values (for self assessment for the property tax we now have) has drawn a lot of criticism, mainly of the type “the values they are suggesting do not match reality on the ground”. See this report here.

I’m not, for now, going to go into any great detail on data quality and assessment of same. There are a lot of arguments to be had over that.

My issue is the map itself. Here, roughly speaking, is what it looks like (screengrabbed at 7pm) on my computer:

Dublin City area property tax valuations
Dublin City - property values

 

I firmly believe that a graphic like this should be easy to read. This one isn’t because the graduations between the different colours is very slight so that it can be hard to identify exactly which of two bands a particular area falls into.

If I were doing something like this, I’d take bigger colour differentials for the different bands rather than a graduated scheme as used above.

The rush to apps…

A little while ago, I noticed that if I tried to open a link to a major property website in Ireland, it insisted on sending me to an unknown protocol and demanded that I used its app.

The website in question has a website. It may not be completely pretty on a mobile browser, but you know, sometimes I am in a hurry. And when I am opening a link from an email either in an email application which has a local browser or from web readable email in something like Chrome or Safari, I expect the link to open. I don’t expect to be told the browser doesn’t recognise the protocol and I don’t expect to be told that the company has an app and then be redirected to the app store to get it.

I expect the page to open.

I realise there has been some serious bandwagoning around app development – but the problem is this. We’re moving to web based applications via a browser on desktops – slowly – but we’re getting there. To quote that nice Mr Randall Munro, at XKCD:

 

But we seem to be moving in the other direction on mobile. I don’t want 100 applications on my phone. I don’t need an app for every individual company whose website I wish to browse. Already, at least one company that I can think of (but won’t name) has an app which doesn’t even include the key functionality I need from that company. And they are still pushing me to use their app.

This is not stepping forward. It’s stepping backwards. If this is the future, I really, really don’t want it.

I have a browser for a reason.  I expect to be able to browse data on the web in it. I expect not to need a proprietary application per company to get at their online store front.

IIHA2YD – Pinterest

Pinterest is another one of those services whose data I suspect would be very interesting to look at. For the most part, it’s starting to be picked up commercially as a gallery option for a lot of companies. It gets very heavy usage from the handcrafts sector which is how I encountered it first, and then also via the surf magazine world (where I saw it starting to get a lot of use).

Most of the people I know who actively use Pinterest for themselves are women. I’d be interested to see how much of that is based in reality or whether that’s limited to my particular circle of boards. For the limited number of boards I follow, there is a heavy increase in usage between about 6 in the evening and 11 at night. There are a lot of specialist boards (I follow one which handles antique ink bottles only).

Quite a few stores are starting to implement pinterest links and this is interesting because one of the uses I have gotten out of it is as a visual shopping list. It also gets used as a bookmark service (see all the recipes that get bookmarked on it) and as a gallery (see how companies are using it to show case their products).

So I’d like to map out how it gets used by different people and how it gets exploited commercially. There is some evidence to suggest that it drives serious sales for companies using their pinboards as store fronts and that it may be more effective than other social media in this respect. I’d be interested to see if it’s possible to get a global picture of this, and whether some sectors do better than other sectors in this respect. I’d like to have a look at international traffic flows and whether different cultures target different uses of the pinboards.

I’m also interested in how the pins are categorised. Unlike – for example – Flickr – pinterest doesn’t really use tags – it uses categories. I’m not certain how its search works but my guess is it’s part category based and part text content based and this is interesting because usage patterns suggest that repinners very often don’t change the text accompanying a particular pin.

I’m really going to see if I can figure out a way of structuring how I would do this if I had a chance. It would create one giant infographic I suspect.

The death of big data

A couple of people have tweeted links to this article in my stream this morning and a couple of comments in it stood out for me, particularly bearing in mind I’ve already considered the concept of big data. Money quote from that piece:

I’m troubled by the impression I have that big data is somewhere we should be at without understanding why we should be at it and what we can get from it.

Moving back to the piece from VentureBeat, one of the standout sentences for me was this one:

The phrase “big data” is now beyond completely meaningless.

I’ve never, very liked the term big data because from my point of view, it never was meaningful. And yet there are still people having conversations that go “what are we doing about big data”.

This is the wrong question. The question is “how do we best exploit the data we have, how do we improve the quality of the data we have”. Scale has very little to do with this when you think about it.

Data is all about the questions you ask of out.

I know you’ve got an app for that but

…it doesn’t do what I want it to do.

This rush to put out apps for mobile devices is completely futile if your app has less functionality than your website does. And continually insisting on tell me about your app which is crippled compared to your website is a futile exercise if you want to win my heart and mind. I’ve downloaded your app. It’s functionally useless for why I want to visit your website. If I visit your website from a mobile device, serve me the link I clicked on and stop giving me a page that says your app exists and I should download it. I ALREADY HAVE AND IT DOESN’T DO WHAT I NEED IT TO DO.

Have you got that? I click on a link in my email to a page on your website and I can’t get to it because you’ve blocked it with a demand to download your app.

 

 

 

 

 

There is no point in having a mobile app for the sake of having a mobile app.

Catching my eye…what is your job exactly…

Jeff Leek over at Simply Statistics interviewed one of Google’s statisticians there a little while ago, Nick Chamandy. You’ll find the interview here. He had an interesting comment on describing what it is he did, and more to the point, ensuring more people got access to his kind of role by recognising that different field use different languages.

When posting job opportunities, we are cognizant that people from different academic fields tend to use different language, and we don’t want to miss out on a great candidate because he or she comes from a non-statistics background and doesn’t search for the right keyword. On my team alone, we have had successful “statisticians” with degrees in statistics, electrical engineering, econometrics, mathematics, computer science, and even physics. All are passionate about data and about tackling challenging inference problems.

I thought this was quite interesting because it represented a certain amount of out of the box thinking about what it is you want people to do. I can say this of course because I’m a language graduate working in IT – sometimes the talent isn’t roundly sorted by academia for you.

I think this tends to get forgotten now and again.

 

IIHA2YD: Etsy

Etsy is an online market place for handcrafted goods and related specialist objects. They present some of their business data here in their monthly Weather Report which is – in my opinion – quite a nice idea. I’d like to see more companies, and not just in the internet start up branch do something similar rather than just waiting for filing time.

Etsy was the first company I started thinking about for this project for various reasons – first of all, they totally drove their market and are still the market leaders globally in that zone despite some local competition in smaller market areas. What they do, they do very, very well. But they are not necessarily high profile companies like, for example, the Netflixes and the Groupons. They do, however, have some interesting ideas in terms of organising their market and their staff. Their process of increasing their numbers of female engineers was a master case in not paying lip service to something they wanted to change.

According to Amazon, Etsy get their web log data processed on MapReduce, and actually, Etsy have blogged about that here and it is well worth a read if you are interested in data analytics and the requirements of companies in the new economy.

But that doesn’t answer the question as to what I would do if I had access to their data and let’s be honest, Etsy are pretty hot in terms of dealing with their data themselves so whatever I suggest, they may well have it covered.

The first thing I would do is look at data for Etsy outside America. I’m interested in international sales. Sales from America to France, from France to Germany, from Australia to Italy. If you pushed me to the wall and said “Guess”, I’d be willing to assume that a significant proportion of Etsy’s business sales are intra-United States. I’m interested in the breakdown in sales outside that piece of their business because to some extent, that may well be where much of their growth comes from. Etsy has done some very interesting localisation of their site – see here (yes, they blogged about that too) but I’d like to drill down into the numbers of pages they are serving in their locales (currently, in addition to English(US) and English (UK) they are providing localisation in German, French, Italian, Spanish and Dutch) and additionally what is getting hit by google translate, whether it is English pages or any of the other locales. From a currency point of view, they are providing pricing in significantly more currencies – I’m interested in seeing how the currencies line up with the language and locales. Right now, Etsy recognises that I speak English, that I live in Ireland and that I like my prices in Euro. But I could have them in Thai baht if I wanted.

I’m interested in how Etsy’s non-US market is playing out. Whether there’s a dependency on English for those languages which do not have language content localised – for example Japanese, or whether much of it gets streamed through Google Translate, how much trade not featuring US sellers or buyers is happening, and what networks are cropping up again and again in those sales; whether there is an obvious leaning for many people in Japan to buy handcrafts from, say, Australia. Whether Italian products are going down particularly well in Denmark.

I’m interested in changing life for people who might buy products through the site. Part of this is by making it easier to identify lower and higher delivery charges – for example, intra Europe is less expensive than US-Europe. So I’d like to find a way of setting up search/product offerings in Etsy that can be done on the basis of likely postal charge. Currently, I don’t think this is possible – the search is limited on the basis of whether a product will be dispatched to your location or not, and not sorted according to possible cost – but it could be done by setting up banding based on the delivery charges in the store fronts, potentially. I’d also like it if, underlying, the system which serves storefront pages to possible customers could learn when a particular product created in one part of the world seems to have a particular following in another part of the world. I’d be interested to see what Etsy are doing in terms of localising demand beyond the need to serve products which can be dispatched to your country of location or not and whether this can be used to drive market penetration outside the US.

In summary then, I’m interested in Etsy’s non-US data. I’m interested in extra-US sales activities, I’m interested in measuring whether the localisation they have done so far is matching how their international markets are moving. I’m interested in using this data to tweak how products are served to potential customers, and I’m interested in enhancing the available information to a customer in terms of delivery issues, for example. I particularly interested to see how Etsy is doing in the UK compared to other non-English language locales on a similar scale (say Germany, France, Italy). I’m very interested to see how Etsy is doing in Japan and India and what the trends there have been over the last 2-3 years for example. I want to see if particular locales are showing organic growth and I’m interested to see what the company is doing to drive growth outside the US heartland.

This is what I would do with some of Etsy’s data if I ever got my hands on it. Also, I’d implement a wishlist. Please can I have a wishlist.

_______________

ETA: Etsy’s localised newsletters are great and yes, they have some very decent localised search well. I am completely impressed.

 

 

 

If I had access to your data…

Some time ago, Hilary Mason of Bit.ly did a blog post on the sort of questions she asked when she was recruiting data scientists. There was some interesting stuff there, and since then, other people have done similar things via LinkedIn, for example.

One of the ones Hilary raised went along the lines of “Well look, you know a bit about our data now, so, what would you do with it that we aren’t doing at the moment”.

I liked that question a lot and have been thinking about it since, particularly with a view to the data available to other companies – not just Bit.ly – and have decided to do the occasional blog post on what I’d do with available data in different companies. Hence, there will be the odd entry which starts IIHA2YD which will cover that. I see some benefits to this – it allows you to sit down and consider what sort of data companies might truly have. And because you are looking at it from a company perspective, it’s likely to be less silo’d than if you were looking at it from the point of view of analytics in support of a particular function.

I foresee fun.

How college is going

Being back at university studying mathematics more or less for the hell of it is actually quite an interesting experience. The whole independent study thing is hard from time to time, but what’s hardest about it is you have to do actual rent paying work around it and somehow, study is more fun on occasion. I’m just done with a block looking at iteration and matrices which was quite interesting, and also, with a stats block dealing with time series. The thing about time series – in one respect – is that they get used a lot on a very superficial by a lot of people…but in depth, there’s kind of a lot more, particularly in terms of predictive modelling.

I scored very, very well on both assignments linked to these modules and am about to move into calculus (again) and multivariates between the maths and the stats.

What people can’t quite get to grips with is that I’m actually doing this. Why, if you already have a degree and a couple of postgrads, and a job, would you go back and so something like maths. Maths is hard.

And it’s not like I need to.

This leads me to wonder about people’s motivation sometimes. When I look around, the people whose opinion I have, over the years, tended to value most, think that going back to college is a terrific thing, and that it’s awesome that I’m doing it. The ones who question the sanity of it, I have noticed, tend to be slightly more negative in their outlook about most of their daily life, and in particular, about the impact that decisions outside their control have on their lives. On balance, I wonder how many people assert control over their lives and how many just coast.

I was looking at maths courses for 2-3 years before I eventually signed up to the Open University. Dublin really only has one part time option which is the DIT and at the time I eventually rejected it, I was pretty sure it wasn’t right for me. The Open University while requiring a lot of independent time with the books, has proven to be more helpful. At the time which I started the course, there were some reorganisations going on at work, and quite a lot of people were suggesting that I, maybe, wait and see.

I have come to the conclusion that sometimes, “wait and see” is a corrosive piece of advice. If, for example, I had waited and seen a year in 2011, the changes in funding for OU courses would have made it financially out of the question. Sometimes, you really need to identify the right decision for yourself regardless of what other people think.

I scored 94 in the last maths assignment. It’s probably the highest mark I have gotten in anything since I was about 17 years old and I knew that the max I’d be scored from was 97 anyway. So I’m really, really pleased with this.

I don’t think waiting and seeing would have been the right thing to do. I’m very, very glad I did this even if it means I spend a lot of time curled up with numbers and symbols.