IIHA2YD – Pinterest

Pinterest is another one of those services whose data I suspect would be very interesting to look at. For the most part, it’s starting to be picked up commercially as a gallery option for a lot of companies. It gets very heavy usage from the handcrafts sector which is how I encountered it first, and then also via the surf magazine world (where I saw it starting to get a lot of use).

Most of the people I know who actively use Pinterest for themselves are women. I’d be interested to see how much of that is based in reality or whether that’s limited to my particular circle of boards. For the limited number of boards I follow, there is a heavy increase in usage between about 6 in the evening and 11 at night. There are a lot of specialist boards (I follow one which handles antique ink bottles only).

Quite a few stores are starting to implement pinterest links and this is interesting because one of the uses I have gotten out of it is as a visual shopping list. It also gets used as a bookmark service (see all the recipes that get bookmarked on it) and as a gallery (see how companies are using it to show case their products).

So I’d like to map out how it gets used by different people and how it gets exploited commercially. There is some evidence to suggest that it drives serious sales for companies using their pinboards as store fronts and that it may be more effective than other social media in this respect. I’d be interested to see if it’s possible to get a global picture of this, and whether some sectors do better than other sectors in this respect. I’d like to have a look at international traffic flows and whether different cultures target different uses of the pinboards.

I’m also interested in how the pins are categorised. Unlike – for example – Flickr – pinterest doesn’t really use tags – it uses categories. I’m not certain how its search works but my guess is it’s part category based and part text content based and this is interesting because usage patterns suggest that repinners very often don’t change the text accompanying a particular pin.

I’m really going to see if I can figure out a way of structuring how I would do this if I had a chance. It would create one giant infographic I suspect.

The death of big data

A couple of people have tweeted links to this article in my stream this morning and a couple of comments in it stood out for me, particularly bearing in mind I’ve already considered the concept of big data. Money quote from that piece:

I’m troubled by the impression I have that big data is somewhere we should be at without understanding why we should be at it and what we can get from it.

Moving back to the piece from VentureBeat, one of the standout sentences for me was this one:

The phrase “big data” is now beyond completely meaningless.

I’ve never, very liked the term big data because from my point of view, it never was meaningful. And yet there are still people having conversations that go “what are we doing about big data”.

This is the wrong question. The question is “how do we best exploit the data we have, how do we improve the quality of the data we have”. Scale has very little to do with this when you think about it.

Data is all about the questions you ask of out.

I know you’ve got an app for that but

…it doesn’t do what I want it to do.

This rush to put out apps for mobile devices is completely futile if your app has less functionality than your website does. And continually insisting on tell me about your app which is crippled compared to your website is a futile exercise if you want to win my heart and mind. I’ve downloaded your app. It’s functionally useless for why I want to visit your website. If I visit your website from a mobile device, serve me the link I clicked on and stop giving me a page that says your app exists and I should download it. I ALREADY HAVE AND IT DOESN’T DO WHAT I NEED IT TO DO.

Have you got that? I click on a link in my email to a page on your website and I can’t get to it because you’ve blocked it with a demand to download your app.






There is no point in having a mobile app for the sake of having a mobile app.

Catching my eye…what is your job exactly…

Jeff Leek over at Simply Statistics interviewed one of Google’s statisticians there a little while ago, Nick Chamandy. You’ll find the interview here. He had an interesting comment on describing what it is he did, and more to the point, ensuring more people got access to his kind of role by recognising that different field use different languages.

When posting job opportunities, we are cognizant that people from different academic fields tend to use different language, and we don’t want to miss out on a great candidate because he or she comes from a non-statistics background and doesn’t search for the right keyword. On my team alone, we have had successful “statisticians” with degrees in statistics, electrical engineering, econometrics, mathematics, computer science, and even physics. All are passionate about data and about tackling challenging inference problems.

I thought this was quite interesting because it represented a certain amount of out of the box thinking about what it is you want people to do. I can say this of course because I’m a language graduate working in IT – sometimes the talent isn’t roundly sorted by academia for you.

I think this tends to get forgotten now and again.



Etsy is an online market place for handcrafted goods and related specialist objects. They present some of their business data here in their monthly Weather Report which is – in my opinion – quite a nice idea. I’d like to see more companies, and not just in the internet start up branch do something similar rather than just waiting for filing time.

Etsy was the first company I started thinking about for this project for various reasons – first of all, they totally drove their market and are still the market leaders globally in that zone despite some local competition in smaller market areas. What they do, they do very, very well. But they are not necessarily high profile companies like, for example, the Netflixes and the Groupons. They do, however, have some interesting ideas in terms of organising their market and their staff. Their process of increasing their numbers of female engineers was a master case in not paying lip service to something they wanted to change.

According to Amazon, Etsy get their web log data processed on MapReduce, and actually, Etsy have blogged about that here and it is well worth a read if you are interested in data analytics and the requirements of companies in the new economy.

But that doesn’t answer the question as to what I would do if I had access to their data and let’s be honest, Etsy are pretty hot in terms of dealing with their data themselves so whatever I suggest, they may well have it covered.

The first thing I would do is look at data for Etsy outside America. I’m interested in international sales. Sales from America to France, from France to Germany, from Australia to Italy. If you pushed me to the wall and said “Guess”, I’d be willing to assume that a significant proportion of Etsy’s business sales are intra-United States. I’m interested in the breakdown in sales outside that piece of their business because to some extent, that may well be where much of their growth comes from. Etsy has done some very interesting localisation of their site – see here (yes, they blogged about that too) but I’d like to drill down into the numbers of pages they are serving in their locales (currently, in addition to English(US) and English (UK) they are providing localisation in German, French, Italian, Spanish and Dutch) and additionally what is getting hit by google translate, whether it is English pages or any of the other locales. From a currency point of view, they are providing pricing in significantly more currencies – I’m interested in seeing how the currencies line up with the language and locales. Right now, Etsy recognises that I speak English, that I live in Ireland and that I like my prices in Euro. But I could have them in Thai baht if I wanted.

I’m interested in how Etsy’s non-US market is playing out. Whether there’s a dependency on English for those languages which do not have language content localised – for example Japanese, or whether much of it gets streamed through Google Translate, how much trade not featuring US sellers or buyers is happening, and what networks are cropping up again and again in those sales; whether there is an obvious leaning for many people in Japan to buy handcrafts from, say, Australia. Whether Italian products are going down particularly well in Denmark.

I’m interested in changing life for people who might buy products through the site. Part of this is by making it easier to identify lower and higher delivery charges – for example, intra Europe is less expensive than US-Europe. So I’d like to find a way of setting up search/product offerings in Etsy that can be done on the basis of likely postal charge. Currently, I don’t think this is possible – the search is limited on the basis of whether a product will be dispatched to your location or not, and not sorted according to possible cost – but it could be done by setting up banding based on the delivery charges in the store fronts, potentially. I’d also like it if, underlying, the system which serves storefront pages to possible customers could learn when a particular product created in one part of the world seems to have a particular following in another part of the world. I’d be interested to see what Etsy are doing in terms of localising demand beyond the need to serve products which can be dispatched to your country of location or not and whether this can be used to drive market penetration outside the US.

In summary then, I’m interested in Etsy’s non-US data. I’m interested in extra-US sales activities, I’m interested in measuring whether the localisation they have done so far is matching how their international markets are moving. I’m interested in using this data to tweak how products are served to potential customers, and I’m interested in enhancing the available information to a customer in terms of delivery issues, for example. I particularly interested to see how Etsy is doing in the UK compared to other non-English language locales on a similar scale (say Germany, France, Italy). I’m very interested to see how Etsy is doing in Japan and India and what the trends there have been over the last 2-3 years for example. I want to see if particular locales are showing organic growth and I’m interested to see what the company is doing to drive growth outside the US heartland.

This is what I would do with some of Etsy’s data if I ever got my hands on it. Also, I’d implement a wishlist. Please can I have a wishlist.


ETA: Etsy’s localised newsletters are great and yes, they have some very decent localised search well. I am completely impressed.




If I had access to your data…

Some time ago, Hilary Mason of Bit.ly did a blog post on the sort of questions she asked when she was recruiting data scientists. There was some interesting stuff there, and since then, other people have done similar things via LinkedIn, for example.

One of the ones Hilary raised went along the lines of “Well look, you know a bit about our data now, so, what would you do with it that we aren’t doing at the moment”.

I liked that question a lot and have been thinking about it since, particularly with a view to the data available to other companies – not just Bit.ly – and have decided to do the occasional blog post on what I’d do with available data in different companies. Hence, there will be the odd entry which starts IIHA2YD which will cover that. I see some benefits to this – it allows you to sit down and consider what sort of data companies might truly have. And because you are looking at it from a company perspective, it’s likely to be less silo’d than if you were looking at it from the point of view of analytics in support of a particular function.

I foresee fun.

How college is going

Being back at university studying mathematics more or less for the hell of it is actually quite an interesting experience. The whole independent study thing is hard from time to time, but what’s hardest about it is you have to do actual rent paying work around it and somehow, study is more fun on occasion. I’m just done with a block looking at iteration and matrices which was quite interesting, and also, with a stats block dealing with time series. The thing about time series – in one respect – is that they get used a lot on a very superficial by a lot of people…but in depth, there’s kind of a lot more, particularly in terms of predictive modelling.

I scored very, very well on both assignments linked to these modules and am about to move into calculus (again) and multivariates between the maths and the stats.

What people can’t quite get to grips with is that I’m actually doing this. Why, if you already have a degree and a couple of postgrads, and a job, would you go back and so something like maths. Maths is hard.

And it’s not like I need to.

This leads me to wonder about people’s motivation sometimes. When I look around, the people whose opinion I have, over the years, tended to value most, think that going back to college is a terrific thing, and that it’s awesome that I’m doing it. The ones who question the sanity of it, I have noticed, tend to be slightly more negative in their outlook about most of their daily life, and in particular, about the impact that decisions outside their control have on their lives. On balance, I wonder how many people assert control over their lives and how many just coast.

I was looking at maths courses for 2-3 years before I eventually signed up to the Open University. Dublin really only has one part time option which is the DIT and at the time I eventually rejected it, I was pretty sure it wasn’t right for me. The Open University while requiring a lot of independent time with the books, has proven to be more helpful. At the time which I started the course, there were some reorganisations going on at work, and quite a lot of people were suggesting that I, maybe, wait and see.

I have come to the conclusion that sometimes, “wait and see” is a corrosive piece of advice. If, for example, I had waited and seen a year in 2011, the changes in funding for OU courses would have made it financially out of the question. Sometimes, you really need to identify the right decision for yourself regardless of what other people think.

I scored 94 in the last maths assignment. It’s probably the highest mark I have gotten in anything since I was about 17 years old and I knew that the max I’d be scored from was 97 anyway. So I’m really, really pleased with this.

I don’t think waiting and seeing would have been the right thing to do. I’m very, very glad I did this even if it means I spend a lot of time curled up with numbers and symbols.


Passenger Air transport in Europe, 2004 to 2011

One of the things I wanted this year was a little experimentation with Tableau so I had been looking around for some data to play with. The above data comes to you courtesy of Eurostat and it relates to passenger transport by air in Europe. I’ve covered the period 2004-2011 and the countries concerned because they provided complete data for the periods concerned. There are a couple of other countries with incomplete data in the Eurostat tables as well which you can find here.

I did a little bit of work with the data because I wanted to identify two underlying stories. The first one – which you can see in this display – takes the absolute passenger figures and divides them by the population of the countries concerned so that we get a measure for passenger flights per head of population. This is interesting because it can highlight a couple of things – necessity (see Ireland and Iceland for example which are relatively small countries with no other connection options to other countries), economic strength (see Switzerland) and, rather more difficult to measure, the importance of travel.

The other story – which parts of this dashboard hint at – is how economic performance impacts on air transport. For this, I will look to get GDP figures into the underlying data and graph them against passengers per head of population. Already, however, if you look at the time lines for Ireland and Iceland, there is a hint that there can be a major impact in this respect.

This is the first project I have undertaken with Tableau and I am using Tableau Public. It has been a sharp learning experience. One of the things which has struck me is that software can be erratic in how it handles dates. The underlying tables for this project are in Excel, and Excel does not handle years as dates. Tableau attempted to interpret the years as days since sometime in 1899. Fixing that is messy and potentially a logistical night mare in the future. When I went to look at date formatting, I was stunned to see Excel didn’t allow me to format a year as date. This is infuriating.

However, I got something out of this process which is a lot of information on how to get data working in Tableau for me.