Skip to content

IIHA2YD: Etsy

Etsy is an online market place for handcrafted goods and related specialist objects. They present some of their business data here in their monthly Weather Report which is – in my opinion – quite a nice idea. I’d like to see more companies, and not just in the internet start up branch do something similar rather than just waiting for filing time.

Etsy was the first company I started thinking about for this project for various reasons – first of all, they totally drove their market and are still the market leaders globally in that zone despite some local competition in smaller market areas. What they do, they do very, very well. But they are not necessarily high profile companies like, for example, the Netflixes and the Groupons. They do, however, have some interesting ideas in terms of organising their market and their staff. Their process of increasing their numbers of female engineers was a master case in not paying lip service to something they wanted to change.

According to Amazon, Etsy get their web log data processed on MapReduce, and actually, Etsy have blogged about that here and it is well worth a read if you are interested in data analytics and the requirements of companies in the new economy.

But that doesn’t answer the question as to what I would do if I had access to their data and let’s be honest, Etsy are pretty hot in terms of dealing with their data themselves so whatever I suggest, they may well have it covered.

The first thing I would do is look at data for Etsy outside America. I’m interested in international sales. Sales from America to France, from France to Germany, from Australia to Italy. If you pushed me to the wall and said “Guess”, I’d be willing to assume that a significant proportion of Etsy’s business sales are intra-United States. I’m interested in the breakdown in sales outside that piece of their business because to some extent, that may well be where much of their growth comes from. Etsy has done some very interesting localisation of their site – see here (yes, they blogged about that too) but I’d like to drill down into the numbers of pages they are serving in their locales (currently, in addition to English(US) and English (UK) they are providing localisation in German, French, Italian, Spanish and Dutch) and additionally what is getting hit by google translate, whether it is English pages or any of the other locales. From a currency point of view, they are providing pricing in significantly more currencies – I’m interested in seeing how the currencies line up with the language and locales. Right now, Etsy recognises that I speak English, that I live in Ireland and that I like my prices in Euro. But I could have them in Thai baht if I wanted.

I’m interested in how Etsy’s non-US market is playing out. Whether there’s a dependency on English for those languages which do not have language content localised – for example Japanese, or whether much of it gets streamed through Google Translate, how much trade not featuring US sellers or buyers is happening, and what networks are cropping up again and again in those sales; whether there is an obvious leaning for many people in Japan to buy handcrafts from, say, Australia. Whether Italian products are going down particularly well in Denmark.

I’m interested in changing life for people who might buy products through the site. Part of this is by making it easier to identify lower and higher delivery charges – for example, intra Europe is less expensive than US-Europe. So I’d like to find a way of setting up search/product offerings in Etsy that can be done on the basis of likely postal charge. Currently, I don’t think this is possible – the search is limited on the basis of whether a product will be dispatched to your location or not, and not sorted according to possible cost – but it could be done by setting up banding based on the delivery charges in the store fronts, potentially. I’d also like it if, underlying, the system which serves storefront pages to possible customers could learn when a particular product created in one part of the world seems to have a particular following in another part of the world. I’d be interested to see what Etsy are doing in terms of localising demand beyond the need to serve products which can be dispatched to your country of location or not and whether this can be used to drive market penetration outside the US.

In summary then, I’m interested in Etsy’s non-US data. I’m interested in extra-US sales activities, I’m interested in measuring whether the localisation they have done so far is matching how their international markets are moving. I’m interested in using this data to tweak how products are served to potential customers, and I’m interested in enhancing the available information to a customer in terms of delivery issues, for example. I particularly interested to see how Etsy is doing in the UK compared to other non-English language locales on a similar scale (say Germany, France, Italy). I’m very interested to see how Etsy is doing in Japan and India and what the trends there have been over the last 2-3 years for example. I want to see if particular locales are showing organic growth and I’m interested to see what the company is doing to drive growth outside the US heartland.

This is what I would do with some of Etsy’s data if I ever got my hands on it. Also, I’d implement a wishlist. Please can I have a wishlist.

_______________

ETA: Etsy’s localised newsletters are great and yes, they have some very decent localised search well. I am completely impressed.

 

 

 

If I had access to your data…

Some time ago, Hilary Mason of Bit.ly did a blog post on the sort of questions she asked when she was recruiting data scientists. There was some interesting stuff there, and since then, other people have done similar things via LinkedIn, for example.

One of the ones Hilary raised went along the lines of “Well look, you know a bit about our data now, so, what would you do with it that we aren’t doing at the moment”.

I liked that question a lot and have been thinking about it since, particularly with a view to the data available to other companies – not just Bit.ly – and have decided to do the occasional blog post on what I’d do with available data in different companies. Hence, there will be the odd entry which starts IIHA2YD which will cover that. I see some benefits to this – it allows you to sit down and consider what sort of data companies might truly have. And because you are looking at it from a company perspective, it’s likely to be less silo’d than if you were looking at it from the point of view of analytics in support of a particular function.

I foresee fun.

How college is going

Being back at university studying mathematics more or less for the hell of it is actually quite an interesting experience. The whole independent study thing is hard from time to time, but what’s hardest about it is you have to do actual rent paying work around it and somehow, study is more fun on occasion. I’m just done with a block looking at iteration and matrices which was quite interesting, and also, with a stats block dealing with time series. The thing about time series – in one respect – is that they get used a lot on a very superficial by a lot of people…but in depth, there’s kind of a lot more, particularly in terms of predictive modelling.

I scored very, very well on both assignments linked to these modules and am about to move into calculus (again) and multivariates between the maths and the stats.

What people can’t quite get to grips with is that I’m actually doing this. Why, if you already have a degree and a couple of postgrads, and a job, would you go back and so something like maths. Maths is hard.

And it’s not like I need to.

This leads me to wonder about people’s motivation sometimes. When I look around, the people whose opinion I have, over the years, tended to value most, think that going back to college is a terrific thing, and that it’s awesome that I’m doing it. The ones who question the sanity of it, I have noticed, tend to be slightly more negative in their outlook about most of their daily life, and in particular, about the impact that decisions outside their control have on their lives. On balance, I wonder how many people assert control over their lives and how many just coast.

I was looking at maths courses for 2-3 years before I eventually signed up to the Open University. Dublin really only has one part time option which is the DIT and at the time I eventually rejected it, I was pretty sure it wasn’t right for me. The Open University while requiring a lot of independent time with the books, has proven to be more helpful. At the time which I started the course, there were some reorganisations going on at work, and quite a lot of people were suggesting that I, maybe, wait and see.

I have come to the conclusion that sometimes, “wait and see” is a corrosive piece of advice. If, for example, I had waited and seen a year in 2011, the changes in funding for OU courses would have made it financially out of the question. Sometimes, you really need to identify the right decision for yourself regardless of what other people think.

I scored 94 in the last maths assignment. It’s probably the highest mark I have gotten in anything since I was about 17 years old and I knew that the max I’d be scored from was 97 anyway. So I’m really, really pleased with this.

I don’t think waiting and seeing would have been the right thing to do. I’m very, very glad I did this even if it means I spend a lot of time curled up with numbers and symbols.

 

Passenger Air transport in Europe, 2004 to 2011

One of the things I wanted this year was a little experimentation with Tableau so I had been looking around for some data to play with. The above data comes to you courtesy of Eurostat and it relates to passenger transport by air in Europe. I’ve covered the period 2004-2011 and the countries concerned because they provided complete data for the periods concerned. There are a couple of other countries with incomplete data in the Eurostat tables as well which you can find here.

I did a little bit of work with the data because I wanted to identify two underlying stories. The first one – which you can see in this display – takes the absolute passenger figures and divides them by the population of the countries concerned so that we get a measure for passenger flights per head of population. This is interesting because it can highlight a couple of things – necessity (see Ireland and Iceland for example which are relatively small countries with no other connection options to other countries), economic strength (see Switzerland) and, rather more difficult to measure, the importance of travel.

The other story – which parts of this dashboard hint at – is how economic performance impacts on air transport. For this, I will look to get GDP figures into the underlying data and graph them against passengers per head of population. Already, however, if you look at the time lines for Ireland and Iceland, there is a hint that there can be a major impact in this respect.

This is the first project I have undertaken with Tableau and I am using Tableau Public. It has been a sharp learning experience. One of the things which has struck me is that software can be erratic in how it handles dates. The underlying tables for this project are in Excel, and Excel does not handle years as dates. Tableau attempted to interpret the years as days since sometime in 1899. Fixing that is messy and potentially a logistical night mare in the future. When I went to look at date formatting, I was stunned to see Excel didn’t allow me to format a year as date. This is infuriating.

However, I got something out of this process which is a lot of information on how to get data working in Tableau for me.

 

My way or the high way.

Via HBR

Alexandra Samuel wrote a piece on notetaking. It was quite breathtaking. It opened as follows:

I knew right away, when you walked in here with a paper notebook — a paper notebook! — I realized that this meeting was not going to be a good use of our time.

It caused what might best be described as a shitstorm, and seems to have topped out at 304 comments, the overwhelming majority of which were not in favour of the piece. She then wrote a piece about the reaction here where again, the majority of comments gently pointed out that a look in the mirror to remove the plank from her eye might be in order before criticising much of what was said to her.

I have a lot of problems with the piece, the key one being, anyone who meets other people with that sort of attitude; the attitude that her way with her digital gadgets and toys was better than anyone else’s way of organising aide memoires. The simple truth is, it probably isn’t. Certainly, it’s not straight forward that all the digital productive tools in the world make you more productive. What – in my experience – tends to make you more productive is not feeling you have to justify every single little way of doing things.

I own a laptop, a tablet and a smartphone. The laptop runs Windows, the rest is iOS. I also have a lot of notebooks not because they make me less productive but because they cause me to be more productive. Basic day to day list? It’s faster to write it down and tick it off. Mindmapping? Quicker to pull out software. Reflecting on life in general? I’ve kept a journal since I was 19 years old. There’s something inherently more valuable about it than files which go missing and get corrupt.

There is a common view that previous generations before us will have left more of a footprint via their books and their monuments than we will. We may generally produce a huge amount more information than the generations before us, but we do not do very much to retain it. Already, data saved in the 1950s and 1960s is getting harder and impossible to retrieve because we just don’t have the technology. Our technology cycles are changing and information is dying, in some respects, faster than it did 100 years ago.

Important things to me, my life, and my feelings, go in notebooks.

What worried me most about the whole piece was not so much the massively condescending piece as it was published originally although I really do have to say that it came across as childish and condescending, but the overwhelming lack of understanding why she might not be right. This came across in her replies to comments across the piece. For example, she really doesn’t get that a lot of companies for legal and regulatory reasons just are not allowed to use services like Evernote. It’s not a question of a manager being an old fogey that she can write to and point out the errors of their ways so that a bunch of people wind up with laptops and iPads.

As it happens, I don’t think that laptops and iPads enhance listening. My experience is that people who are typing are not processing information at all. I’m a very fast typist – I typically averaged 120wpm in English in my admin days. Alone of all my colleagues, I could type from live dictation. This means that as fast as you spoke, I typed. And as a special trick, I could type in English what you said to me in French.

For a good typist, the iPad keyboard is basically unworkable. Typing things puts a constraint on how you describe unstructured data. Most meetings consist of unstructured data; they consist of brainstorming, problem solving.

Being honest, were I to walk into a meeting with someone like Alexandra, weighed down by her laptop and her iPad, I’d wonder if she really had any interest in the meeting at all. Oh it’s not because I think she’ll be checking her email or her twitter or her Facebook while I’m describing whatever problem we are here to resolve. It’s because I know that people who are typing are not absorbing. This is why, perhaps, Alexandra needs the crutch of search and retrieval of her digital tools. People who remember more get more done.

I think Alexandra, in stating that you don’t have to remember things because it’s all in Evernote, has missed that minor detail.

I should note she has a book on Evernote as a tool available at the moment.

 

Your objective to inform, and not look pretty but useless.

Via Stats Chat in the last week or two.

If you’re not willing to click through, Stats Chat have posted a donut graphic which some New Zealand paper have printed to display some data. Really, you should have a look and then decide whether the graphic actually accurately depicts the data that the Australian paper’s figures appear to be giving.

One of the worst features – in my humble experience – of enhanced graphics capabilities of different software packages (I’m looking at you, Excel, you know I love you but…) is that people will insist on using them. Inappropriately, confusingly and just plain badly. It’s quite worrying in some respects.

So what would be an elite technology company then?

About a week ago, I had a discussion on twitter about this article.

Facebook is not an elite company

(from the San Francisco Chronicle)

The list is a short one. Usually, it includes Google, Amazon, Apple, Facebook and (debatably) Microsoft.

This is the interesting quote.

Different things are essential to different people. So I’d argue that in the grand scheme of things, I’d be severely discommoded without Microsoft and Google but life without Amazon, Apple, and Facebook, provided at least one bookshop was still open, would probably be   well more than survivable.

For me, when we have conversations like this, I don’t like to see sentences like this, however:

For the sake of this scenario, we’re not talking about behind-the-scene all-stars like Nvidia, IBM and Intel, but the companies that people interact with every day.

The simple truth is people interact with IBM, Nvidia and Intel every day of their lives, but the crucial difference is they often don’t know it. In my view, if you took IBM away, you really wouldn’t have much left. You’d potentially have a banking system and aviation system in serious crisis. Pretending they are excluded just because people don’t load stuff up in a browser is missing the point if we’re trying to identify the elite companies; the ones we cannot do without. To some extent, there are replacements for every single product and producer on the list the article was willing to look at, but it’s not anywhere near so straightforward for the second list, the list we don’t want to talk about. The justification for including Amazon has nothing to do with its retail arm and everything to do with the fact that a lot of other sites are hosted on their AWS, for example, something which an awful lot of people don’t know. This puts them in the same box as the IBM systems underpinning the banks and many of the airlines. It’s what you don’t deal with on a day to day basis which is most critical. And that’s what makes the elite companies elite.

An open letter to Twitter

Hi,

Thanks for the promoted tweet from eToro. I seem to see them regularly.

I understand that you have a business. From my point of view, promoted tweets are little more than ads, or marketing junk. I’d like to be able to switch off promoted tweets from eToro. I’m just not interested.

I get the need to monetise your product. Google manages to ship me reasonably relevant advertising in my Gmail. YOu get a lot more information out me so….why do I get ads for Apple Stock?

I read a piece Hilary Mason wrote the other day about interview questions for data science questions. She said she’d ask what, based on your knowledge of bit.ly’s data, you would do that they are not doing.

Well I don’t know for bit.ly to be honest. I don’t use the service quite enough to comment. However, where Twitter is concerned, I’d do a better job on contextualising the inline advertising. Take me. It’s clear from the accounts I follow, the links I follow, the posts I make, even my description that I have certain specialised interests….photography. Surf. Kitesurf. Computer related stuff. Travel.

Nowhere in my account is any evidence that I am interested in eToro’s services. But I wouldn’t object to more relevant tasting promoted tweets, so how about it? Are you working in that area at all?

 

yours,

 

Treasa

 

 

Why do you develop…

Sometime ago, I had a conversation with a developer on the subject of rectifying a re-occurring issue. There was a straightforward fix a developer could do to fix each occurrence of that issue but the developer, who had also explained several times how to avoid the issue to one or two of the several users wanted to punish the users and stop fixing the problems for them to compel them to make efforts to avoid the problem by following procedure. This might work if you’ve one or two users but more than that, I think it’s unrealistic. Much better to allow for the software to protect against errors particularly if it’s a known and re-occurring issue.

I’ve often replayed that conversation in my mind and realised that I don’t really like it as an idea. While no part of the world is perfect, and there are often underlying considerations, rather than telling users how to avoid problems procedurally, we should enable them not to cause the problem in the first place by either a) preventing it from happening at a coding level or b) automatically fixing it in some way. Failing that, providing them with a tool to fix the issue themselves.

I don’t think we should ever be in a zone whereby it’s considered acceptable to punish users via the software we’ve designed for them. We should be in a zone whereby we develop to protect them against themselves to some extent. Ultimately, a developer’s role is to help a user to accomplish some task. That includes making it easy for them to accomplish that task while making it hard for them to break accomplishing that task. Punishing them because your software design fails on the second part of that role is perhaps a little unfair.

 

Big Data. Many things to many different people

Late last night, I picked up a tweet from Hilary Mason, chief scientist with Bit.ly

I’m troubled by the increasing interpretation of “big data” to mean “data without the scientific method”. When did that happen?

This is an interesting question, made all the more difficult by the growing impression I have that the definition of Big Data is a very dynamic concept. What is big data?

The truth is, I think a lot of people aren’t sure. Hilary herself provides an interesting definition:

I prefer the big = “too big to analyze on one computer” definition

I have some mixed feelings. I don’t like the phrase big data; I never have because it comes over far too much like a marketing buzzword and less like some underlying concept. For various reasons, people have had cause to ask me what I understand by “big data” lately which indicates to me that it’s something that has come at people without them recognising what lies beneath it.

For me, when I have to describe what I see it as, I say this. I say “We generate a lot of data. From different activities within a given organisation. We allow some people to analyse certain specific areas of it because historically, we didn’t have so much data, and it was all subject specific. But things are different now. Our activities generate a lot more data and that data is very much interdependent sometimes. You may be a subject area specialist in one particular area of an organisation and you may only care about that particular area. But your organisation is much bigger than your area and it could be – often is the case for example – that marrying the data from your area with the data from other areas can have a huge impact on the way your organisation does business”. In other words, we have a lot of data now, some of it more voluminous than others, but the vast majority of organisations do not use their data to join the dots coherently.

When I read articles about big data, and data science, I too am troubled. I’m troubled by the impression I have that big data is somewhere we should be at without understanding why we should be at it and what we can get from it. There is a degree of unclarity about what a data scientist actually does. Business is not generally good with a lack of clarity. Matters are not helped when the media helpfully supply articles about how datascience is the next big thing or that being a datascientist is the sexiest new job going.

It probably is but this only serves to attract people who weren’t really interested in the first place.

I’m interested in the interpretation of numbers, what they mean, how we got to those numbers, where we can go with them. How they inform us. I equally got to where I am right now by recognising that a lot of people were very interested in drawing pretty pictures displaying numbers but not so interested in the validity of the numbers. I have seen bar charts comparing social media site usage which compared the number of Facebook page loads with the number of photo uploads on Flickr. You don’t need me to tell you this is not a valid comparison given that Flickr gets page loads and Facebook has photo uploads as well.

My big huge concern with big data is that people look at the big bit but not the data bit.

I can’t write this in 140 characters, by the way; that’s why we are here and not trashing it out into the middle of the night on twitter. Also, Hilary had a good go at it last night.

If you were an executive standing in front of me, I would ask if you ever measured your website response times against your website demands linked to – for example – your sending out a marketing email, or whether there were unexpected regional variations, or whether there are trends in the google search terms bringing people to your website that indicate a questionable link somewhere or a business opportunity lost. You can call this big data if it makes you feel better, or data science. I tend to prefer data science because I suspect it is going to be around a lot longer than big data.

What matters to me is that you get the best possible information out of your data, regardless of how much or how little you have. One of the things that concerns me slightly about discussions on data science and big data is the lack of attention to basic skills in analytics. It is not just a case of running a SQL query and picking out the highest band of the ensuing bar chart. When we look at the skill sets required for data science, there tends to be a focus on computer programming (which is good, don’t get me wrong), but less importance attached to basic statistics.

When I got interested in this about 18 months ago, I had an understanding of what I wanted to do, and went back to college to get the maths and stats skills that are handy here. I’ve been programming for 12 years so I don’t worry too much about the necessary programming skills. What worries me a little is that we will wind up with a lot of people calling themselves data scientists on the basis of a few Python scripts and not a lot of understanding of the actual data.

When people focus on concerns about big data, they talk about the skills squeeze (see this New York Times piece, for example, and Jason Ward of EMC in Ireland on a similar subject) and not the actual underlying business of data science. A key issue is that datascience is not just about dragging out the data into a spread sheet, pressing a few buttons and going “hey presto”. Harvard Business Review had a useful piece on this which I recommend looking at.

Hilary Mason has an interesting piece on Getting Started With Data Science. Another place to start is probably to take a step back and try and describe what you’d want to do with a lot of data. What matters with data, no matter what scale it’s on, is how you interpret it. If you really want an example of why this is important, although it’s not data on a massive scale, Nate Silver’s work on the last US election is a good place to start, particularly given that he and a number of other analysts disagreed on the data interpretation. You need to recognise that data is to inform, not to be bent to your needs and that sometimes, that data will not tell you what you want to hear.

I’d agree with Hilary in that communication is a key skill which very often gets forgotten about, but this issue is not limited to data analytics.

So, this brings us back to what big data is, or is not, and whether it really means what we think it means. I really do think it means so many different things to so many people that as a label, it’s functionally useless. Hence, I’d prefer data science as a label, or data analytics. In this way, you can highlight that yes, methodology matters, and statistical skills will matter.

In essence, I would answer Hilary’s original question as follows: Big Data lost its underlying rigour when it become a product to be sold rather than a job to be done.