Yelp have a data analytics challenge at the moment.
http://www.yelp.ie/dataset_challenge/
May be worth a little of your time.
Yelp have a data analytics challenge at the moment.
http://www.yelp.ie/dataset_challenge/
May be worth a little of your time.
Eoghan McCabe and a bunch of his colleagues came to UCD Computer Science the other day to have a chat with some of the 4th years and postgrads about how opportunities were changing in Dublin compared, in particular, to how things were when he graduated.
I’m older than Eoghan, and I’m a bit unorthodox in that my background is not really computer science but I did take an unusual journey through life and spent more than a quarter of my life (but not quite a third) working on IBM big iron. But he had a message which resonated quite a bit in that the opportunities available to graduates today have broadened quite a bit compared to what was available less than 10 years ago, and even more say, compared to what was available 20 years ago.
This is true in a monumental way; but the way it gets discussed rarely focuses on those changes. The concept of starting your own business, and the question of innovations is pushed a lot more than it ever has been before – it seems like every third level college has some sort of incubator program in place now. The whole market of available jobs has changed – there are a lot more interesting small software firms springing up of which Intercom is obviously one, and there are a few more getting ready to push from America to Ireland like New Relic. The big institutional employers are basically not the only show in town and this is fundamentally important because people are not uniform and they tend to thrive in different environments. We have this tendency in humanity to go with the one size fits all approach in the face of overwhelming evidence that in fact, one size has never fitted all.
I’m not a fourth year – I have 20 years work experience under my belt and not all of it has been in the technology arena. But I do believe that when you have a widening of employment and employer culture, it fundamentally benefits society and supports general growth.
One thing which we did discuss however is the tendency of people to think that Silicon Valley can be recreated here, and the tendency of politicians in particular to think about recreating Silicon Valley in Ireland. I think this is unrealistic because mostly it rests on an incomplete understanding of what drives the Valley at the moment – and also, the fact that what drives the Valley has evolved over time. Possibly the weather helps a lot but a key feature which supports the structure in California is probably the finance.
So I do wish, sometimes, we could recognise that this, along with a friendlier approach to failure, are key components of how you drive a start up culture. The last time I heard a politician in Ireland discuss this, he just wanted to import more people to work here.
More than anything, however, I wish that we got shot of this idea of wanting to Be Like Something Else. I’m pretty sure the valley infrastructure won’t last forever; it’s not even that unique as there are similar things happening in the northeast United States, in Berlin, and to a lesser extent in London, in terms of funding interesting ideas. Something or someone will come along and seriously disrupt it; that’s what happens. Or, more possibly, a tech bubble will blow up.
In the meantime, the funding available to start ups in Ireland is on a small scale. When you consider the amount of investment money that went into property in 2006 – some 40% of lending for new developments were for buy to let investments – you have to wonder whether the issue isn’t so much that we don’t have the money to generate a start up scene of some description here, probably with a more limited utility focus, or idea factories but that we misapply it.
So companies like Intercom wind up going to San Francisco to get funding. I do honestly believe that understanding this is important for generating a local start up culture,
On a related note, Eoghan made two remarks which I thought were worth remembering.
This, I think is good to know, even if you’re 25 years old.
On a completely unrelated note, there was something I really liked about Intercom before Eoghan and his colleagues came in to talk and that is that Code Kata ran there on a Wednesday morning. I made it in there one morning but I liked the idea of doing something like that not just from a networking point of view but from a diversity point of view – yes, there were mainly men there (I think I was the only woman the day I did go) – but because people from different companies tend to have different cultures. In many ways, it was illuminating.
The one question I hate to hear asked is “What are we doing about Big Data“?
Seriously, what are we doing about Big Data? There is no right answer to this question. What have you been doing with your data all along? Nothing? Managing it in silos?
No one should be asking “What are we doing about big data?”
The question is “How can we better exploit the data we have to improve our bottom line?”
Big Data is not an amorphous cloud. You might not even be a big data shop – are you really generating that much data? How much of it are you marrying together? What do you want to get out of it? Do you still expect to summarise it on a PowerPoint slide deck?
If someone were to ask me now, what are you doing about big data, here is what I would say first:
A lot of companies have neither, to be honest, and there is very little you can do with data if you do not have that overview. This – incidentally – is why data science is sexy. A data scientist isn’t someone who plays with big data – it’s someone who plays with all your data and does things with it you might not have imagined for the simple reason that, for example, all your data stream are kept separately.
If you have not got someone with a company wide overview, are you prepared to put someone in place who is not department specific? Someone who has access to all your data, and not just the data of one department? Are you going to break down the silos for your data?
Big data has a rather movable definition, but the definition I tend to work off is Hilary Mason’s: it’s data that one machine cannot handle on its own. After that, the worth is not in that it’s big, or you have a lot of it, but in what you do with it. I hate the word, but how you leverage it. The creativity does not lie in the extent of the data but the vision applied to it.
So, the next time someone asks, what are we doing about big data, what are you going to say?
Before I start into this piece properly, I want to make the following point absolutely crystal clear. None of what I say applies until we handle some primary skillsets adequately. They are as follows:
In other words, these three skill sets are the foundation for the education system.
Now. Back with the Year of Code.
The powers that be in the UK have decided to put in place an initiative called The Year of Code. You’ll find a few details here, so happy reading. The key motivation, apparently, is to fill a coding skills gap.
This bit, I thought, was interesting:
Such endeavours mark the build up to September, when computer coding will become a compulsory part of the curriculum for every child over five.
I am sure someone thinks this is a very good idea. I am not one of them. I do honestly think you’d get a lot further with teaching people to code – kids aged five – if you made sure they could read and write first. And count. Coding without some numeracy skills just isn’t going to happen. And this is from someone who has been pushing Scratch for 10 years. Scratch – by the way – is a computer programming language developed by MIT to help children to learn to program.
So. There have been comments about the Year of Code. Its public face did not do very well on BBC Newsnight during the week. She cannot program. And the discussion is full of comments about how easy it is to code. It is very easy to code when you are typing what is in front of you.
I bang on, from time to time, about data in itself being pointless if you don’t sit down and work out what questions you want to ask it. Programming has a similar dimension. Anyone can write – environment set up aside:
print(“Hello World”)
and that’s a program.
But I don’t spend my day whiling around writing strings to a screen. I use it – for example – to automate calculations I do frequently. I use it to run statistical analysis. In my entire life I have never spent one Saturday developing an application that answered a question I did not have. Some of those questions have been assignments, some of them are things for myself (there is a nice little R script under production to pull the figures for property sales in Cork apart). Some things have been websites. Programming and writing code has always had a planned output.
So I don’t necessarily think focussing on code is the primary thing you should be doing here. Focussing on problems people can solve, that’s a far more important skill. And you need elite communication skills to be able to do that.
Not a lot of people remember now, as they wander around with their iPhones and Androids, that 60 years ago, there wasn’t much in the way of computering power outside the government. The first commercial computer to come into Ireland was, as far as I am aware, bought for Aer Lingus, and in fact, one of the first commercial problems to be solved using computers was the whole airline reservation thing in America. Legend has it that issues in the manual process in booking tickets led to the boss of IBM and the boss of American Airlines winding up bumped off a flight due to overbooking caused by failures to keep records in several airports aligned and so, over coffee, in a position to have a chat about how this could possibly be made more efficient leading to fewer people getting bumped off. We think we have it bad now.
Anyway, the point of that story was here is a problem – chaotic air ticket bookings getting lost, duplicated, overbooked – and there is a man with a vision, a bunch of highly paid computer geniuses and some money – who allowed the problem to get (reasonably) resolved. Every day, someone has a problem, and someone fixes it.
When we focus on the response, and not the recognition of the problem first, we are not really teaching people to code. We’re teaching them to regurgitate. So being honest, focussing on code rather than problem analysis is probably a bad way to go. Doing it at age 5 when you’ve not fully covered literacy and numeracy, that’s not ideal either.
Moving back to the year of code, I don’t like what is essentially a PR initiative. The assertion that, for example, we can teach teachers to code in a day, is wildly inaccurate. You can’t. And yet, there are going to be courses doing just that.
I learned to code when I was 13 years old. A bit, that is. I learned some basic from a massively inspirational maths teacher who swiped a week out of his schedule to teach 29 13 year old girls to write some basic and again, to work out how you might break down a problem. I stopped when I was 14 for some reason and I started again when I was 27. I do honestly believe that children should learn to write programs but that this is not really practical without the supporting skills of reading, writing, numeracy and breaking problems.
So the objective of this is to plead – in Ireland – please do not implement a PR exercise like this. Do something a bit more indepth. Talk to the people who run with Coderdojo in Ireland – we are getting hundresd if not thousands of kids up and down the country into schools and halls on Saturdays – ie outside school hours – and identify what drives this; what makes them enthusiastic to do it. When you put money into getting 30 Raspberry Pis into a school, learn how to use them creatively. Treat the computer lab a bit like a woodwork lab, where things get tried and tested. Raspberry Pis are not expensive, and if one gets fried the odd time, so be it. They can very often be fixed by formatting the SD card holding their operating system. Load the lab up with stuff from Adafruit. IT and programming covers a multitude between messing around with hardware (program up those Christmas lights and motion controlled webcams). They are not typically expensive – not in the way that Apple iPads are – but from a technical and programming point of view are enormously learner friendly. And teach kids the wider skills of recognising the computer equivalent of “I want to make a table, how do I achieve this”. Focus on the steps they make to do this rather than the end result.
This is a skill more valuable than anything. The one that doesn’t make you give up at the first hurdle.
Make this a general education policy. Not a PR push. And make it inspirational.
I see a lot of commentary about how some people aren’t talented for programming skills, and, indeed for language skills. We don’t tend to tolerate this from reading any more (although we still do for basic numeracy and in this country, foreign languages).
The simple truth is society changes and reading and writing become universal.
This can be true for analytic thinking and problem breakdown. And programming.
In the meantime, I’d favour teaching 15 year olds how to use Python to do maths calculations rather than a calculator but that’s just because that’s the way I do it. And Scratch. Don’t forget Scratch.
Yesterday, the world learned that Virgin Atlantic were planning to use Google Glass for their customer relations management. The world also learned that Virgin Atlantic were planning to use Sony Smart watches as well but for some reason, that got sort of ignored. I don’t know why.
Virgin’s press release is here. If you do a little reading, you find out they are doing it in cooperation with SITA, whose press release is here.
Needless to mention, it generated a lot of notice, mostly about Google Glass, and not a lot about what Virgin Atlantic were actually doing. So the first question you really have to ask is are they doing anything particularly new on the business process point of view. The answer appears to be no.
Here’s the money quote from the Virgin Atlantic:
Virgin Atlantic, working with air-transport specialist SITA, is the first in the industry to test how the latest wearable technology, including Google Glass, can best be used to enhance customers’ travel experiences and improve efficiency. From the minute Upper Class passengers step out of their chauffeured limousine at Heathrow’s T3 and are greeted by name, Virgin Atlantic staff wearing the technology will start the check-in process. At the same time, staff will be able to update passengers on their latest flight information, weather and local events at their destination and translate any foreign language information. In future, the technology could also tell Virgin Atlantic staff their passengers’ dietary and refreshment preferences – anything that provides a better and more personalised service. During the six week pilot scheme, the benefits to consumers and the business will be evaluated ahead of a potential wider roll-out in the future.
My emphasis. With one possible exception, Virgin staff are doing nothing new here from a business process point of view:
There is nothing really all that special here – if you like, the key difference is the method by which they are managing existing processes. Frankly, I doubt very much whether they are carrying out check ins using Google Glass – a bit more information from SITA would be nice in that respect because the language of the press release is interesting to say the least “start the check in process”.
But all of this could be done using technology which has been around for a few years – and it is entirely possible that Virgin Atlantic are already doing it – using things like d iPads so again the question, is what is this adding?
SITA Labs have already done a lot of research in this area and some of the applications are nice. They have a press release here and it has some interesting stuff in it. This is an interesting quote in the context of the Virgin Atlantic story:
Travel documents and loyalty cards can be scanned by smart glasses.
But
Peters added: “Specifically, our research at SITA has shown that for any type of use in the air transport industry the technology needs to be more robust to avoid breakages and the cost will have to come down. The camera quality will also need to be enhanced. Currently it requires near perfect light conditions within the airport for scanning documents to be successful.
This dates from October 2013 by the way and specifically, smart glasses were being looked at in the context of scanning barcodes. And they weren’t, at that time, up to the job on a day to day basis. It may be telling that the Virgin Atlantic trial focusses on a subset of passengers – a very small number.
SITA’s description of what the wearable devices are being used for is interesting:
Airline staff are equipped with either Google Glass or a Sony SmartWatch 2, which is integrated to both a purpose-built dispatch app built by SITA and the Virgin Atlantic passenger service system. The dispatch app manages all task allocation and concierge availability. It pushes individual passenger information directly to the assigned concierge’s smart glasses or watch just as the passenger arrives at the Upper Class Wing.
They really can only do this if they already know who the passenger is before they get to the Upper Class Wing, usually because they arriving in a limo which Virgin Atlantic already know about.
So what do I think about this?
Well, based on all the available information, Google Glass is, at best, replicating existing utility. Now you could ask the question is it really necessary to do that when we’ve got paper and iPads and the like but that is not really the right question. The question is does it make the experience more efficient for both Virgin Atlantic and the passenger. That is open to debate, and it is open to debate for this particular quote (also highlighted above).
In future, the technology could also tell Virgin Atlantic staff their passengers’ dietary and refreshment preferences
Airlines already have to ensure their staff are aware of dietary requirements for passengers, for vegetarians for example. So the interesting thing is right now, Virgin Atlantic’s implementation of Google Glass doesn’t appear to be able to deal with this sort of information. One key reason for this – right now – is that Google Glass is not being implemented in the business processes that involve the need to have that information, which for the most part, is probably cabin service in the aircraft. It is possible that it might well be useful in the lounge service for Upper Class passengers – but this service is not available to all Virgin Atlantic customers. It remains to be seen whether they will implement the hardware in the cabin – my gut feeling is that it will require regulatory agreement so it’s not going to happen soon.
What is happening is they are accessing information already in their possession using a different device. Where once it was a computer, or possibly a tablet, it is now some sort of wearable device.
They are replicating existing processes. Whether there is a gain for them in so doing – their press release talks about the glamour of flying and I don’t see this having an impact on that – is open to debate, and it’s what a 6 week trial is all about.
They are not using the devices to collect new data in the customer interaction zone at this point in time. and this is an important point. And if they do, well there are other considerations to take into account before implementing them.
Right now,, I would take the view that Virgin Atlantic are fully aware of things like data retention legislation and data protection. I certainly would not assume that they are hopping down the road to matching passengers up with their dietary requirements using Google Glass because they already do that using good old fashioned data entry and in any case, they have not implemented a business process with Google Glass applying that type of data at this point in time.
I will be very interested to see how this trial works out – I must make a note to check with SITA’s social media channels in about 8 weeks’ time to know if they will at least provide some sort of feedback given that this caused quite the bit of noise.
Another data cry for help. I’m trying to identify all the state bodies and the members of their boards to do some membership analysis
I have a list as follows:
It’s not a very long list so I know it’s non-exhaustive.
I’m interested in the state bodies I have missed like the Arts Council and other similar organisations and I am interested in a comprehensive list of the members of the boards for each of these organisations.
If you can suggest organisations I have missed, that would be a great start.
Thanks a million.
Hi folks,
This is the first of occasional small requests for help on the data front.
I have a tiny little poll going here on the subject of captchas. Three simple questions with yes/no answers.
I am keeping it open for another week or so and then I will publish some comments on the outcome.
______Interim results suggest fewer than half my respondents are aware that reCaptchas are used to solve OCR problems. This is interesting.
Update Word – if you use it – with a couple of extra styles:
When you are creating the style in Word 2013, you can tell the software to use this in all new documents as well and make it part of the Word normal template or the default. This is useful if you don’t want to build a separate template.
The other thing which may be useful is something highlighting action points and completed action points. I tend to use bold and and again, different colours, and for the case of completed action points, strike through.
People handle work and coding differently – I tend to like to have a commentary file of what I am doing, what I am trying to do, where I am stuck, how I’ve resolved problems, for each project and this is to ensure I don’t have to build a brand new document with new styles every time. Useful information on customising Word is here – I don’t recommend doing everything he suggests but there are ways of making it more helpful for you. If you’re not familiar with styles, they are useful to be able to work with.
Declaration of interest: I am doing a lot of learning in the area of machine learning, classification, recommender and personalisation systems at the moment (at least compared to 3 months ago).
If you were to look at the recommendations which Amazon offer me in the area of books, you’d probably wonder a little about me. The two front runners, content wise, are ethnic recipe books, and machine learning related programming or algorithms.
I go through them every once in a while, usually late at night, and update them with useful information such as which of the recommendations I already own, and which I absolutely don’t want. And I might occasionally add something unexpected to my wishlist.
This has a fascinating impact on my recommendations. Last night, the addition of a single machine learning book to my wishlist had the net impact of dropping the number one recommendation, a cookery book called Jerusalem, down to number 6. A subsequent addition of an Edward Tufte datavisualisation book caused two new datavisualisation books to get into the top ten including one I had never heard of at number 3 (after Jerusalem got pushed down to number 6, Stephen Few wound up in number 1 with a book called Show me the Numbers). I haven’t decided yet whether I want Jerusalem or not either; I have over 100 cookbooks so theoretically, I can’t argue that I need it.
Deletions of books I wasn’t interested in usually resulted in the list just shuffling up a bit. Additions to the wishlist caused changes to the content of the list. From this I can conclude there’s a greater weight given to additions to the wishlist rather than deletion from the recommendation list. I would love to see the underlying datastructure and code for this. There’s this but it’s 10 years old and I have no doubt but that they’ve done a serious amount of work in the interim.
What does all this mean for the supposed content of this blog post? Well I realise that the Amazon data set relating to me is large and gathered over around 10 years at this stage, but deep down a part of me would like to do a little more research into it.
However, during the week, I was also considering recommender systems for less frequently used services and in particular, airlines.
Recommender systems work best if you have a decent picture of your individual customer at the point of loading up the site. Amazon does this using accounts. If you have a look at the airlines, in general, they have a mixed experience in that front. The majority of them offer you some form of registering, although not all, some of them allow you to connect your account to a frequent flier card, and some of them allow you to create an account.
However, I’m not sure how many of them compel you to create an account to book a flight directly with them. I’m pretty certain that the last few times I booked airline tickets, I did so without an account.
This is not necessarily an impediment to providing some personalisation services. While I do have a Hotels.com account, for example, they are well capable of remembering where I was last looking for hotels even if I haven’t signed in with my own account.
There is an issue, however, in that the airlines are already perceived to, perhaps, game that sort of idea by providing you higher charges the second time you look. This isn’t ideal from the point of view of endeavouring to provide any sort of personalisation and recommendation system.
The other key issue is that arguably, how do you provide personalisation services to a cohort that doesn’t buy airline tickets every other day (or at five past midnight when they can’t sleep)? If you take any of the major airlines, they carry millions of passengers, and by definition, a lot of them have to be duplicates courtesy of return ticketing, business travelling, family visits. The airline business got on the loyalty business early with the frequent flier cards but again, the picture of airline travel has changed a lot for a lot of the market since those things were invented. There is not necessarily a lot in common between your Netflix recommendations and your frequent flier points.
I have no doubt work is going on in this area – check this out from Rick Seaney in USA Today – however, what follows are some of my own thoughts on the subject.
There are a couple of things which I could see coming out of this.
Here’s something that would certainly buy my interest immediately if, for example, I was travelling to Paris every Monday morning and coming back on a Tuesday evening for business. Provide me a login that generates a page that has two buttons: Paris and Other. The Paris button could be prefilled with the most likely routing/timing options if they are available. Or, Sorry Miss Lynch, your usual flight is fully booked. Allow me to create another personalised button based on possible plans. For example, I might want to fly to oh, Malaga to go kitesurfing in Tarifa maybe six times a year. Let me build one of those so that my landing screen is Paris, Malaga and Other. Include sports equipment as an option by default in the Malaga booking. Learn enough about me to know that, for example, I have annual travel insurance, and don’t try to sell me more. Know enough about me to know that if I am flying into Nice, I’ll hire a car, but not if I fly into London. Even if I am not booking, it might be worth letting me build dreams on your site like this for three reasons:
You can make it clear you are not locking down a fare at that point, but you do get a picture of some of the possible bookings on that flight and this may have an impact on how you manage bookings on that route around those dates. While you’re at it, keep an eye on possible efforts to game your recommender system and identify it as a class of behaviour.
Based on the information I provide when I am booking, airlines can obtain enough data to do this, even without tying the behaviour to an account. However, right now, this is not the approach that they take.
But here’s something else you could do.
Suppose I click on my Malaga button and the flights for the dates I choose are full. Maybe there is some golf competition on there and you know this because you’re good at knowing when events are on but the average kitesurfer might not care about the European PGA. Or it’s the week before the school holidays. Or O’Reilly have decided to run a big technical conference down there. Any number of reasons, but the flight from, say, Dublin to Malaga is full. Or any flight to Malaga is full depending on where I am living.
If I, as an airline, know that a lot of kitesurfers take their kitesurfing gear to Tenerife, or, at least have built potential bookings, I could suggest Tenerife as an alternative – a targeted alternative (particularly if I am flying alone), with the practical date data already provided for Malaga filled into a new booking form. Or if Tenerife is your first choice, Lanzarote is a viable alternative. Or Faro. Or Madeira. Based on the time frame and the amount of money concerned, and whether you interline with anyone, you have endless opportunity here. Clearly someone going golfing in Portugal for four days is not going to want to fly 11 hours via London to somewhere in Italy – but someone going for 14 days might consider a non-direct option.
Of course to do this, you need to know that my sports gear is kitesurfing equipment. But this is not impossible. And of course, you’ll never ask if I want to bring kitesurfing equipment on my regular Monday morning trip to Paris because you know already I don’t. If I don’t have much of a direct history with you, the data you have on other people can be leveraged to build a feature set to classify me.
The point I am trying to make here is that, publicly, there is a perception that airlines basically use whatever personalisation options they have to increase the fares by trapping you. Airline yield management is complex so with the best will in the world, it’s never likely to be quite that simple. But if airline personalisation tools made life easier for their customers, they might engender a lot more repeat business, particularly now. Obviously gaining that trust in a way which is not perceived to be creepy is going to be a challenge because it’s based on knowing a lot about your customers which is something a lot of people go out of their way to discourage. I mean, I know people who deliberately like to confuse Amazon about their taste in books and music – I’m not one but then you’re talking to someone who got pleasure out of checking out how her recommendations changed by updating her wishlist.
Another interesting thing which could be done with this sort of model of engaging with your customer, based on what you know about them is telling them how many seats are available or grading the flight as commonly searched Hotels.com does this with hotel rooms. Two rooms left at this price. This is useful because while it may not cause me to book at that point in time, it’s hardly going to come as a shock to me that the price of a room in the Georges V in Paris has increased in the last two or three hours since I managed to get my travelling companion on the phone. It provides some trust. If my flight is rated Red for popular, I’ll know I am competing with, for example, 5000 Munster fans for that last seat on a flight the day before a match.
All of this is only possible if my customers trust me to use this data effectively to support them and not, specifically, to abuse them. I mean, if I assume someone who books every Monday morning will always book every Monday morning and start applying stealthy price increases to them that I do not necessarily apply to non-regular passengers, I will wind up with some public relations issues. And the loss of regular streams of income.
In summary, I believe it is possible to personalise the booking experience to the benefit of both passenger and airline. I can see that hotel booking agencies are already working in this area but I think there’s even more potential there. Even after the booking experience is personalised down to the nth degree, this information could have a huge impact on targeting promotional emails (which is something, in my experience, the hotels aren’t quite getting right yet).
I’ve started a datascience twitter feed for Ireland. Mainly I’ve done it out of frustration that there isn’t one and that searching for them isn’t always straightforward and also because that’s my own area of interest.
I’m particularly interested in those jobs with datascientist as their job title; I will consider other titles on a case by case basis – in particular if you’re looking for a data analyst to run simple reports, that won’t get listed here. I will look at machine learning related options and I will consider PhDs in the area of analytics and machine learning if you’ve gone one. Key requirement is that they be based in Ireland.
The twitter feed is here: http://www.twitter.com/datascience_ie and if you’ve got one, either DM a link to the description to me or send a link via email to datascience [at] treasalynch.com