Falling out of love with Amazon

I remember a time when I used to love Amazon. It was back around the time when there was a lot less stuff on the web and it was an amazing database of books. Books, Books, Books.

I can’t remember when it ended. I find the relationship with Amazon has deteriorated into one of convenience more than anything; I need it to get books, but it’s doing an awful job of selling me books at the moment too. Its promises have changed, my expectations have risen and fallen accordingly. Serendipity is failing. I don’t know if it is me, or if it is Amazon.

But something has gone wrong and I don’t know if Amazon is going to be able to fix it.

There are a couple of problems for me, which I suspect are linked to the quality of the data in Amazon’s databases. I can’t be sure of course – it could be linked to the decision making gates in its software. What I do know is it is something I really can’t fix.

Amazon’s search is awful. Beyond awful. Atrocious. A disaster. It’s not unique in that respect (I’ve already noted the shocking localisation failings for Google if you Are English Speaking But You Live In Ireland And Not The United States When Looking For Online Shops) but in terms of returning books which are relevant to the search you put in, it is increasingly a total failure. The more specific your search terms as well, the more likely to are to get what can only be described as a totally random best guess. So, for example, if I look for books regarding Early Irish History, then search returning books on Tudor England are so far removed from what I want that it’s laughable. On 1 May 2015 (ie, day of writing) fewer than a quarter of the first 32 search results refer to Ireland, and only 1 of them is even remotely appropriate.

Even if you are fortunate enough to give them an author, they regularly return searches of books not by that author.

I find this frustrating at the best of times because it wastes my time.

Browsing is frustrating. The match between the categories and the books in those categories can be random. The science category is full of new age nonsense and it often is very much best selling so the best sellers page becomes utterly useless. School books also completely litter the categories, particularly in science. I have no way of telling Amazon that I live in Ireland and have no real interest in UK school books, or, in fact, any school books when I am browsing geography.

Mainly I shouldn’t have to anyway. They KNOW I live in Ireland. They care very much about me living in Ireland when it comes to telling me they can deliver stuff. They just keep trying to sell me stuff that really, someone in Ireland probably isn’t going to want. Or possibly can’t buy (cf the whinge about Prime Streaming video to come in a few paragraphs). Amazon is not leveraging the information it has on me effectively AT ALL.

The long tail isn’t going to work if I can’t find things accidentally because I give up having scrolled through too many Key Stage Three books.

Foreign Languages: Amazon makes no distinction between text books and, for want of a better word, non-text books in its Books in Foreign Languages section. So again, once you’ve successfully drilled down to – for example – German – you are greeted with primarily Learn German books and Dictionaries, probably because of the algorithm which prioritises best sellers.

How can I fix this?

Basically, Amazon won’t allow me to fix things or customise things such that I’m likely to find stuff that interests me more. I don’t know whether they are trying to deal with these problems in the background – it’s hard to say because well, they don’t tend to tell you.

But.

  1. It would be nice to be able to reconfigure Treasa’s Amazon. Currently, its flagship item is Amazon Prime Streaming Video, which is not available in Ireland.Amazon knows I am in Ireland. It generally advises me how soon it can deliver stuff to Ireland if I’m even remotely tempted to buy some hardcopy actual book. Ideally they wouldn’t serve their promotions for Amazon Prime Streaming Video, but if they have to inflict ads for stuff they can’t sell me, the least they could do is let me re-order the containers in which each piece of information appears. So I could prioritise books and coffee which I do buy, over streaming video and music downloads which I either can’t or don’t buy from amazon usually.
  2. It would be nice to be able to set up favourite subject streams in books or music or dvds. I’d prefer to prioritise non-fiction over beach fiction, for example.
  3. I’d like to be able to do (2) for two other languages as well. One of the most frustrating things with the technology sector is the assumption of mononlinguality. I’d LIKE to be able to buy more books in German, in fact I’m actively TRYING to read more German for various reasons, and likewise for French.
  4. I don’t have the time to Fix This Recommendation. They take 2 clicks and feature a pop up. As user interaction, it sucks. I’d provide more information for fixing the recommendations if I could click some sort of Reject from the main page and have them magically vanish. Other sites manage this.

But there are core problems with Amazon’s underlying data I think. Search is so awful and so prone to bringing back wrong results, it can only be because metadata for the books in question is wrong or incomplete. If they are using text analysis to classify books based on title and description, it’s not working. Not only that, their bucket classification is probably too broadbased. Their history section includes a metric tonne of historical fiction, ie, books which belong in fiction and not in history. If humans are categorising Amazon’s books, they are making a mess of it. If machine learning algorithsm are, they are making a mess of it.

There is an odd quirk in the sales based recommender which means that I can buy 50 books on computer programming but as soon as I buy one oh book of prayers as a gift for a relative, my recommender becomes highly religious focused and prayer books outplay programming books. Seriously: 1 prayer book to 50 programming books means you could probably temper the prayer books. Maybe if I bought 2 or 3 prayer books you could stop assuming it was an anomaly. This use of anomalous purchases to pollute the recommendations is infuriating and could be avoided by Amazon not overly weighting rare purchases.

I’m glad Amazon exists. But the service it has provided, particularly in terms of book buying, is nowhere near as useful as it used to be. Finding stuff I know I want is hard. Finding stuff I didn’t know I wanted but now I HAVE to have is downright impossible.

And this is a real pity because if the whole finding stuff I wanted to buy was easier on the book front, I’d be happy to spend money on it. After all, the delivery mechanisms, by way of Kindle etc have, have become far, far easier.

In search, Google’s localisation seems to be poor

Google are able to identify my location via useful clues like the GPS on my phone, and, I suppose, a reverse look up of the IP from which I connect to the internet sometimes. On my computer, Google knows exactly where I am, down to demonstrating my location when I open Google Maps, for example. There are additional clues: I’ve told it, in the past, that I am based in Ireland, and, mostly, when I run search, it is via Google.ie.

But it has become increasingly useless as far as finding outlets for online shopping. Today, I am looking for top spiral bound A4 notebooks – we’ll skip why exactly that is the case because it doesn’t matter. Google returns to me, as top search results, companies uniquely in America. This problem is not unique to top spiral bound A4 notebooks – I have had similar frustrating experiences with art supplies. There could be a thousand stationery shops in the UK, Ireland, and most of Europe, and Google still seems to think that someone based in Ireland is going to order off companies in the United States of America.

I appreciate some of this is based on search engine optimisation carried out by the companies concerned, but given that Google’s sponsored links are generally regionally appropriate, or at least more so than the first 2 or 3 of its search results, it would help if the organic search results were also regionally appropriate.

There is a wider issue with Google in my experience, however; while it provides services in a large number of languages, and provides online translation facilities, it seems to mainly operate on the assumption that most of its users are monolingual. I generally have an issue with Google News on that front, and have basically set up a feed from Twitter to pull news from a number of different source languages. For all the media organisations which Google News serves, it doesn’t seem to cope well with the idea that people might be more than monolingual.

Bookmarks in Chrome

Google rolled out updated bookmarking to the main version of Chrome  lately. It arrived on my desktop a couple of days ago.

I do not usually spend much of my time checking out Chrome betas so I was unaware that this functionality – I use the word reservedly – had been a part of Chrome betas for the last year. I could be obnoxious and say I don’t need it and I didn’t want it. But more to the point, I haven’t worked out what utility it adds for me at all.

I first discovered Google’s Chrome team had done something with book marking when someone on Facebook complained that it now took them 4 clicks to book mark stuff. This was a warning.

I use bookmarks quite a bit. They are also reasonably well organised and I have an overview of how they are organised. All I really want from them is a list of websites and whatever their browser address button icon is.  Oh, and I want as much of an overview of them as possible.

This is not possible with what Google have done. They’ve replaced list of bookmarks with tiled icons of images pulled from the site. This is a chronic waste of space and vastly reduces the amount of useful information you get on a single page.  This is the default.

For anyone who has bookmarks sorted in folders, the folder list display now has a significant amount of white space between the folder names, effectively halving the number of folders that you can have an overview of any one time.

In addition, they have gotten rid of the folder tree option, which, if you’ve actually organised your bookmarks into folders and subfolders, means there’s a lot of information you cannot access any more. Subfolders appear as tiles on the righthand side of the dividing bar instead.

Google have provided a list view. However, this still doesn’t give you the tree overview, and more to the point if you are clicking on subfolders, the change of display from one subfolder to its contents and vice versa is animated. It is enormously distracting and I hate it.

The interface for bookmarking items via the star icon in the browser bar has been changed and now includes a large image which is not exactly necessary, clutters the interface and wastes space.

There is a method under the hood where you can configure Chrome not to use this clinically insane change – it’s not a user enhancement – and I will apply it. But I cannot count on Google to leave that backdoor option in place.

It’s one thing to provide what they think is enhanced utility (and it is entirely likely that for some people, the tiles display is useful – it just isn’t for me as I prefer a tree list and set of icons instead and they don’t need to be animated.

Other complaints include the fact that you can no longer sort bookmarks alphabetically. Google expects you to search these things, you see.

Google have a product page where this is being discussed. Feedback is universally negative. They have said they want feedback through the gears icon in Bookmark Manager where apparently 25% of the feedback is positive. That’s still a lot of negative feedback.

Ultimately, Chrome is Google’s product, and they provided it for free so yes, if they want to make changes that annoy the wider user community, they can. It is also unclear whether enough of the wider community is impacted by this. The extent to which people use bookmarks varies, and the underlying methods by which people use them varies. Google is happy enough to annoy a few million people when it suits them (Google Reader is a key example of that). Presumably, they are going somewhere with this that is not completely clear to Chrome users at the moment. I’d have to hope they are because otherwise, they’ve foisted a change for changes sake, reduced and wrecked usability, all for the sake of shiny and new.

The thing is, it’s possible that in fact, the sake of shiny and new is what drove this. The technology sector has forgotten that it’s basically a support industry and thinks it’s now a disruptive industry.

 

Large wave events in Ireland

A couple of years ago, Professor Frederic Dias and some of his colleagues published a list of large scale wave events around the coasts of Ireland, linked with various causes, and from the historical record. It’s an interesting paper, and if you have any interest at all on sea behaviour around Ireland, it’s worth a look to get a picture of some of the weather/wave related impacts on the country historically. This is linked to the Multiwave project, page here.

They could do with a little help if you’ve experienced any extreme wave events. They’d be grateful if you could fill out the form linked on this page (pdf) or this one (word)  and send it back to them.

Language learning

I found myself taking part in a discussion on language learning this morning and thought it might be worth a while to drop in some things that are on my bloglater list. I will develop them in more depth later maybe but this is just an overview of them.

  1. on average, twice as many girls study languages at school leaving stage in both the Irish leaving certificate system and at A-level stage in England/Wales
  2. in absolute numbers, more students study higher level French in Ireland than study A-level French. A-level students have a higher average grade than HL Leaving certificate students and almost 30% get an A or higher at A-level, versus around 13% in Ireland.
  3. After French, the second most popular A-level foreign language is Spanish where the number of candidates is higher than for HL LC candidates.
  4. Spanish is the only language where there are more A-level candidates than HL LC candidates.
  5. The second most popular language for HL LC is German.
  6. HL LC statistics give figures for Italian; the A-Level stats didn’t, but interestingly, did give figures for Irish. If they were higher than Italian, then the figures for Italian are extremely low at A-Level stage.
  7. Amazon has opened up its Kindle store to include significantly more foreign language literature than was previously the case.
  8. The internet makes access to foreign language media significantly easier than was previously the case
  9. Facebook allows you to customise your newsfeed sources to include foreign language media options more easily than Google does. Google News, however customisable it is, is still a fiasco in that respect. It is distinctly monolingual – so while I can easily pull in foreign sources, those foreign sources are still English language.

With respect to the A-Level  HL LC comparison, there are serious difficulties in doing a qualitative comparison given feature differences between the two exam systems, viz, in terms of mandatory subjects and de-facto mandatory subjects. The Leaving cert is a marginally less specialist set up and it is worth noting that the comparison figures above are specifically higher level figures and do not include the high number of students taking ordinary level studies. Students at LC level take 6 to 7 subjects whereas A-level tops out at 4 usually. Irish, English and mathematics are defacto mandatory in Ireland – nearly every single students takes all three – and most university requirements include a minimum of some sort of a pass in a foreign language module. Hence, the motivations are different. This may be reflected in the average grades which, for A-level, are across the board, higher.

Data sources:

  • HL Leaving certificate: www.examinations.ie
  • A-level: www.theguardian.com/news/datablog/2014/aug/14/a-level-results-2014-the-full-breakdown

David McWilliams and teachers

David McWilliams has a piece in today’s Irish Independent which basically suggests the problem with teachers is with teachers themselves, mainly, they are too stuck in the mud.

Debate on education in Ireland  is rarely if ever balanced and the current hassle causing discussion is linked to changes to the Junior cycle assessment. However, when you see people contributing to debate in a manner that includes “why don’t you leave it, oh yeah, you’d lose your three months’ holidays”, and “Overpaid, jobs for life”, my soul fades away a little people. Arguments of this nature are ignorant. In Ireland, probably led on by the UK, we don’t trust teachers.

We don’t trust teachers.

We devalue their work, reject their contributions to debate on their industry and suggest their sole motivation is for the easy life of three years holidays.

And then we throw Finland at them.

In not one column discussion changes locally and citing Finland as an example, do we highlight core features of the Finnish system that made it special. It is a highly equal system and its core objective was social equality. Becoming best in literacy and numeracy was a fringe benefit.

So here are some features of the Finnish system which perhaps Irish people should want to take on board.

There are practically no private schools. 

All those rugby playing schools who populate the elite in Ireland? Gone. Nothing. I personally have no objection to this  – I think it’s a good idea. What private schools exist in Finland are subject to a few rules which I don’t think would go down well here:

However, even in private schools, the use of tuition fees is strictly prohibited, and selective admission is prohibited, as well: private schools must admit all its pupils on the same basis as the corresponding municipal school.

Source:

Teachers have a great deal of autonomy.

Teachers are highly valued members of society and their contribution is recognised as valued. Certainly the Finns pay their teaching staff less than we do, but only ignorant people would fail to see that this is part of a whole. Living conditions in Finland tend to be better across the board and the last time I was there, it was also a noticeably less expensive place to live in. Comparing purely in monetary terms is something we really should learn not to do. However, one of the key things which they do not do in Finland is treat their teachers as leeches on the system. We could learn a lot from how the Finns in general treat their teachers. We are not anywhere close.

University education is free

If you are arguing that we should follow the Finnish method, then you can’t do it on a pick and mix basis. Our universities want to charge fees.

When David McWilliams suggests that teachers should be open to change, he is missing the point. They are interested in change. They are interested in change which they can effectively implement, which gives them autonomy. We don’t respect teachers in this country and we certainly don’t listen to them.

If I wanted to give an example, it’s worth looking at the respect rugby referees get from their players versus the respect, or lack of it, football referees get from their players. We, in Ireland, play football with our teachers, and not rugby.

Most of the debates on education in Ireland do not focus on the core objective of education. Nowhere, in any of the pieces I have read recently, does anyone advocating Finland, highlight the core objective of the Finnish system was to reduce social inequality. We want to emulate their numeracy and literacy and be the same as them without taking the hard decisions that they did. You could argue that yes, the Junior Certificate could be marked locally by teachers. But that doesn’t magically make us perform like Finland. It’s a cargo cult approach to education.

In the same respect, you cannot honestly claim to respect teachers if you 1) insist they’re only in it for the holidays 2) suggest they quit if it’s not all that and 3) impose change on them without understanding where their objections are coming from. They deal with the every day price of teaching and the every day challenges of it.

We also need a wider discussion on what we expect from our education system. We have tech companies complaining that our university grads are not skilled enough. We have business people screaming that we don’t have enough language skills while paying absolutely nothing for those skills which are apparently in short supply. We get fashionable demands like “every child must learn to code” and “we should be teaching kids foreign languages from the age of 4”. These are fads.

There are a bunch of core skills on which every other part of learning is built. We need to identify them and focus on them. Reading, writing and basic numeracy are amongst them. Critical thinking is another one which is sadly absent in a lot of discourse on education.

When I look at the education debate in Ireland, it strikes me as poverty stricken.

This article – which may or may not have formed some of McWilliams’ research for the piece linked above is more nuanced.

Even in Finland, the reforms have met objections from teachers and heads – many of whom have spent their lives focusing on a particular subject only to be told to change their approach.

Finnish schools are obliged to introduce a period of “phenomenon-based teaching” at least once a year. These projects can last several weeks. In Helsinki, they are pushing the reforms at a faster pace with schools encouraged to set aside two periods during the year for adopting the new approach.

I honestly believe that we need to re-assess how we consider education. I’m not sure that doing it within the framework of criticising teachers for blocking reform is the most effective way of doing so.

Mercer Quality of Living Survey 2015

Much was made in Ireland, this morning, of the news that Dublin had rated highly in the annual Mercer Quality of Living Survey. This survey is carried out to provide some guidance for companies who are expatriating staff in terms of cost of living, suitable salaries for staff being relocated, and related matters. I have generally relocated myself so this is not something I have ever worried about but I had a look at the reports anyway.

Dublin came in joint 34th place with Boston. This placed it higher than London and New York, and depending on which reports you read, outranking either London or New York was the hook most of the media went with.

You can, with a little cooperating with Mercer, have a look at the data by clicking on “See Full List” on this page. So I did that because I wanted to have a closer look at the list and perhaps think a little more about whether, in fact, 34th place was good for Dublin or not. The only other Irish city on the list was Belfast and it came in at 63rd place.

One of the single most interesting things that struck me about the top end of the list was the prevalence of German speaking cities. The highest ranked English speaking city is Auckland. There are only two other English speaking cities in the top ten, namely Vancouver and Sydney which squeaks in at 10. Five cities are German speaking and of those five, three are in Germany and Switzerland.

No other country has more than one city in the top 10. Even if you stretch that out to the top 20 cities, Germany is still looking good.

Top 20 cities by country

Basically, a quarter of the top 20 cities ranked by quality of living, are in Germany. After that, if you stretch it out to include the top 50, the US squeezes in 8 cities. Germany still has 7. Australia has 6 which has to include pretty much all their major cities when you think about it.

Top50citiesbycountry

Once I was done being surprised at the prevalence of German cities in the top ten, and Australian cities in the top 50, the other thing which caught my interest was that realistically, none of the top ranked cities were particularly big.

Here’s how they rank, left to right, in terms of population.

Top 50 cities by population

and here’s how they rank, left to right, in terms of population density.

Top 50 cities by density

There is a point to be noted about the population figures. If you look up population figures for most of these cities, you will find a number of figures, namely the figures for the city’s administrative area, and a metropolitan area figure. Taking Paris as an example, its population is 2.273 million inhabitants. The Paris metropolitan area, however, includes around 10 million people. For Tokyo, the difference is even more extreme: its population is given as around 13 million inhabitants; its metro area as 35 million.

That being said, Paris and Tokyo are two of only 6 cities in the top fifty cities in this ranking whose populations exceed two million. After you come to terms with the idea that the best quality of living standards are basically in Germanic speaking countries, the next point to be picked up is that the best quality of living is in comparatively low population cities. The highest ranked city with a population greater than 2 million is Sydney which comes in at 10th place; the next highest is Melbourne.

An interesting feature about Sydney, and Melbourne, and in fact, other English speaking new world cities (so Auckland, Ottowa and Brisbane as well) is compared to most of the other cities around them in the rankings, they have very low population densities. In terms of population densities, the three high hitters are Geneva, Paris which is way, way out in front in terms of population density, and Barcelona.

So while you could suggest that there is a quality of living premium to be gained from living in comparatively small cities by population, the same pattern doesn’t exist in terms of population density. The vast majority of cities come in with a population density below 5000 inhabitants per square kilometre and above 1000. There are notable outliers either side of that band. All six Australian cities come in below 500 inhabitants per square kilometre including both Sydney and Melbourne, the two biggest Australian cities featuring in the list.

What I do not have access to at this point is a detailed description of the features on which this ranking is calculated and that is a pity as I would be interested to see what those features were, and how weighted they were, and more to the point, whether all of them were necessary.

I would also be interested to see on what basis cities were selected for review. The populations for Bern and Geneva, for example, are below 200,000. The lowest ranked city is Baghdad. Manchester does not feature which is surprising bearing in mind that Aberdeen does and it has less than half the population. Of the five UK cities in the list as a whole (not just the top 50), two are in Scotland. Only two cities from France feature. It is hard to argue that quality of living wise, Nice comes in somewhere below Baghdad. It is clear that the choice of cities is not on the basis of population but given Mercer’s primary business, it may well be in terms of the cities they get inquiries about.

From an Irish point of view, you could ask whether Dublin is doing well coming in at 34th place. Coming in ahead of London and New York City might look good except both London and New York are large cities and as already noted, larger cities are not ranking very highly here. Without knowing what the basic criteria for the survey were, it is, to some extent, guess work, to identify where the gaps are in terms of improvement. I would suggest that arguably, the following items could be addressed:

  • public transport
  • health system
  • cost of rental accommodation

Connection wise, Dublin is well connected with most of Europe and some key locations in North America. Culturally, it is reasonably well served, if not as well as some of the other cities on the list. Shopping wise it isn’t terrible. But then, this is true of cities ranked more highly on the list, like Brussels in 23rd place. Admittedly, accommodation and public transport, in my view, almost certainly should rate Brussels higher than Dublin.

If I were somewhere in Dublin City Council where policies get made and implemented, what would I want to do with this, if anything? Is it something useful to have under random news or is there anything to be learned. Given the audience of Mercer reports, ie, companies relocating staff, and Ireland’s heavy dependency on foreign direct investment, is there anything to be noted here?

___________________________________________________

Ranking data available from Mercer

City and density data from Wikipedia

Density data not available for Kobe, Japan

 

Uber, Github and You’ve got to be kidding me

In major goof, Uber stored sensive database key on public Github page.

via Ars Technica.

Disclosure: I have a Github account, on which I have stored very little. However, I do have a project going in the background to build a terminology database which will be mega simple (I like command lines) and which will have a MySQL database and an interactive Python script to get at the contents of the MySQL database. However, one thing which has exercised my mind is a reminder to myself that when I promote all this to Github (as I might in case anyone else wants a simple terminology database) to ensure that I remove my own database keys.

But this is not a corporate product, or any sort of corporate code. Nobody’s personal data will be impacted if I forget (which I won’t).

In the meantime, Uber, which is probably the highest profile start up, which has money being flung at it right left and centre by venture capitalists, managed to put a database key up on Github.

I don’t understand this. Why is Uber database related information anywhere near Github anyway? If they are planning to sell this as a product, why would you put anything related to it on an open repository?

I like the idea of an online repository for my own stuff. I don’t actually love Github but it’s easy enough to work with and, a bit like Facebook, everyone uses it. But that doesn’t mean any corporate site should allow access to unless they are open sourcing some code and even then, any such code really should be checked to ensure it doesn’t present any risk to the corporate security of the company.

Database keys in an open repo: there really is no excuse for this regardless of whether you’re a corporate or an individual.

 

Language skills.

The Economist is shouting about lack of language skills in the UK again. Their basic thesis is that the lack of language skills amongst UK workers costs in economic growth. I’m not sure how much we can stand over that assertion – the Economist admits as much –

This lack of language skills also lowers growth. By exactly how much is hard to say, but one estimate, by James Foreman-Peck of Cardiff University, puts the “gross language effect” (the income foregone because language barriers alter and reduce international trade) in 2012 as high as £59 billion ($90 billion), or 3.5% of GDP.

which suggests it’s basically educated guesswork.

For unrelated reasons, I had a look at CPL’s language vacancies yesterday and the one thing that interested me is how low the salaries are on average.

The simple issue is this: if we do not value language skills economically, people will not study to acquire those skills.

Comparatively, we value programming skills more highly although they are significantly easier to come by. Put simply, the amount of time required to get usefully acquainted with a programming language (including assembler) is significantly less than the amount of time required to get usefully acquainted with a foreign language.

Put simply, the return on effort in acquiring foreign language skills to a high level, is low compared to the return on effort in acquiring programming skills.

I might have more sympathy for the idea that the economy was suffering by a supposed lack of foreign language skills if foreign language skills related salaries were increasing. The truth is they aren’t, really, because the skills are being imported.

A Magna Carta for Big Data

I need first to provide a disclaimer: I did my MSc in CompSci at University College Dublin which is one of the universities providing a home to the Insight Centre. And LinkedIn sent me the vacancy for Oliver Daniels’ job several times as a vacancy for which I was suitable. I know some of the Insight people and I have a particular amount of respect for the senior ones I know both in UCD and UCC.

With that out of the way, Oliver Daniels wrote a piece for the Huffington Post which I have some reservations about.

The data industry has to stop seeing itself as Big Data. The term is loaded. When people are talking about Big Pharma, they are talking about the pharmaceutical industry acting in its best interests (and not yours), and when they talk about Big Ag, they are talking about the agricultural-industrial complex acting in its best interests, not yours and not the environments. Big X is never a positive label for X. It implies a behemoth which really has no interest in your interests. I hate the term Big Data for this reason. It has never really meant serious data analytics, only a marketing tool for people who genuinely aren’t interest in data, but in buzzwords. Big Data is turning toxic.

If you read Oliver Daniels’ piece about a Magna Carta for Big Data, it is obvious that he is not looking for a Magna Carta for you or me, but for the right of large scale data analytics companies to have access to and use your data. There are a lot of benefits to large scale analytics but it is a stretch to call it a charter of rights when you have to give them access to your data, and they promise not to sell it to AN Other Company. The example in the Daniels piece relates to health data specifically, and the risk of sale of same to insurance companies.

Unlike Oliver Daniels, I have always known my mother’s age, and indeed, my father’s age and so I won’t be using either as an emotional hook on which to demand that people make their data available. What I would like to see Insight, and organisations attempting to be active in the health analytics side do is recognise that the vast majority of people, while not analytics experts, are not necessarily stupid. And I have issues with statements like this:

Healthcare has always been about data analytics, only now we have access to so much more data.

The thing is we don’t. We can certainly generate more data, but we don’t necessarily have the right to use it. When Oliver Daniels is talking about a Magna Carta for big data, he is looking for the right to use it, framed in a way that suggests my rights are protected. This might be viable if the data industry – and hardly any company is not a data company at this stage – had an even remotely sane record on not losing data.

There is no point in saying “and we promise your data won’t be released to AN Company you don’t approve of” when all over the world, vendors are getting hacked, losing data, losing laptops, spending a small fortune writing to customers suggesting they get their credit cards reissued, re-enacting U2 videos by beating their chests and being sorry. Really Sorry. Very, really sorry. We lost your data.

I have already written about the cost of messing up individuals in the quest of getting access to their health data in the past.

Oliver Daniels writes:

We need the public to feel trust when they hand over details about their health.

Even if we were to take the view that of course you can have everything you want, we trust you completely not to misuse the data, the simple truth is that we already know that large scale data sites have been hacked in highly public manners. I have correspondence from Adobe apologising for losing a lot of data. I have correspondence from any number of online data centric companies explaining that they have allowed their perimeters to be breached. The data industry has simply not earned the right to respect in terms of practically protecting data.

It would be an overarching, policy-led document that describes what we want, and don’t want, from Big Data. It is a document that would put citizens at the centre of the Big Data age, and ensure that the technology develops with democracy and human rights as guiding principles.

The Magna Carta was a document of rights, not a policy document. What Oliver Daniels wants is not so much a charter of rights for humanity but a bill of rights for Big Data – he uses the term; I think he should move away from it to have access to humanity’s data. The regulatory framework at the moment, piecemeal as it might be, in Europe, in particular, errs on the side of the individual, not the gathering of large datasets.

You know this is what he is looking for with this:

A Magna Carta for Data would not be a list of protectionist rules about privacy triggered by court cases and data infractions.

A Magna Carta for Data is not a Magna Carta for owners of data.

You know this when he says this:

The Magna Carta would not enshrine privacy measures that risk bringing enlightened data research to a standstill.

The core objective of this measure is not to balance the rights of humans who generate data and companies and organisations which want to exploit that data. It is to make it easier to get access to that data. And it uses the argument that privacy concerns are already left behind by big data.

I have a couple of issues with this. At this stage, I’d like senior managers who genuinely believe in the benefits of large scale data analytics to stop calling it Big Data. It is a toxic term with strongly negative connotations.

I also take issue with describing this as a Magna Carta for Data. This is a marketing metaphor and nothing else. It is not even appropriate in the context of trying to get people to give up some existing privacy rights – rights which are not negated just because you claim they are.

I would like the data industry to understand that to date, they have already made demonstrable screw ups, both in the private sector (Target and Adobe as two examples) and the public sector (the NHS mess with attempting to sell care.data to the public).

I have a lot of time for data analytics and in particular, the machine learning side of things. I honestly believe there is a lot of insight to be gained from it. But equally, I believe that there is no god given right for access to this data, and I’d like practitioners of big data to pay more attention to the fact that a lot of what they are trying to do has been done by statisticians who recognise underlying problems with large scale analytics. The fact that you’ve 10 billion records does not automatically infer you have a wholly representative sample or, indeed, a viable model. Tim Harford has an illustrative piece here.

I’ve done some work with large datasets. I’m fully aware of the benefits of being able to get a picture of the behaviour of system components over time – such as buses running ahead of or behind schedule. But I’m also aware of the risk of assuming technology gives us more exact pictures of reality. The garbage in garbage out principle will always exist, and the cartoon I saw more than twenty years which had the tagline “The beauty of computers is that you can screw up so much more precisely”.

More than anything, I want people in the industry to stop playing with marketing tags like Magna Carta for data and Big Data. Neither of these instil much confidence. I’d hate to see the benefits of health analytics killed by pretending these things can be simplified down to a Universal Declaration of Data Rights.

 

this is about data and technology and where I interact with both