A Magna Carta for Big Data

I need first to provide a disclaimer: I did my MSc in CompSci at University College Dublin which is one of the universities providing a home to the Insight Centre. And LinkedIn sent me the vacancy for Oliver Daniels’ job several times as a vacancy for which I was suitable. I know some of the Insight people and I have a particular amount of respect for the senior ones I know both in UCD and UCC.

With that out of the way, Oliver Daniels wrote a piece for the Huffington Post which I have some reservations about.

The data industry has to stop seeing itself as Big Data. The term is loaded. When people are talking about Big Pharma, they are talking about the pharmaceutical industry acting in its best interests (and not yours), and when they talk about Big Ag, they are talking about the agricultural-industrial complex acting in its best interests, not yours and not the environments. Big X is never a positive label for X. It implies a behemoth which really has no interest in your interests. I hate the term Big Data for this reason. It has never really meant serious data analytics, only a marketing tool for people who genuinely aren’t interest in data, but in buzzwords. Big Data is turning toxic.

If you read Oliver Daniels’ piece about a Magna Carta for Big Data, it is obvious that he is not looking for a Magna Carta for you or me, but for the right of large scale data analytics companies to have access to and use your data. There are a lot of benefits to large scale analytics but it is a stretch to call it a charter of rights when you have to give them access to your data, and they promise not to sell it to AN Other Company. The example in the Daniels piece relates to health data specifically, and the risk of sale of same to insurance companies.

Unlike Oliver Daniels, I have always known my mother’s age, and indeed, my father’s age and so I won’t be using either as an emotional hook on which to demand that people make their data available. What I would like to see Insight, and organisations attempting to be active in the health analytics side do is recognise that the vast majority of people, while not analytics experts, are not necessarily stupid. And I have issues with statements like this:

Healthcare has always been about data analytics, only now we have access to so much more data.

The thing is we don’t. We can certainly generate more data, but we don’t necessarily have the right to use it. When Oliver Daniels is talking about a Magna Carta for big data, he is looking for the right to use it, framed in a way that suggests my rights are protected. This might be viable if the data industry – and hardly any company is not a data company at this stage – had an even remotely sane record on not losing data.

There is no point in saying “and we promise your data won’t be released to AN Company you don’t approve of” when all over the world, vendors are getting hacked, losing data, losing laptops, spending a small fortune writing to customers suggesting they get their credit cards reissued, re-enacting U2 videos by beating their chests and being sorry. Really Sorry. Very, really sorry. We lost your data.

I have already written about the cost of messing up individuals in the quest of getting access to their health data in the past.

Oliver Daniels writes:

We need the public to feel trust when they hand over details about their health.

Even if we were to take the view that of course you can have everything you want, we trust you completely not to misuse the data, the simple truth is that we already know that large scale data sites have been hacked in highly public manners. I have correspondence from Adobe apologising for losing a lot of data. I have correspondence from any number of online data centric companies explaining that they have allowed their perimeters to be breached. The data industry has simply not earned the right to respect in terms of practically protecting data.

It would be an overarching, policy-led document that describes what we want, and don’t want, from Big Data. It is a document that would put citizens at the centre of the Big Data age, and ensure that the technology develops with democracy and human rights as guiding principles.

The Magna Carta was a document of rights, not a policy document. What Oliver Daniels wants is not so much a charter of rights for humanity but a bill of rights for Big Data – he uses the term; I think he should move away from it to have access to humanity’s data. The regulatory framework at the moment, piecemeal as it might be, in Europe, in particular, errs on the side of the individual, not the gathering of large datasets.

You know this is what he is looking for with this:

A Magna Carta for Data would not be a list of protectionist rules about privacy triggered by court cases and data infractions.

A Magna Carta for Data is not a Magna Carta for owners of data.

You know this when he says this:

The Magna Carta would not enshrine privacy measures that risk bringing enlightened data research to a standstill.

The core objective of this measure is not to balance the rights of humans who generate data and companies and organisations which want to exploit that data. It is to make it easier to get access to that data. And it uses the argument that privacy concerns are already left behind by big data.

I have a couple of issues with this. At this stage, I’d like senior managers who genuinely believe in the benefits of large scale data analytics to stop calling it Big Data. It is a toxic term with strongly negative connotations.

I also take issue with describing this as a Magna Carta for Data. This is a marketing metaphor and nothing else. It is not even appropriate in the context of trying to get people to give up some existing privacy rights – rights which are not negated just because you claim they are.

I would like the data industry to understand that to date, they have already made demonstrable screw ups, both in the private sector (Target and Adobe as two examples) and the public sector (the NHS mess with attempting to sell care.data to the public).

I have a lot of time for data analytics and in particular, the machine learning side of things. I honestly believe there is a lot of insight to be gained from it. But equally, I believe that there is no god given right for access to this data, and I’d like practitioners of big data to pay more attention to the fact that a lot of what they are trying to do has been done by statisticians who recognise underlying problems with large scale analytics. The fact that you’ve 10 billion records does not automatically infer you have a wholly representative sample or, indeed, a viable model. Tim Harford has an illustrative piece here.

I’ve done some work with large datasets. I’m fully aware of the benefits of being able to get a picture of the behaviour of system components over time – such as buses running ahead of or behind schedule. But I’m also aware of the risk of assuming technology gives us more exact pictures of reality. The garbage in garbage out principle will always exist, and the cartoon I saw more than twenty years which had the tagline “The beauty of computers is that you can screw up so much more precisely”.

More than anything, I want people in the industry to stop playing with marketing tags like Magna Carta for data and Big Data. Neither of these instil much confidence. I’d hate to see the benefits of health analytics killed by pretending these things can be simplified down to a Universal Declaration of Data Rights.