admin – Page 7 – Musings on Languages, IT and other stuff

Word of the Day: Entlieben

In addition to the tech stuff, and the data stuff, and opinions linked to each, I have an interest in languages as well (this might explain one of the projects I have running in the background)

Given the fact that I lived in Germany for a few extended periods between the ages of 19 and 23, it’s surprising that the first time I came across the word entlieben was this morning, in particular, since entlieben perfectly describes something that’s happened me a few times in my life, and probably most people.

If you go to online Duden, the definition is given as:

aufhören [einander, jemanden] zu lieben

This can be translated as “stop loving [one another/someone]”

But I don’t think that’s quite the holy all of it in atmosphere. I prefer the “fall out of love with” translation which adds a little nuance which I think matters in the case when we are discussing labelling feelings.

The opposite – incidently (because, mostly you have to do it first) – is verlieben. Interestingly, Duden defines that as:

von Liebe zu jemandem ergriffen werden

To be moved to love someone is the literal translation. Here, we would say ” fall in love with”.

The verb lieben means to love or to like – a bit like French it covers a few bases, although both have closer equivalents to like in the indirect forms “Ca me plait” and, specifically for German, “Das gefaellt mir”. It’s interesting to note, by the way, that usage of the verb “like” in English functioned this way around five hundred years ago, per Shakespeare. But this is not a discussion of verbs describing the action of “being pleasing to”.

What is interesting – if you are of a systematic kind of mind is the impact of prefixes on a root word like lieben, and how they can be used for similar impacts on other root words. I’ve been aware of these for years – the ones that stand out from German language tuition at university are Einsteigen, Aussteigen and Umsteigen, which respectively mean “get into” [a form of transport], “get off” [a form of transport] and “change from one to another”[form of transport].

I’ve seen the form ent– before in verbs like “entziehen“, to take away, withdraw. I’ve just never seen it used on the verb lieben before and despite the fact that it’s a straight application of an unmysterious system in the German language, it seems rather lyrical in a way that something de- does not in English.

Samaritans Radar, again

The furore refuses to die down and to be honest, I do not think the Samaritans are helping their own case here. This is massively important, not just in the context of the Samaritans’ application, but in the case of data analysis in the health sector in general. In my view, the Samaritans have got this terribly wrong.

If you’re not familiar with Samaritans Radar, here is how it works.

You may be on twitter, and your account may have any number of followers.
Any one of those followers may decide that they like the idea of getting a warning in case any of the people THEY follow are suicidal.
Without obtaining permission from the people they follow, they download/install/sign up for Samaritans Radar which will read the tweets that the people they follow post, run a machine learning algorithm against it, and tag the tweets as potentially a cause for concern regarding a possible suicide attempt if it trips on their algorithm.
The app will then generate an email to the person who installed it.

In their blurb, the Samaritans make it clear that at no point will the person whose tweets are being processed be asked, or potentially even know that this is happening. As an added bonus, at the outset, their FAQ made it clear they didn’t want to let people out of having their tweets processed in this way without their consent or even knowledge. They had a whitelist for the occasional organisation whose language might trip the filter, but after that, if your friend or contact installed the application, you had no way out.

That last part didn’t last for long. They now accept requests to put your twitter id on what they call a whitelist but what is effectively an opt out list. And their performance target for getting you opted out is 72 hours. So you can be opted in instantly without your permission, but it may take three days to complete your request to get opted out, plus you get entered on a list. Despite not wanting anything to do with this.

There is a lot of emotive nonsense running around with this application, including the utterly depressing blackmailing line of “If it saves even one life, it’ll be worth it”. I’m not sure how you prove it saves even one life and against that, given the criticism about it, you’d have to wonder what happens if it costs even one life. And this is the flipside of the coin. As implemented, it could.

When I used to design software, I did so on the premise that software design should also mitigate against things going wrong. There are a number of serious issues with the current implementation of Samaritans Radar, and a lot of things which are unclear in terms of what they are doing.

As implemented, it seems to assume that the only people who will be affected by this are their target audience of 18-35 year olds. This is naive.
As implemented, it seems to assume that there is an actually friendship connection between followers and followees. Anyone who uses Twitter for any reason at all knows that this is wrong as well.
As implemented it defaults all followees into being monitored while simultaneously guaranteeing data protection rights not to them but to their followers.
As implemented, it is absolutely unclear whether there are any geographically limitations on the reach of this mess. This matters because of the different data protection regulations in different markets. And that’s before you get to some of the criticisms regarding whether the app is compliant with UK data protection regulations.

So, first up, what’s the difference between what this app is doing versus any, for example, market research analysis being done against twitter feeds.

This app creates data about a user and it uses that data to decide whether to send a message to a third party or not.

Twitter is open – surely if you tweet in public, you imagine someone is going to read it, right? This is true within a limit. But there’s a difference between someone actively reading your twitter feed and them getting sent emails based on keyword analysis. In my view, if the Samaritans wants to go classifying Twitter users as either possibly at risk of suicide or not, they need to ask those Twitter users if they can first. They haven’t done that.

The major issue I have about this is that I am dubious about sentiment analysis anyway, particularly for short texts which twitter feeds are.

Arguably, this is acting almost as a mental health related diagnostic tool. If we were looking to implement an automated diagnostic tool of any description in the area of health medicine, it’s pretty certain that we would want it tested for very high accuracy rates. Put simply, when you’re talking about health issues, you really cannot afford to make too many mistakes. Bearing in mind that – for example – failure rates of around 1% in contraception make for lots of unplanned babies, a failure rate of 20% classifications in terms of possibly suicidal could be seriously problematic. A large number of false positives and that’s a lot of incorrect warnings.

Some people might argue that a lot of incorrect warnings is a small price to pay if even one life is saved. If you deal with the real world, however, what happens is that a lot of incorrect warnings cause complacency. False negatives are classifications where issues are missed. They may result in harm or death.

Statistics theory talks about type 1 and type 2 errors, which effectively are errors where something is classified incorrectly in one direction or the other. The rate of those errors matters a lot in health diagnosis. In my view, they should matter here, and if the Samaritans have done serious testing in this area, they should release the test results, suitably anonymised. If they did not, then the application was not anywhere near adequately tested. Being honest, I’m really not sure how they might effectively test for false negatives using informed consent.

Ultimately, one point I would make is that sometimes, the world is not straightforward, and some things just aren’t binary. Some things exist on a continuum. This app, in my view, could move along the continuum from a bad thing to a good thing if the issues with it were dealt with. At the absolute best, you could argue that the application is a good thing done badly, spectacularly so in my view, since it may allow people who aren’t out for your good to monitor you and identify good times to harass you. The Samaritans’ response to that was to make a complaint with Twitter if you get harassed. A better response would be to recognise this risk and mitigate against enabling such harassment in the first place.

Unfortunately, as things stand, if you want to prevent that happening, you have to ask the Samaritans to put you on a list. The app, as designed, defaults towards allowing the risk and assumes that people won’t do bad things. This may not be a good idea in the grand scheme of things. It would be better to design the app to prevent people from doing bad things.

The thing is, in the grand scheme of things, this matters a lot, not just because of this one app, but because it calls into question a lot of things around the area of datamining and data analysis in health care, be it physical or not.

If you wanted, you could re-write this app such that, for example, every time you posted a tweet about having fast food in any particular fast food company, concerned friends sent you an email warning you about your cholesterol levels. Every time you decided to go climbing, concerned friends could send you emails warning you how dangerous climbing is, and what might happen if you fell. Every time you went on a date, someone could send you a warning about the risk that your new date could be an axe-murderer. You’d have to ask if the people who are signing up to this and merrily automatically tweeting about turning their social net into a safety net would love it if their friends were getting warnings about the possiblity that they might get raped, have heart attacks, get drunk, fall off their bikes, get cancer if they light up a cigarette, for example.

I personally would find that intrusive. And I really don’t know that twitter should default towards generating those warnings rather than defaulting towards asking me if I want to be nannied by my friends in this way. I’d rather not be actually. I quite like climbing.

The biggest issue I have with this, though, is that it is causing a monumentally negative discussion around machine learning and data analysis in the healthcare sector, such that it is muddying the water around discussions in this area. People like binary situations; they like black and white and they like everything is right or everything is wrong. If I were working in the data sector in health care, looking into automated classification of any sort of input for diagnosis support, for example. I’d be looking at this mess in horror.

Already, a lot of voices against this application – which is horrifically badly designed an implemented – are also voicing general negativity about data analysis and data mining in general. And yet data mining has, absolutely, saved lives in the past. What John Snow did to identify the cause of the 1854 Broad Street cholera outbreak is pure data mining and analysis. Like any tool, data analysis and mining can be used for good and for bad. I spent a good bit of time looking at data relating to fatal traffic accidents in the UK last year and from that concluded that a big issue with respect to collisions were junctions with no or unmarked priorities.

So, the issue with this is not just that it causes problems in the sphere of analysing the mindset of various unsuspecting Twitter users and telling their friends on them, but that it could have a detrimental impact on the use of data analysis as a beneficial tool elsewhere in healthcare.

So what now? I don’t know any more. I used to have a lot of faith in the Samaritans as a charity particularly given their reputation for integrity and confidentiality. Given some of their responses to the dispute around this application, I really don’t know if I trust them at the moment as they are unwilling to understand what the problems with the application are. Yes they are collecting data, yes they are creating data based on that data, and yes, they are responsible for it. And no they don’t understand that they are creating data, and no they don’t understand that they are responsible for it. If they did, they wouldn’t write this (update 4th November):

We condemn any behaviour which would constitute bullying or harassment of anyone using social media. If people experience this kind of behaviour as a result of Radar or their support for the App, we would encourage them to report this immediately to Twitter, who take this issue very seriously.

In other words, we designed this App which might enable people to bully you and if they do, we suggest you annoy Twitter about it and not us.

It’s depressing.

The other issue is that the Samaritans appear to be lawyering up and talking about how it is legal, and it’s not against the law. This misses a serious point, something which is often forgotten in the tech industry (ie, do stuff first and ask forgiveness later), namely, Just because you can do something doesn’t mean you should do it.

Right now, I think the underlying idea of this application is a good idea but very badly implemented and that puts it safely into the zone of a bad idea right now. Again, if I were the Samaritans, once the first lot of concerns started being voiced, I would have pulled the application and looked at the problems around consent to being analysed and having data generated and forwarded to followers. It’s obvious though that up front, they thought it was a good idea to do this without consent and you’d have to wonder why. I mean, in general terms, if you look at my twitter feed, it’s highly unlikely (unless their algorithm is truly awful altogether) that anything I post would flag their algorithm. I’m not coming at this from the point of view of feeling victimised as someone who is at risk of getting flagged.

My issues, quite simply, are this:

it’s default opt in without even informing Twitter users that they are opted in. The Samaritans have claimed that over a million twitter feeds are being monitored thanks to 3000 sign ups. You’d have to wonder how many of those million twitter accounts are aware that they might cause an email to be sent to a follower suggesting they might be suicidal.
the opt-out process is onerous and, based on the 72 hour delay they require, probably manual. Plus initially, they weren’t even going to allow people to opt out.
It depends on sentiment analysis, the quality of which is currently unknown.
The hysteria around it will probably have a detrimental effect on consent for other healthcare related data projects in the future.

The fact that you can ask the Samaritans to put you on a blocklist isn’t really good enough. I don’t want to have my name on any list with the Samaritans either which way.

EDIT: I fixed a typo around the Type 1 and Type 2 errors. Mea culpa for that.

Seriously, Oracle…

Here’s a thing. I wanted to build a small utility to automate a task which would be handy, which I don’t need right now, but which I reckon would take about 8-10 hours to build in Python. So as I have some time, I’m doing it now.

For it to do what I want, I need the script to be able to read and write to a MySQL database. I chose that one because MySQL is open source and also because compared to Oracle 11g it uses fewer resources on my laptop. This is not going to be a big utility and I really don’t need serious heavy lifting at this point in time. But I do need the MySQL Python connector library.

So far, so good. I don’t have the connector library installed, and need to go and get it from Oracle.

To do this, I need to sign into Oracle. Fine. Password forgotten, so password reset, nuisance, but there you go. It’s a fact of life with things like this.

Once signed in, oh wait, now I have to answer some survey. They want to know what I’m using it for, what industry sector, how many employees, what sort of application, and then they offer me a list of reasons for which they can contact me further. Not on the list is “You don’t need to contact me”.

I’m not trying to download MySQL. I already have it installed. I just want a library that will enable me to write some code to connect a Python script to an existing install.

Downloading a single library really should be a lot easier.

Samaritans Radar – the app that cares

The Samaritans have designed a new app that scans your friends’ twitter feeds and lets you know when one or other of them might be vulnerable so you can call them and may be prevent them from committing suicide.

It has caused a lot of discussion, and publicly at least, the feedback to the organisation, is not massively positive. I have problems with it on several fronts.

By definition, it sounds like it is basically doing some sort of sentiment analysis. Before we ever get into the details of privacy, consent, and all that, I would say “stop right there“.

Sentiment analysis is highly popular at the moment. My twitter feed gets littered in promoted tweets for text mining subjects. It is also fair to say that its accuracy is not guaranteed. Before I’d even look at this application, I would want to know on what basis, the application is assessing tweets as evidence of problems. We’ve seen some rather superficial sentiment analysis done in high profile (and controversial as a result) studies in the past, including that study by Facebook, for example. Accuracy in something like this is massively important and unfortunately, I have absolutely no faith that we can rely on this to work.

According to the Samaritans:

Our App searches for specific words and phrases that may indicate someone is struggling to cope and alerts you via email if you are following that person on Twitter.

The Samaritans include a list of phrases which may cause concern, and, on their own, yes, they are the type of phrases which you would expect to cause concern. But it’s not clear how much more granular the underlying text analysis is, or, on what basis their algorithm works. This is something which Jam, the digital agency responsible for this product really, really should be far more open about.

In principle, here is how the application works. A person signs up to it, and their incoming feed is constantly scanned until a tweet from one of their contacts trips the algorithm and the app generates an email to the person who signed up to say one of their contacts may be vulnerable.

This may come across as fluffy and nice and helpful and it is if you avoid thinking about several unpleasant factors.

Just because a person signs up to Radar does not mean their friends signed up to have their tweets processed and acted upon in this way.
Textual analysis is not always correct and there is a risk of false positives and false negatives.

Ultimately, my twitter account is public and always has been. You will find stuff about photographs, machine learning, data analysis, the tech industry, the weather, knitting, lace making, general chat with friends. I’m aware people may scan it for marketing reasons. I’m less enthusiastic about the idea of people a) scanning it to check my mental health and b) enabling a decision being taken without any consideration of whether I agree to such decisions to cause a friend or acquaintance to act on it.

It also assumes that everyone who actually follows me on Twitter is a close friend. This is an incredibly naive assumption given the nature of Twitter. 1200 people follow me from various worlds on which my life touches including data analysis and machine learning. Many of them are people I have never, ever met.

One of the comments on the Samaritans’ site about this is telling:

Unfortunately, we can’t remove individuals as it’s important that Radar is able to identify their Tweets if they need support.

Actually this isn’t true any more because a lot of people on Twitter made it clear they weren’t happy about having their tweets processed in this way.

Effectively, someone thought it was a good idea to opt a lot of people into a warning system without their consent. I can’t understand who would be so missing the point.

Anyway, now there is a whitelist you can use to opt out. Here’s how that works.

Radar has a whitelist of Twitter handles for those who would like to opt out of alerts being sent to their Twitter followers. To add yourself to the Samaritans Radar whitelist, you can send a direct message on Twitter to @samaritans. We have enabled the function that allows anyone to direct message us on Twitter, however, if you’re experiencing problems, please email: radar@samaritans.org

So, I’ve never downloaded Radar, I want nothing to do with it, but to ensure that I have nothing to do with it, I have to get my Twitter ID put on a list.

In technical terms, this is a beyond STUPID way of doing things. There’s a reason people do not like automatic opt-in on marketing mail and that’s with companies they’ve dealt with. I have no reason to deal with the Samaritans but now I’m expected to tell them they must not check my tweets for being suicidal otherwise they’ll do it if just one of my friends signs up to Radar? And now, the app, how does it work, check the text or the userid first? If the app resides on a phone, does it have to call home to the Samaritans every single time to check an updated list? What impact will that have on data usage?

Ultimately, the first problem I have with this is I’m dubious about relying on text analysis for anything at all, never mind mental health matters, and the second problem I have is that the Samaritans don’t appear to understand that just because my tweets are public does not mean I want an email sent to one of my friend suggesting they need to take action re my state of mental well being.

The Samaritans have received a lot of negative feedback on twitter about it. Various other blogs have pointed out that the Samaritans really should have asked people’s permission before signing them up to some early warning system that they might not even know exists plus the annotating of tweets generates data about users which they didn’t give permission to be generated.

So they issued an updated piece of text trying to do what I call the “there there” act on people who are unhappy about this. It does nothing to calm the waters.

We want to reassure Twitter users that Samaritans does not receive alerts about people’s Tweets. The only people who will be able to see the alerts, and the tweets flagged in them, are followers who would have received these Tweets in their current feed already.

Sorry, not good enough. I don’t want alerts generated off the back of my tweets. Don’t do it. It’s bold. Also, don’t ask me to stop it happening because I never asked for it to happen in the first place. It’s, a bit, Big Brother Is Watching You. It’s why at some point, people will get very antsy about big data.

Having heard people’s feedback since launch, we would like to make clear that the app has a whitelist function. This can be used by organisations and we are now extending this to individuals who would not like their Tweets to appear in Samaritans Radar alerts.

Allowing individuals to opt out of this invasive drivel was not there by default (in fact they made it clear they didn’t want it) and now to get out of it, they expect twitter users to opt out. I have to make the effort to get me out of the spider’s web of stupidity. The existence of a whitelist is not a solution to this problem. People should not have to opt out of something that they never opted into in the first place. Defaulting the entirety of twitter into this was a crazy design decision. I’m stunned that Twitter didn’t pull them up on this.

It’s important to clarify that Samaritans Radar has been in development for well over a year and has been tested with several different user groups who have contributed to its creation, as have academic experts through their research. In developing the app we have rigorously checked the functionality and approach taken and believe that this app does not breach data protection legislation.

I want to see the test plans and reports. It sounds to me like it never included checking whether people wanted this in the first place
Name the academics.
They cannot possibly have claimed to have checked the functionality and approach when almost the first change they’ve had to make is broaden access to the whitelist
Presumably the app is only available in the UK but does it check whether the contacts are in the UK?

Those who sign up to the app don’t necessarily need to act on any of the alerts they receive, in the same way that people may not respond to a comment made in the physical world. However, we strongly believe people who have signed up to Samaritans Radar do truly want to be able to help their friends who may be struggling to cope.

Yes but the point is that the app may not be fully accurate – I would love to know how they tested its accuracy rates to be frank – and additionally, the people whose permission they are not the people who sign up to Radar, but the people whose tweets get acted on. Suggesting “People may not do anything” is logically a stupid justification for this: the app is theoretically predicated on the idea that they will.

So here are two questions:

Do I want my friends getting email alerts in case I’m unlucky enough to post something which trips a text analysis tool which may or may not be accurate? The answer to that question is no.

Do I want to give my name to the Samaritans to go on a list of people who are dumb enough not to want their friends to check up on them in case things are down? The answer to that question is no.

I’m deeply disappointed in the Samaritans about this. For all their wailing that they talked to this expert and that expert, it’s abundantly clear that they don’t appear to have planned for any negative fall out. They claim to be listening and yet there’s very limited evidence of that.

You could argue that there needs to be serious research into examining how accurate the tool is in identifying people who need help; there also needs to be understanding that even if, to the letter of the law in the UK, it doesn’t break data protection, there are serious ethical concerns in this. I’d be stunned if any mental health professional thought that relying on textual analysis of texts of 140 characters was a viable way of classifying a person as being in need of help or not, even if you could rely on textual analysis. This application, after all, is credited to a digital agency, not a set of health professionals.

If I were someone senior in the Samaritans, I’d pull the app immediately. It is brand damaging – and that may ultimately have fundraising issues as well. I would also talk to someone seriously to understand how such a public relations mess could have been created. And I would also ask for serious, serious research on the use of textual analysis in terms of identifying mental health states and without it, I would not have released this.

It is one of the most stupid campaigns I have seen in a long time. It is creepy and invasive and it depends on a technology which is not without its inadequacies here.

Someone should have called a halt before it ever reached the public.

Flickr: Park or Bird

This made me love Flickr when I found out about it yesterday.

I’m just really sorry my life is such that it happened long after I did some research into deep learning for my dissertation this summer. I’d have given anything to quote XKCD in it.

Languages from a young age

I’m not entirely sure who dropped this in my twitter feed this morning but it caught my attention because it relates to teaching children foreign languages from the age of 3.

I am in favour of children learning languages from a young age and I am starting to do some research into how children acquire language for a separate reason anyway, but this concerned me:

When children join the preschool class of Moreton First at three years of age, they are exposed to four languages.

The four languages are English, French, Spanish and Chinese.

Catherine More, the head of the Moreton First School mentions explicitly research discussing the benefits of bilingualism and I fully favour that. However, bilingualism only works if it’s done properly. Quadrilingualism is not doing bilingualism properly.

Having spoken to parents in bilingual households, full fluency in two languages is hard work and that is with the benefit of home contact. If I were looking to school a child in an atmosphere where they were to be getting linguistic advantage, I’d prefer it to be just one foreign language, but taught in a more indepth manner.

Moreton First is a feeder school for Moreton Senior School. It would be interesting to test the fluency of children in the four languages as they progress through school.

Coding/Programming/Education…

Via twitter, I was pointed to this report on the RTE website this morning.

The takeaway message is:

A new survey has found a third of parents think computer coding is a more important skill to learn than Irish.

I’m getting wary of seeing pieces talking about computer coding rather than computer programming. Ultimately, there is a lot more to writing computer code than just knowing the syntax and I tend to consider coding to be the syntax part of things, and programming to be the wider scale of things.

But even if we leave that little quibble aside, I have problems with the whole idea of either/or when it comes to asking people what should be taught in school. I’m fully in favour of teaching children to program. There are a lot of tools to do this: MIT Scratch is one of the highest profile ones but depending on what age children you are talking about, Python and Java are also options, particularly the former in the context of Raspberry Pis.

Realistically, we need to step back and look at core skills. When you are talking about primary school level, which we are here:

The findings are likely to bolster the arguments of those who say coding should be offered as part of the primary school curriculum, as it has been in Britain since the start of this school year.

the point remains that we are also dealing with literacy and numeracy issues at this stage too. I have written before on the UK’s policy – and I’d also add that while the authorities there made a lot of noise about this curriculum policy, they did not follow it up with so much support for continuing professional development for teachers who were expected to go from teaching computer use to computer programming in a school year. Ultimately, when you start thinking about getting children to write computer programs, you need to also start thinking about the tools they will have available and what you expect them to achieve.

I am willing to bet that this survey did not actually talk about what the parents in question expected children to be able to do writing computer code at the age of 8 or 10.

One of the items which RTE reported on was this:

Of the 1,000 adults questioned, two-thirds said learning coding is equally important as maths, science and languages.

Leaving aside the fact that whoever wrote this needs to re-read things occasionally, the point is, writing computer programs depends on abilities in maths, science and language. In short, you cannot learn to write computer code without already having core skills in mathematics and communication. Logically, when it is dependent on a skill set, it cannot be as important as that skill set itself.

This is why this part makes me incandescent with rage:

One-third even think it’s more valuable than Irish, with one-fifth believing it is a more important skill than maths.

Every single computer science undergraduate course in the country will have a mathematics component. If you want them to get any value out of the growing sector, which is data analytics, mathematics is absolutely MANDATORY. There is no point in assuming that you know what you’re talking about in terms of education policy if you can agree with the statement “writing computer code is more important than mathematics” given that actually, it’s the other way around.

Put in that context, the one third who think computer coding is more important than Irish did not give the most annoying response to this survey.

In any case, there is also this:

And three quarters of people said they would avail of such classes if they were available in their area.

The thing is, they pretty much are. There are over 100 Coderdojo groups spread out across the island of Ireland, near to 150 actually. They are not all in Dublin.

So the question is, are they availing of the Coder Dojo groups – I hesitate to call them classes as that sort of takes the fun out of things – or is this a throwaway “yeah, they don’t teach it in the school but at least if there were a Coder Dojo around, we’d probably do this…” I would have driven 30km to one when I was a child.

I have asked UPC via their PR and general twitter lines whether I can get a copy of the questions on this survey. I really would like to know what they looked like. Also a copy of the report would be useful.

Mathematica on the Raspberry Pi

Seriously, I have a scary to do list but I finally got around to having a go with this the other day. It is very very nice. If you’re leaning towards a RaspPi and are interested in symbolic programming, it’s a pretty good place. Worth remembering that a RaspPi is not scary fast (ie, Mathematica on it is not hugely fast) but it comes across as something that a) is nice to work with and b) I will probably license on a bigger machine at some point.

Big data – is this really what we want?

If you think of the other places where Big is used to describe an industry, it’s not generally used to by people who like the industry in question. Big Pharma. Big Agriculture, Big Other things We’d Like To Scare You About.

But the data industry insists on talking about big data as a thing that it’s pushing as the next big thing without considering that equally, there are a lot of people pushing back against big data. How Big Data Can Tell A Lot About You Just from Your Zip Code.

This is not good for data analytics. Any term which can be used to engender fear and nerves is not so much an asset as a liability.

There’s an apocryphal story about Target apparently identifying when a teenager was pregnant from her shopping habits, writing to her, her father finding out and getting into a rage with the local branch of Target and having to apologise. A number of people in the data industry have described it although I can’t actually find a source for it. A lot has been written about how retailers can learn a lot about you from your habits, however, which has an impact on which special offers you get when they deign to send you vouchers.

Some people find this a little bit creepy. This, together with news stories about What Your ZipCode Says About You and “What Big Data Knows About You” just reinforces this.

So a couple of things. We need to stop talking about Big Data. Big Data will come back to bite the analytics industry as consumers push back against what they perceive as a bit of spying and general creepiness. And we need to focus on the benefits to consumers of data analytics. It is not just a question of buying them off with extra vouchers. Pinterest, for example, is getting much better with recommendations for new boards (although once they get hold of an idea such as Treasa Likes Fountain Pens it takes weeks for them to realise I’m now following enough fountain pen boards). On the other hand, Amazon is not getting so much better with book recommendations lately.

The other problem I see with the label big data is that it allows people to avoid thinking about what they are really trying to achieve. The question “What are we doing about big data?” never comes across as anything other than “I read this in HBR and everyone’s on about it on my LinkedIn groups so we need to hop on this bandwagon”.

If you take a step back, it’s better to think about this question: “What data do we have, and are we using it effectively to support both ourselves and our customers”. It may be big, it may be small. Some of it may be system related – getting pages to load faster, for example – some of it may be habit recognition related – prefilling forms for transactions which happen regularly like, oh flying to London every Monday morning.

More women in tech

Over the past few days or so, much has been written about the question of egg freezing for women so as not to interrupt their careers. Extensive media reports suggest that Apple and Facebook are offering this to women so that they don’t take a career hit.

There are a lot of ways you can look at it but the first thing that occurs to me is this: this isn’t the most effective way to sort out inequality for the simple reason that it does not sort out the inequality suffered by fathers too much.

Ultimately, when you’re designing a solution for a problem and the question here is, what problem is solved by freezing eggs so that women have (or at least try to have) children much later? Well women take a career hit.

The question is why do mothers take a career hit when fathers do not? The problem to be solved isn’t “get women to try for babies later” but “get parents to have equal rights and and responsibilities”.

The way to do this, however, is not solely to challenge women’s positions in the workplace by keeping them there longer, but to challenge men’s positions in the workplace by keeping them there less time.

Children benefit by this; they benefit by greater contact with both parents, and parents, to be honest, are better equipped to have children at a younger age (ie, when they have more energy) than when they are forty and “established” in their careers for want of a better word.

Ultimately, I think it is good that egg freezing is supported if that is what an individual woman wants; but it should not become a method by which their employers decide when they have children, and when they should wait.

There is no real similarity in terms of paying for contraceptives and paying for egg freezing; contraceptives are not just indicated for preventing babies anyway, and a lot depends on the objective of egg freezing – is it to benefit the woman, or is it to benefit the company paying for it?

Ultimately, the issue I have here is it is not solving the problem; just a symptom of it, and that is the one whereby female parents are discriminated against in the workplace when male parents are not, not to the same extent anyway.