Just briefly, given the general response to the Facebook empathy contagion article on PNAS a while back (an hour is a long time on the internet, let’s face it), the question I would have to ask is this: is everyone in Facebook so attached to what they can do with their dataset that they no longer remember to ask whether they should be doing that stuff with their dataset?
A while back, I met a guy doing a PhD in data visualisation or something related and he spoke at length about how amazing it was, what could be done with health data and how the data had to be freed up because it would benefit society so much. I’ve never really bought that idea because the first thing you have to ask is this: do individuals get messed up if we release a whole pile of health data, and if so, to what extent are you willing to have people messed up?
What I’m leading to here is the question of group think and yesmenery. Ultimately, there comes a point where people are so convinced that they should do what they want, that they are unwilling to listen to dissent. The outcry over Facebook’s study has been rather loud and yet, it doesn’t appear to have occurred to anyone who had anything to do with the study that people might find it a bit creepy, to say the least. It’s not even a question of “oh, you know, our terms and conditions” or “oh, you know, we checked with Cornell’s review board”, it’s just straight up “is it creepy that we’re trying to manipulate people’s feelings here? Without telling them?”
I mean, I can’t ever imagine a case in which the answer to that question is anything other than Yes, yes it is creepy and eugh. And yet, it doesn’t seem to have occurred to anyone connected with it that it was kind of creepy and gross.
Once we get past that, what’s being focussed on is the datascience aspect and I have a hard time swallowing that too. This was a psychological experiment, not a datascience on. I mean, if you did a similar study with 40 people, you wouldn’t call it a statistical experiment, would you? In many respects, the datascience aspect is pretty irrelevant; it’s a tool to analyse the data and not the core of the experiment in and of itself. A datascience experiment might involve identifying the differences in outcome between using a dataset with 10,000 records and a dataset with 10 million records for example. Or identifying the the scale of difference in processor speeds between running a data analysis on one machine versus another.
Anyway, the two main issues I want to take away from this is that a) it wasn’t really a datascience experiment and b) sometimes you need to find people who are willing to tell you that what you are doing is ick, and you need to listen to them.
Thing is – and this is where we run into fun – what have they done that they haven’t told us about?