Migratory programmers

I don’t live in the US and nor have I any desire to do so. But Paul Graham has written a piece called “Let the other 95% of Great Programmers In“.

It is another call which can be loosely translated as “Let the Tech Industry Do What it Likes”. The logic on which it is built is, however, questionable. He assumes being a great programmer is an innate talent. It isn’t. It is far more a question of experience, conversation and training than innate talent.

Paul’s logic rests on the fact that the US has ca 5% of the world’s population and therefore must only have 5% of the great programmers in the world. This might be attractive logic if it wasn’t already clear that the tech industry in the US already rules out a lot of programmers by having a very different ethnic demographic break down in the US technology industry compared to the rest of the population as a whole. It’s possible that the US has identified very few of its own great programmers and let their talents go elsewhere in industry.

What the US technology industry appears to be poor at is identifying promising talent and nurturing it if it is not white or Asian male. Possibly they might have an argument about letting in the other 95% of great programmers when their diversity figures match the general population but right now they don’t.

For me, one of the core problems with Graham’s piece is he never really identifies a great programmer and how you’d actually identify getting them into the US technology industry. What if great programmers don’t want to go to the US? This is assuming someone can identify them correctly – but that being said I don’t think that’s guaranteed either.

I’m not sure what a great programmer can easily defined as – which is why I am wary of pieces like Paul’s. Many things are context dependent. It’s entirely possible that the vast majority of great US based programmers aren’t even in California but working somewhere else because they have a different set of priorities to Paul Graham. Being blunt about it, I’ve no desire to go to the US, and certainly not to Silicon Valley. I’m not interested in 2 hour commutes and horrifically high rents. Silicon Valley is all about money, and a lot of people – including great programmers – are all about life.

I think Paul Graham is trying to solve the wrong problem. If he wants access to more talent, he may have to move to it rather than trying to get it to move to him. The US is not the sole target for great programmers after all; great programmers can set up more or less where they want to and often do. It’s what I’d do. The great programmers of the world need the US less than the US needs them; and the US’s ability to identify them is already clearly questionable.

The issue is maybe the technology industry is just not attracting the right people in the US from within the US at the moment. Not everyone wants to cram into California either. Silicon Valley’s issues in attracting talent may have little to do with the lack of immigrants that they are allowed and a lot more to do with Silicon Valley itself.

Women and tokenism

This turned up on my twitter feed this afternoon and I wrote a piece, and then chose not to publish it for various reasons, the primary one being it didn’t really get my message through. It is by Maura, from Clinch, a HR start up.

I did not go to Websummit this year, although arguably, I had the time, if not the money. They got into some discussion immediately prior owing to the usual PR gaffe (not uncommon in this country) of using women as eye-candy for their Night Summit thing. They have a staff member working very hard to deal with their diversity issue, and per Maura’s piece, one of the things they did was invite a whole pile of women for free to come to Web Summit. I knew they were doing this but also, chose not to look into it for a few reasons. I already knew that where the real action was, women were noticeable by their absence, and that was on the stage.

I don’t really think it’s fair to ask Maura if it’s not a bit ick being given a free ticket to something just for being a woman. Sure, there’s a quid pro quo in that Websummit gets to feel a bit fluffier about their diversity figures which are, to be frank, pretty awful (but they are trying. Eamon Leonard wrote this on that subject back in September).  The reasons for which people get free stuff varies and often, men get free stuff because they are men, but the underlying rationale isn’t as obvious. Being invited to play golf at expensive golf courses springs to mind, particularly when deals are being done. I don’t know if Websummit has done this and I’m not implying that they have; only that men receive things for free sometimes too, and very often it is because they are men. But it is unspoken.

So.

The problem for me is that even if you give a couple of hundred tickets to women to go to Websummit, it does the grand total of nothing to deal with the gender diversity issue that the conference has.

The highest profile woman speaker at Websummit in 2014 was Eva Longoria. She was interviewed by Jemima Khan. Eva Longoria was there because she funds start-ups. Jemima Khan was there as a publisher. The interview, or some of it, is to be found on YouTube.

But…and here’s the but…neither of them are women in tech. Yes, Eva Longoria got invited to talk about her philanthropy and how it relates to technology but…it’s entirely likely we would haven’t seen her there doing that if she didn’t happen to star in Desperate Housewives. And to be perfectly honest about it, if I were to have gone to Websummit, I would not have wanted to hear neither Eva Longoria nor Jemima Khan speak.

The speakers on my wishlist were Rachel Schutt. Padmasree Warrior, Anna Patterson, Tony Fadell, Gavin Andresen, Stefan Weitz, John Foreman, Andra Keay, and Moe Tanabian. You can see the list of speakers here by the way (refers to the 2014 list at the moment, and I’m pretty sure that will change later).

But that’s way of an aside. Women are by and large missing from the stage of Websummit, and where they do turn up on stage, they are journalists, artists, surfers, marketers. Not that many of them are technologists. In fact, quite a lot of the speakers at the Websummit aren’t technologists anyway – seriously, I wouldn’t necessarily go to a tech conference to hear AC Grayling speak, and if I want to hear someone speaking from Aldabaran Robotics, I’d prefer her to be a technology person rather than a marketing person. I recognise that Websummit was pushing Sports and Food Summits as well – I don’t necessarily agree with it to be honest but it’s their business.

There’s a wider, far wider discussion to be had on what constitutes a tech company at the moment because a lot of techpreneurs are running what might best be described as retail companies rather than tech companies. The only reason I consider Amazon a tech company at all is because of AWS. But that’s by way of an aside.

The point I would ask myself, Maura, and people like her, and which I ask myself every day, is would I stand up on that stage or a stage like it to talk about technology. I’m not interested in standing on a stage to talk about women in technology – that’s too meta and it achieves nothing anyway other than to reinforce the problem rather than solve it.

I could take the easy way and say “I’d like Websummit to invite more women technology specialists (as opposed to ancillary support like marketing)”. But there is a hard way too and that is finding out how to be the kind of woman that I’d like to see more frequently on the stages of technology conferences. Talking about technology.

Everyone should learn to code

This, from the Wall Street Journal.

It annoyed me, not because I disagree with the idea of people learning to code – I don’t – but because as a piece supporting the idea that people should learn to code, it has some glaring errors in it and doesn’t really support the idea that people should learn to code. Personally I think a lot of tech people should learn to communicate more effectively but a lot of them appear to think they don’t have to so let’s just explain why this piece is a problem.

The most important technological skill for all employees is being able to code. If, as Marc Andreessen once noted, “Software is eating the world,” knowing how to code will help you eat rather than be eaten. Understanding how to design, write and maintain a computer program is important even if you never plan to write one in business. If you don’t know anything about coding, you won’t be able to function effectively in the world today.

So, two major assertions here: the most important technological skill for all employees is being able to code and “if you don’t know anything about coding, you won’t be able to function effectively in the world today”.

These assertions are patently not true. To be frank, the most important technological skill for an employee, in my opinion, is the ability to describe what’s gone wrong on the screen in front of them. That’s also a communications issue but it does enable technology experts to help them. As for “if you don’t know anything about coding, you won’t be able to function effectively”, I strongly disagree with that and would suggest that ultimately, the problems lie with interface design which employees are not actually responsible for the most part.

You will inevitably work with people who program for a living, and you need to be able to communicate effectively with them. You will work with computers as a part of your job, and you need to understand how they think and operate. You will buy software at home and work, and you need to know why it works well or doesn’t. You will procure much of your information from the Internet, and you need to know what went wrong when you get “404 not found” or a “500 internal server error” messages.

Not one thing in this paragraph requires coding skills. It requires programmers to learn to communicate effectively and given a lot of them have trouble with the basic need to document what they are doing already, it’s a steep learning curve. With respect to software, again, how well it works depends on how well it is documented and designed. You do not need to be able to program to understand a 404 not found or a 500 internal server error.

Of course, being able to code is also extremely helpful in getting and keeping a job. “Software developers” is one of the job categories expected to grow the most over the next decade.

But not every employee is a software developer and nor should they be.

But in addition to many thousands of software professionals, we need far more software amateurs. McKinsey & Co. argued a few years ago that we need more than 1.5 million “data-savvy managers” in the U.S. alone if we’re going to succeed with big data, and it’s hard to be data-savvy without understanding how software works.

Data and programming are not the same things. Where data is concerned we frantically need people who get statistics, not just programming. IME, most programmers don’t get statistics at all. Teaching people to code will not fix this; it’s a tool to support another knowledge base.

Even if you’ve left school, it’s not too late. There are many resources available to help you learn how to code at a basic level. The language doesn’t matter.

Learn to code, and learn to live in the 21st century.

I’m absolutely in favour of people learning to think programmatically, and logically. But I don’t think it’s a requirement for learning to live in the 21st century. The world would be better served if we put more effort into learning to cook for ourselves.

I hate puff pieces like this. Ultimately, I mistrust pieces that suggest everyone should be able to code particularly at a time when coding salaries are low at the time we are being told there’s a frantic shortage. I’ve seen the same happen with linguistic skills. There are a lot of good reasons to learn to code – but like a lot of things, people need to set priorities in what they want to do, what they want to learn on. Learning to write computer code is not especially different; learning to apply it to solving problems on the other hand takes a way of looking at the world.

I’d prefer it if we looked at teaching people problem solving skills. These are not machine dependent and they are sadly lacking. In the meantime, people who have never opened a text editor understand that 404 Not found does not mean they could fix their problems by writing a program.

 

SamaritansRadar is gone

The application was pulled on Friday 7 November. Here is the statement issued by the Samaritans on that occasion.

I am not sure how permanently gone it is, but this is worth noting:

We will use the time we have now to engage in further dialogue with a range of partners, including in the mental health sector and beyond in order to evaluate the feedback and get further input. We will also be testing a number of potential changes and adaptations to the app to make it as safe and effective as possible for both subscribers and their followers.

Feedback for the Radar application was overwhelmingly negative. There is nothing in this statement to suggest that the issue for the Samaritans is that there were problems with the app, only that some people were vocal about their dislike of it.

I really don’t know what to say at this stage. While I’m glad it has been withdrawn for now, I’m not really put at ease to know that the Samaritans have an interest in pushing it out there again. It was a fiasco in terms of app design and especially community interaction. There is nothing, absolutely nothing, to indicate that they saw the light about the technical issues with the application, the ethical issues with the app and the legal difficulties with asserting they weren’t data controllers for that app.

I hate this because a) it negatively affected a lot of people who might in under circumstances use Samaritans services and b) it makes the job of data scientists increasingly difficult. It is very hard to use a tool to do some good stuff when the tool has been used to do bad stuff.

Samaritans Radar, again

The furore refuses to die down and to be honest, I do not think the Samaritans are helping their own case here. This is massively important, not just in the context of the Samaritans’ application, but in the case of data analysis in the health sector in general. In my view, the Samaritans have got this terribly wrong.

If you’re not familiar with Samaritans Radar, here is how it works.

  • You may be on twitter, and your account may have any number of followers.
  • Any one of those followers may decide that they like the idea of getting a warning in case any of the people THEY follow are suicidal.
  • Without obtaining permission from the people they follow, they download/install/sign up for Samaritans Radar which will read the tweets that the people they follow post, run a machine learning algorithm against it, and tag the tweets as potentially a cause for concern regarding a possible suicide attempt if it trips on their algorithm.
  • The app will then generate an email to the person who installed it.

In their blurb, the Samaritans make it clear that at no point will the person whose tweets are being processed be asked, or potentially even know that this is happening. As an added bonus, at the outset, their FAQ made it clear they didn’t want to let people out of having their tweets processed in this way without their consent or even knowledge. They had a whitelist for the occasional organisation whose language might trip the filter, but after that, if your friend or contact installed the application, you had no way out.

That last part didn’t last for long. They now accept requests to put your twitter id on what they call a whitelist but what is effectively an opt out list. And their performance target for getting you opted out is 72 hours. So you can be opted in instantly without your permission, but it may take three days to complete your request to get opted out, plus you get entered on a list. Despite not wanting anything to do with this.

There is a lot of emotive nonsense running around with this application, including the utterly depressing blackmailing line of “If it saves even one life, it’ll be worth it”. I’m not sure how you prove it saves even one life and against that, given the criticism about it, you’d have to wonder what happens if it costs even one life. And this is the flipside of the coin. As implemented, it could.

When I used to design software, I did so on the premise that software design should also mitigate against things going wrong. There are a number of serious issues with the current implementation of Samaritans Radar, and a lot of things which are unclear in terms of what they are doing.

  • As implemented, it seems to assume that the only people who will be affected by this are their target audience of 18-35 year olds. This is naive.
  • As implemented, it seems to assume that there is an actually friendship connection between followers and followees. Anyone who uses Twitter for any reason at all knows that this is wrong as well.
  • As implemented it defaults all followees into being monitored while simultaneously guaranteeing data protection rights not to them but to their followers.
  • As implemented, it is absolutely unclear whether there are any geographically limitations on the reach of this mess. This matters because of the different data protection regulations in different markets. And that’s before you get to some of the criticisms regarding whether the app is compliant with UK data protection regulations.

So, first up, what’s the difference between what this app is doing versus any, for example, market research analysis being done against twitter feeds.

This app creates data about a user and it uses that data to decide whether to send a message to a third party or not.

Twitter is open – surely if you tweet in public, you imagine someone is going to read it, right? This is true within a limit. But there’s a difference between someone actively reading your twitter feed and them getting sent emails based on keyword analysis. In my view, if the Samaritans wants to go classifying Twitter users as either possibly at risk of suicide or not, they need to ask those Twitter users if they can first. They haven’t done that.

The major issue I have about this is that I am dubious about sentiment analysis anyway, particularly for short texts which twitter feeds are.

Arguably, this is acting almost as a mental health related diagnostic tool. If we were looking to implement an automated diagnostic tool of any description in the area of health medicine, it’s pretty certain that we would want it tested for very high accuracy rates. Put simply, when you’re talking about health issues, you really cannot afford to make too many mistakes. Bearing in mind that – for example – failure rates of around 1% in contraception make for lots of unplanned babies, a failure rate of 20% classifications in terms of possibly suicidal could be seriously problematic. A large number of false positives and that’s a lot of incorrect warnings.

Some people might argue that a lot of incorrect warnings is a small price to pay if even one life is saved. If you deal with the real world, however, what happens is that a lot of incorrect warnings cause complacency. False negatives are classifications where issues are missed. They may result in harm or death.

Statistics theory talks about type 1 and type 2 errors, which effectively are errors where something is classified incorrectly in one direction or the other. The rate of those errors matters a lot in health diagnosis. In my view, they should matter here, and if the Samaritans have done serious testing in this area, they should release the test results, suitably anonymised. If they did not, then the application was not anywhere near adequately tested. Being honest, I’m really not sure how they might effectively test for false negatives using informed consent.

Ultimately, one point I would make is that sometimes, the world is not straightforward, and some things just aren’t binary. Some things exist on a continuum. This app, in my view, could move along the continuum from a bad thing to a good thing if the issues with it were dealt with. At the absolute best, you could argue that the application is a good thing done badly, spectacularly so in my view, since it may allow people who aren’t out for your good to monitor you and identify good times to harass you. The Samaritans’ response to that was to make a complaint with Twitter if you get harassed. A better response would be to recognise this risk and mitigate against enabling such harassment in the first place.

Unfortunately, as things stand, if you want to prevent that happening, you have to ask the Samaritans to put you on a list. The app, as designed, defaults towards allowing the risk and assumes that people won’t do bad things. This may not be a good idea in the grand scheme of things. It would be better to design the app to prevent people from doing bad things.

The thing is, in the grand scheme of things, this matters a lot, not just because of this one app, but because it calls into question a lot of things around the area of datamining and data analysis in health care, be it physical or not.

If you wanted, you could re-write this app such that, for example, every time you posted a tweet about having fast food in any particular fast food company, concerned friends sent you an email warning you about your cholesterol levels. Every time you decided to go climbing, concerned friends could send you emails warning you how dangerous climbing is, and what might happen if you fell. Every time you went on a date, someone could send you a warning about the risk that your new date could be an axe-murderer. You’d have to ask if the people who are signing up to this and merrily automatically tweeting about turning their social net into a safety net would love it if their friends were getting warnings about the possiblity that they might get raped, have heart attacks, get drunk, fall off their bikes, get cancer if they light up a cigarette, for example.

I personally would find that intrusive. And I really don’t know that twitter should default towards generating those warnings rather than defaulting towards asking me if I want to be nannied by my friends in this way. I’d rather not be actually. I quite like climbing.

The biggest issue I have with this, though, is that it is causing a monumentally negative discussion around machine learning and data analysis in the healthcare sector, such that it is muddying the water around discussions in this area. People like binary situations; they like black and white and they like everything is right or everything is wrong. If I were working in the data sector in health care, looking into automated classification of any sort of input for diagnosis support, for example. I’d be looking at this mess in horror.

Already, a lot of voices against this application – which is horrifically badly designed an implemented – are also voicing general negativity about data analysis and data mining in general. And yet data mining has, absolutely, saved lives in the past. What John Snow did to identify the cause of the 1854 Broad Street cholera outbreak is pure data mining and analysis. Like any tool, data analysis and mining can be used for good and for bad. I spent a good bit of time looking at data relating to fatal traffic accidents in the UK last year and from that concluded that a big issue with respect to collisions were junctions with no or unmarked priorities.

So, the issue with this is not just that it causes problems in the sphere of analysing the mindset of various unsuspecting Twitter users and telling their friends on them, but that it could have a detrimental impact on the use of data analysis as a beneficial tool elsewhere in healthcare.

So what now? I don’t know any more. I used to have a lot of faith in the Samaritans as a charity particularly given their reputation for integrity and confidentiality. Given some of their responses to the dispute around this application, I really don’t know if I trust them at the moment as they are unwilling to understand what the problems with the application are. Yes they are collecting data, yes they are creating data based on that data, and yes, they are responsible for it. And no they don’t understand that they are creating data, and no they don’t understand that they are responsible for it. If they did, they wouldn’t write this (update 4th November):

We condemn any behaviour which would constitute bullying or harassment of anyone using social media. If people experience this kind of behaviour as a result of Radar or their support for the App, we would encourage them to report this immediately to Twitter, who take this issue very seriously.

In other words, we designed this App which might enable people to bully you and if they do, we suggest you annoy Twitter about it and not us.

It’s depressing.

The other issue is that the Samaritans appear to be lawyering up and talking about how it is legal, and it’s not against the law. This misses a serious point, something which is often forgotten in the tech industry (ie, do stuff first and ask forgiveness later), namely, Just because you can do something doesn’t mean you should do it.

Right now, I think the underlying idea of this application is a good idea but very badly implemented and that puts it safely into the zone of a bad idea right now. Again, if I were the Samaritans, once the first lot of concerns started being voiced, I would have pulled the application and looked at the problems around consent to being analysed and having data generated and forwarded to followers. It’s obvious though that up front, they thought it was a good idea to do this without consent and you’d have to wonder why. I mean, in general terms, if you look at my twitter feed, it’s highly unlikely (unless their algorithm is truly awful altogether) that anything I post would flag their algorithm. I’m not coming at this from the point of view of feeling victimised as someone who is at risk of getting flagged.

My issues, quite simply, are this:

  • it’s default opt in without even informing Twitter users that they are opted in. The Samaritans have claimed that over a million twitter feeds are being monitored thanks to 3000 sign ups. You’d have to wonder how many of those million twitter accounts are aware that they might cause an email to be sent to a follower suggesting they might be suicidal.
  • the opt-out process is onerous and, based on the 72 hour delay they require, probably manual. Plus initially, they weren’t even going to allow people to opt out.
  • It depends on sentiment analysis, the quality of which is currently unknown.
  • The hysteria around it will probably have a detrimental effect on consent for other healthcare related data projects in the future.

The fact that you can ask the Samaritans to put you on a blocklist isn’t really good enough. I don’t want to have my name on any list with the Samaritans either which way.

 

EDIT: I fixed a typo around the Type 1 and Type 2 errors. Mea culpa for that. 

 

 

 

 

Facebook and that study

Just briefly, given the general response to the Facebook empathy contagion article on PNAS a while back (an hour is a long time on the internet, let’s face it), the question I would have to ask is this: is everyone in Facebook so attached to what they can do with their dataset that they no longer remember to ask whether they should be doing that stuff with their dataset?

A while back, I met a guy doing a PhD in data visualisation or something related and he spoke at length about how amazing it was, what could be done with health data and how the data had to be freed up because it would benefit society so much. I’ve never really bought that idea because the first thing you have to ask is this: do individuals get messed up if we release a whole pile of health data, and if so, to what extent are you willing to have people messed up?

What I’m leading to here is the question of group think and yesmenery. Ultimately, there comes a point where people are so convinced that they should do what they want, that they are unwilling to listen to dissent. The outcry over Facebook’s study has been rather loud and yet, it doesn’t appear to have occurred to anyone who had anything to do with the study that people might find it a bit creepy, to say the least. It’s not even a question of “oh, you know, our terms and conditions” or “oh, you know, we checked with Cornell’s review board”, it’s just straight up “is it creepy that we’re trying to manipulate people’s feelings here? Without telling them?”

I mean, I can’t ever imagine a case in which the answer to that question is anything other than Yes, yes it is creepy and eugh. And yet, it doesn’t seem to have occurred to anyone connected with it that it was kind of creepy and gross.

Once we get past that, what’s being focussed on is the datascience aspect and I have a hard time swallowing that too. This was a psychological experiment, not a datascience on. I mean, if you did a similar study with 40 people, you wouldn’t call it a statistical experiment, would you? In many respects, the datascience aspect is pretty irrelevant; it’s a tool to analyse the data and not the core of the experiment in and of itself. A datascience experiment might involve identifying the differences in outcome between using a dataset with 10,000 records and a dataset with 10 million records for example. Or identifying the the scale of difference in processor speeds between running a data analysis on one machine versus another.

Anyway, the two main issues I want to take away from this is that a) it wasn’t really a datascience experiment and b) sometimes you need to find people who are willing to tell you that what you are doing is ick, and you need to listen to them.

Thing is – and this is where we run into fun – what have they done that they haven’t told us about?

 

Cloud versus local

I had an odd dialogue coming with Microsoft Word this morning. The document I wanted to open, it said, had an issue synching, and the Skydrive and local copies of the document were different. I needed to choose which to retain.

I was not, it must be said, very happy about this, but chose local as I assumed that on the half dozen occasions I hit save last night, it saved locally first.

This turned out to be a mistake. The most recent local version was saved 3 hours before I finished work.

I was doing a lot of messing with dropping image files into that document last night so it was regularly saved. It was also saved when I shut up shop yesterday evening but none of those saves appeared to get written to local disc.

This is a huge problem for me. I have an always on connection so connectivity isn’t generally an issue. I’m the only person accessing the Skydrive, and I do it from two computers, both of which only I use. MS’s dialog told me another user had updated the document last night. That other user was me, on the same computer as I am using now.

I’m not going to complain bitterly about the problems this is causing me, suffice to say my day has suddenly become a whole lot worse than it was before I discovered this. But I do have to say this.

  • the dialog box, on telling me another user has updated the file, needs to tell me who that user is. I know in this case it was me, but in Skydrive’s case, that’s not always going to be true. With shared documents, it’s almost guaranteed not to me.
  • The dialog box, on telling me there’s an issue, needs at least to tell me which file is older. This really should be obvious to anyone.

I ran a completely unscientific straw poll this morning. On balance, more people expected the local copy to be more recent than the cloud copy with some comments about exceptions around documents stored in a browser. So I have to say, the assumption that the local file was the most recent was not particularly inane – it’s what most people expect.

I’m not sure what the problem is but the evidence I have right now is that it’s tied to something Microsoft have done between SkyDrive and Office. I only know this because the folder concerned included other no MS application based files which did get saved locally and did get synchronised correctly.

Right now, I’m faced with replicating a whole pile of work which is not ideal. It’s only three hours and it’s write up and it’s possible it will take me significantly less time to do it as I have most of the output, or can get it very easily as I have the scripts generating it (and some of that will have to be done unfortunately).

The take away message from this is:

  • most people expect local versions of files to be updated before cloud versions, particularly if they are editing in locally installed software
  • if you’re telling them that their files are out of sync because another user has updated, you must tell them who that user is and you must give them the time stamps of both versions

I find it hard to believe that this occurred to no one working on this in Microsoft.

I could live with the cloud version being the more recent version if I was told that it was. Instead, the utterly useless dialog box I got didn’t tell me this. I know the other user involved in this case was me, on the same computer, and I can’t see why MS’s dialog can’t communicate this.