Every time I check the Wolfram site in hope that this has arrived. Please, for Christmas…can I have the data science platform?
This is cool.
I really, really like creative visualisations like this.
It annoyed me, not because I disagree with the idea of people learning to code – I don’t – but because as a piece supporting the idea that people should learn to code, it has some glaring errors in it and doesn’t really support the idea that people should learn to code. Personally I think a lot of tech people should learn to communicate more effectively but a lot of them appear to think they don’t have to so let’s just explain why this piece is a problem.
The most important technological skill for all employees is being able to code. If, as Marc Andreessen once noted, “Software is eating the world,” knowing how to code will help you eat rather than be eaten. Understanding how to design, write and maintain a computer program is important even if you never plan to write one in business. If you don’t know anything about coding, you won’t be able to function effectively in the world today.
So, two major assertions here: the most important technological skill for all employees is being able to code and “if you don’t know anything about coding, you won’t be able to function effectively in the world today”.
These assertions are patently not true. To be frank, the most important technological skill for an employee, in my opinion, is the ability to describe what’s gone wrong on the screen in front of them. That’s also a communications issue but it does enable technology experts to help them. As for “if you don’t know anything about coding, you won’t be able to function effectively”, I strongly disagree with that and would suggest that ultimately, the problems lie with interface design which employees are not actually responsible for the most part.
You will inevitably work with people who program for a living, and you need to be able to communicate effectively with them. You will work with computers as a part of your job, and you need to understand how they think and operate. You will buy software at home and work, and you need to know why it works well or doesn’t. You will procure much of your information from the Internet, and you need to know what went wrong when you get “404 not found” or a “500 internal server error” messages.
Not one thing in this paragraph requires coding skills. It requires programmers to learn to communicate effectively and given a lot of them have trouble with the basic need to document what they are doing already, it’s a steep learning curve. With respect to software, again, how well it works depends on how well it is documented and designed. You do not need to be able to program to understand a 404 not found or a 500 internal server error.
Of course, being able to code is also extremely helpful in getting and keeping a job. “Software developers” is one of the job categories expected to grow the most over the next decade.
But not every employee is a software developer and nor should they be.
But in addition to many thousands of software professionals, we need far more software amateurs. McKinsey & Co. argued a few years ago that we need more than 1.5 million “data-savvy managers” in the U.S. alone if we’re going to succeed with big data, and it’s hard to be data-savvy without understanding how software works.
Data and programming are not the same things. Where data is concerned we frantically need people who get statistics, not just programming. IME, most programmers don’t get statistics at all. Teaching people to code will not fix this; it’s a tool to support another knowledge base.
Even if you’ve left school, it’s not too late. There are many resources available to help you learn how to code at a basic level. The language doesn’t matter.
Learn to code, and learn to live in the 21st century.
I’m absolutely in favour of people learning to think programmatically, and logically. But I don’t think it’s a requirement for learning to live in the 21st century. The world would be better served if we put more effort into learning to cook for ourselves.
I hate puff pieces like this. Ultimately, I mistrust pieces that suggest everyone should be able to code particularly at a time when coding salaries are low at the time we are being told there’s a frantic shortage. I’ve seen the same happen with linguistic skills. There are a lot of good reasons to learn to code – but like a lot of things, people need to set priorities in what they want to do, what they want to learn on. Learning to write computer code is not especially different; learning to apply it to solving problems on the other hand takes a way of looking at the world.
I’d prefer it if we looked at teaching people problem solving skills. These are not machine dependent and they are sadly lacking. In the meantime, people who have never opened a text editor understand that 404 Not found does not mean they could fix their problems by writing a program.
Ryan Walmsley has a piece suggesting you shouldn’t learn programming before going to university. It’s worth a read.
Personally, I am not against people learning to code before they get to university. I am, however, not in favour of people who have no coding skills arriving at university and starting with Scratch. Scratch is a superb tool for teaching kids how to program, and a bit about how computers work. It is not a suitable tool for adults on a coding specialist code in my view. While I am not the biggest fan of Java (disclaimer: have yet to review Lambdas in Java 8 and this may make some of my frustration go away), and I recognise that some people have issues with the lack of strong typing in Python, ultimately, once you get as far as university, you should at least start with tools you have a fighting chance of using in the income earning world. And there are a lot of them. Not in the top ten is Scratch.
Like a lot of things, tools need to be used appropriately and Scratch is an absolute winner in the sector it was designed for. But I have a book on my desk here that teaches kids how to program in Python and if kids can do that, I see no reason why we need kids level languages like Scratch at university level.
I really have a lot of things to catch up on but a couple of weeks ago, a piece on the Business Insider site caught my eyes. In it, it suggested that if you wanted to work for Google, you needed to know Matlab. They attributed the comment to a guy called Jonathon Rosenberg.
This caused some discussion on twitter in the days afterwards. Mostly, people found it difficult to believe, particularly when Google uses a bunch of other tools, including my personal choice for a lot of data analysis, R.
I am not sure that Matlab is a mandatory requirement to work in Google; it doesn’t necessarily turn up on any of their job ads that I might be interesting, but in some respects, I can understand why A N Company might do something like this. It’s a little sorting mechanism. The point which I found most interesting about the piece above was less that Google were looking for Matlab, but that the writers of the piece had never heard of Matlab.
I was once interviewed about modern web technology and how it might benefit the company concerned way back in the early days of the web becoming a consumer sales channel. My view of the discussion ultimately wasn’t that they wanted me to work on their web interfaces (not at that stage anyway), but they wanted to see what my ability to learn about new stuff was. It may well be that if you go to work for Google in some sort of research job, you’ll use Matlab. Or, more probably, you’ll learn a bunch of other things in the area that you are working.
Either way, comments like Rosenberg’s may, or may not be official hiring policy but it’s often worth considering that they are asking a broader question rather than “Can you use Matlab” and more “Can you prove to use that you can develop in whatever direction we throw you”.
And if you haven’t heard of Matlab, the chances are, you may not.
One of the things which a lot of people don’t actually know about me is that I trained as an interpreter in my twenties. I have a diploma from the University of Westminster, which, at the time, was the leading interpreting school in the United Kingdom. While I don’t interpret any more, I’m still interested in on a tangential basis and that’s why I found this article from Mosaic very interesting yesterday. I’ve always wondered about how it can be possible to carry out simultaneous interpreting even as I did it. A lot of it is practice related, and technique/strategy building. In certain respects, I found it a lot like playing music. It’s a skill you learn by doing, not so much by understanding how it works inside your mind. And yet:
The caudate isn’t a specialist language area; neuroscientists know it for its role in processes like decision making and trust. It’s like an orchestral conductor, coordinating activity across many brain regions to produce stunningly complex behaviours.
I strongly recommend reading the piece – even aside from the whole question of interpreting, the piece brings up some interesting information in the area of the neurosciences. I wasn’t familiar with the site before now, but it had an interesting collection of science writing on it from a number of different fields in the life science sector so the interpreting piece aside, I (so far) find it a valuable resource.
One of the aspects of programming life that most software developers will talk about, in terms of getting anything done, is flow. When you’re in a zone where everything is just working together nicely, the problem solving is happening, it’s you and the code and the phone isn’t ringing. There’s a space I used to get into in interpreting – I miss it a lot – which is broadly similar; I called it the zone; I imagine other people approach it different because like most effects, it can be quite personal. I actually did an interpreting test for the first time in more than ten years last year and while it didn’t go perfectly for me, I did, in the course of practice, hit that zone a couple of times. I’d love to see what my brain activity looks like when I hit; it’s a place where you’ve to fight for nothing mentally.
There are a couple of different paths into a career as a conference interpreter. The University of Westminster cancelled the course I did a number of years ago and appear to have replaced it with an MA in Translating & Interpreting, but there appears, in Ireland, to be a course at the National University in Galway, and in the UK, there are joint translation/interpreting courses at the University of Bath, the University of Leeds, London Metropolitan University, The University of Manchester, the University of Salford and Heriot-Watt University in Edinburgh. Outside the English speaking colleges, there are options in France and Belgium at ESTI and ISTI and in Germany at Hamburg and Heidelberg (at least). These courses are postgraduate courses so fees are very obviously going to be a factor to consider.
Ultimately, the two big employers of interpreters in the world are the United Nations and the European Union institutions.
From the point of view of what you need to go down the road of interpreting, the obvious ones are a) a very strong command of your mother tongue and b) comprehensive understanding of two other languages.
You also need the ability to research and get up to speed with various different fields of expertise. The one which used to make my blood run cold during my training was any discussion of European fisheries policy as fish species in English were ongoing hassle, never mind fish species in French and German.
In many respects, it’s a career which allows you access to learn about a lot of other different areas; I’d be happy to go back. But I’d also like to look at breaking down the challenges in automating it as well and that’s a really hard problem to solve; not least because we haven’t solved machine translation very effectively either although a lot of work is happening in the area. Not because I would like to see a bunch of interpreters lose their jobs – they shouldn’t because for all that we might get actual words automatically translated, we are missing a lot of the non-verbal nuances and cultural markers that come not directly from the words themselves, but how they are used, and marked with non-verbal clues, for example. Computers don’t get irony or sarcasm.
One of the reasons I really like the Mosaic piece is that it provides some useful other references for you to carry out your own research. With respect to science writing online, this is really helpful. I have to say kudos to them.
Friday 7 November saw Professor David Spiegelhalter talking about risk at the Royal Irish Academy. If you’re not familiar with him, his site is here, he occasionally pops up on BBC Radio 4’s More or Less and other interesting places.
Risk is an interesting thing because humans are appallingly bad at assessing it. Ultimately, the core of Professor Spiegelhalter’s talk focused on calculating risk (yes, there is a micromort unit of measurement) and more specifically, communicating it in human friendly terms. This is not to suggest statisticians are not human; only that they have a language (we have a language) that isn’t always at one with general understanding.
This isn’t the only problem either – humans appear to be very good at not worrying about non-immediate risks as well. So this presents a number of challenges in terms of decision making behaviour on the part of people.
Talks like this can be massively entertaining if done well; less so if badly done. In one respect, one of the overwhelming contrasts of the evening was the absolute contrast between Professor Spiegelhalter’s talk and Patrick Honohan’s response which focused on difficulties in risk assessment in the financial sector. I took a slightly dim view of the response on the basis that every single banking ad makes it clear that the value of your home (or assets ) can go down as well as up and did so for most of the 2000s in this country, and therefore it isn’t so much a question as we didn’t understand the risk – many people just did not want to accept it. In certain respects, it has a lot in common with people who find it hard to live healthily now to for benefits sixty years down the line. If I had to choose who got their message across more effectively, by some distance it was Professor Spiegelhalter.
Talks of this nature interest me; particularly as they relate to numbers and numeracy, and in this case, on risk. People are never particularly good on probability and chance despite all that Monopoly board training each Christmas. Ultimately, the impression I got from the talk is that the debate has moved on somewhat from “what is the risk of [X bad or good thing] happening” to “how do we effectively communicate this risk”. It’s interesting – in a tangential way – that we are swimming in methods of communicating things these days between online streaming, social media feeds, many online publishing platforms and still, with science and numbers, we are only finding the corrective narrative for engagement in a hit or miss manner. Professor Spiegelhalter delivers his talk in an excellent manner. It is a pity that more people will not get to hear it.
on a related note, if you’re interested in talks of a science and mathsy flavour, the RIA and the Meteorological Society are prone to organise such things on the odd occasion. Check their websites for further information.
The application was pulled on Friday 7 November. Here is the statement issued by the Samaritans on that occasion.
I am not sure how permanently gone it is, but this is worth noting:
We will use the time we have now to engage in further dialogue with a range of partners, including in the mental health sector and beyond in order to evaluate the feedback and get further input. We will also be testing a number of potential changes and adaptations to the app to make it as safe and effective as possible for both subscribers and their followers.
Feedback for the Radar application was overwhelmingly negative. There is nothing in this statement to suggest that the issue for the Samaritans is that there were problems with the app, only that some people were vocal about their dislike of it.
I really don’t know what to say at this stage. While I’m glad it has been withdrawn for now, I’m not really put at ease to know that the Samaritans have an interest in pushing it out there again. It was a fiasco in terms of app design and especially community interaction. There is nothing, absolutely nothing, to indicate that they saw the light about the technical issues with the application, the ethical issues with the app and the legal difficulties with asserting they weren’t data controllers for that app.
I hate this because a) it negatively affected a lot of people who might in under circumstances use Samaritans services and b) it makes the job of data scientists increasingly difficult. It is very hard to use a tool to do some good stuff when the tool has been used to do bad stuff.
In addition to the tech stuff, and the data stuff, and opinions linked to each, I have an interest in languages as well (this might explain one of the projects I have running in the background)
Given the fact that I lived in Germany for a few extended periods between the ages of 19 and 23, it’s surprising that the first time I came across the word entlieben was this morning, in particular, since entlieben perfectly describes something that’s happened me a few times in my life, and probably most people.
If you go to online Duden, the definition is given as:
aufhören [einander, jemanden] zu lieben
This can be translated as “stop loving [one another/someone]”
But I don’t think that’s quite the holy all of it in atmosphere. I prefer the “fall out of love with” translation which adds a little nuance which I think matters in the case when we are discussing labelling feelings.
The opposite – incidently (because, mostly you have to do it first) – is verlieben. Interestingly, Duden defines that as:
von Liebe zu jemandem ergriffen werden
To be moved to love someone is the literal translation. Here, we would say ” fall in love with”.
The verb lieben means to love or to like – a bit like French it covers a few bases, although both have closer equivalents to like in the indirect forms “Ca me plait” and, specifically for German, “Das gefaellt mir”. It’s interesting to note, by the way, that usage of the verb “like” in English functioned this way around five hundred years ago, per Shakespeare. But this is not a discussion of verbs describing the action of “being pleasing to”.
What is interesting – if you are of a systematic kind of mind is the impact of prefixes on a root word like lieben, and how they can be used for similar impacts on other root words. I’ve been aware of these for years – the ones that stand out from German language tuition at university are Einsteigen, Aussteigen and Umsteigen, which respectively mean “get into” [a form of transport], “get off” [a form of transport] and “change from one to another”[form of transport].
I’ve seen the form ent- before in verbs like “entziehen“, to take away, withdraw. I’ve just never seen it used on the verb lieben before and despite the fact that it’s a straight application of an unmysterious system in the German language, it seems rather lyrical in a way that something de- does not in English.
The furore refuses to die down and to be honest, I do not think the Samaritans are helping their own case here. This is massively important, not just in the context of the Samaritans’ application, but in the case of data analysis in the health sector in general. In my view, the Samaritans have got this terribly wrong.
If you’re not familiar with Samaritans Radar, here is how it works.
- You may be on twitter, and your account may have any number of followers.
- Any one of those followers may decide that they like the idea of getting a warning in case any of the people THEY follow are suicidal.
- Without obtaining permission from the people they follow, they download/install/sign up for Samaritans Radar which will read the tweets that the people they follow post, run a machine learning algorithm against it, and tag the tweets as potentially a cause for concern regarding a possible suicide attempt if it trips on their algorithm.
- The app will then generate an email to the person who installed it.
In their blurb, the Samaritans make it clear that at no point will the person whose tweets are being processed be asked, or potentially even know that this is happening. As an added bonus, at the outset, their FAQ made it clear they didn’t want to let people out of having their tweets processed in this way without their consent or even knowledge. They had a whitelist for the occasional organisation whose language might trip the filter, but after that, if your friend or contact installed the application, you had no way out.
That last part didn’t last for long. They now accept requests to put your twitter id on what they call a whitelist but what is effectively an opt out list. And their performance target for getting you opted out is 72 hours. So you can be opted in instantly without your permission, but it may take three days to complete your request to get opted out, plus you get entered on a list. Despite not wanting anything to do with this.
There is a lot of emotive nonsense running around with this application, including the utterly depressing blackmailing line of “If it saves even one life, it’ll be worth it”. I’m not sure how you prove it saves even one life and against that, given the criticism about it, you’d have to wonder what happens if it costs even one life. And this is the flipside of the coin. As implemented, it could.
When I used to design software, I did so on the premise that software design should also mitigate against things going wrong. There are a number of serious issues with the current implementation of Samaritans Radar, and a lot of things which are unclear in terms of what they are doing.
- As implemented, it seems to assume that the only people who will be affected by this are their target audience of 18-35 year olds. This is naive.
- As implemented, it seems to assume that there is an actually friendship connection between followers and followees. Anyone who uses Twitter for any reason at all knows that this is wrong as well.
- As implemented it defaults all followees into being monitored while simultaneously guaranteeing data protection rights not to them but to their followers.
- As implemented, it is absolutely unclear whether there are any geographically limitations on the reach of this mess. This matters because of the different data protection regulations in different markets. And that’s before you get to some of the criticisms regarding whether the app is compliant with UK data protection regulations.
So, first up, what’s the difference between what this app is doing versus any, for example, market research analysis being done against twitter feeds.
This app creates data about a user and it uses that data to decide whether to send a message to a third party or not.
Twitter is open – surely if you tweet in public, you imagine someone is going to read it, right? This is true within a limit. But there’s a difference between someone actively reading your twitter feed and them getting sent emails based on keyword analysis. In my view, if the Samaritans wants to go classifying Twitter users as either possibly at risk of suicide or not, they need to ask those Twitter users if they can first. They haven’t done that.
The major issue I have about this is that I am dubious about sentiment analysis anyway, particularly for short texts which twitter feeds are.
Arguably, this is acting almost as a mental health related diagnostic tool. If we were looking to implement an automated diagnostic tool of any description in the area of health medicine, it’s pretty certain that we would want it tested for very high accuracy rates. Put simply, when you’re talking about health issues, you really cannot afford to make too many mistakes. Bearing in mind that – for example – failure rates of around 1% in contraception make for lots of unplanned babies, a failure rate of 20% classifications in terms of possibly suicidal could be seriously problematic. A large number of false positives and that’s a lot of incorrect warnings.
Some people might argue that a lot of incorrect warnings is a small price to pay if even one life is saved. If you deal with the real world, however, what happens is that a lot of incorrect warnings cause complacency. False negatives are classifications where issues are missed. They may result in harm or death.
Statistics theory talks about type 1 and type 2 errors, which effectively are errors where something is classified incorrectly in one direction or the other. The rate of those errors matters a lot in health diagnosis. In my view, they should matter here, and if the Samaritans have done serious testing in this area, they should release the test results, suitably anonymised. If they did not, then the application was not anywhere near adequately tested. Being honest, I’m really not sure how they might effectively test for false negatives using informed consent.
Ultimately, one point I would make is that sometimes, the world is not straightforward, and some things just aren’t binary. Some things exist on a continuum. This app, in my view, could move along the continuum from a bad thing to a good thing if the issues with it were dealt with. At the absolute best, you could argue that the application is a good thing done badly, spectacularly so in my view, since it may allow people who aren’t out for your good to monitor you and identify good times to harass you. The Samaritans’ response to that was to make a complaint with Twitter if you get harassed. A better response would be to recognise this risk and mitigate against enabling such harassment in the first place.
Unfortunately, as things stand, if you want to prevent that happening, you have to ask the Samaritans to put you on a list. The app, as designed, defaults towards allowing the risk and assumes that people won’t do bad things. This may not be a good idea in the grand scheme of things. It would be better to design the app to prevent people from doing bad things.
The thing is, in the grand scheme of things, this matters a lot, not just because of this one app, but because it calls into question a lot of things around the area of datamining and data analysis in health care, be it physical or not.
If you wanted, you could re-write this app such that, for example, every time you posted a tweet about having fast food in any particular fast food company, concerned friends sent you an email warning you about your cholesterol levels. Every time you decided to go climbing, concerned friends could send you emails warning you how dangerous climbing is, and what might happen if you fell. Every time you went on a date, someone could send you a warning about the risk that your new date could be an axe-murderer. You’d have to ask if the people who are signing up to this and merrily automatically tweeting about turning their social net into a safety net would love it if their friends were getting warnings about the possiblity that they might get raped, have heart attacks, get drunk, fall off their bikes, get cancer if they light up a cigarette, for example.
I personally would find that intrusive. And I really don’t know that twitter should default towards generating those warnings rather than defaulting towards asking me if I want to be nannied by my friends in this way. I’d rather not be actually. I quite like climbing.
The biggest issue I have with this, though, is that it is causing a monumentally negative discussion around machine learning and data analysis in the healthcare sector, such that it is muddying the water around discussions in this area. People like binary situations; they like black and white and they like everything is right or everything is wrong. If I were working in the data sector in health care, looking into automated classification of any sort of input for diagnosis support, for example. I’d be looking at this mess in horror.
Already, a lot of voices against this application – which is horrifically badly designed an implemented – are also voicing general negativity about data analysis and data mining in general. And yet data mining has, absolutely, saved lives in the past. What John Snow did to identify the cause of the 1854 Broad Street cholera outbreak is pure data mining and analysis. Like any tool, data analysis and mining can be used for good and for bad. I spent a good bit of time looking at data relating to fatal traffic accidents in the UK last year and from that concluded that a big issue with respect to collisions were junctions with no or unmarked priorities.
So, the issue with this is not just that it causes problems in the sphere of analysing the mindset of various unsuspecting Twitter users and telling their friends on them, but that it could have a detrimental impact on the use of data analysis as a beneficial tool elsewhere in healthcare.
So what now? I don’t know any more. I used to have a lot of faith in the Samaritans as a charity particularly given their reputation for integrity and confidentiality. Given some of their responses to the dispute around this application, I really don’t know if I trust them at the moment as they are unwilling to understand what the problems with the application are. Yes they are collecting data, yes they are creating data based on that data, and yes, they are responsible for it. And no they don’t understand that they are creating data, and no they don’t understand that they are responsible for it. If they did, they wouldn’t write this (update 4th November):
We condemn any behaviour which would constitute bullying or harassment of anyone using social media. If people experience this kind of behaviour as a result of Radar or their support for the App, we would encourage them to report this immediately to Twitter, who take this issue very seriously.
In other words, we designed this App which might enable people to bully you and if they do, we suggest you annoy Twitter about it and not us.
The other issue is that the Samaritans appear to be lawyering up and talking about how it is legal, and it’s not against the law. This misses a serious point, something which is often forgotten in the tech industry (ie, do stuff first and ask forgiveness later), namely, Just because you can do something doesn’t mean you should do it.
Right now, I think the underlying idea of this application is a good idea but very badly implemented and that puts it safely into the zone of a bad idea right now. Again, if I were the Samaritans, once the first lot of concerns started being voiced, I would have pulled the application and looked at the problems around consent to being analysed and having data generated and forwarded to followers. It’s obvious though that up front, they thought it was a good idea to do this without consent and you’d have to wonder why. I mean, in general terms, if you look at my twitter feed, it’s highly unlikely (unless their algorithm is truly awful altogether) that anything I post would flag their algorithm. I’m not coming at this from the point of view of feeling victimised as someone who is at risk of getting flagged.
My issues, quite simply, are this:
- it’s default opt in without even informing Twitter users that they are opted in. The Samaritans have claimed that over a million twitter feeds are being monitored thanks to 3000 sign ups. You’d have to wonder how many of those million twitter accounts are aware that they might cause an email to be sent to a follower suggesting they might be suicidal.
- the opt-out process is onerous and, based on the 72 hour delay they require, probably manual. Plus initially, they weren’t even going to allow people to opt out.
- It depends on sentiment analysis, the quality of which is currently unknown.
- The hysteria around it will probably have a detrimental effect on consent for other healthcare related data projects in the future.
The fact that you can ask the Samaritans to put you on a blocklist isn’t really good enough. I don’t want to have my name on any list with the Samaritans either which way.
EDIT: I fixed a typo around the Type 1 and Type 2 errors. Mea culpa for that.