machine learning – Musings on Languages, IT and other stuff

Some comments on the march of technology in interpreting

Troublesome Terps did a podcast on remote interpreting about a month ago which I finally found time to listen to yesterday. I won’t go into it in too much detail but a couple of things struck me during the conversation which I wanted to tease out as someone who is a trained interpreter, who likes the actual activity of interpreting simultaneously, and who has a bit of experience working in IT, in fact, quite a bit more than working as an interpreter.

When I listened to the piece, it wasn’t so much the discussion on value add that Jonathan Downie discussed – this ties in with a view I’ve expressed elsewhere about how the money in the language industry is not actually in the language bit of the industry per se, but the fact that the discussion caused me to think of two companies in particular putting effort into the self driving sector, namely Tesla and Uber, both, potentially with a view to having a fleet of self driving cars carrying out the work currently done mainly by cabbies. In the meantime, Tesla are selling you cars and learning from your driving habits and Uber are learning from your public transportation needs.

We haven’t really solved machine translation adequately yet. But it has reached a stage to where it is considered “enough” by people who are generally ill qualified to assess whether in fact it is considered “enough” for their market. Output from Google Translate is considered more than enough by lots of people every day who run newspaper articles through it to get a gist. At least one, if not two, crowdfunding campaigns are pushing simultaneous interpreting systems, often pushing its AI and machine learning credibility to sound attractive. In my view, the end game with remote interpreting is less likely to be industrial parks full of interpreting booths or home interpreting systems, and more automated interpreting. Remote interpreting allows the expectation of quality to shift.

We would laugh if any human translator translated Ghent to Cork and yet, I have seen Google Translate do this. We would also not pay the human translator for such egregious errors. But Google is free, so meh. We tolerate it and we use it much more than we ever used human translators.

At some point, after the remote interpreting system, someone is going to AI their marketing speech about an interpreting system which cuts out the need for interpreters because Machine Learning System Blah. Both voice recognition and machine learning need to improve radically across all languages to get there to match humans but if we first bring about a situation where lower standards are tolerated (or cannot be identified) then selling a lesser quality product to the consumers of interpreting services becomes easier.

Much of what remote interpreting is bringing now is basically nothing to interpreters – I have a vision of three interpreters handling a conference somewhere in Frankfurt from their kitchens in South Africa, Berlin and somewhere in Clare, and they cannot really talk to each other in terms of who will take what slots, whether someone will catch a bunch of numbers or run out and get a few bottles of water. It seems to me that a lot of what remote interpreting is about forgets that a lot of conference interpreting is not about 1 person doing some interpreting; it’s about a team of people who need contact and coordination in real time. A lot of remote interpreting is around “this market is ripe for disruption” but the disruption is not necessarily being driven by people who know much about what the service actually involves. It misses a lot of context and perhaps it needs to do that because ultimately, the endgame may not be not about remote interpreting but non-human interpreting.

Hysteria of Hype

Somewhere around the web, there’s a cycle of hype which generally pins down where we are in terms of a hype cycle. I have not the time to go looking for it now but put simply, it has bunch of stages. I have decided it is too complicated for the tech sector.

Basically, the point at which you start seeing comments around X is the next big thing is the point at which something else is the next big thing. Sounds contradictory? Well yeah, it is.

Most people talking about the next big thing being X tend not to know a whole lot about X. Their primary objective is to make money off X. They do not really care what X achieves, so long as it makes them money.

Five years ago up to oh I don’t know, middle of 2014, early 2015 sometime, Big Data Is The Next Big Thing. Being blunt about it, there has been very little obvious Life Changing going on courtesy of Big Data and that is because by the time people started screaming about big data in the media and talking about how it was the future, it had ceased to be the future in the grand scheme of things. Artificial intelligence and machine learning, now they are the next big thing.

I have to declare an interest in machine learning and artificial intelligence – I wrote my masters dissertation on the subject of unsupervised machine learning and deep learning. However, I am still going to say that machine learning and artificial intelligence are a) a long way short of what we need them to be to be the next big thing b) were the next big thing at the time everyone was saying that big data is the next big thing.

It is particularly galling because of Alpha Go and the hysteria that engendered. Grown men talking about how this was the N.

Right now, artificial intelligence is still highly task limited. Sure it is fantastic that a machine can beat a human being at Go. In another respect, it isn’t even remotely special. AlphaGo was designed to do one thing, it was fed with data to do one thing. Go, and chess to some extent, are the same thing as brute forcing a password. Meanwhile, the processes designed to win games of Go and chess are not generally also able to learn to be fantastic bridge players, for example. Every single bit of progress has to be eked out, at high costs. Take machine translation. Sure, Google Translate is there, and maybe it opens a few doors, but it is still worse than a human translator. Take computer vision. It takes massive deep learning networks to even approximate human performance for identifying cats.

I’m not writing this to trash machine learning, artificial intelligence and the technologies underpinning both. I’m saying that when we have a discussion around AI and ML being the next big thing, or Big Data being the next thing, we are having the equivalent of looking at a 5 year old playing Twinkle Twinkle Little Star and declaring he or she will be the next Yehudi Menuhin. It doesn’t work like that.

Hype is dangerous in the tech sector. It overpromises and then, screams blue murder when delivery does not happen. Artificial intelligence does not need this. It’s been there before with the AI winter and the serious cuts in research. Artificial intelligence doesn’t need to be picked on by the vultures looking for the next big thing because those vultures aren’t interested in artificial intelligence. They are only interested in the rentability of it. They will move on when artificial intelligence fails to deliver. They will find something else to hype out of all order. And in the meantime, things which need time to make progress – and artificial intelligence has made massive jumps in the last 5 or 6 years – will be hammered down for a while.

For the tl;dr version, once you start talking about something being the next big thing, it no longer is.

The invisible conduit of interpreting

Jonathan Downie made an interesting comment on his twitter this morning.

Interpreting will never be respected as a profession while its practitioners cling to the idea that they are invisible conduits.

Several things occurred to me about this and in no particular order, I’m going to dump them out here (and then write in a little more detail how I feel about respect/interpreting)

Some time ago I read a piece on the language industry and how much money it generated. The more I read it, the more I realised that there was little to no money in providing language skills; the money concentrated itself in brokering those skills. In agencies who buy and sell services rather than people who actually carry out the tasks. This is not unusual. Ask the average pop musician how much money they make out of their activities and then check with their record company.
As particular activities become more heavily populated with women, the salary potential for those activities drops.
Computers and technology.

Even if you dealt with 1 and 2 – and I am not sure how you would, one of the biggest problems that people providing language services now have is the existence of free online translation services and, for the purposes of interpreters, coupled with the ongoing confusion between translation and interpreting, the existence Google Translate and MS’s Skype Translate will continue to undermine the profession.

However, the problem is much wider than that. There are elements of the technology sector who want lots of money for technology, but want the content that makes that technology salable for free. Wikipedia is generated by volunteers. Facebook runs automated translation and requests correction from users. Duolingo’s content is generated by volunteers and their product is not language learning, it is their language learning platform. In return, they expect translation to be carried out.

All of this devalues the human element in providing language skills. The technology sector is expecting it for free, and it is getting it for free, probably from people who should not be doing it either. This has an interesting impact on the ability of professionals to charge for work. This is not a new story. Automated mass production processes did it to the craft sector too. What generally happens is we reach a zone where “good enough” is a moveable feast, and it generally moves downwards. This is a cultural feature of the technology sector:

The technology sector has a concept called “minimum viable product”. This should tell you all you need to know about what the technology sector considers as success.

But – and there is always a but – the problem is not what machine translation can achieve – but what people think it achieves. I have school teacher friends who are worn out from telling their students that running their essays through Google Translate is not going to provide them with a viable essay. Why pay for humans to do work which costs a lot of money when we can a) get it for free or b) a lot less from via machine translation.

This is the atmosphere in which interpreters, and translators, and foreign language teachers, are trying to ply their profession. It is undervalued because a lower quality product which supplies “enough” for most people is freely and easily available. And most people are not qualified to assess quality in terms of content, so they assess on price. At this point, I want to mention Dunning-Kruger because it affects a lot of things. When MH370 went missing, people who work in aviation comms technology tried in vain to explain that just because you had a GPS on your phone, didn’t mean that MH370 should be locatable in a place which didn’t have any cell towers. Call it a little knowledge is a dangerous thing.

Most people are not aware of how limited their knowledge is. This is nothing new. English as She is Spoke is a classic example dating from the 19th century.

I know well who I have to make.

My general experience, however, is that people monumentally over estimate their foreign language skills and you don’t have to be trying to flog an English language phrasebook in Portugal in the late 19th century to find them…

All that aside, though, interpreting services, and those of most professions, have a serious, serious image problem. They are an innate upfront cost. Somewhere on the web, there is advice for people in the technology sector which points out, absolutely correctly, that information technology is generally seen as a cost, and that if you are working in an area perceived to be a cost to the business, your career prospects are less obvious than those who work in an area perceived to be a revenue generating section of the business. This might explain why marketing is paid more than support, for example.

Interpreting and translation are generally perceived as a cost. It’s hard to respect people whose services you resent paying for and this, for example, probably explains the grief with court interpreting services in the UK, why teachers and health sector salaries are being stamped on while MPs are getting attractive salary improvements. I could go on but those are useful public examples.

For years, interpreting has leaned on an image of discretion, a silent service which is most successful if it is invisible. I suspect that for years, that worked because of the nature of people who typically used interpreting services. The world changes, however. I am not sure what the answer is although as an industry, interpreting needs to focus on the value add it brings and why the upfront cost of interpreting is less than the overall cost of pretending the service is not necessary.

Future work

Via twitter yesterday, I was pointed to this piece on one of the WSJ’s blogs. Basically it looks at the likelihood that given job type might or might not be replaced by some automated function. Interestingly, the WSJ suggested that the safest job might be amongst the interpreter/translation industry. I found that interesting for a number of reasons so I dug a little more. The paper that blogpost is based on is this one, from Nesta.

I had a few problems with it so I also looked back at this paper which is earlier work by two of the authors involved in the Nesta paper. Two of the authors are based at the Oxford Martin institute; the third author of the Nesta paper is linked with the charity Nesta itself.

So much for the background. Now for my views on the subject.

I’m not especially impressed with the underlying work here: there’s a lot of subjectivity in terms of how the underlying data was generated and in terms of how the training set for classification was set up. I’m not totally surprised that you would come to the conclusion that the more creative work types are more likely to be immune to automation for the simple reason that there are gaps in terms of artificial intelligence on a lot of fronts. But I was surprised that the outcome focused on translation and interpreting.

I’m a trained interpreter and a trained translator. I also have postgraduate qualifications in the area of machine learning with some focus on unsupervised systems. You could argue I have a foot in both camps. Translation has been a target of automated systems for years and years. Whether we are there yet or not depends on how much you think you can rely on Google Translate. In some respects, there is some acknowledgement in the tech sector that you can’t (hence Wikipedia hasn’t been translated using it) and in other respects, that you can (half the world seems to think it is hilariously adequate; I think most of them are native English speakers). MS are having a go at interpreting now with Skype. As my Spanish isn’t really up to scratch I’m not absolutely sure that I’m qualified to evaluate how successful they are. But if it’s anything like machine translation of text, probably not adequately. Without monumental steps forward in natural language processing – in lots of languages – I do not think you can arrive at a situation where computers are better at translating texts than humans and in fact, even now, to learn, machine translation systems are desperately dependent on human translated texts.

The interesting point about the link above is that while I might agree with the conclusions of the paper, I remain unconvinced by some of the processes that delivered them to those conclusions. To some extent, you could argue that the processes that get automated are the ones that a) cost a lot of people a lot money and b) are used often enough to be worth automating. It is arguable that for most of industry, translation and interpreting is less commonly required. Many organisations just get around the problem by having an in house working language, for example, and most organisations outsource any unusual requirements.

The other issue is that around translation, there has been significant naiveté – and I believe there continues to be – in terms how easy it is to solve this problem automatically. Right now we have a data focus and use statistical translation methods to focus on what is more likely to be right. But the extent to which we can depend on that tend to be available data and that varies in terms of quantity and quality with respect to language pairs. Without solving the translation problem, I am not sure we can really solve the interpreting problem either given issues around accent and voice recognition. For me, there are core issues around how we enable language for computers and I’ve come to the conclusion that we underestimate the non-verbal features of language such that context and cultural background is lost for a computer which has not acquired language via interactive experience (btw, I have a script somewhere to see about identifying the blockages in terms of learning a language). Language is not just 100,000 words and a few grammar rules.

So, back to the question of future work. Technology has always driven changes in employment practices and it is fair to say that the automation of boring repetitive tasks might generally be seen as good as it frees people up to higher level tasks, when that’s what it does. The papers above have pointed out that this is not always the case; that automation occasionally generates more low level work (see for example mass manufacture versus craft working).

The thing is, there is a heavy, heavy focus on suggesting that jobs disappearing through automation of vaguely creative tasks (tasks that involve a certain amount more decision making for example) might be replaced with jobs that serve the automation processes. I do not know if this will happen. Certainly, there has been a significant increase in the number of technological jobs, but many of those jobs are basically irrelevant. The world would not come to a stop in the morning if Uber shut down, for example, and a lot of the higher profile tech start ups tend to be targeting making money or getting sold rather than solving problems. If you look at the tech sector as well, it’s very fluffy for want of a better description. Outside jobs like programming, and management, and architecture (to some extent), there are few recognisable dream jobs. I doubt any ten year old would answer “business analyst” to the question “What do you want to do when you grow up”.

Right now, we see an excessive interest in disruption. Technology disrupts. I just think it tends to do so in ignorance. Microsoft, for example, admit that it’s not necessary to speak more than one language to work on machine interpreting for Skype. And at one point, I came across an article regarding Duolingo where they had very few language/pedagogy staff particularly in comparison to the number of software engineers and programmers, but the target for their product was to a) distribute translation as a task to be done freely by people in return for free language lessons and b) provide said free language lessons. The content for the language lessons is generally driven by volunteers.

So the point I am driving at is that creative tasks, which feature content creation, for example carrying out translation tasks, or providing appropriate learning tools, these are not valued by the technology industry. What point is there training to be an interpreter or translator if technology distributes the tasks in such a way as people will do it for free? We can see the same thing happening with journalism. No one really wants to pay for it.

And at the end of the day, a job which doesn’t pay is a job you can’t live on.

Falling out of love with Amazon

I remember a time when I used to love Amazon. It was back around the time when there was a lot less stuff on the web and it was an amazing database of books. Books, Books, Books.

I can’t remember when it ended. I find the relationship with Amazon has deteriorated into one of convenience more than anything; I need it to get books, but it’s doing an awful job of selling me books at the moment too. Its promises have changed, my expectations have risen and fallen accordingly. Serendipity is failing. I don’t know if it is me, or if it is Amazon.

But something has gone wrong and I don’t know if Amazon is going to be able to fix it.

There are a couple of problems for me, which I suspect are linked to the quality of the data in Amazon’s databases. I can’t be sure of course – it could be linked to the decision making gates in its software. What I do know is it is something I really can’t fix.

Amazon’s search is awful. Beyond awful. Atrocious. A disaster. It’s not unique in that respect (I’ve already noted the shocking localisation failings for Google if you Are English Speaking But You Live In Ireland And Not The United States When Looking For Online Shops) but in terms of returning books which are relevant to the search you put in, it is increasingly a total failure. The more specific your search terms as well, the more likely to are to get what can only be described as a totally random best guess. So, for example, if I look for books regarding Early Irish History, then search returning books on Tudor England are so far removed from what I want that it’s laughable. On 1 May 2015 (ie, day of writing) fewer than a quarter of the first 32 search results refer to Ireland, and only 1 of them is even remotely appropriate.

Even if you are fortunate enough to give them an author, they regularly return searches of books not by that author.

I find this frustrating at the best of times because it wastes my time.

Browsing is frustrating. The match between the categories and the books in those categories can be random. The science category is full of new age nonsense and it often is very much best selling so the best sellers page becomes utterly useless. School books also completely litter the categories, particularly in science. I have no way of telling Amazon that I live in Ireland and have no real interest in UK school books, or, in fact, any school books when I am browsing geography.

Mainly I shouldn’t have to anyway. They KNOW I live in Ireland. They care very much about me living in Ireland when it comes to telling me they can deliver stuff. They just keep trying to sell me stuff that really, someone in Ireland probably isn’t going to want. Or possibly can’t buy (cf the whinge about Prime Streaming video to come in a few paragraphs). Amazon is not leveraging the information it has on me effectively AT ALL.

The long tail isn’t going to work if I can’t find things accidentally because I give up having scrolled through too many Key Stage Three books.

Foreign Languages: Amazon makes no distinction between text books and, for want of a better word, non-text books in its Books in Foreign Languages section. So again, once you’ve successfully drilled down to – for example – German – you are greeted with primarily Learn German books and Dictionaries, probably because of the algorithm which prioritises best sellers.

How can I fix this?

Basically, Amazon won’t allow me to fix things or customise things such that I’m likely to find stuff that interests me more. I don’t know whether they are trying to deal with these problems in the background – it’s hard to say because well, they don’t tend to tell you.

But.

It would be nice to be able to reconfigure Treasa’s Amazon. Currently, its flagship item is Amazon Prime Streaming Video, which is not available in Ireland.Amazon knows I am in Ireland. It generally advises me how soon it can deliver stuff to Ireland if I’m even remotely tempted to buy some hardcopy actual book. Ideally they wouldn’t serve their promotions for Amazon Prime Streaming Video, but if they have to inflict ads for stuff they can’t sell me, the least they could do is let me re-order the containers in which each piece of information appears. So I could prioritise books and coffee which I do buy, over streaming video and music downloads which I either can’t or don’t buy from amazon usually.
It would be nice to be able to set up favourite subject streams in books or music or dvds. I’d prefer to prioritise non-fiction over beach fiction, for example.
I’d like to be able to do (2) for two other languages as well. One of the most frustrating things with the technology sector is the assumption of monolinguality. I’d LIKE to be able to buy more books in German, in fact I’m actively TRYING to read more German for various reasons, and likewise for French.
I don’t have the time to Fix This Recommendation. They take 2 clicks and feature a pop up. As user interaction, it sucks. I’d provide more information for fixing the recommendations if I could click some sort of Reject from the main page and have them magically vanish. Other sites manage this.

But there are core problems with Amazon’s underlying data I think. Search is so awful and so prone to bringing back wrong results, it can only be because metadata for the books in question is wrong or incomplete. If they are using text analysis to classify books based on title and description, it’s not working. Not only that, their bucket classification is probably too broadbased. Their history section includes a metric tonne of historical fiction, ie, books which belong in fiction and not in history. If humans are categorising Amazon’s books, they are making a mess of it. If machine learning algorithsm are, they are making a mess of it.

There is an odd quirk in the sales based recommender which means that I can buy 50 books on computer programming but as soon as I buy one oh book of prayers as a gift for a relative, my recommender becomes highly religious focused and prayer books outplay programming books. Seriously: 1 prayer book to 50 programming books means you could probably temper the prayer books. Maybe if I bought 2 or 3 prayer books you could stop assuming it was an anomaly. This use of anomalous purchases to pollute the recommendations is infuriating and could be avoided by Amazon not overly weighting rare purchases.

I’m glad Amazon exists. But the service it has provided, particularly in terms of book buying, is nowhere near as useful as it used to be. Finding stuff I know I want is hard. Finding stuff I didn’t know I wanted but now I HAVE to have is downright impossible.

And this is a real pity because if the whole finding stuff I wanted to buy was easier on the book front, I’d be happy to spend money on it. After all, the delivery mechanisms, by way of Kindle etc have, have become far, far easier.

Flickr: Park or Bird

This made me love Flickr when I found out about it yesterday.

I’m just really sorry my life is such that it happened long after I did some research into deep learning for my dissertation this summer. I’d have given anything to quote XKCD in it.