For personal reasons, I was unable to attend the recent conference on machine translation held in Ede in the Netherlands as I had planned. A colleague, Diane McCartney, was there and has gracefully consented to share her impressions and extensive notes from the day, which will appear here in several blog posts, of which this is the second part. It was a busy day.
***
In my opinion, Dr. O’Brien’s presentation was the highlight of the day. I still have a hard time believing that someone who doesn’t believe in MT as it is being shoved down our throats had been invited to the conference. But boy, I’m sure everyone in the room was glad they had chosen this particular workshop. It was so encouraging to see that research is being done by people who are interested in the research and not in selling vaporware.
She started with a short introduction of MT and its history. MT, we learned, has only really taken off in the last ten years when rule-based systems and statistic-based systems were married to create a hybrid paradigm. Rule-based systems consist of coding dictionaries, creating rules and ensuring the rules do what they’re supposed to do. TMs, so-called data-driven corpora, are used to create a statistic-based data-driven engine. The quality of the MT’s “training,” which is done by editing translated segments, is crucial to the quality of the output.
Symantec, which uses Systran, funded the MT research at Dublin City University, but as Dr. O’Brien says herself, she was not there to blow the Symantec or Systran horn but to give us a picture of MT that is based on a real scenario in a real, live environment. Symantec uses Systran because it enables them to quickly translate virus alerts. An engineer in Latvia, for example, doesn’t need a highly polished translation but a set of understandable instructions he needs to carry out. Here, the accuracy of the translation outweighs its style. This is a perfect example of “fit for purpose,” which is taught in translation theory and implies that a translation has to be accurate rather than polished. Symantec uses MT successfully because they know what they want, have taken the pre-processing steps, have involved the engineers and translators in the process and have implemented guidelines for writing for machine translation.
Dr. O’Brien ran her post-editing test in French and Spanish and used the LISA QA metric to assess it. The test was run with a good terminology database and a good MT. The results for French and Spanish were very similar, but would have varied if other, not so well-prepared, MT engines had been used. She pointed out that quality may be subjective but that we would probably all agree that “good quality” generally means a translation that accurately reflects the meaning of the source text and that one could rely on if one’s life were in danger. She also pointed out that Asian languages will produce different errors than Western European languages because the markers are different.
Quality being the hot topic of the day, she overtly disagreed with Renato’s statement and explained that she would talk a lot about quality. According to the research, the highest quality is achieved when there is a fit between the source text and the contents of the MT. Domain-driven engines are more successful than engines based on generic data. The assumption used to be the more data, the higher the quality, but new research has shown that the quality rather than the quantity of the data is crucial and that pre-processing steps are essential! If she didn’t have our full attention, she sure had it now!
So what does the post-editing challenge consist of? It consists of, well, trained bilingual translators fixing errors in a combined MT environment. MT developers are talking about monolingual post-editing, but no one really thinks that is a good idea because there is no way of checking the accuracy of a translation if the person reviewing the text doesn’t speak the source language. Throughout her presentation, Dr. O’Brien points out time and time again that tight control is the key in every area that touches on MT and that quality issues can and should be tackled at the source.
We also learned that there are in fact several levels of post-editing: Fast post-editing, which is also referred to as gist post-editing, rapid post-editing and light-post editing, consists of essential corrections only and therefore has a quick turnaround time, and Conventional post-editing, which is also referred to as full post-editing, consists of making more corrections, which result in higher quality but a slower turnaround time.
These levels are problematic because there are no standard definitions for the terms and no agreement on what each level means, and this creates a mismatch of expectations. A good way of defining which level of post-editing a customer needs it to discuss:
Volume - How many words/pages?Turnaround time - How much time has been planned for post-editing?Quality - How polished does the translation have to be?User requirements - Who are the readers and why will they be reading it?Perishability - Time in the sense of when the translation is really neededText function - What is the purpose of the text?
The distinction between light and full post-editing is in fact useful. The key to determining the level of post-editing needed depends on the effort involved, meaning the quality of the initial MT and the level of output quality expected. However, the customer may not know what they want themselves and may therefore be disappointed by what they get. It should, however, be clear whether the customer wants “good enough” quality, or quality that is similar or equal to human translation.
The nature of the post-editing task will vary depending on whether the quality of the output is good. If the quality is good, post-editing will consist mainly of minor changes, such as capitalization, numbers, gender, style and maybe a few sentences that need retranslating. If the quality is bad, the situation is reversed and post-editing will consist mainly of major changes, meaning more sentences that need retranslating and a few minor changes such as capitalization, numbers, gender etc.
There are many ways of measuring the quality of MT, some of which are more useful for post-editing and localization processes than others. The quality metric example in Dr. O’Brien’s presentation is that used by Symantec. There are, however other metrics such as General Text Matcher (GTM) and Translation Edit Rate (TER). The post-task edit distance is measured by comparing raw MT output to the post-edited segment and gives a score based on the number of insertions, deletions, shifts, etc. Whichever metric is used, it is important to remember that quality issues can be tackled at the content creation and pre-processing stages.
In order to get around the cost and subjectivity of the evaluation of translation output, IBM developed Bleu scores. This metric consists of taking a raw MT sentence and comparing it to a human translation, which is the Gold metric. This metric, however, only determines the similarity between the two, not the quality. This score only works in conjunction with a reference translation. MT providers all have Bleu scores and compare them with each other, but they are only useful for system development and comparison – they are not meaningful for the post-editing effort.
An alternative to Bleu scores are confidence scores, which are generated by MT by using its own knowledge about its own probabilities and its confidence of producing a good quality translation.
It terms of productivity, research has shown that post-editing is faster than translating and that the throughput rates vary between 3,000 and 9,000 words a day. However, comparisons are often made on first-pass translation versus post-editing, i.e. there is no revision. There will always be individual variations in speed that will differ across systems and languages. Experiments of post-editing using keyboard logging software show that post-editing involves less typing than translation, which probably matters more in terms of RSI than speed because translators are generally fast typers.
The cognitive effort required by translation and editing is rarely considered in research. However, translators report being more tired after post-editing and find post-editing more tedious, probably because they have to correct something they wouldn’t have written in the first place.
Dr. O’Brien didn’t spend much time on pricing, but she did make it clear that a whole new pricing model will have to be developed for MT post-editing. In her opinion, structured feedback to the system owner should be paid for and translators should be involved in the development of the system, terminology management, dictionary coding etc.
New generations of translators will benefit the most from post-editing because they will have grown up with technology and social networks and will be more flexible in terms of quality. Research suggests that students can learn about translation through post-editing.
***
Diane McCartney was born in California and raised in Germany where she attended a French-German school. She set up the translation department at ASK Computer Systems, where she used a UNIX program to prepare text for translation and review. Today she is based in the Netherlands and has been running her own company since 1997.
I wasn't at the conference this year, but from these posts I draw the same conclusion I brought home last year: nothing wrong with using MT if you do it right, and in the right setting. But recruiting the people who *can* do it right and also *want* to do it will be a major problem.
ReplyDeleteReally good proofreaders (or post-editors, what's in a name) are far and few in between, be it for human translation or for MT.
@Susan: I come to a little different conclusion based on this presentation. The cases where MT can be of any real value are so limited and will remain so limited that the technology is largely irrelevant to translators of quality. And as Jost Z's experience as an MT post-editor and that of others has shown, exposure to bad texts (be they produced by MT or by translators with tails) has a poisonous influence on one's own writing and is best avoided.
ReplyDeleteThe "success" of this whole MT endeavor lies with convincing good translators to waste themselves correcting this spew. If there is in fact the shortage of good translators claimed (one of the justifications used for MT), then these people have far better alternatives now, don't they? So what you are left with for post-editing is those with broken confidence (thus the TAUS campaign of intimidation and others) and monkeys. I wish the MT salesmen rotsa ruck as they pick as many corporate pockets as they can before prospects realize that the ROI just ain't there. Except possible with something like that pilot program in Texas that the new language services venture is doing.
Hi Susan,
ReplyDeleteI'd actually like to understand who you're referring to by 'you'. If there's one thing I took away from this conference is that MT as MT should be used is neither intended for translators nor for translation agencies, and that its usefulness is very limited. Or do you see yourself rewriting your customers' documents so they meet the strict criteria needed to get anything remotely useful out of an MT engine? Or doing all the legwork required to create a clean, domain-specific environment and code all the dictionaries? Pre-processing and controlled environment seem to be the keywords here.
Hi Kevin,
ReplyDeleteAh, but I'm not arguing that the technology is relevant to translators of quality. Rather the other way round: translators of quality are the prerequisite to make the technology work. But alas, they are unlikely to be very interested "en masse" - and it's that mass that's needed to "give MT the wings to fly", to copy some of the high-blown rhetoric.
Diane, my use of "you" was generic. I can definitely see that there *is* a market for MT, but like you said it's far smaller than what various people would want to make their audiences believe (or should I spell "make-believe" here?). I'm thinking of software help stuff, smart reuse of standardized documents translations (e.g. notarial law, which is a lot of template work) and not much more. Unfortunately, that distinction is all too often skipped in discussions on MT.
As for doing the legwork: let's say that we have a setting in which MT, in its current state, could be applied. Then if you rename that legwork and call it "linguistic consultancy", meaning I'd be involved from the outset in trying to work out a routine for a large client that would result in lower translation costs through the use of controlled writing and MT, I'd probably be quite excited about that, yes.
Why?
I like nerds (the engine programmers), I like not only language, but also linguistics (computational or not) and I like seeing what magic you *can* perform with MT. I understand figures, I understand enough of how software works and I am a merciless proofreader. I'm pragmatic enough to be satisfied with "good enough". Plus, I know how to explain the linguistics to the software guys and and the software side to the language lovers.
And hey, I'm commercial enough to know that that's a rare skillset... So, sure, count me in. And if the results are good, I'd even be happy to sell the stuff. But one condition: only if we're talking about "what's available in MT technology, and working, right now" and *not* if we're talking about "what'll be really, really fantastic in the future". I don't see those developments progressing quite as fast!
And no, don't count me in as a proofreader day in day out, or as a dedicated source text editor. But then again, I'd also be horrified at the prospect of translating day in day out, and it seems enough people see that as their calling and are really happy with that.
It takes every kind of people... and that holds true for MT as well.
What I'm saying is, I don't see enough of these people willing to jump on the MT bandwagon. And if MT sellers opt for a "plan B" with second-rate proofreaders, the whole scheme falls apart, even if they only apply MT in an MT-suitable setting. In that respect, MT is not much different from inferior work by human translators - if you work with an equally inferior proofreader, it's never going to be any good.
So in the end, you still need a good human proofreader.