... but with over 4 billion of them, that interpretation of the News on the Web corpus at Brigham Young University would be plausible. BYU is known for its high quality research corpora available to the public. The news corpus grows by about 10,000 articles each day, and its content can be searched online or downloaded.
The results are displayed in a highlighted keyword in context (KWIC) hit list with the source publications indicated in the "CONTEXT" column:
As a legal translator, I find the BYU corpus of US Supreme Court Opinions more useful. It displays results in a similar manner:
It is difficult or impossible to configure a direct search in these corpora using memoQ Web Search, IntelliWebSearch or similar integrated web search features in translation environments. However, these tools can be used as a shortcut to open the URL, and the search string can be applied once the site has been accessed. Since I perform searches like this to study context infrequently, a standalone shortcut with IWS serves me best; if I were using this to study usage in a language I don't master very well, like Portuguese (yes there is a Portuguese corpus at BYU - actually, two of them, one historical), then I might include the URL in a set of sites which open every time I invoke memoQ Web Search or a larger set of terminology-related sites in an IntelliWebSearch group.
One great benefit of using such corpora as a language learner, is that context and collocations (words that occur together with a particular word or phrase) can be studied easily, better than with dictionaries, enabling one to sound a bit less like an idiot in a second, third, fourth or fifth language. Or for many perhaps, even their first language :-)
An exploration of language technologies, translation education, practice and politics, ethical market strategies, workflow optimization, resource reviews, controversies, coffee and other topics of possible interest to the language services community and those who associate with it. Service hours: Thursdays, GMT 09:00 to 13:00.
▼
Jun 25, 2017
Jun 24, 2017
The multilingual toolkit for getting a date in Swahili
Some time ago, I was asked by IAPTI to provide some technical support for a developing effort to assist professional translators in various African regions. The flame of the Translators Without Borders center established a few years ago in Kenya has apparently sputtered out due to an incredibly silly anti-business model which undermined local professionals, so various initiatives were launched to help translators in the region grow stronger together and improve their professional practice.
Since memoQ is perhaps the best tool for managing the challenges of expert translation under the widest range of languages and conditions, I considered how I might contribute to solving some of these and reduce the frustrations of language barriers in Africa. I thought of all the business travelers there, as well as the NGOs and representatives of governments around the world who want a piece of what's there. All alone, strangers in a strange land, sweltering in some Nairobi hotel, how can these people even get a date in Swahili?
Once again, it's Kilgray to the rescue... with memoQ's auto-translation rules!
Using the various methods I have developed and published for planning and specifying auto-translation rules, I assembled an expert team for translation in Swahili, Arabic, Hebrew, English, German, Portuguese, Spanish, French, Russian, Hungarian, Dutch, Finnish, Polish and Greek to draft the rules for getting long dates in Swahili.
And using the Cretinously Uncomplicated Process for Identifying Dates (CUPID), these results can be transmogrified quickly to support lonely translators working from German, French and English into Arabic or from German, French, English and Spanish into Portuguese, for example, or in any combination of the languages applied for Swahili dates or others as needed.
With memoQ and regex-based auto-translation, you'll never be stuck for a quality-controlled date in any language!
Germany needs Porsches! And Microsoft has the Final Solution....
So he was left with no choice but to cut overhead using the latest technologies. Microsoft to the rescue! With Microsoft Dictate, his crew of intern sausage technologists now speak customer texts into high-quality microphones attached to their Windows 10 service stations, and these are translated instantly into sixty target languages. As part of the company's ISO 9001-certified process, the translated texts are then sent for review to experts who actually speak and perhaps even read the respective languages before the final, perfected result is returned to the customer. This Linguistic Inspection and Accurate Revision process is what distinguishes the value delivered by Globelinguatrans GmbHaha from the TEPid offerings of freelance "translators" who won't get with the program.
But his true process engineering genius is revealed in Stage Two: the Final Acquisition and Revision Technology Solution. There the fallible human element has been eliminated for tighter quality control: texts are extracted automatically from the attached documents in client e-mails or transferred by wireless network from the Automated Scanning Service department, where they are then read aloud by the latest text-to-speech solutions, captured by microphone and then rendered in the desired target language. Where customers require multiple languages, a circle of microphones is placed around the speaker, with each microphone attached to an independent, dedicated processing computer for the target language. Eliminating the error-prone human speakers prevents contamination of the text by ums, ahs and unedited interruptions by mobile phone calls from friends and lovers, so the downstream review processes are no longer needed and the text can be transferred electronically to the payment portal, with customer notification ensuing automatically via data extracted from the original e-mail.
Major buyers at leading corporations have expressed excitement over this innovative, 24/7 solution for globalized business and its potential for cost savings and quality improvements, and there are predictions that further applications of the Goldberg Principle will continue to disrupt and advance critical communications processes worldwide.
Articles have appeared in The Guardian, The Huffington Post, The Wall Street Journal, Forbes and other media extolling the potential and benefits of the LIAR process and FARTS. And the best part? With all that free publicity, my friend no longer needs his sales staff, so they are being laid off and he has upgraded his purchase plans to a Maserati.
The other sides of Iceni in Translation
The integration of the online TransPDF service from Iceni in memoQ 8.1 has raised the profile of an interesting company whose product, the Infix PDF Editor, has been reviewed before on this blog. TransPDF is a free service which extracts text content from PDF files, converts it to XLIFF for translation in common translation environments, and then re-integrates the target text from the translated XLIFF to create a PDF file in the target language.
This is a nice thing, though its applicability to my personal work is rather limited, as not many of my clients would be enthusiastic if I were to send PDF files as my translation results. Sometimes that fits, sometimes not. And of course, some have raised the question of whether using this online service is compatible with some non-disclosure restrictions.
I think it's a good thing that Kilgray has provided this integration, and I hope others follow suit, but for the cases where TransPDF doesn't meet the requirements of the job, it is useful to remember Iceni's other options for preparing text for translation.
Translatable XML or marked-up text export
As long as I can remember, the Infix PDF Editor has offered the option to export text on your local computer (avoiding potential non-disclosure agreement violations) so that it can be translated and then re-imported later to make a PDF in the target language. Only the location of this option in the menus has changed: the menu choices for the current version 7 are shown below.
This solution suffers from the same problem as the TransPDF service: not everyone will be happy with the translation in PDF, as this complicates editing a little. However, I find the XML extract very useful to put the content of PDF files into a LiveDocs corpus for reference or term extraction. The fact that Infix also ignores password protection on PDFs is also helpful sometimes.
"Article" export
The Article Tool of the Iceni Infix PDF Editor enables various text blocks on different pages of a PDF file to be marked, linked and extracted in various translatable formats such as RTF or HTML. The quality of the results varies according to the format.
Once "articles" are defined, they are exported via the command in the File menu:
The RTF export has some problems, as this view in Microsoft Word with the format characters made visible reveals:
However, the Simple HTML export opened in Microsoft Word shows no such troubles (and can be saved in RTF, DOCX or other formats):
Use of the article export feature requires a license for the Infix PDF editor, unlike the XML or marked-up text exports for translation. In demo mode, random characters are replaced by an "X" so that one can see how the function works but not receive any unjust enrichment from it. However, this feature has significant value for the work of translators and is well worth an investment, as the results are typically better than using OCR software on a "live" (text-accessible) PDF file.
But wait... there's more!
Version 7 also has an OCR feature:
I tested it briefly on some scanned Portuguese Help Wanted ads that I'll probably use for a corpus linguistics lesson this summer; the results didn't look too awful all considered. This feature is worth a closer look as time permits, though it is unlikely to replace ABBYY FineReader as my tool of choice for "dead" PDFs.
Jun 23, 2017
Terminology output management with SDL MultiTerm
I have always liked SDL MultiTerm Desktop - since long before it was an SDL product, back when it came as part of the package with my Trados Workbench version 3 license.
Then, as now, Trados sucked as a working tool, so I soon switched to Atril's Déja Vu for my translation work, and after 8 or 9 years to memoQ, but MultiTerm has continued to be an important working tool for my language service business. I extract and manage my terminology with memoQ for the most part, but when I want a high-quality format for sharing terminology with my clients' various departments, there is currently no reasonable alternative to MultiTerm for producing good dictionary-style output.
Terminology can be exported from whatever working environment you maintain it in, and then transferred to a MultiTerm termbase using MultiTerm Convert or other tools. In the case of memoQ, there is an option to output terms directly to "MultiTerm XML" format:
Fairly simple; there are no options to configure. Just select the radio button for the MultiTerm export format at the top of any memoQ term export dialog. And what do you get?
Three files: the XML file with the actual term data and the XDT file with the termbase specifications are the important ones. The latter is used to create the termbase in SDL MultiTerm. If you have an existing termbase to use in MultiTerm, you won't need the XDT file, though if that termbase is not based on Kilgray's XDT file there might be some mapping complications for the term inport from the XML file.
Now let's create a termbase in SDL MultiTerm 2017 Desktop:
Give it a name:
When the termbase wizard starts, choose the option to load an existing termbase definition and select the XDT file created by memoQ:
At the end of the process you will have an empty Multiterm termbase into which the data in the XML file are imported:
Now you'll have an SDL Multiterm termbase with the glossary content exported from memoQ. This is a process which can be carried out when sharing terminology with a colleague who uses SDL Trados Studio for translation, for example. If they don't know how to use the import functions of SDL Multiterm or you want to save them the bother of doing so, just share the SDLTB file.
Now that the glossary is in Multiterm it can be exported in various formats which can be helpful to people who prefer the data in a more generally accessible format. Please note that this is not done using the export functions under the File menu! SDL Multiterm is a program originally developed by German programmers, who have their own Konzept of Benutzerfreundlichkeit. Even in the hands of Romanian developers, it's still kinda weird. The desired functions are found in the Termbase Management area of course:
In keeping with the German Benutzerfreundlichkeitskonzept, the command to generate the desired output is Process, of course.
There are a number of pre-defined output templates included with Multiterm. I usually use a version of the "Word Dictionary" export definition, which produces a two-column RTF file, which by default will give output like this:
I prefer something a little different, so I have prepared various improved versions of this output definition, and I usually edit the text, adjust the column breaks as needed and clean up any garbage (like redundant initial letters caused by inflected vowels in a language like Portuguese), then I slap a cover page on the file and make a PDF out of it or create a nice printed copy, possibly with other page size formatting. Here is an example:
Example PDF dictionary output - click to enlarge |
Other possible output formats include HTML, which can be useful for term access on an intranet, for example. Custom definitions can be created by cloning and editing an existing definition; these are specific to a given termbase. If you want to apply a custom export definition to another termbase, export it as an XDX file and then load it for the other termbase. The definition file used to generate the example above is available here.
One essential weakness of the SDL export definition which has always annoyed me is the failure to include the last word on the page in the header as most proper dictionaries do. I addressed this in the definition with my limited knowledge of RTF coding, but the change can be made manually in Microsoft Word too, for example, by copying and pasting the SortTerm field and editing it to add the \l argument:
There are, of course other, possibly better ways to get some nice output formats from memoQ glossaries or termbases in other tools. One approach with memoQ is to create XSL scripts to process the MultiTerm XML output from memoQ. For years I have been hoping that Kilgray would create a simple extension to the term export dialog in memoQ, which would allow XSL scripts to be chosen and a transformation applied when the data are exported. It really is a shame that after more than a decade the best translation environment tool available - memoQ - still cannot match the excellent formatted output that my clients and I have enjoyed with MultiTerm since I first started using that program 17 years ago!
Jun 22, 2017
Translation alignment: the memoQ advantage
The basic features for aligning translated texts in memoQ are straightforward and can be learned easily from Kilgray documentation such as the guides or memoQ Help. However, there are three aspects of alignment in memoQ which I think are worth particular attention and which distinguish it in important ways from alignments performed with other translation environment tools or aligners.
The first is memoQ’s ability to determine alignment pairs automatically based on the similarity of names. This has the advantage, for example, that large numbers of files can be aligned automatically, with the source and target documents matched based upon the filenames. This can be done with individual files or with entire folders with perhaps hundreds of files. Thus if source files are contained in one folder and the translated files in the target language are in a different folder and the source and target file names are similar, the alignment process for a great number of files can be set up and run in a matter of minutes. Note in the example screenshot above that different file types may be aligned with each other.
The second important difference with alignment in memoQ is that it is really not necessary to feed the aligned content to a translation memory. memoQ LiveDocs alignments essentially function as a translation memory in the LiveDocs corpus, with one important difference: by right-clicking matches in the translation results pane or a concordance hit list, the aligned document can be opened directly and the full context of the content match can be read. A match or concordance hit found in a traditional translation memory is an isolated segment, divorced from its original context, which can be critical to understanding that translated segment. LiveDocs overcomes this problem.
A third advantage of alignment in memoQ is that, unlike environments in which aligned content can only be used after it is fed to a translation memory, a great deal of time can be saved by not “improving” the alignment unless its content has been determined to be relevant to a new source text for translation. If an analysis shows that there are significant matches to be found in a crude/bulk alignment, the specific relevant alignments can be determined and the contents of these finalized while leaving irrelevant aligned documents in an unimproved state. Should these unimproved alignments in fact contain relevant vocabulary for concordance searches, and if a concordance hit from them appears to be misaligned, opening the document via the context menu usually reveals the desired target text in a nearby segment.
Aligning the content of two folders with source and target documents; automatic pairing by name |
The second important difference with alignment in memoQ is that it is really not necessary to feed the aligned content to a translation memory. memoQ LiveDocs alignments essentially function as a translation memory in the LiveDocs corpus, with one important difference: by right-clicking matches in the translation results pane or a concordance hit list, the aligned document can be opened directly and the full context of the content match can be read. A match or concordance hit found in a traditional translation memory is an isolated segment, divorced from its original context, which can be critical to understanding that translated segment. LiveDocs overcomes this problem.
A third advantage of alignment in memoQ is that, unlike environments in which aligned content can only be used after it is fed to a translation memory, a great deal of time can be saved by not “improving” the alignment unless its content has been determined to be relevant to a new source text for translation. If an analysis shows that there are significant matches to be found in a crude/bulk alignment, the specific relevant alignments can be determined and the contents of these finalized while leaving irrelevant aligned documents in an unimproved state. Should these unimproved alignments in fact contain relevant vocabulary for concordance searches, and if a concordance hit from them appears to be misaligned, opening the document via the context menu usually reveals the desired target text in a nearby segment.
Jun 16, 2017
Troubleshooting memoQ light resource import problems
The other day I sent a friend some updated auto-translation rules for currency expressions; a short time later I received a message that they would not import into memoQ. The error message displayed was the following:
Now the problem here might seem obvious, but the name of the file I sent was nothing like any rule she already had installed,
In the example shown below, the source of the trouble is more obvious, but if there are a lot of resources in the list shown in the Resource Console or elsewhere, the redundancy of the name in the import dialog and an existing resource name in the list might not stand out so clearly....
In the MQRES file (for the memoQ light resource), the "trouble spot" is in the XML header at the top of the file. This can be seen by opening it in any text editor (in this case I used Notepad++ to show line numbering);
In this case, the fifth line contains the name that will be applied to the resource after it is imported. The <Name> tags are found in all kinds of memoQ light resources, and the same problem will occur if a redundancy is found during import. Here is an example from a memoQ ignore list (used to exclude certain words from error indications by spellchecking functions):
There are a couple of ways to avoid or correct these problems:
- First of all, when a ruleset is edited, the text enclosed by the Name tags should be altered. It's probably a good idea to update the Description as well. The FileName is actually ignored and need not be updated; a difference with the real name of the MQRES file will not cause any trouble with an import.
- When importing a light resource, you can always change the information read from the Name and Description tags of the MQRES file. This avoids the conflict.
- The name and description of an existing light resource can be edited via the Properties of the resource in the Resource Console or Project Home > Settings, Accessing the resource via memoQ Options will currently (as of version 8.1) not show the Properties.
memoQ's "light resources" - the portable configurations and information lists to assist various translation tasks - are one of the environment's greatest strengths, but the generally bad state of the associated editing tools and unhelpful error handling continue to cause a lot of unnecessary confusion among users. Key people at Kilgray are not unaware of this problem, and for years there has been a debate regarding new features versus actual usability of the features already present. When you encounter difficulties like the one described above - or other troubles using this generally excellent, leading translation assistance tool - it is important to communicate your concerns to Kilgray Support (support@kilgray.com).
Without appropriate feedback from the wordface, there is often really no way for the designers and product engineers to understand and prioritize the challenges of usability. I can understand the reluctance of those who have used other tools for many years, where it was clear that their requests for bug fixes or other improvements were largely ignored, to take such action, but it really does make a difference, though not always on a time scale of hours or days. Weeks, months, sometimes years may pass before important changes are made, but usually this is because the urgency of the matter has not been communicated with sufficient clarity, or there are in fact, more pressing matters which require attention. But in fact no serious matters are seldom ignored by those responsible, as nine years as a satisfied user have shown me.
Jun 11, 2017
On a TEUR with German financial translation
Currency expressions occur in great variety in German financial translation, and it is often a great nuisance to type and check the corresponding expressions, correctly formatted, in the target language. One group of such expressions are those involving thousands of euros, typically written in German as "TEUR". However, depending on the proclivities of the source text author, other forms such as T€, kEUR or k€ may be encountered.
On the target side, clients might want to see figures like "TEUR 1.352" rendered in a number of ways: perhaps EUR 1,352 thousand, perhaps €1,352k, perhaps something else.
I have described before how to map out source and target equivalents for developing auto-translation rules or regex-based quality checking instruments to use as the basis for development specifications and case testing as well as how to document the structure and reasoning of the respective rules.
Here you can download an example of possible solutions to the specific problem described above. The downloadable ZIP archive contains two different rulesets for each of the English target text formats cited above; these may be adapted to fit the particular requirements of a client as needed.
There are, of course, quite a number of other currency expressions one routinely encounters when translating financial texts or other business documents, and the diversity of client preferences for target language formats can be considerable. In many cases, it is worthwhile to document which rulesets correspond to which client's preference, perhaps even including client names in the filename to keep things straight. Thus "KPMG_TEUR-to-English" might be the ruleset name for the client KPMG's preference for how to translate those particular expressions to English.
Busy financial translators who use memoQ and who have discovered the benefits of rulesets like these tell me time and again how many hours or days of effort are saved routinely by using tools like these in translation and subsequent quality checks. They are a "secret weapon" in an often competitive environment with a lot of short, stressful deadlines.
Those who wish to have rulesets of their own to handle the specific requirements of their clients can turn to a number of sources for help. Kilgray's Professional Services department can develop custom rules, as can competent consultants such as Marek Pawelec or yours truly. One caveat: in hiring development experts for memoQ tools based on regular expressions (regex), it is generally a good idea to work with consultants whose primary focus is memoQ. Regular expressions are used in many other environments, such as Apsic Xbench and SDL Trados Studio (as well as many others having nothing to do with translation), but without an intimate, daily working acquaintance with memoQ, developers are often unable to understand the best approaches for working with the memoQ environment and it is all too possible to spend a lot of money on custom work which proves to be unusable, for example because the complex rules take many minutes to load each time a project or document is opened, because the developer did not break the problem down efficiently into its component parts. But done right, these rulesets are an investment which can pay enormous dividends for many specialist translators.
Jun 6, 2017
Build your own online reference TM for a team or anyone!
In the past, I have published several articles describing the use of free Google Sheets as a means of providing searchable glossaries on the Internet. This concept has continued to evolve, with current efforts focused on the use of forms and Google's spreadsheet service API to provide even more free, useful functionality.
On a number of occasions I have also mentioned that the same approaches can be used for translation memories to be shared with people having different translation environments, including those working with no CAT tools at all. However, the path to get there with a TM might not be obvious to everyone, and the effort of finding good tools to handle the necessary data conversions can be frustrating.
I've put up a demonstration TM in Portuguese and English here: https://goo.gl/LXXgmf
Here is a selection from the same data collection, selecting for matches of the Portuguese word 'cachorro': https://goo.gl/9KJils
This uses the same parameterized URL search technique described in my article on searchable glossaries.
A translation memory in a Google Sheet has a few advantages:
- It can be made accessible to anyone or to a selected group (using Google's permission scheme)
- It can be downloaded in many formats for adding to a TM or other reference source on a local computer
- Hits can also be read in context if the TM content is in the order it occurs in the translated documents. This is an advantage currently offered in commercial translation environment tools only by memoQ LiveDocs corpora.
Web search tools of many kinds can be configured easily to find data in these online Google Sheet "translation memories" - SDL Trados Studio, OmegaT and memoQ are among those tools with such facilities integrated, and IntelliWebSearch can bridge the gap for any environment that lacks such a thing.
But... how do you go from a translation memory in a CAT tool to the same content in a Google Sheet? This can be confusing, because many tools do not offer an option to export a TM to a spreadsheet or delimited text file. Some suggestions are found in an old PrAdZ thread, but I found a more satisfactory way of dealing with the problem.
A few years ago, the Heartsome Translation Studio went free and Open Source. It contains some excellent conversion tools. I downloaded a copy of the Heartsome TMX Editor (the available installers for Windows, Mac and Linux are here) and used it to convert my TMX file.
The result was then uploaded to a public directory on my personal Google Drive, and the URL was noted for building queries. Fairly straightforward.
The Heartsome TMX Editor seems like it might be a useful tool to replace Olifant as my TMX editor. While the TM editor in my tool of choice (memoQ) has improved in recent years, it still does not do many things I require, and some of this functionality is available in Heartsome.
Jun 5, 2017
Optimizing term properties for many entries in a memoQ termbase
Terminology: On my wishlist: an easier way to deal with termbases imported into MemoQ in Studio packages. Especially annoying: the habit of many Studio users capitalizing termbase entries & thus torpedoing recognition. It would seem the default setting in MultiTerm is fuzzy matching.
memoQ is noted for its compatibility with SDL Trados Studio files and projects; with the latest release of memoQ (version 8.1) there is apparently full compatibility with tracked changes in SDLXLIFF files and with Studio's translation quality assurance. However, there are apparently a few little points remaining to satisfy some.
The opening comment is from a colleague who seems to experience less than optional matching for terminology which is imported as part of a memoQ project using an SDL Trados Studio package (SDLPPX). The solution to this person's frustration is fairly simple, however, and it is useful in many other cases where the properties of terms in a memoQ glossary are not well optimized.
Many people are unaware of the fact that it is possible to change any of the term properties for a large number of terms at once. To do this, simply open the memoQ termbase for editing and select the terms to change. Multiple selections can be made by holding down the Shift key and clicking on the desired range of rows or by using the control key to mark individual selections. Then simply set the desired property (such as fuzzy matching) and the change will be applied to all of the selected terms.
About four years when fuzzy term matching was introduced by Kilgray I made a short video about this. The memoQ interface is a little different since then but the procedure works just as well today:
Technology for Legal Translation
Last April I was a guest at the Buenos Aires University Facultad de Derecho, where I had an opportunity to meet students and staff from the law school's integrated degree program for certified public translators and to speak about my use of various technologies to assist my work in legal translation. This post is based loosely on that presentation and a subsequent workshop at the Universidade de Évora.
Useful ideas seldom develop in isolation, and to the extent that I can claim good practice in the use of assistive technologies for my translation work in legal and other domains it is largely the product of my interactions with many colleagues over the past seventeen years of commercial translation activity. These fine people have served as mentors, giving me my first exposure to the concepts of platform interoperability for translation tools, and as inspirations by sharing the many challenges they face in their work and clearly articulating the desired outcomes they hoped to achieve as professionals. They have also generously and frequently shared with me the solutions that they have found and have often unselfishly shared their ideas on how and why we should do better in our daily practice. And I am grateful that I can continue to learn with them, work better, and help others to do so as well.
A variety of tools for information management and transformation can benefit the work of a legal translator in areas which include but are not limited to:
Useful ideas seldom develop in isolation, and to the extent that I can claim good practice in the use of assistive technologies for my translation work in legal and other domains it is largely the product of my interactions with many colleagues over the past seventeen years of commercial translation activity. These fine people have served as mentors, giving me my first exposure to the concepts of platform interoperability for translation tools, and as inspirations by sharing the many challenges they face in their work and clearly articulating the desired outcomes they hoped to achieve as professionals. They have also generously and frequently shared with me the solutions that they have found and have often unselfishly shared their ideas on how and why we should do better in our daily practice. And I am grateful that I can continue to learn with them, work better, and help others to do so as well.
A variety of tools for information management and transformation can benefit the work of a legal translator in areas which include but are not limited to:
- corpus utilization,
- text conversion,
- terminology management,
- diverse information retrieval,
- assisted drafting,
- dictated speech to text,
- quality assurance,
- version control and comparison, and
- source and target text review.
Though not exhaustive, the list above can provide a fairly comprehensive basis for education of future colleagues and continued professional development for those already active as legal translators. But with any of the technologies discussed below, it is important to remember that the driving force is not the hardware and software we use in technical devices but rather the human mind and its understanding of subject matter and the needs of the particular task or work process in the legal domain. No matter how great our experience, there is always something more and useful to be learned, and often the best way to do this is to discuss the challenges of technology and workflow with others and keep an open mind for new approaches with promise.
Reference texts of many kinds are important in legal translation work (and in other types of translation too, of course). These may be monolingual or multilingual texts, and they provide a wealth of information on subject matter, terminology and typical usage in particular contexts. These collections of text – or corpora – are most useful when the information found in them can be read in context rather than isolation. Translation memories – used by many in our work – are also corpora of a kind, but they are seriously flawed in their usual implementations, because only short segments of text are displayed in a bilingual format, and the meaning and context of these retrieved snippets are too often obscure.
An excerpt from a parallel corpus showing a treaty text in English, Portuguese and Spanish |
The best corpus tools for translation work allow concordance searches in multiple selected corpora and provide access to the full context of the information found. Currently, the best example of integrated document context with information searches in a translation environment tool is found in the LiveDocs module of Kilgray's memoQ.
A memoQ concordance search with a link to an "aligned" translation |
A past translation and its preview stored in a memoQ LiveDocs corpus, accessed via concordance search |
A memoQ LiveDocs corpus has all the advantages of the familiar "translation memory" but can include other information, such as previews of the translated work as well. It is always clear in which document the information "hit" was found, and corpora can also include any number of monolingual documents in source and target languages, something which is not possible with a traditional translation memory.
In many cases, however, much context can be restored to a traditional translation memory by transforming it into a "document" in a LiveDocs corpus. This is because in most cases, substantial portions of the translation memory will have its individual segment records stored in document order; if the content is exported as a TMX file or tab-delimited text file and then imported as a bilingual document in a LiveDocs corpus, the result will be almost as if the original translations had been aligned and saved, and from a concordance hit one can open the bilingual content directly and read the parts before and after the text found in the concordance search.
Legal translation can involve text conversion in a broad sense in many ways. Legal translators must often deal with hardcopy or faxed material or scanned files created from these. Often documents to translate and reference documents are provided in portable document format (PDF), in which finding and editing information can be difficult. Using special software, these texts can be converted into documents which can be edited, and portions can be copied, pasted and overwritten easily, or they can be imported in translation assistance platforms such as SDL Trados Studio, Wordfast or memoQ. (Some of these environments include integrated facilities for converting PDF texts, but the results are seldom as suitable for work as PDF or scanned files converted with optical character recognition software such as ABBYY FineReader or OmniPage.)
Software tools like ABBYY FineReader can also convert "dead" scanned text images into searchable documents. This will even work with bad contrast or color images in the background, making it easier, for example, to look for information in mountains of scanned documents used in legal discovery. Text-on-image files like the example shown above completely preserve the layout and image context of the text to be read in the best way. I first discovered and used this option while writing a report for a client in which I had to reference sections of a very long, scanned policy document from the European Parliament. It was driving me crazy to page through the scanned document to find information I wanted to cite but where I had failed to make notes during my first reading. Converting that scanned policy to a searchable PDF made it easy to find what I needed in seconds and accurately cite its page number, etc. Where there is text on pictures, difficult contrast and other features this is often far better for reference purposes than converting to an MS Word document, for example, where the layouts are likely to become garbled.
Software tools for translation can also make text in many other original formats accessible to translators in an ergonomically simpler form, also ensuring, where necessary, that no text is overlooked because of a complicated layout or because it is in an easily overlooked footnote or margin note. Text import filters in translation environments make it easy to read and translate the words in a uniform working environment, with many reference tools and other help available, and then render the translated text back into its original format or some more useful bilingual format.
An excerpt of translated patent claims exported as a bilingual table for review |
Technology also offers many possibilities for identifying, recording and controlling relevant terminology in legal translation work.
Large quantities of text can be analyzed quickly to find the most frequent special vocabulary likely to be relevant to the translation work and save these in project glossaries, often enabling that work to be organized better with much of the clarification of terms taking place prior to translation. This is particularly valuable in large projects where it may be advisable to ensure that a team of translators all use the same terms in the target language to avoid possible confusion and misunderstanding.
Glossaries created in translation assistance tools can provide terminology hints during work and even save keystrokes when linked to predictive, "intelligent" writing features.
Integrated quality checking features in translation environments enable possible deviations of terminology or other issues to be identified and corrected quickly.
Technical features in working software for translation allow not only desirable terms to be identified and elaborated; they also enable undesired terms to be recorded and avoided. Barred terms can be marked as such while translating or automatically identified in a quality check.
Technical tools enable terminology to be shared in many different ways. Glossaries in appropriate formats can be moved easily between different environments to share them with others on a team which uses diverse technologies; they can also be output as spreadsheets, web pages or even formatted dictionaries (as shown in the example above). This can help to ensure consistency over time in the terms used by translators and attorneys involved in a particular case.
There are also many different ways that terminology can be shared dynamically in a team. Various terminology servers available usually suffer from being restricted to particular platforms, but freely available tools like Google Sheets coupled with web look-up interfaces and linked spreadsheets customized for importing into particular environments can be set up quickly and easily, with access restricted to a selected team.
A patent glossary exported from memoQ and then made into a PDF dictionary via SDL Trados MultiTerm |
There are also many different ways that terminology can be shared dynamically in a team. Various terminology servers available usually suffer from being restricted to particular platforms, but freely available tools like Google Sheets coupled with web look-up interfaces and linked spreadsheets customized for importing into particular environments can be set up quickly and easily, with access restricted to a selected team.
The links in the screenshot above show a simple example using some data from SAP. There is a master spreadsheet where the data is maintained and several "slavesheets" designed for simple importing into particular translation environment tools. Forms can also be used for simplified data entry and maintenance.
If Google Sheets do not meet the confidentiality requirements of a particular situation, similar solutions can be designed using intranets, extranets, VPNs, etc.
Technical tools for translators can help to locate information in a great variety of environments and media in ways that usually integrate smoothly with their workflow. Some available tools enable glossaries and bilingual corpora to be accessed in any application, including word processors, presentation software and web pages.
Corpus information in translation memories, memoQ LiveDocs or external sources can be looked up automatically or in concordance searches based on whole or partial content matches or specified search terms, and then useful parts can be inserted into the target text to assist translation. In some cases, differences between a current source text and archived information is highlighted to assist in identifying and incorporating changes.
Structured information such as dates, currency expressions, legal citations and bibliographical references can also be prepared for simple keystroke insertion in the translated text or automated quality checking. This can save many frustrating hours of typing and copy revision. In this regard, memoQ currently offers the best options for translation with its "auto-translation" rulesets, but many tools offer rules-based QA facilities for checking structured information.
Voice recognition technologies offer ergonomically superior options for transcription in many languages and can often enable heavy translation workloads with short deadlines to be handled with greater ease, maintaining or even improving text quality. Experienced translators with good subject matter knowledge and voice recognition software skills can typically produce more finished text in a day than the best post-editing operations for machine pseudo-translation, with the exception that the text produced by human voice transcription is actually usable in most situations, while the "gloss" added to machine "translations" is at best lipstick on a pig.
Reviewing a text for errors is hard work, and a pressing deadline to file a brief doesn't make the job easier. Technical tools for translation enable tens of thousands of words of text to be scanned for particular errors in seconds or minutes, ensuring that dates and references are correct and consistent, that correct terminology has been used, et cetera.
The best tools even offer sophisticated tools for tracking changes, differences in source and target text versions, even historical revisions to a translation at the sentence level. And tools like SDL Trados Studio or memoQ enable a translation and its reference corpora to be updated quickly and easily by importing a modified (monolingual) target text.
When time is short and new versions of a source text may follow in quick succession, technology offers possibilities to identify differences quickly, automatically process the parts which remain unchanged and keep everything on track and on schedule.
For all its myriad features, good translation technology cannot replace human knowledge of language and subject matter. Those claiming the contrary are either ignorant or often have a Trumpian disregard for the truth and common sense and are all too eager to relieve their victims of the burdens of excess cash without giving the expected value in exchange.
Technologies which do not assist translation experts to work more efficiently or with less stress in the wide range of challenges found in legal translation work are largely useless. This really does include machine pseudo-translation (MpT). The best “parts” of that swindle are essentially the corpus matching for translation memory archives and corpora found in CAT tools like memoQ or SDL Trados Studio, and what is added is often incorrect and dangerously liable to lead to errors and misinterpretations. There are also documented, damaging effects on one’s use of language when exposed to machine pseudo-translation for extended periods.
Legal translation professionals today can benefit in many ways from technology to work better and faster, but the basis for this remains what it was ten, twenty, forty or a hundred years ago: language skill and an understanding of the law and legal procedure. And a good, sound, well-rested mind.
Tiago Neto on applications: https://tiagoneto.com/tag/speech-recognition
Translation Tribulations – free mobile for many languages: http://www.translationtribulations.com/2015/04/free-good-quality-speech-recognition.html
Circuit Magazine - The Speech Recognition Revolution: http://www.circuitmagazine.org/chroniques-128/des-techniques
The Chronicle - Speech Recognition to Go: http://www.atanet.org/chronicle-online/highlights/speech-recognition-to-go/
The Chronicle - Speech Recognition Is in Your Back Pocket (or Wherever You Keep Your Mobile Phone): http://www.atanet.org/chronicle-online/none/speech-recognition-is-in-your-back-pocket-or-wherever-you-keep-your-mobile-phone/
Copernic Desktop Search: https://www.copernic.com/en/products/desktop-search/
AntConc concordance: http://www.laurenceanthony.net/software/antconc/
Multiple, separate concordances with memoQ: http://www.translationtribulations.com/2014/01/multiple-separate-concordances-with.html
memoQ TM Search Tool: http://www.translationtribulations.com/2014/01/the-memoq-tm-search-tool.html
memoQ web search for images: http://www.translationtribulations.com/2016/12/getting-picture-with-automated-web.html
Upgrading translation memories for document context: http://www.translationtribulations.com/2015/08/upgrading-translation-memories-for.html
Free shareable, searchable glossaries with Google Sheets: http://www.translationtribulations.com/2016/12/free-shareable-searchable-glossaries.html
http://www.translationtribulations.com/search/label/autotranslatables
Marek Pawelec, regular expressions in memoQ: http://wasaty.pl/blog/2012/05/17/regular-expressions-in-memoq/
If Google Sheets do not meet the confidentiality requirements of a particular situation, similar solutions can be designed using intranets, extranets, VPNs, etc.
Technical tools for translators can help to locate information in a great variety of environments and media in ways that usually integrate smoothly with their workflow. Some available tools enable glossaries and bilingual corpora to be accessed in any application, including word processors, presentation software and web pages.
Corpus information in translation memories, memoQ LiveDocs or external sources can be looked up automatically or in concordance searches based on whole or partial content matches or specified search terms, and then useful parts can be inserted into the target text to assist translation. In some cases, differences between a current source text and archived information is highlighted to assist in identifying and incorporating changes.
Structured information such as dates, currency expressions, legal citations and bibliographical references can also be prepared for simple keystroke insertion in the translated text or automated quality checking. This can save many frustrating hours of typing and copy revision. In this regard, memoQ currently offers the best options for translation with its "auto-translation" rulesets, but many tools offer rules-based QA facilities for checking structured information.
Voice recognition technologies offer ergonomically superior options for transcription in many languages and can often enable heavy translation workloads with short deadlines to be handled with greater ease, maintaining or even improving text quality. Experienced translators with good subject matter knowledge and voice recognition software skills can typically produce more finished text in a day than the best post-editing operations for machine pseudo-translation, with the exception that the text produced by human voice transcription is actually usable in most situations, while the "gloss" added to machine "translations" is at best lipstick on a pig.
Reviewing a text for errors is hard work, and a pressing deadline to file a brief doesn't make the job easier. Technical tools for translation enable tens of thousands of words of text to be scanned for particular errors in seconds or minutes, ensuring that dates and references are correct and consistent, that correct terminology has been used, et cetera.
The best tools even offer sophisticated tools for tracking changes, differences in source and target text versions, even historical revisions to a translation at the sentence level. And tools like SDL Trados Studio or memoQ enable a translation and its reference corpora to be updated quickly and easily by importing a modified (monolingual) target text.
When time is short and new versions of a source text may follow in quick succession, technology offers possibilities to identify differences quickly, automatically process the parts which remain unchanged and keep everything on track and on schedule.
For all its myriad features, good translation technology cannot replace human knowledge of language and subject matter. Those claiming the contrary are either ignorant or often have a Trumpian disregard for the truth and common sense and are all too eager to relieve their victims of the burdens of excess cash without giving the expected value in exchange.
Technologies which do not assist translation experts to work more efficiently or with less stress in the wide range of challenges found in legal translation work are largely useless. This really does include machine pseudo-translation (MpT). The best “parts” of that swindle are essentially the corpus matching for translation memory archives and corpora found in CAT tools like memoQ or SDL Trados Studio, and what is added is often incorrect and dangerously liable to lead to errors and misinterpretations. There are also documented, damaging effects on one’s use of language when exposed to machine pseudo-translation for extended periods.
Legal translation professionals today can benefit in many ways from technology to work better and faster, but the basis for this remains what it was ten, twenty, forty or a hundred years ago: language skill and an understanding of the law and legal procedure. And a good, sound, well-rested mind.
*******
Further references
Speech recognition
Dragon NaturallySpeaking: https://www.nuance.com/dragon.htmlTiago Neto on applications: https://tiagoneto.com/tag/speech-recognition
Translation Tribulations – free mobile for many languages: http://www.translationtribulations.com/2015/04/free-good-quality-speech-recognition.html
Circuit Magazine - The Speech Recognition Revolution: http://www.circuitmagazine.org/chroniques-128/des-techniques
The Chronicle - Speech Recognition to Go: http://www.atanet.org/chronicle-online/highlights/speech-recognition-to-go/
The Chronicle - Speech Recognition Is in Your Back Pocket (or Wherever You Keep Your Mobile Phone): http://www.atanet.org/chronicle-online/none/speech-recognition-is-in-your-back-pocket-or-wherever-you-keep-your-mobile-phone/
Document indexing, search tools and techniques
Archivarius 3000: http://www.likasoft.com/document-search/Copernic Desktop Search: https://www.copernic.com/en/products/desktop-search/
AntConc concordance: http://www.laurenceanthony.net/software/antconc/
Multiple, separate concordances with memoQ: http://www.translationtribulations.com/2014/01/multiple-separate-concordances-with.html
memoQ TM Search Tool: http://www.translationtribulations.com/2014/01/the-memoq-tm-search-tool.html
memoQ web search for images: http://www.translationtribulations.com/2016/12/getting-picture-with-automated-web.html
Upgrading translation memories for document context: http://www.translationtribulations.com/2015/08/upgrading-translation-memories-for.html
Free shareable, searchable glossaries with Google Sheets: http://www.translationtribulations.com/2016/12/free-shareable-searchable-glossaries.html
Auto-translation rules for formatted text (dates, citations, etc.)
Translation Tribulations, various articles on specifications, dealing with abbreviations & more:http://www.translationtribulations.com/search/label/autotranslatables
Marek Pawelec, regular expressions in memoQ: http://wasaty.pl/blog/2012/05/17/regular-expressions-in-memoq/