Dec 31, 2016

The Dark Secret of memoQ interface switch-hitters

The memoQ world looks very different in the typical view of a translator versus that of a project manager.

Chaos Central: the memoQ Translator Pro dashboard
Ready to work: the project manager interface of memoQ
These differences sometimes lead to difficulties in training or documenting procedures for others. It can be quite annoying, for example, for the students in a university translation technology course to watch their professor demonstrate a technique in the project manager version as they stare at their local copies of memoQ Translator Pro in the computer lab and try to repeat the steps in that different environment.

Project managers who never or seldom use the Translator Pro version often sow unintended confusion when they explain to a translator who isn't a memoQ power user how to performcertain tasks; it really would help if both were looking at the same setup.

I used to spend a lot of annoying time changing the license in my copy of memoQ when I needed to change from one edition to another for purposes of teaching or writing help instructions for the blogor elsewhere. No longer.

A few years ago, memoQ developers and testers got tired of the same problem, so Kilgray made it possible to switch easily back and forth between the edition views. The only catch, as far as I know, is that you must be using a PM license, such as the one that is included with the memoQ cloud service account.

Switch-hitting with memoQ is very simple, just a few little steps. These are:
  1. Create an XML file named ClientDevConfig.xml with the following content: 
  2. Copy this file into the path: C:\ProgramData\MemoQ. Note that ProgramData is generally a hidden folder, so you'll have to deal with that. If I have to tell you how, you probably shouldn't be doing this :-)
  3. Restart memoQ. There will be a new menu group at the far right of the ribbon menu. Perhaps you noticed that already in the screenshots above. This addition may appear in two different forms:


That's it. So if you are a memoQ trainer or a project manager who needs to explain something to a frustrated translator in a visual context they can relate to, you have another tool to help you with that. This has been a great boon to me for the past few years, and I am very glad to have it in my tattered bag of tricks.

Dec 28, 2016

Go Figure (with memoQ!)

When translating patents, legal briefs, reports, manuals and many other kinds of documents I inevitably encounter figure references to photographs and illustrations in the text as well as the labeled captions for these. In this morning's translation of a petition in a nullity suit, one such reference takes the form in Verbindung mit Figur 1,  but it might just as well appear as

Fig. 1
Fig 1
Abb. 1
or
Abbildung 1

in this or some other text; in documents with multiple and/or sloppy authors I might even find a mix of all these in the same text.

As I value consistency in writing even when the client might not care, I try to translate all of these to the same form in English where it makes sense to do so. That might be Figure 1 or Fig. 1 depending on the situation and the styleguide stipulated for the project.

But when I finish the 10,000 or so words for this job and need to do my final check before sending it to the client, I expect to be a little tired, and I want to use my attention and energy to focus on the accuracy and reading comfort of my translation. In doing so I tend to miss little details like the occurrence of "Fig. 1" on page 32 as opposed to "Figure 1" on the other 40 pages. That is why I use the QA feature of memoQ to check the consistency with which I have translated the figure references as well as other matters such as the accurate use of special terminology for the project.

The specific feature I use here for quality assurance is


an auto-translation rule set (aka "autotranslatables"), which is highlighted and selected in the screenshot of the project's settings above.

As I have stated many times before, autotranslatables should be used, but not created by the average translator. Aside from the fact that the regular expressions involved are not particularly easy even for most of the nerds among us, there are a lot of little subtleties that make the difference between a well-functioning rule set and annoying garbage, and even the "experts" struggle with this for sophisticated rules.

But the present example of Figure mapping is a comparatively simple case which can illustrate the principles and some of the "risks" to mere mortals.



My rule set for mapping figures from many German forms to a particular English form consists of a single rule.

All of the possibilities that I expect in German are compiled in a list, along with the English expression for each, and this translation pair list is named #figurelist# and is found on the corresponding dialog tab in the memoQ rule set editor for autotranslatables. (I usually edit rules externally in Notepad++ where I can comment them liberally, but in this case I felt no need to do so.) This named list is used as a variable in the regular expression for the rule to describe a source text match.

(#figurelist#)\.?\s+?\b(\d+)\b

Jeepers. That regex for the source text looks complicated, doesn't it? Wouldn't (#figurelist#) \d+ be just as good? After all, it seems to work just fine. Well, except that the list would need a few extra entries to account for abbreviations with and without periods.

No. "(#figurelist#) \d+" is total, incompetent crap. Here are some reasons why:
  • It is more efficient to express the possibility of a period after the text for "Figure" with the regex "\.?",  because you'll never have to worry about abbreviations with or without periods in your lists. Mine will get longer, as I'll probably expand these rules to cover Portuguese as well and use the same rule for both Portuguese and German sources.
  • There may or may not be a space or even extra spaces after the Figure expression. Simply typing a standard space after the (#figurelist#) group means that it must be present and it must be an ordinary space to match. If it's missing or someone typed a non-breaking space (a reasonable thing to do to keep both parts of "Figure 1" on the same line), the rule will not work! Using \s+? to express the possibility of 0 to n spaces after "Fig." or whatever is in fact the right way to go.
  • If you test the "simple" crappy regex, you'll also find that "Abb. 14" gives to results: Figure 1 and Figure 14. That is because the rule does not stipulate that the second part must be a whole "word", so the substring match with the first character also gives a result. Bad, bad, bad. The chaos that this sort of mistake can cause with more complex rules like currency expressions used in important financial translations is frightening.
The regex for the result also appears more complex than it should be, but there is a reason behind that as well. Instead of the simple $1 $2 (first group followed by a space followed by the second group), I specified output with a non-breaking space, because it looks rather unfortunate to have a line wrap in the middle of the expression for a figure. One sees that a lot, because it's a nuisance to remember to type non-breaking spaces all the time on the keyboard. This rule can also be used to check the use of the non-breaking space; an ordinary space will generate a warning when the memoQ QA profile is run with the autotranslatables check activated.

There are many ways in which regular expression rule sets can enhance the user experience and the quality of translation results when working in memoQ. It is not hard to use these rules, but it is beyond most users to create and maintain their own rule sets. Therefore
  • Kilgray should include more useful examples of rule sets (in addition to the very helpful number rules) in future releases of memoQ
  • The average user should ask the help of Kilgray Support for simple rules they need (in most cases this would fall under the usual commitment of paid support and maintenance for the year)
  • memoQ users should work with Kilgray's Professional Services department or other competent consultants to devise robust rule sets to boost their translation and quality assurance productivity. Beware of casual advice found in forums or social media; much of it does not consider issues like the problems described above despite the aggressive insistence one might see for a particular "solution". Truly, you get what you pay for :-)

Post scriptum:
An yet ye hack by night and sun, the work of regex be never done.
Of course something was forgotten in the example here. The myriad styles and customs of source text authors will inevitably offer up challenging variants to break your well-crafted rules. Today's is a text full of figure references like Abbildung 4.12, which would refer to the twelfth figure in the fourth chapter. For this the modified rule might be 

(#figurelist#)\.?\s+?(\b\d+\.?\d+?\b) 

Or perhaps not quite. Try it and you'll see a few problems. This is just another example of why it is good to make use of professional resources to help you with these challenges and to have a systematic way of recording and elaborating them. I'll explain more about such an effective system for planning and documentation in a future article. I've noticed that the "experts" in the translation field often care little for the usual standards of project specification, perhaps because they are sick and tired of translation projects with so many specification documents for those who know better.

Dec 27, 2016

Free shareable, searchable glossaries for collaboration with anyone

Some years ago I suggested a procedure using Google spreadsheets for glossary collaboration in projects. Many people do this sort of thing now.

What I do not think most are doing, however, is accessing these web-based term lists efficiently as terminology resources in their work. It's hard to compete with the efficiency of integrated termbases, TMs, web search features, etc.

... unless of course you integrate a web search for those online spreadsheets which returns just the few data of interest.

Matches found for German "ladepresse" in a glossary of a few thousand hunting terms
This is fairly straightforward using Google's visualization API with a simple query. A parameterized URL can be built to perform custom searches of your own data or data shared by colleagues or clients. "Canned" queries can be easily incorporated in custom searches from many tools, including memoQ Web Search, IntelliWebSearch and others.


Building a custom search URL for your Google spreadsheet is fairly simple. In the example above it consists of three parts:

{base URL of the spreadsheet} + /gviz/tq?tqx=out:html&tq= + {query}

The red bit invokes the Google visualization API and specifies that the query results be returned as HTML (for display in a browser). The query language is similar to SQL, but if you use a prepared query for a given spreadsheet table structure, you don't need to learn any of that. Queries can be made which also return definitions, images, context examples or anything else that might reside in columns of interest in the online spreadsheet.

Using a tool like IntelliWebSearch or integrated extensions of OmegaT, memoQ and other tools, users working with any sort of tools can share a live glossary. Google Spreadsheets also have some permissions/security features which can be investigated if needed.

Of course other data can be shared this way, including TMs or XLIFF data as well as monolingual information. A little study of the relevant Google documentation reveals many possibilities :-)

Getting the picture with automated web searches

Like many other translators, I have come to appreciate the value and the complications of Internet searches in my work. As the garbage accumulated on the World Wide Web grows ever deeper, focused searches are more important than ever to get past the noise to find the information required, then get back to work.

Integrated tools for focused searches on multiple web sites are popular with many. IntelliWebSearch (IWS), memoQ Web Search and similar tools can be an enormous boost to productivity. But I doubt that many people give much thought to optimizing that possibility in general or for particular jobs.

Google searches are very popular. The Advanced Search features are particularly useful. For example, I find translating Austrian legal texts to be difficult sometimes, because an ordinary Google search of relevant legal terms yields too much interference from sites in other German-speaking countries. However, a search configured like this:

https://www.google.com/search?as_epq=schwerer+Betrug&lr=lang_de&as_sitesearch=www.jusline.at

will yield only results in German from the Austrian site Jusline, which is very helpful if I am looking for the specific definition of "schwerer Betrug" in the jurisprudence of that country.

Similarly, a financial translator working with Austrian texts might use a search like

https://www.google.pt/search?as_epq=Umlage&as_sitesearch=www.afrac.at

In my technical work, very often I must look for images of a component or process described. For a long time I did this inefficient: searched Google and then clicked the Images link and waded through the chaos to find what I needed. But if I am translating the catalog of the hunting supplier Frankonia, that's stupid. I can do a very specific search like this:

https://www.google.com/search?q=wildbergehaken+site:frankonia.de&tbm=isch

which will open a Google Images search directly (that's what the argument tbm=isch does), using only pictures culled from the site of the retailer whose material I am working on.

An image search using Wikipedia.org can often be very helpful to identify an unknown term and navigate to related articles in various languages. For example, a person encountering an unknown word in Russian might use this search:

https://www.google.com/search?as_epq=собака&as_sitesearch=wikipedia.org&tbm=isch

and quickly see what the term is about.


The search results above were obtained with memoQ Web Search, where I have the Wipikedia image search preconfigured:


Astute readers may notice the slight difference in syntax between the search in the screenshot and the Russian example I gave. There is more than one way to skin a cat with web searches. Or a dog in this case. To restrict searches to the wiki for one particular language just add the prefix for that subsite to the URL, de.wikipedia.org for German, for example.

If you need to do such searches from many different applications under Windows, IntelliWebSearch might be a better choice for the preconfigured searches. I think it also handles a lot of tabs better, and it uses the ordinary browser setup instead of the more restricted options of memoQ's integrated mini-browser. I don't really like the fact that IWS keeps adding tabs to the browser, so I close it between searches, and to avoid messing up other work I am doing in Chrome (my default browser), I configure IWS to use another browser like Opera or Microsoft Edge.

Anyone who would like the light resource file for one of my German/English profiles for memoQ's web search can get it here. It includes the image search in Wikipedia and has a number of (mostly deactivated) custom search tabs useful for intellectual property translation. A few of the searches are for engines which require manual input of terms, but I find it convenient to have these on a tab for quick access.

Dec 26, 2016

The challenge of too many little files to translate

It seems to me that most translators face this challenge eventually: a customer has many small files of some kind - tiny web pages perhaps or other content snippets in XML, text or Microsoft Word files or perhaps even in some bizarre proprietary format - and wants them translated.

Imagine a dictionary project with thousands of words with their definitions, each "entry" being stored in a separate text file. How would you translate that efficiently?

The brute force method of opening and translating each file individually is not very satisfactory. Not only does this take a long time, but when I have tried foolishness like that I tend to overlook some files and spend far too much time checking to ensure that nothing has been overlooked. And QA measures like spellchecking? Let's change the subject....

Some translation tools offer the possibility to "glue" the content of the little files together and then (usually) "unglue" them later to reconstitute the original structure of little files, now translated.

Other tools offer various ways to combine content in "views" to allow translation, editing, searching and filtering in one big pseudofile. This is very convenient, and this is the method I use most often in my work with memoQ or SDL Trados Studio after learning its virtues earlier as a Déjà Vu user.

Unusual file formats can often be dealt with the same way after some filter tweaking or development. But sometimes....

... there are those projects from Hell where you have to ask yourself what the customer was smoking when he structured his data that way, because some other way would be so much more practical and convenient... for you. Ours is generally not to question why some apparently insane data structure was chosen but to deal with the problem as efficiently as possible within budget and charge appropriately for any extra effort incurred. Hourly fees for translation rather than piece rates certainly have a place here.

Sometimes there is a technical solution, though it may not be obvious to most people. For example, in the case presented to me by a colleague on Christmas Eve


the brief was to write the translation in language XX in the empty cell in that columnof the 3x2 table embedded in a DOCX file. There were hundreds of these files, each containing a single word to translate.

If these were Excel or delimited text files, a simple solution would have been to use the Multilingual Delimited Text Filter for memoQ and specify that the first row is a header. But that won't fly (yet) for MS Word files of any kind.

In the past when I have had challenging preparation to do in RTF or Microsoft Word formats - such as when only certain highlighted passages are to be translated and everything else is ignored - I have created macros in a Microsoft Office application to handle the job.

But this case was a little different. The others were always single files, or just a few files where individual processing was not inconvenient. And macro solutions often suffer from the difficulty that most mere mortals fear to install macros in Microsoft Word or Excel or simply have no idea how to do so.

So some kind of bulk external processing is called for. In this case, probably with a custom program of some kind.

I usually engineer such solutions with a simple scripting language - a dialect of the BASIC language which I learned some 45 years ago - using a free feature which is part of the Microsoft Windows operating system: Windows Scripting Host. And one-off, quick-and-dirty solutions with these tools do not require a lot of skill. The components of many solutions can be found on Microsoft Help pages or various internet forums with a little research if you have only a vague idea of what to do.

In this case, the tasks were to
  1. Select the files to process (all 272 of them)
  2. Open each file, copy the English word into the empty cell next to it
  3. Hide all the other text in the file so that it can be excluded from an import into a working tool like Déja Vu, memoQ or SDL Trados Studio (using the options for importing Microsoft Word files in this case; the defaults usually ignore hidden text on import)
After that the entire folder structure of files could be imported into most professional translation support environments and all 300 or so words to translate could be dealt with in a single list view.

A more detailed definition of the technical challenge would include the fact that to manipulate data in some way in a Microsoft Office file format, the object model for the relevant program would probably have to be used in programming (for XML-based formats there are other possibilities that some might prefer).

Microsoft kindly makes the object models of all its programs available, usually for free, and there is a lot of documentation and examples to support work with them. That may in fact be a problem: there is a lot of information available, and it is sometimes a challenge to filter it all intelligently.

In this case, I needed to use the Microsoft Word object model. It also conveniently provided the methods I needed to create the selection dialog for my executable script file. The method I knew from the past and wanted to use at first is only available to licensed developers, and I am not one of these any more.

It is easy to find examples of table manipulation and text alteration techniques in Microsoft Word using its object model in VBScript or some other Microsoft Basic dialect like Visual Basic for Applications (VBA). The casual dabbler in such matters might run into some trouble using these examples if there is no awareness of differences between these dialects; trouble is often found where VBA examples that declare variables by type (example: "Dim i as Integer") occur. Declarations in VBScript must be untyped (i.e. "Dim i"), so a few changes are needed.

In this case, the quick and simple solution (' documentary comments are delimited by apostrophes and marked green) to make the files import-ready was:

' We have a folder full of DOCX files, each containing
' a three-column table where COL1 ROW2 needs to be copied to COL2 ROW2
' and then the COL1 ROW2 and other content needs to be hidden.

Option Explicit

Dim fso
Dim objWord
Dim WshShell
Dim File
Dim objFile
Dim fileCounter
Dim wrd ' Word app object
Dim oFile  ' Word doc object
Dim oCell1  ' first cell of interest in the table
Dim oCell2  ' second cell of interest in the table
Dim oCellx1  ' other uninteresting text
Dim oCellx2  ' other uninteresting text
Dim oCellx3  ' other uninteresting text 
Dim oCellx4  ' other uninteresting text 

fileCounter = 0

'set the type of dialog box you want to use
'1 = Open
'2 = SaveAs
'3 = File Picker
'4 = Folder Picker
Const msoFileDialogOpen = 1

Set fso = CreateObject("Scripting.FileSystemObject")
Set objWord = CreateObject("Word.Application")
Set WshShell = CreateObject("WScript.Shell")

'use the path selected in the SelectFolder method
'set the dialog box to open at the desired folder
objWord.ChangeFileOpenDirectory("c:\")

With objWord.FileDialog(msoFileDialogOpen)
   'set the window title to whatever you want
   .Title = "Select the files to process"
   .AllowMultiSelect = True
   'Get rid of any existing filters
   .Filters.Clear
   'Show only the desired file types
   .Filters.Add "All Files", "*.*"
   .Filters.Add "Word Files", "*.doc;*.docx"
         
   '-1 = Open the file
   ' 0 = Cancel the dialog box
   '-2 = Close the dialog box
   'If objWord.FileDialog(msoFileDialogOpen).Show = -1 Then  'long form
   If .Show = -1 Then  'short form
      'Set how you want the dialog window to appear
      'it doesn't appear to do anything so it's commented out for now
      '0 = Normal
      '1 = Maximize
      '2 = Minimize
      'objWord.WindowState = 2

      'the Word dialog must be a collection object
      'even with one file, one must use a For/Next loop
      '"File" returns a string containing the full path of the selected file
     
      For Each File in .SelectedItems  'short form
       'Change the Word dialog object to a file object for easier manipulation
        Set objFile = fso.GetFile(File)
Set wrd = GetObject(, "Word.Application") 
wrd.Visible = False 
wrd.Documents.Open objFile.Path 
Set oFile = wrd.ActiveDocument

Set oCell1 = oFile.Tables(1).Rows(2).Cells(1).Range  ' EN text
        oCell1.End = oCell1.End - 1
        Set oCell2 = oFile.Tables(1).Rows(2).Cells(2).Range  ' Target (XX)
        oCell2.End = oCell2.End - 1
        oCell2.FormattedText = oCell1.FormattedText  ' copies EN>XX 
oCell1.Font.Hidden = True ' hides the text in the source cell

' hide the other cell texts (nontranslatable) now
Set oCellx4 = oFile.Tables(1).Rows(2).Cells(3).Range
oCellx4.Font.Hidden = True
Set oCellx1 = oFile.Tables(1).Rows(1).Cells(1).Range
oCellx1.Font.Hidden = True
Set oCellx2 = oFile.Tables(1).Rows(1).Cells(2).Range
oCellx2.Font.Hidden = True
Set oCellx3 = oFile.Tables(1).Rows(1).Cells(3).Range
oCellx3.Font.Hidden = True

wrd.Documents.Close 
Set wrd = Nothing
 
fileCounter = fileCounter + 1
      Next    
   Else 
   End If
End With 

'Close Word
objWord.Quit

' saying goodbye
msgbox "Number of files processed was: " & fileCounter



The individual files look like the above screenshot (all text in the top row is hidden, so the entire row is invisible, including its bottom border line) after processing with the script, which is saved in a text file with a *.vbs extension (it can be launched under Windows by double-clicking):


Of course the script could be made much shorter by declaring fewer variables and structuring in a more efficient way, but this was a one-off thing where time was of the essence and I just needed to patch something together fast that worked. If this were a routine solution for a client I would be a bit more professional, lock the screen view, change to some sort of "wait cursor" during processing or show a progress bar in a dialog and all the other trimmings that one expects from professional software these days. But professional software development is a bit of a bore after so many decades, and I haven't got the patience to see the same old stupid mistakes and deceits practiced by yet another generation of technowannabe world rulers, I just want to solve problems like this so I can get back to my translations or go play with the dogs and feed the chickens.

But before I could do that I had to save my friend from the Hell of manually unhiding all that table text after his little translation was finished, so I put another 5 minutes (or less) of effort into the "unhiding" script:

Option Explicit

Dim fso
Dim objWord
Dim WshShell
Dim File
Dim objFile
Dim fileCounter
Dim wrd 
Dim oFile  
Dim oCell1  ' source text cell in the table
Dim oCellx1  ' other uninteresting text
Dim oCellx2  ' other uninteresting text
Dim oCellx3  ' other uninteresting text 
Dim oCellx4  ' other uninteresting text 

fileCounter = 0

Const msoFileDialogOpen = 1

Set fso = CreateObject("Scripting.FileSystemObject")
Set objWord = CreateObject("Word.Application")
Set WshShell = CreateObject("WScript.Shell")

objWord.ChangeFileOpenDirectory("c:\")

With objWord.FileDialog(msoFileDialogOpen)
   .Title = "Select the files to process"
   .AllowMultiSelect = True
   .Filters.Clear
   .Filters.Add "All Files", "*.*"
   .Filters.Add "Word Files", "*.doc;*.docx"
   If .Show = -1 Then  
      For Each File in .SelectedItems
         Set objFile = fso.GetFile(File)
Set wrd = GetObject(, "Word.Application") 
wrd.Visible = False 
wrd.Documents.Open objFile.Path 
Set oFile = wrd.ActiveDocument
Set oCell1 = oFile.Tables(1).Rows(2).Cells(1).Range
oCell1.Font.Hidden = False 
Set oCellx4 = oFile.Tables(1).Rows(2).Cells(3).Range
oCellx4.Font.Hidden = False
Set oCellx1 = oFile.Tables(1).Rows(1).Cells(1).Range
oCellx1.Font.Hidden = False
Set oCellx2 = oFile.Tables(1).Rows(1).Cells(2).Range
oCellx2.Font.Hidden = False
Set oCellx3 = oFile.Tables(1).Rows(1).Cells(3).Range
oCellx3.Font.Hidden = False

wrd.Documents.Close 
Set wrd = Nothing
 
fileCounter = fileCounter + 1
      Next    
   Else 
   End If
End With 

objWord.Quit
msgbox "Number of files processed was: " & fileCounter

Dec 15, 2016

Validating Roman numerals in translation QA

The issue of Roman numerals in my translation work has been at the back of my mind for a few years now, but the pain level had not been such that I got around to dealing with it. It comes up time and again in legal translation work: references to the "X. Senat" or the like which mess up segmentation (and require a bit of regex to do a new segmentation rule); references to "Art. VII" of some law (I need to catch the typos like "VIII"); source text errors like "VIIII"; and of course dates like MCMXXIV, etc. and century references.

For simple matters I used regex which would capture and reproduce "Roman numerals", but erroneous data using the right letters would also be accepted:

[MDCLXVI]+

That is, of course, rather useless for QA which checks the correctness of the expression in the source text. So with a bit of thought I came up with:


Without the word border syntax ("\b"), non-standard expressions like "VIIII" might appear to be validated in the interface of memoQ, for example, because the whole express would be marked green in the source text, and one might not notice that it was resolved into "VIII" and "I".

These expressions can be used in various ways in any CAT tool that supports regular expressions, such as SDL Trados Studio or memoQ.

If you want this typing aid and QA tool as a memoQ autotranslatable (along with a little demo data file), you can get it here.

Dec 13, 2016

The irregularities of regular expressions in #memoQ


Sometime back in the time-distant swamps where memoQ evolved, regex mysteriously became part of the software's virtual genes. It was unclear, exactly, which third-party engine or bacterial life form had been its source, and solution developers were often at a loss to know which advanced syntax would work or not unless they tried (and very often failed).

Many of us begged and pleaded for some kind of definitive documentation of allowed syntax for memoQ's regular expressions, which are an important feature for filtering (in recent versions), segmentation rules, special text import filters, autotranslatables rules and probably a few other things I've forgotten. But begging, threats - even bribery - led to no useful reference information, just some useless suggestions to read beginner's tutorials for other dialects somewhere on the Web.

Then, quite by accident, I learned yesterday that Kilgray uses the engine in Microsoft's .NET framework. Doh. Who'da thunk? Now, at last, I can get some definitive syntax information to help me solve more sophisticated problems for legal reference formats and other challenges in my translations with memoQ.

Even with accurate syntax guidance (at last!!!), regex development with memoQ is often not a simple matter. The integrated editors are often useless, especially for things like complex autotranslatables, where the bad feature of changing the order of rules after an edit can kill a ruleset. (It was long claimed by Kilgray Support that rule order does not matter, which is patently untrue. They simply did not look at the right test cases.)

Good code of any kind should usually be documented to facilitate maintenance. This is simply not possible with the editors for regex integrated in memoQ. So instead, I do all my rule-writing work in an external editor (such as Notepad++), where I can add extensive <!-- comments so I know what the heck I did when I have to revise the rules later --> and import the rulesets for testing into a memoQ project with appropriate test data included as "translation" documents. The hardest part of this workflow is remembering to enable the imported ruleset I want to test under Project home>Settings>Auto-translation rules; often I forget and think I really screwed up until I go back to the settings and mark the checkbox by the rules to test. Keep a lot of carb sources at your desk when you do regex work. Your brain will need them.

A lot of memoQ users think that regex is irrelevant to their working lives, but for hardcore financial and legal translators at least, this is an entirely mistaken idea. Correctly constructed rules can save much time and a lot of frayed nerves dealing with citations, dates, currency expressions and more, and the rules also decrease QA time while increasing accuracy.

I have quite a number of custom rulesets I have put together for my work and for some colleagues and clients. Regex is hard shit, no matter what anyone tells you. I have programmed computers in a host of languages since 1970 more or less and used to be known for a good memory for syntax rules, but I find regex so non-intuitive at anything more than a very basic level that if I use it only a few times a year, I have to re-learn it nearly every time. That's no fun. So the key to mastering regex is not to learn it. The massahs usually don't know sheet about workin' the fields, but if they are going to survive in this competitive world, they'll know which specialist to put on the job and reward him or her appropriately. Get to know a competent consulting specialist for memoQ regex, like colleague Marek Pawelec, and let that person's expertise save you many hours of typing and QA, not to mention undetected errors.

Kilgray also established a Professional Services department at last not long ago, and that team can also help you with these and other problems for optimizing the use of translation technologies. This is very often a better option than using consultants primarily focused on SDL solutions who do a bit of memoQ on the side, because even the best of these are often not really aware of the best approaches to use, and the consequences of this are sometimes dire. Are they at the memoQ wordface nearly every day, dealing with a wide range of challenges that push the technical envelope of the software to its limits? Or would they really rather do a beginner's workshop for SDL Trados Studio 2017 and show you all the cool features that memoQ has had for years and they probably never learned very well anyway? If it's not the first case, caveat emptor no matter the source.

Nov 20, 2016

Sweet Greek olives come to Portugal

The Good Doctor is widely travelled, and brings back to Portugal many interesting culinary ideas from around the world, using these to complement the traditions of her native land. So when I began to harvest olives from her trees to pickle for the coming year, she looked a little skeptically at the plastic water bottles full of crushed and slit olives and asked me Why don't you make sweet Greek olives?

I had never heard of those before, and she could not tell me much about them except that she had bought some in a shop while driving through Greece some years ago, and they were rather good, so she would prefer that I make some of those instead of the usual spiced pickles all the local farmers do. OK, I said, and began to look for information on the Internet. Nothing useful was found in searches using terms in English, German and Portuguese. I found some pages talking about candied olives made from pickled ones, but nothing useful describing the process starting with fresh olives.

What to do? I asked a Greek colleague for help, and a few minutes later, she sent me a link to a web page in Greek which describes making sweet olives and olive jam.

Since I can't do much more with Greek than sound the words out and search my brain for possible derivates in a language I know, it wasn't clear to me if I needed to work with any particular sort of olives, and I thought the suggested extraction time to remove the bitter elements from the raw olives was optimistic at best, so I took notes and prepared to "transcreate" the recipe for the olives I have available (based on my past experience picking them) and my own preferred approach to scaling recipes. Thus I arrived at the following recipe:

Azeitonas doces de Elvas
  1. Gather ripe, dark olives, de-stem and rinse them, then place them in clean one- to two-liter plastic bottles. Fill the bottles with fresh, cold water and cap them.
  2. Change the water daily for about two weeks, testing the bitterness of the olives until it is reduced to an acceptable level. The time needed will vary according to the olive variety, the degree of ripeness and your personal taste. The Greek recipe this one is based on suggests four or five days time with daily water changes, but that is simply too little time for my olives and my taste.
  3. After the olives are debittered, cut the tops of the plastic bottles to remove the olives. Then use a de-pitter (a descaroçador de cerejas - a cherry pitter - will do the job) to remove the pits from the olives.
  4. Weigh the olives and place them in a saucepan or small pot.
  5. Add the same weight of water to the pan (so for 600 g of de-pitted olives, add 600 ml water).
  6. Add sugar to the pot amounting to 40% of the weight of the olives (which would be 240 g sugar for 600 g olives).
  7. Bring to a hard boil on high heat, and let the mixture boil for 20 minutes, with occasional stirring. Then remove from heat and allow to rest overnight.
  8. The next day, add more sugar to the pot - 20% of the weight of the olives (so another 120 g of sugar if you are working with 600 g of de-pitted olives). 
  9. Boil the mixture hard for another 20 minutes until the syrup thickens. Then remove from heat.
  10. Can the sweet olives in sterilized jars following the usual hygenic procedures or serve them fresh, warm or cold.



Nov 12, 2016

Trump this!


The Lay of the Politics Waged by Donald

I am the Trump, o hear my cry!
I'll fight for you to love my Lie.
I'll build a Wall to bend you over,
then take my turn with Vlad and Rover.
Injecting Hope in your back end,
I'll screw you green, but I'm your Friend.

I won't pay tax: I'm not a chump,
no plebe like you, I am the Trump!
I make the jobs, you do the work,
you working slobs, and like a jerk,
I'll keep your pay, 'cause it's my perk.

A plastic wife like mine ain't cheap,
nor my pet dog, Slick Mike the veep,
and Master Vlad, he wants his cut,
I'll keep the cash, he'll take your butt.
Russian winters are so, so cold,
but rampant bears are hot and bold!


In politics I make my luck
by giving Vlad a timely suck
and spread his word so true and pure,
like finest vodka from manure!

I am the Trump! O hear my cry!
I have the codes: prepare to die!




Oct 21, 2016

A day in the life....


One of the things I enjoy most about professional translation is the range of activities and subject matters that one can encounter, even as a specialist in a few domains. I can't say the work is never boring, but when it does drift that way, very suddenly it isn't any more. Quite unpredictably.

Yesterday I typed translations. A bit more than expected after two sets of PowerPoint slides - a small one to translate from German and another to edit the rather acceptable English - turned out to have about 8,000 words of highly specialized slide notes about military command and control structures and the technology of fighting forest fires. (Note to self: no matter how busy you are, always import those presentations into memoQ with the options set to extract every kind of text as well as the bitmap graphics if you have to translate those too. Then do a word count! Appearances can be deceiving.)

Yesterday I dictated translations. The job started out as a bunch of text fragments from slides, where context über alles was the rule, lots of terminology required research, and voice recognition offered no particular advantages, then suddenly it became the translation of a rather long lecture using all that new terminology, and the deadline was tighter than thumbscrews operated by an angry ex-girlfriend. Dragon NaturallySpeaking to the rescue. Not only was this necessary to finish the text in a long workday rather than most of a week, but the more natural style of translation by dictation suited the purpose of the translated presentation particularly well. I could imagine myself in the room with equipment vendors, military commanders, firefighting specialists and freight forwarders, talking about the challenges faced and the technology required to avoid the tragedies of an out-of-control firestorm. And the words came out, transcribed from my voice directly into the target text fields of memoQ, exactly as they should be spoken to that audience. And at the end of that long day my hands still had feeling in them, which would not have been the case if I had typed even a third of the text.

Yesterday I made a specialized glossary to share with a presenter who will travel halfway around the world to lecture with the slides I translated for his talk. Long ago I discovered that the way I produce translations has the potential to provide additional benefits for those who will use my work. Sales representatives might need to write letters to their prospects, discussing their products in a language not mastered as a native, and the vocabulary from my work may help them to improve communication and avoid confusion that might result from using incorrect or simply different words to describe the same stuff. Or an attorney might need a quick overview of the language I used to translate the pleading she intends to file, to ensure that it is consistent with previous efforts and will not complicate discussions with her client. The terminology I research and record for each translation can be exported and reformatted quickly to produce glossaries or more complex dictionaries in a variety of formats suited for purpose. Little time and often a lot of benefits for my clients.

Yesterday I translated bitmap graphics and not only had to deal with the editing tools for that but also had to consider the best strategy for transforming the original German graphics into English ones. Would those charts be translated again into other languages? Would the graphics be re-used in other types of documents, so that I should consider ease of portability in my approach to the translation? And how the Hell do I actually use that new bitmap graphics transcription and substitution for Microsoft Office files which was added to memoQ some time ago and sort out the five charts to translate from the fifty to ignore? (Maybe I should blog the solutions some day.)

And yesterday I was asked to write summaries of large, badly scanned articles so that the equipment manufacturer would understand how its latest technology was discussed by German reviewers. As a kid I had a silly fantasy about getting paid to read, and this is just one of the many ways it unexpectedly came true. But before I get that far, these scanned files needed to be reworked so that they could be read and searched on the screen, so as I described in a guest post on another blog some years ago, I converted them to searchable PDF/A with ABBYY FineReader, which in this case also reduced their size by about 75%. The video below also shows how this works. Strangely, when I describe this procedure to other translators, many of them don't get it, and they go on about converting PDF files into editable MS Word files or plain text, or, God help them, something really stupid like importing PDF files directly into a CAT tool for translation, though none of this really relates to my purpose. Conversions often contain errors, and many texts are harder to interpret when the context of an accurate layout is lost. So "text-on-image" PDF files for translation reference to the original source files are often critical, and for files to summarize or consult sporadically for reference (with many pages to look at and essentially nothing to translate), a searchable PDF is the gold standard for efficient work.


In the course of that day I had to work with two computers linked by remote access using four networks at various time, working in German, English and Portuguese (the latter mostly involving questions to the housekeeper on how to do an online pizza delivery order so I could stay in the office and keep working). I used well over a dozen software applications for necessary tasks. These, and the environments in which they operate must be balanced carefully for efficient work. And even after some months in my new office, the balance isn't quite as good as I've had it before, and more attention to ergonomics is required.

Some colleagues are nostalgic for the "good old days" when they received a stack of paper to translate and sent off another stack of paper when the work was done, and they had a filing cabinet or a shelf of notebooks full of old work to use as reference material, and boxes of index cards stuffed full of scribbled notes on terminology next to seldom-dusty specialist dictionaries prepared by presumed experts, often full of marginalia commenting on errors or omissions and stuffed with papers bearing other scribbled notes. Not me. Since the day 30 years ago when I laboriously typed a text file full of file folder numbers and content descriptions for my research work and personal papers I have been a big believer in electronic retrieval of information wherever possible, and I miss retyping botched pages just as little as I miss the lines in the post office or the stress of dealing with delivery services.

I suspect that some feel a loss of control with the advent of new technologies in an old profession, and certainly the changes in the business environment for translation since the days of the typewriter often require a very different mentality to survive and thrive. What that mentality is, exactly, is a matter of healthy debate and often misunderstanding - again, because of the great diversity of the profession and the professions and unprofessionals in it.

The greatest challenges of new technologies that I find are the same as those faced in many other kinds of work and in modern life in general. Filtering the overabundance of input for the few things that are truly of use or interest and maintaining focus and calm amidst omnipresent distractions. Not relying too much on technologies that are far more fallible than most people, even experts, realize or acknowledge. And remembering that a fool with a tool, however many features and failsafes it may offer, remains a fool.

Oct 8, 2016

SDL Trados Roadshow in Lisbon on November 16th!

Next month on Wednesday, November 16th, the SDL roadshow featuring the latest release of SDL Trados Studio will be coming to Lisbon, Portugal. The all-day event is free of charge,but registration is required.

A full afternoon of training on the SDL Trados Studio translation environment is included in the day. Even if you live and work in a country other than Portugal, this is an excellent opportunity to be briefed on one of the leading technologies for efficient translation work and then take a very long weekend to enjoy Europe's capital of cuisine and culture.

See you there?

Oct 7, 2016

Time enough for words

Ursula,
We lack the words, you say,
to describe the journey
through the dark borderlands
at the end of our time,
as once the want of words
for the ordinary
light which fills half the sky
cast shadows on our lives.
But words were there, waiting,
for untrained ears to hear.
School us now in the sounds
of old life's dialect.

Aug 24, 2016

memoQ autotranslatables: a partial antidote for drudgery

I'm currently working on a stack of legal pleadings for a patent nullity suit – lots of "urgent" words to churn by the end of the week. And after 10,000 or so of them, I got pretty damned tired of typing out the translation of text citations of the form "Spalte 7, Zeilen 34 bis 45" as "Column 7, Lines 34 to 45".

In fact, it was really starting to piss me off. In such situations, I try not to get mad but to get an autotranslatable ruleset instead. This is perhaps one of the most under-utilized productivity tools in memoQ.


So the next time I ran into a text that fit that format, the translation was offered as an autocompletable phrase as soon as I typed the first letter:


Of course life isn't usually that simple, at least not life with technology. And authors? Well, they seem to believe firmly in the old saying that "consistency is the hobgoblin of little minds". So of course the text also includes lots of references in the form "Spalte 7, Zeilen 34 - 45", with or without spaces around the hyphen. No problem, just add a rule for that (or if you are more clever, edit the single rule to cover the variations):



Now I am not one to advocate that the unwashed masses of translators – or even the washed ones – run out and learn to write regular expressions. I've programmed more computer languages and systems than I can possibly remember for about 45 years now, and I can't keep most of the autotranslatable rules in my head if I don't use them for a week or more after yet-another-refresher, so it would be stupid and hypocritical of me (or just bloody naive) to expect most people to mess with nerdy shit like this. But....

... a few simple rules and a couple of nice "recipe templates" to start can go a long way. And sometimes it pays not to be too clever; I have one highly sophisticated set of rules for complex legal citations that was written by a professional programmer, and it's unusable. Takes minutes to load even on a very fast computer, which is a huge pain in the backside every time a project is opened in memoQ. My more verbose, brute force approach to legal reference autotranslation may not be elegant, but it loads much faster and covers 90% or more of what I encounter. Maybe a case of where it's smart to be a little stupid.

There are lots of good tutorials out there on regex (regular expressions), including a few YouTube webinar videos from Kilgray, the memoQ Help, a few chapters in old books of mine, discussions in the Yahoogroups lists and more.

The examples above require the knowledge of only a few rules:
  • Chunks of the source text to be analyzed are grouped in parentheses. In the examples shown, those groups are merely where numbers occur.
  • Numbers are represented by the escape code "\d". If there might be more than one digit, add a plus sign: \d+.
  • Spaces are represented by the escape code "\s". In the rules you can usually just type a space instead, but if you have to cover cases where it might be missing or where more than one might have been typed (usual sloppiness), then use the escape code, followed by an asterisk, which means "zero or more" of whatever it is put after: \s*.
  • For the rest of the text to match, you can usually type it just the way it occurs as I have done above. For the target translation rules, you can usually just type the literal text you want, with the groups represents by the numerical order in which they occur, preceded by a dollar sign. So the first group (parentheses set) in the source is $1, the second is $2, etc. Of course the order can be changed in the target; it's just not necessary in this case, but in autotranslatable rules for dates this happens rather often.
Not only will the little rules I wrote for this big job save me a lot of typing, I can also use them in a QA profile to check that I have made no errors by switching numbers, missing a space or anything else in my translation. That is done by marking the appropriate checkbox on the first tab of the QA profile you plan to use:


Perhaps such things are worth a little effort in your projects once in a while....