Oct 18, 2023

An Unfiltered Look at memoQ Filters (webinar, 19 October 2023, 15:00 CET)


 

This presentation and discussion covered some of the challenges and opportunities to improve memoQ project workflows through correct filter choice and design. There are many different aspects to filters in memoQ, and the right choices for a given translatable file or project are not always clear, or different options may offer particular advantages in your situation.

Cascading filters - an important feature for dealing with complex source texts - are also part of the talk, not just the basics but also examples of going beyond what visible memoQ features allow, to do "the impossible". This session is part of the weekly open office hours for the course "memoQuickies Resource Camp", but everyone is welcome to attend these talks regardless of enrollment status. Those interested in full access to all the course resources and teaching may enroll until the end of January 2024.

To join sessions for the October and November office hours, register here.

After registering, you will receive a confirmation email containing information about joining the meeting. 

Here is an edited recording of the October 19th session, with a time-coded index available on YouTube in the Description field:

Oct 5, 2023

What's wrong with my segmentation (in translation)?

The fifth open office hours session for the self-guided online course "memoQuickies Resource Camp" discussed segmentation problems with documents imported to translation environments such as memoQ, Trados Studio, Phrase, Cafetran Espresso, etc. and various ways that these issues might be identified so that they can be corrected.

Segmentation problems waste enormous amounts of time, and bad segmentation rules are a plague on the translation and localization service community. Unfortunately, nearly all the rules I have seen, for all working environments, simply suck sewage. memoQ's rules usually suck less, but still....

This week's talk presented, among other things, some methods for identifying segmentation trouble spots quickly and easily with the use of special regular expressions describing common patterns followed by texts with troubled segmentation. And a Regex Assistant library has been provided (and will be updated during the course period) to help with all of this.

The video and related course pages will remain completely open to the public, with downloads available, at least through the end of 2023. After that the pages and resources may be taken down for updates and reorganization in other courses.

The video recording of the lecture "What's wrong with my segmentation?" can be accessed on YouTube (embedded below) or course participants can access the page to download it by clicking the "segmentation rules" icon at the top of this article.


An important part of checking the performance of your segmentation rules and possibly improving them is to have a good sampling of test data. One of my favorite sources for this are the European Community archives at the DGT, where EU legislation and other important information is available in a parallel corpus of all the official languages of the Community.

I have downloaded part of the 2022 DGT distribution and prepared a number of monolingual and bilingual corpora (about 2.6 million words, approximately 150,000 TUs) in EU languages and translation pairs. Moreover, information on my method has been published so that others can reproduce it for the languages that interest them.

Oct 1, 2023

Bring the lightning.

Quality is a slippery notion, especially when discussing it with those whose ethical approaches to providing services are even slipperier. According to one well-known figure in the trashlation sector, "Quality doesn't matter". Knowing that individual as I do, I know that this utterance was intended as a provocation, and that it is likely backed by some almost-persuasive sleight-of-hand involving differing definitions and whatnot. Given the variability in the human emotional perception of quality (as with obscenity, I cannot define quality, but I know it when I experience it), all of the attempts one sees to quantify it in language services seem all the more absurd.

All the myriad process definitions, ISO certifications, stamps and seals of sinlessness, diplomata, grants of honoris causa et cetera cannot transform the humble lightning bug into a Bolt of Zeus.

Nor are Large Language Models (LLMs) capable of such linguistic transubstantiation, but rather the opposite. The predictive practices at their core could take a training feed of all the world's great literature (and likely already have), and yet the output would be nothing more than an insipid averaging of the basest mediocrities. Only the basest of the mediocre could mistake such text for objectively good quality.

Were we to plot the degree of enthusiasm for AI as the "future" of trashlation against the degree of actual understanding and competence for good language, the graph would look something like this:

But a recent article in The Economist suggests a better way. Curiously, it is a process I resort to myself when the greatest subtlety and balance are needed in a work, for example in the translation of good poetry, or a letter of condolence occasioned by the loss of a belovèd child.

Back to pen on paper. Where the pressure of the nib is an expression in itself, as the sweeping flourish of a final letter or a well-executed ligature.

"But that's ABSURD!!!" some might protest, glancing nervously at their smartphone timers counting down to the next due delivery of linguistic sausage. Much too slow some might think. But is it? Really?

"But you need to run QA and you can't do that with a sheet of scribbles on paper!" some might suggest, more reasonably. Ah, but I can, merely dictate the text I will have read aloud already time and again as I refined the words and their rhythm, and then, in good electronic form, all the slings and arrows of outrageous regex are my quality arsenal.

We have a slow food movement. Perhaps if we want more delicious, digestible, properly communicative words in our translated lives, we should slow the fuck down and let them crystallize, with exquisite subconscious fractal creativity, to form bolts of emotion and understanding that pierce the veil between this world and others as they flash across a page.

As the morlocks cower in their caves and hovels, tapping tiny tablets in their claws, prompting their artificial gods to take this terror of meaning from their shriveled world.