Jan 7, 2012

Translation tool concordances compared

A recent experience when tutoring a new memoQ user started me thinking about the way concordance searches work in various translation environment tools and how the results are displayed. The user, who was quite experienced with OmegaT, kept telling me that memoQ could not find examples of a term's use in the TM and she had to do all her searches in OmegaT. I was somewhat puzzled by that, and when I looked at her screen with the memoQ concordance dialog, I saw something like this:

The memoQ version 5 concordance dialog
Looks like the term ("Inverkehrbringen") was found. So what was the problem? For years she had looked at this concordance view:

The OmegaT concordance dialog
The differences in layout and the lack of highlighting of the key term (which was aligned in the center of the memoQ concordance window in the ancient KWIC display tradition) were unexpected and confusing to the new user.

This inspired me to have a look at how various other tools display concordance results. I was not very happy with some of what I discovered, especially with some of today's leading commercial tools. I took a look at the TWB translation memories in SDL Trados 2007, concordancing in SDL Trados Studio 2009, Wordfast Pro (very limited test due to a demo license and my inability to load my TMX test data), memoQ and OmegaT.

In terms of overall performance, the best results were obtained with OmegaT and "Trados Classic" (2007). Searching a huge TM gave results in a flash. Concordance searches with SDL Trados Studio 2009, on the other hand, really sucked with a big TM (EU data, about 400,000 TUs). I vacuumed my entire apartment and fed the dog while I waited for the result, and I wasn't even told how many hits were found. Unfortunately, my favorite working environment, memoQ, performed worst with the same big data set: it simply gave an error message. Further testing revealed that this error was due to the very large number of hits. (This would have been obvious had I paid enough attention to read the dialog title in the first place.)

memoQ error message from too many concordance hits
So it looks like some development attention may need to be directed here. (Update: Kilgray's develops are actively working to remove this restriction.) Of all the tools I was able to test with a large concordance, memoQ was the only one to fail this way. My personal TM with about 10 years of my work in it is nearly as long as my German/English EU legal test database, but concordance searches in it using memoQ are not unduly slow.

Other concordance views looked like this:

The concordance in SDL Trados 2007 - hits limited compared to OmegaT (see above)
SDL Trados 2009 - perhaps the easiest to read, but slower than molasses
Wordfast Pro - format not bad, but the test was limited due to the demo license
The Déjá Vu X concordance hasn't changed significantly in appearance in the latest version (DVX2). Once again, Victor Dewsbery was kind enough to provide me with screenshots of the two "scan" options for searching the translation memory. The initial scan produces only fairly close matches, while the "power scan" is more like the usual concordance with the term embedded in a larger body of text (the non-matching parts being crossed out)

DVX2 scan (first click)



DVX2 Power Scan (second click)

I do have a license for the older version of DVX, but I didn't attempt any stress testing. While its performance with large TMs has always been good (my personal "Big Mama" is about 330,000 TUs), import and export of such data volumes are painfully slow. We're talking overnight. I hope the new version is better in that respect. There I must really give kudos to the OmegaT developer: loading the TM was even faster than with Trados Workbench, which for me has always been a benchmark of speed to aspire to. All you have to do to add a TMX file to the TM of an OmegaT project is to drop it in the "TM" folder of the project. Very nice :-)

I also received a screenshot of a search in Transit NXT from colleague Hans Lenting in the Netherlands. He searched the term "Inverkehrbringen" in the German/Dutch EU dataset from the DGT:

STAR Transit NXT concordance search
As you can see, there are many ways to display data from a concordance search. Which do you find easiest to deal with? Personally, I love the insertion features of the memoQ concordance, but for readability I think some of the other tools are better. And I do like to know how many results I can expect from my data, and I might even want to view them all.

10 comments:

  1. An interesting comparison, Kevin. I'm mostly very happy with memoQ's concordance capability, though there is one enhancement I would dearly love: the ability to filter and/or order concordance hits by TM.

    ReplyDelete
  2. You and a lot of other people, Rob. That's one of the top wishes I have heard for a while. A little dropdown checkbox with all the active TMs and corpora, similar to the mechanism for termbase write selections, would be a great delight. The insertion functions are the most useful I have seen anywhere. But I do think that the visual representation of the info could be improved. Add highlighting instead of aligning the key word/phrase at the very least.

    ReplyDelete
  3. Thanks for taking the time to carry out this comparison, Kevin. The superior concordance function is really the only thing that I miss from my Trados Classic. And the weird presentation of the search term in memoQ's function takes a lot of getting used to.

    ReplyDelete
  4. I have been asking the guys/girls at memoQ to change the design of the memoQ concordance window for quite a while (in the memoQ list and via numerous Support feature requests), and they keep promising that they are going to change it and/or offer us an 'alternative table-like display', but so far it has remained exactly the same.

    See e.g.:

    - [Gabor:] 'Ultimately, memoQ's concordance window will have an alternative two-column layout, as you mention below. That will not happen in 4.5 though.' [Mon Oct 18, 2010 3:34 pm] &

    - 'Hello Michael, It's still coming. :) Gergely' [Wed Aug 10, 2011 1:22 pm]

    and...

    http://tech.groups.yahoo.com/group/memoQ/messages/15784?threaded=1&m=e&var=1&tidx=1
    http://tech.groups.yahoo.com/group/memoQ/message/21374
    http://tech.groups.yahoo.com/group/memoQ/message/16404
    http://wordbook.nl/screenshots/Concordance.jpg

    Let's hope it finally makes it into one of the next versions or editions!

    I think it's time Kilgray stop adding cool new features and put some serious work into improving various aspects of the UI that we have been begging them to fix/change for a long time already (such as fixing the strange old-fashioned concordance window), and addressing basic missing functionality such as making it possible to REMOVE DUPLICATES FROM OUR %$#*!! term bases!!!

    Michael

    ReplyDelete
  5. That bit about removing duplicates from termbases is more important than ever now that the term extraction function has been introduced. It is far too easy to load up the termbase with duplicates now. Actually, a whole suite of maintenance functions for TMs and termbases are desperatly needed; the world can't rely forever on the free Okapi tools. I don't even think they are maintained any more.

    ReplyDelete
  6. mmmh...
    I suspect that CAT tools are simply the less suitable tools for concordance searches

    using rather Logiterm Pro (last release) with my zillions of TUs I have more results in just a couple of minutes

    clearly, after I must waste some clik/time for a copy-paste process (unless I use MS Word), but anyway Logiterm Pro is still unrivalled in that

    ReplyDelete
  7. Thank you for your good comment about LogiTerm Pro.

    Éric Théberge
    DÉVELOPPEMENT DES AFFAIRES/BUSINESS DEVELOPMENT
    Terminotix inc.

    ReplyDelete
  8. FWIW, memoQ 6.0 (no official release date yet) will finally have a "new" concordance window, with possibility to use either the new layout or the "old" one

    ReplyDelete
  9. That is excellent news, Denis. I was talking to a friend last night about this very issue, and she was complaining about various difficulties with the current concordance.

    ReplyDelete
  10. If you are open to experiments and you like to try new ideas, I suggest a collaborative translation platform, http://poeditor.com/. It helps a lot to save time, as it is organized quite well. It is useful for translating websites or mobile phone apps and it offers quite some options of export. Enjoy.

    ReplyDelete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)