Mar 23, 2010

The state of OmegaT


Some time ago I asked Marc Prior, the head of the OmegaT project, if he would like to share some information on developments with this popular freeware translation environment tool, and he has now kindly obliged. Although I need a bit more for my commercial work, this tool has some interesting aces up its sleeve. I seem to recall it working last year for Amharic, for example, where leading commercial tools bit the dust. Now I yield the virtual stage to Marc....

Those of you who follow this blog will know that one of Kevin's interests is CAT tools and translation memory. In particular, he has gone to some lengths in the past to facilitate compatibility between the various tools available (particularly where Déjà Vu is involved), and he and I have had some long discussions concerning OmegaT's compatibility. I recently pointed out to him that OmegaT had seen quite considerable development over the last year or so, and invited him to take another look at it. Instead, he suggested that I summarize the changes here for readers' benefit. So here I am.

Most (but not all) of the programming work on OmegaT over the last year has been done by Alex Buloichik. A huge number of changes have been made; many are minor but may be very important to some people, such as the addition of a LaTeX filter or support for three-digit ISO language codes. The description below is limited to the most significant changes in terms of new functions.

Some of the features may not yet be available in the “stable” version (which is offered for download by default). Users are encouraged to try what the OmegaT team describes as the “beta version”; the development team takes great care to ensure that the code in this version is also stable (some of the commercial competition could learn some lessons from OmegaT in this regard), and the “beta” status refers primarily to the fact that the accompanying documentation is not up to date.

Notable new features:

·        TransTips: when a term in the current segment is found in the glossary, the term is underlined in the editor pane to draw the translator's attention to it. Right-clicking on the underlined term calls up the available target terms, and clicking on a selected term inserts it into the translation.

·        Selecting text in the glossary pane and right-clicking on it also causes it to to be inserted.

·        Automatic update: changes made “on the fly” to the glossary take effect immediately; it is no longer necessary to reload the project in order to see them.

·        CSV format: the glossary function now accepts files in .csv format, making it easier (among other things) to use Excel to manage glossaries.

·        Stemming: “stemming” refers to the process whereby words (in the source text) are reduced to their stems. This enhances fuzzy matching, and also means that glossary terms are displayed even if an inflected form is encountered. The stemming feature (termed “tokenizers”) requires installation of a separate plug-in.

·        Machine translation: if the relevant option is selected, OmegaT sends the current segment to Google Translate and displays the resulting translation in a separate window. Translators can then make of that what they will. For those who love MT, it provides an “instant translation”. For those who think that MT is a waste of time, it provides ammunition!

·        Tags: we all hate them, especially when they don't have any useful function. Where a segment contains numerous superfluous tags, a shortcut now enables them all to be inserted at once (for example at the end of the segment).

·        Dictionaries: a window is now provided in which content can be displayed from external, third-party dictionaries such as StarDict and Lingvo DSL.

·        Translation units: these now contain information on the author and the change date, and can be searched for using this information.

·        Interface for scripting languages: the source and target texts of the current segment are now exported to plain-text files, enabling people with programming skills (even if only rudimentary) to add new features of their own.

·        Performance: various performance enhancements have been made. In particular, data for matching is now computed on demand, rather than during project loading, making project loading much faster.

·        Spelling checker: mis-spelled words are underlined.

Incidentally: the OmegaT project is run by volunteers, and welcomes offers of support, particularly from translators willing to help translate the program, the documentation or the website into their own languages.

4 comments:

  1. As a relatively long-term fan of OmegaT, I'm happy to hear of these continued developments to the software.

    Thank you to the team for your great work.

    If I can help with testing, I'd be happy to help.

    ReplyDelete
  2. Kevin, I really appreciate your sharing your experience(s) with us. I learned a lot by reading your blogs. Thank you.

    - John Bunch

    ReplyDelete
  3. Hi Kevin, thanks for this great summary of OmegaT's recent feature additions.
    I hadn't thought of using the plain-text export function for scripting, I will look into it..

    ReplyDelete
  4. Thank you very much for this great tool!

    ReplyDelete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)