Over the past few days I have been updating some training documentation and running a lot of tests on tagged files as part of this. During this work, I have been struck time and again by the differences in the tags "found" by different tools working with the same file. Sometimes one tool looks better than another, but the patterns are not always consistent. What is most consistent is the ability of CodeZapper to clean up the files in various versions of Microsoft Word and make the tag structures appear a little more uniform.
Here's an example of the same DOCX file "unzapped" in several tools:
Import into memoQ 5, as-is, no tag clean-up. Previous versions of the same file showed more tags in places. |
SDL Trados Studio 2009 before tag clean-up. |
TagEditor in SDL Trados 2007 before tag clean-up |
Initially, OmegaT would not import that particular DOCX without a tag cleanup. I reported the problem to the developers, who upgraded the filter to handle a previously unfamiliar character in internal paths of the ZIP file (DOCX is actually just a renamed ZIP package like many other file types). See http://tech.groups.yahoo.com/group/OmegaT/message/23931 for information on the new release. Opening, editing and re-saving the troublesome file enabled it to be imported after all without the latest version bugfix. So users should keep that trick in mind perhaps if a similar problem is encountered. I've had to do similar actions in the past with other tools, so this is probably a good general tip to keep in mind regardless of what tool you use. When I downloaded an tested the latest standard release of OmegaT (2.3.0_4), the tag structure looked fine - no zapping of the DOCX was necessary in this case.
After treatment with CodeZapper, the file looked the same in memoQ (where the extra tags weren't present in the first place, though one can't count on things always being this way). The view in Trados Studio and TagEditor improved significantly, though there were still more tags, and OmegaT accepted the DOCX after tag cleaning.
SDL Trados Studio 2009 import of the DOCX file after tag cleanup with CodeZapper |
SDL Trados 2007 TagEditor import of the DOCX file after tag cleanup with CodeZapper |
OmegaT import of the DOCX file after tag cleanup with CodeZapper (OmegaT 2.3.0_3) |
It is important to consider that superfluous tags mean wasted work time with formatting and QA corrections, perhaps even a higher risk of file failure (such as the inability to import the file at all into one tool). This is why for some time now, I and others have advocated modifying the costing of volume-based translation work to include the amount of tags. This requires, of course, that you have access to a counting tool which reports the number of tags (SDL Trados Studio does this - Atril's Déjà Vu has long offered this feature, and memoQ even allows you to assign a word or character "weight" for counting purposes). This is the only fair way I know of to account for the extra work (beside time-based charges). Consider that everyone is affected: translators, reviewers and project managers! I've had to talk more than one of the last group through "tag rescue" techniques after hours.
Perhaps it is worth considering as well that cleaner tagging will also improve "leverage" (match quality) in translation memories. So if a tool does offer cleaner tag structures (fora variety of source formats) consistently, working with that tool efficiently to manage projects will save time and money as well on top of the time and money saved with the use of CodeZapper macros in MS Word files.
Perhaps it's worth adding that CodeZapper is now integrated into DVX2. So Dave's genius is now enhanced by Daniel B's brilliance, and I get the best of both.
ReplyDeleteThanks for confirming that, Victor. I had heard a rumor about its integration recently, and I think a move like that is long overdue. In fact, I think it would be an excellent idea in general for translation environment tools which import MS Office files to enable any MS Word VBA macro(s) to be run for pre-processing. This would simplify the handling of "external views" by other environments.
ReplyDelete