An exploration of language technologies, translation education, practice and politics, ethical market strategies, workflow optimization, resource reviews, controversies, coffee and other topics of possible interest to the language services community and those who associate with it. Service hours: Thursdays, GMT 09:00 to 13:00.
May 5, 2012
memoQuickie: fixing source segmentation from abbreviations
Do you see segmentation like the above in your projects? Annoying, right? This is easy to fix in memoQ.
Go to Tools > Options... > Default resources > Segmentation rules (in the row of icons):
Select the language (including sublanguage if relevant) and select the editable rule set, then click Edit.
On the tab for custom lists, add the offending abbreviations to the #abbr_short# list.
Re-import the document(s) on the Translations > Documents list of the Project home tab. The number of segments in my document was reduced from 197 to 134, because it was so laden with academic titles. Since I use versioning, any previously translated segments can be recovered quickly by Operations > X-Translate...
Sometimes I think that abbreviations I added aren't fixing the segmentation. In those cases I have usually switched to a sublanguage for which they were not entered.
Subscribe to:
Post Comments (Atom)
I have this problem all the time and didn't know how to fix it - thanks for the tip.
ReplyDeleteThank you for this tip. It's very useful for me.
ReplyDeleteI wish it were that simple. Funnily I've got the very same segmentation break at Prof. Dr. No biggie, except if you're working for a German university. And no matter what I try (I went through the motions you described, double-checking that the languages are correct – German AND German (Germany) just to make sure although the project is set to German (Germany) – it still breaks at Prof. and at Dr. Not sure what I can do about this. I think part of the problem might be that I have to run cascading filters to prepare the source texts (TYPO3 xml filter, then HTML, then Regex tagger), so maybe MemoQ just misses out the segmentation filter in the end. Just a hunch. I seem to remember that "ca." was also segmented last time and just now I saw it included in the default segmentation filter.
ReplyDeleteApart from that, I just wanted to say that I've been a regular reader of your blog posts (and pre-blog posts in ProZ), and it's given me plenty of great ideas throughout my freelance years. Plenty of great advice there and usually a highly enjoyable read.
Interesting problem, Andrew. I'll have to test whether cascading filters have any effect in cases where I can design a simulation. The specific case you mention here interests me, as I have a current project question involving TYPO3 exports. Would you mind dropping me a private mail? Perhaps you may know the information I seek.
ReplyDeleteOh yes... also wanted to say that I hope you reported this trouble to Kilgray Support. That is very important. I hear a lot of problems with various memoQ modules from colleagues, but in most cases nothing has been said to the team that makes the software. It's unreasonable to expect improvement if you haven't done your part to inform them of the problem. If enough people raise an important point it will probably get attention sooner. Like any sensible provider, Kilgray makes many of its development decisions based on the perception of how many people actually want the solution.