The need for proper documentation of such efforts does not end there, however. It is very important, especially for more complex sets of rules, that there be clear documentation of the purpose and logic of the rules developed, and that this documentation be present
- in the rules themselves (as comments) and
- in external documents to be used as references for troubleshooting, maintenance and further development.
Auto-translation rules and other resources using regular expressions should not be scripted and maintained for the long run in memoQ itself or in any other environment which does not allow thorough commenting of the regular expressions used. Without comments, it is simply too easy to destroy functioning rules by forgetting why they were written a certain way once-upon-a-time, and an environment able to use comments also allows old rules to be "commented out" (disabled, but still available for reference or later re-use) while new versions are tested. That is basically impossible with memoQ's internal resource editors at the present time. And to make matters worse, if auto-translation rules are edited inside memoQ, their order changes, sometimes with dire consequences if functionality depends on the rule order. Try sorting out problems like that in a set of 70 or so rules.
Click to access a PDF sample of my rule development record (2 pages) |
The graphic above is one example of how I maintain my personal records of some work developing regular expressions. I usually include
- descriptions of all information recorded
- a specific example on which I will base the general rule
- a simple ("fragile") version of the rule part (source input and target output) with only the most essential elements; this is not error-tolerant, but it is the easiest to understand and the first place to look if something isn't working as I would like it to
- more robust variations which take into account differences in spacing, punctuation, etc. or include things like non-breaking spaces that might be desired in the output (this can get cluttered and hard to read)
- color-marking for easier identification of some elements
- comments about why things are written as they are or about possible improvements or problems
This record is a template of sorts from which rules can be assembled very quickly or rules can be re-purposed for other languages or formats in a way that is easy to follow and catch mistakes. Such records are also helpful if the rules are to be shared with other developers or maintained by someone else.
My example is certainly not the final word in project documentation for such efforts; it is simply part of a set of personal tools to help me work more efficiently with the limited time I have. Professional development and consulting organizations often have far more extensive and detailed systems of project documentation; when I was part of one such shop nearly 20 years ago, my (downloadable) 2-page example might easily have filled twenty pages of very important-looking professional technobabble. Life's too short for shit like that anymore.
But if you value your time as a developer or your investment as one who hires others to develop such useful rules, it pays big dividends in most cases to demand some sort of clear, systematic and accurate record of how your special rules, filters, etc. were developed so that they can be maintained and improved in the future.
I highly recommend RegexBuddy as an excellent tool to craft regex, using wizard and well commented rules library, featuring color coding. And I fully agree with you on benefit of inserting comments in the rule itself.
ReplyDeleteI think that is a favorite resource for Paul Filkin too. Paul has written a lot of nice regex tutorials (like this one: https://multifarious.filkin.com/2012/08/24/regex-pt1/) and he recommends it too.
DeleteVery interesting! Thank you!
ReplyDelete