Imagine a dictionary project with thousands of words with their definitions, each "entry" being stored in a separate text file. How would you translate that efficiently?
The brute force method of opening and translating each file individually is not very satisfactory. Not only does this take a long time, but when I have tried foolishness like that I tend to overlook some files and spend far too much time checking to ensure that nothing has been overlooked. And QA measures like spellchecking? Let's change the subject....
Some translation tools offer the possibility to "glue" the content of the little files together and then (usually) "unglue" them later to reconstitute the original structure of little files, now translated.
Other tools offer various ways to combine content in "views" to allow translation, editing, searching and filtering in one big pseudofile. This is very convenient, and this is the method I use most often in my work with memoQ or SDL Trados Studio after learning its virtues earlier as a Déjà Vu user.
Unusual file formats can often be dealt with the same way after some filter tweaking or development. But sometimes....
... there are those projects from Hell where you have to ask yourself what the customer was smoking when he structured his data that way, because some other way would be so much more practical and convenient... for you. Ours is generally not to question why some apparently insane data structure was chosen but to deal with the problem as efficiently as possible within budget and charge appropriately for any extra effort incurred. Hourly fees for translation rather than piece rates certainly have a place here.
Sometimes there is a technical solution, though it may not be obvious to most people. For example, in the case presented to me by a colleague on Christmas Eve
the brief was to write the translation in language XX in the empty cell in that columnof the 3x2 table embedded in a DOCX file. There were hundreds of these files, each containing a single word to translate.
If these were Excel or delimited text files, a simple solution would have been to use the Multilingual Delimited Text Filter for memoQ and specify that the first row is a header. But that won't fly (yet) for MS Word files of any kind.
In the past when I have had challenging preparation to do in RTF or Microsoft Word formats - such as when only certain highlighted passages are to be translated and everything else is ignored - I have created macros in a Microsoft Office application to handle the job.
But this case was a little different. The others were always single files, or just a few files where individual processing was not inconvenient. And macro solutions often suffer from the difficulty that most mere mortals fear to install macros in Microsoft Word or Excel or simply have no idea how to do so.
So some kind of bulk external processing is called for. In this case, probably with a custom program of some kind.
I usually engineer such solutions with a simple scripting language - a dialect of the BASIC language which I learned some 45 years ago - using a free feature which is part of the Microsoft Windows operating system: Windows Scripting Host. And one-off, quick-and-dirty solutions with these tools do not require a lot of skill. The components of many solutions can be found on Microsoft Help pages or various internet forums with a little research if you have only a vague idea of what to do.
In this case, the tasks were to
- Select the files to process (all 272 of them)
- Open each file, copy the English word into the empty cell next to it
- Hide all the other text in the file so that it can be excluded from an import into a working tool like Déja Vu, memoQ or SDL Trados Studio (using the options for importing Microsoft Word files in this case; the defaults usually ignore hidden text on import)
After that the entire folder structure of files could be imported into most professional translation support environments and all 300 or so words to translate could be dealt with in a single list view.
A more detailed definition of the technical challenge would include the fact that to manipulate data in some way in a Microsoft Office file format, the object model for the relevant program would probably have to be used in programming (for XML-based formats there are other possibilities that some might prefer).
Microsoft kindly makes the object models of all its programs available, usually for free, and there is a lot of documentation and examples to support work with them. That may in fact be a problem: there is a lot of information available, and it is sometimes a challenge to filter it all intelligently.
In this case, I needed to use the Microsoft Word object model. It also conveniently provided the methods I needed to create the selection dialog for my executable script file. The method I knew from the past and wanted to use at first is only available to licensed developers, and I am not one of these any more.
It is easy to find examples of table manipulation and text alteration techniques in Microsoft Word using its object model in VBScript or some other Microsoft Basic dialect like Visual Basic for Applications (VBA). The casual dabbler in such matters might run into some trouble using these examples if there is no awareness of differences between these dialects; trouble is often found where VBA examples that declare variables by type (example: "Dim i as Integer") occur. Declarations in VBScript must be untyped (i.e. "Dim i"), so a few changes are needed.
In this case, the quick and simple solution (' documentary comments are delimited by apostrophes and marked green) to make the files import-ready was:
' We have a folder full of DOCX files, each containing
' a three-column table where COL1 ROW2 needs to be copied to COL2 ROW2
' and then the COL1 ROW2 and other content needs to be hidden.
Option Explicit
Dim fso
Dim objWord
Dim WshShell
Dim File
Dim objFile
Dim fileCounter
Dim wrd ' Word app object
Dim oFile ' Word doc object
Dim oCell1 ' first cell of interest in the table
Dim oCell2 ' second cell of interest in the table
Dim oCellx1 ' other uninteresting text
Dim oCellx2 ' other uninteresting text
Dim oCellx3 ' other uninteresting text
Dim oCellx4 ' other uninteresting text
fileCounter = 0
'set the type of dialog box you want to use
'1 = Open
'2 = SaveAs
'3 = File Picker
'4 = Folder Picker
Const msoFileDialogOpen = 1
Set fso = CreateObject("Scripting.FileSystemObject")
Set objWord = CreateObject("Word.Application")
Set WshShell = CreateObject("WScript.Shell")
'use the path selected in the SelectFolder method
'set the dialog box to open at the desired folder
objWord.ChangeFileOpenDirectory("c:\")
With objWord.FileDialog(msoFileDialogOpen)
'set the window title to whatever you want
.Title = "Select the files to process"
.AllowMultiSelect = True
'Get rid of any existing filters
.Filters.Clear
'Show only the desired file types
.Filters.Add "All Files", "*.*"
.Filters.Add "Word Files", "*.doc;*.docx"
'-1 = Open the file
' 0 = Cancel the dialog box
'-2 = Close the dialog box
'If objWord.FileDialog(msoFileDialogOpen).Show = -1 Then 'long form
If .Show = -1 Then 'short form
'Set how you want the dialog window to appear
'it doesn't appear to do anything so it's commented out for now
'0 = Normal
'1 = Maximize
'2 = Minimize
'objWord.WindowState = 2
'the Word dialog must be a collection object
'even with one file, one must use a For/Next loop
'"File" returns a string containing the full path of the selected file
For Each File in .SelectedItems 'short form
'Change the Word dialog object to a file object for easier manipulation
Set objFile = fso.GetFile(File)
Set wrd = GetObject(, "Word.Application")
wrd.Visible = False
wrd.Documents.Open objFile.Path
Set oFile = wrd.ActiveDocument
Set oCell1 = oFile.Tables(1).Rows(2).Cells(1).Range ' EN text
oCell1.End = oCell1.End - 1
Set oCell2 = oFile.Tables(1).Rows(2).Cells(2).Range ' Target (XX)
oCell2.End = oCell2.End - 1
oCell2.FormattedText = oCell1.FormattedText ' copies EN>XX
oCell1.Font.Hidden = True ' hides the text in the source cell
' hide the other cell texts (nontranslatable) now
Set oCellx4 = oFile.Tables(1).Rows(2).Cells(3).Range
oCellx4.Font.Hidden = True
Set oCellx1 = oFile.Tables(1).Rows(1).Cells(1).Range
oCellx1.Font.Hidden = True
Set oCellx2 = oFile.Tables(1).Rows(1).Cells(2).Range
oCellx2.Font.Hidden = True
Set oCellx3 = oFile.Tables(1).Rows(1).Cells(3).Range
oCellx3.Font.Hidden = True
wrd.Documents.Close
Set wrd = Nothing
fileCounter = fileCounter + 1
Next
Else
End If
End With
'Close Word
objWord.Quit
' saying goodbye
msgbox "Number of files processed was: " & fileCounter
The individual files look like the above screenshot (all text in the top row is hidden, so the entire row is invisible, including its bottom border line) after processing with the script, which is saved in a text file with a *.vbs extension (it can be launched under Windows by double-clicking):
Of course the script could be made much shorter by declaring fewer variables and structuring in a more efficient way, but this was a one-off thing where time was of the essence and I just needed to patch something together fast that worked. If this were a routine solution for a client I would be a bit more professional, lock the screen view, change to some sort of "wait cursor" during processing or show a progress bar in a dialog and all the other trimmings that one expects from professional software these days. But professional software development is a bit of a bore after so many decades, and I haven't got the patience to see the same old stupid mistakes and deceits practiced by yet another generation of technowannabe world rulers, I just want to solve problems like this so I can get back to my translations or go play with the dogs and feed the chickens.
But before I could do that I had to save my friend from the Hell of manually unhiding all that table text after his little translation was finished, so I put another 5 minutes (or less) of effort into the "unhiding" script:
Option Explicit
Dim fso
Dim objWord
Dim WshShell
Dim File
Dim objFile
Dim fileCounter
Dim wrd
Dim oFile
Dim oCell1 ' source text cell in the table
Dim oCellx1 ' other uninteresting text
Dim oCellx2 ' other uninteresting text
Dim oCellx3 ' other uninteresting text
Dim oCellx4 ' other uninteresting text
fileCounter = 0
Const msoFileDialogOpen = 1
Set fso = CreateObject("Scripting.FileSystemObject")
Set objWord = CreateObject("Word.Application")
Set WshShell = CreateObject("WScript.Shell")
objWord.ChangeFileOpenDirectory("c:\")
With objWord.FileDialog(msoFileDialogOpen)
.Title = "Select the files to process"
.AllowMultiSelect = True
.Filters.Clear
.Filters.Add "All Files", "*.*"
.Filters.Add "Word Files", "*.doc;*.docx"
If .Show = -1 Then
For Each File in .SelectedItems
Set objFile = fso.GetFile(File)
Set wrd = GetObject(, "Word.Application")
wrd.Visible = False
wrd.Documents.Open objFile.Path
Set oFile = wrd.ActiveDocument
Set oCell1 = oFile.Tables(1).Rows(2).Cells(1).Range
oCell1.Font.Hidden = False
Set oCellx4 = oFile.Tables(1).Rows(2).Cells(3).Range
oCellx4.Font.Hidden = False
Set oCellx1 = oFile.Tables(1).Rows(1).Cells(1).Range
oCellx1.Font.Hidden = False
Set oCellx2 = oFile.Tables(1).Rows(1).Cells(2).Range
oCellx2.Font.Hidden = False
Set oCellx3 = oFile.Tables(1).Rows(1).Cells(3).Range
oCellx3.Font.Hidden = False
wrd.Documents.Close
Set wrd = Nothing
fileCounter = fileCounter + 1
Next
Else
End If
End With
objWord.Quit
msgbox "Number of files processed was: " & fileCounter
Very neat solution, Kevin. People send you the fiddliest of files :)
ReplyDeleteGreat job again, Kevin!
ReplyDelete