Use Wf Classic's TMs in another CAT tool
Автор темы: trhanslator (X)
trhanslator (X)
trhanslator (X)
Jan 11, 2013

The simplicity of Wordfast Classic's Translation Memories is beautiful. Almost no overhead, editable in any text editor. Sorting and manipulating in a spreadsheet software is very easy.

The source language is saved in the fifth column, the target language in the seventh. A time stamp is saved in the first column. The number of TUs is saved in the header.

I'm examining a workflow to use Wf C's TMs with another CAT tool. I've manually added/modified some TUs in a Wf C TM,
... See more
The simplicity of Wordfast Classic's Translation Memories is beautiful. Almost no overhead, editable in any text editor. Sorting and manipulating in a spreadsheet software is very easy.

The source language is saved in the fifth column, the target language in the seventh. A time stamp is saved in the first column. The number of TUs is saved in the header.

I'm examining a workflow to use Wf C's TMs with another CAT tool. I've manually added/modified some TUs in a Wf C TM, updated the TU counter, reloaded the TM in Wf C and executed a Reorganization of the TM. I didn't encounter any problems, Wf C could use the thus manipulated TM when translating my sample document.

My question is: Are there any issues to be expected in this workflow?
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Нидерланды
Local time: 10:51
Член ProZ.com c 2006
английский => африкаанс
+ ...
TU count is not necessary, IMO Jan 11, 2013

trhanslator wrote:
Sorting and manipulating in a spreadsheet software is very easy.


I have had issues with not having the correct number of columns in the file, though.

A time stamp is saved in the first column.


Make sure you experiment comprehensively with time stamps, because their format is not as simple as it may seem at first glance (sorry, I can't remember all the things I discovered about it).

The number of TUs is saved in the header.


As far as I know, the number of TUs need not be correct in the header. I regularly add segments to TMs in a text editor without updating the TU counter in the header.


 
Dominique Pivard
Dominique Pivard  Identity Verified
Local time: 11:51
финский => французский
Excel and TMX are potential pitfalls Jan 11, 2013

trhanslator wrote:
The simplicity of Wordfast Classic's Translation Memories is beautiful. Almost no overhead, editable in any text editor. Sorting and manipulating in a spreadsheet software is very easy.

First of all, you don't have to say "Wordfast Classic": Wordfast is sufficient, since Pro and Anywhere also use the same format.
You have to be careful when opening a Wordfast TM in Excel: make sure the source and target columns are imported as text.

trhanslator wrote:
My question is: Are there any issues to be expected in this workflow?

Does your workflow involve conversions to and from TMX? If so, there can be problems ahead. For instance, it is almost impossible to import a large TM created with Wordfast Classic into memoQ: as soon as memoQ encounters a unit that contains an invalid XML characters, it will stop importing the rest of the TM. And there will be invalid XML characters in a TM exported from Wordfast Classic.


 
Samuel Murray
Samuel Murray  Identity Verified
Нидерланды
Local time: 10:51
Член ProZ.com c 2006
английский => африкаанс
+ ...
Slightly off-topic: invalid XML characters Jan 11, 2013

Dominique Pivard wrote:
You have to be careful when opening a Wordfast TM in Excel: make sure the source and target columns are imported as text.


Yes, I forgot: when I open a WFC TM in Excel, I do so by pasting it from a plaintext editor such as Akelpad into a new worksheet, and not by opening it directly.

As soon as memoQ encounters a unit that contains an invalid XML characters, it will stop importing the rest of the TM. And there will be invalid XML characters in a TM exported from Wordfast Classic.


I also find invalid XML in TMX files created by Trados (created by clients). When I try to convert such TMX files to e.g. WFC TMs using either Wordfast itself, the wf2tmx utility or Okapi, those characters give me grief. For this reason I run such TMX files first through a little script that I wrote, that attempts to neutralise such characters and entities:

http://wikisend.com/download/183512/tmxfixerbasic.zip


 
trhanslator (X)
trhanslator (X)
Автор темы
Fixer is nice Jan 11, 2013

Samuel Murray wrote:

I also find invalid XML in TMX files created by Trados (created by clients). When I try to convert such TMX files to e.g. WFC TMs using either Wordfast itself, the wf2tmx utility or Okapi, those characters give me grief. For this reason I run such TMX files first through a little script that I wrote, that attempts to neutralise such characters and entities:

http://wikisend.com/download/183512/tmxfixerbasic.zip



Nice solution. Thanks!


 
trhanslator (X)
trhanslator (X)
Автор темы
Thanks for the warning! Jan 11, 2013

Dominique Pivard wrote:

First of all, you don't have to say "Wordfast Classic": Wordfast is sufficient, since Pro and Anywhere also use the same format.
You have to be careful when opening a Wordfast TM in Excel: make sure the source and target columns are imported as text.


Thanks for the warning!

I was indeed assuming that Wf Pro uses databases for storing the TM. It is indeed a simple tab-delimited text file. Since Wf Pro can do a trick that Wf Classic cannot: where are the tag positions saved? And all three Wf flavors can use the same TM format without any losses? Great!


 
Dominique Pivard
Dominique Pivard  Identity Verified
Local time: 11:51
финский => французский
How WFP stores tags, how WFC displays them Jan 12, 2013

trhanslator wrote:
I was indeed assuming that Wf Pro uses databases for storing the TM. It is indeed a simple tab-delimited text file. Since Wf Pro can do a trick that Wf Classic cannot: where are the tag positions saved? And all three Wf flavors can use the same TM format without any losses? Great!

No, WFP just uses a different index file format, but the TM itself (file with the .TXT extension) is the same.

If you translate this in WFP:



You will see this in the TM:



And if you perform a concordance search in WFC, the results will be displayed in a slightly more user-friendly way:



 
trhanslator (X)
trhanslator (X)
Автор темы
Wf Pro index file Jan 12, 2013

I take it that this index with the precious info about the tag location is built on the fly during the translation process?

What will happen when your index gets corrupted: Can Wf Pro recreate it, with the tag position info?

Since Wf C doesn't save tag positions in an index, one cannot switch between editors during one large project?


 
Samuel Murray
Samuel Murray  Identity Verified
Нидерланды
Local time: 10:51
Член ProZ.com c 2006
английский => африкаанс
+ ...
I'm trying... Jan 12, 2013

trhanslator wrote:
I take it that this index with the precious info about the tag location is built on the fly during the translation process? What will happen when your index gets corrupted: Can Wf Pro recreate it, with the tag position info?


I'm not a WFC expert and I'm not a programmer, but I'm quite confused by your talk of a tag location index. Why would such an index be necessary? The position of the tags are clear to see in the source file, and they are also in the TM, and all that needs to be done is to match their location from the TM to their location in the source file, for every segment, individually, when needed. Why would an index be needed?

Since Wf C doesn't save tag positions in an index, one cannot switch between editors during one large project?


I'm not sure what you mean by "editors" -- do you mean different CAT tools with which you want to translate portions of the TXML file(s)? If you want to continue translating the TXML file in another tool after you've translated half of it, simply open the partially translated TXML file in the other CAT tool (and optionally also open the TM that you used in the previous CAT tool (converted to a format that that CAT tool can understand, if necessary)). Or... what is it that I don't understand here?


 
nrichy (X)
nrichy (X)
Франция
Local time: 10:51
французский => голландский
+ ...
Trying too Jan 12, 2013

Samuel Murray wrote:

trhanslator wrote:
I take it that this index with the precious info about the tag location is built on the fly during the translation process? What will happen when your index gets corrupted: Can Wf Pro recreate it, with the tag position info?


I'm not a WFC expert and I'm not a programmer, but I'm quite confused by your talk of a tag location index. Why would such an index be necessary? The position of the tags are clear to see in the source file, and they are also in the TM, and all that needs to be done is to match their location from the TM to their location in the source file, for every segment, individually, when needed. Why would an index be needed?


Same answer as Samuel. I don't understand your idea of a tag location index.

Since Wf C doesn't save tag positions in an index, one cannot switch between editors during one large project?


The TM is ONE text file. You can switch (on the same project) between WF Pro and WF Classic: segments will be added independently of the CAT you are using. WF Cl doesn't add tags, WF Pro does. If you want to change for another CAT, then you'll have to export the TM into a .tmx format, so you'll have TWO TM files, and segments will be added to the one or the other depending on the CAT you are using.

Back to your first question:
My question is: Are there any issues to be expected in this workflow?


Yes, Excel is not the right tool to correct or sort the TM, because some cells will be filled with #### or date formats will change. Use a text editor for small corrections (be sure to correct all segments), or WF Classic's own TM editor (second icon from the right), or Olifant.

Export: the main problem is that WF Classic is very "elastic": it accepts everything and will export all TMs without problems into .tmx, the exchange format. Problems may occur when importing this .tmx file into another CAT. Typically these issues occur in the language pairs, for instance if you mixed up EN-US and EN-GB as a source language or as a target language. If you encounter this problem, check and correct the language pairs in the whole file (sort the file on source language or on target language, for instance).
Sometimes there is an issue with "old" TMs created with WF 3 or 4 (more than five years old). Olifant will detect these and will enable you to skip some segments.


 
Samuel Murray
Samuel Murray  Identity Verified
Нидерланды
Local time: 10:51
Член ProZ.com c 2006
английский => африкаанс
+ ...
@OP, re: nRichy Jan 12, 2013

nrichy wrote:
You can switch (on the same project) between WF Pro and WF Classic: segments will be added independently of the CAT you are using.


By the way, there used to be this bug in which WFP TMs were in UTF8 and WFC TMs were in UTF16LE, and if WFC tries to open a WFP TM, it would simply delete all entries in it without making a backup. Has this bug been fixed anywhere?

WF Cl doesn't add tags, WF Pro does.


Well (and you probably meant to say this), WFC does add tags, if there are tags in the source text. Since e.g. MS Word files are translated in WFC via WYSIWIG (and not via tagged text, as most other CAT tools do it, including WFP), there are no "tags" in an MS Word file in WFC.

So, if you translate an identical MS Word file in both WFC and WFP, the TM from the one translation will not be a perfect 100% match for the other, because the source formats are different -- in WFC, the source format is the actual MS Word file, and in WFP the source format is the TXML file that is generated from the MS Word file.

My previous answer about this issue assumed that you translate a tagged format in both WFC and WFP (e.g. an HTML file, or a marked-up XML file). If you translate e.g. two identical HTML files in WFC and in WFP, then both TMs will have tags in them (and as far as I know, with HTML at least, the tags will be roughly the same in both WFP and WFC). However, other CAT tools may have defined the tags in a different way (e.g. they might treat "<i><b>xyz</b></i>" as having four tags instead of two), which means that the tags will be different, even if an identical HTML file was used as the source text... which means that the old TM won't be a perfect match.

Still, all of this talking brings me no nearer to the question of what that mysterious tag location index would be.

Typically these issues occur in the language pairs, for instance if you mixed up EN-US and EN-GB as a source language or as a target language. If you encounter this problem, check and correct the language pairs in the whole file (sort the file on source language or on target language, for instance).


For that, I also use a script of my own (you need AutoIt to use it):

http://wikisend.com/download/340982/WfTM2anon.zip



[Edited at 2013-01-12 15:39 GMT]


 
trhanslator (X)
trhanslator (X)
Автор темы
Wf Classic TM and Wf Pro TM are different Jan 13, 2013

Samuel Murray wrote:

Well (and you probably meant to say this), WFC does add tags, if there are tags in the source text. Since e.g. MS Word files are translated in WFC via WYSIWIG (and not via tagged text, as most other CAT tools do it, including WFP), there are no "tags" in an MS Word file in WFC.

So, if you translate an identical MS Word file in both WFC and WFP, the TM from the one translation will not be a perfect 100% match for the other


I can confirm that now. I've translated a Word DOCX document with two sentences in both Wf versions.

The Wf Pro TM:

%20130113~083306 %User ID,HA,HA Hans %TU=00000002 %DE-DE %Wordfast TM v.546/00 %NL-NL %----------- .
20130113~084022 Hans 0 DE-DE &tA;Das hier ist &tB;fett&tC; und &tD;kursiv&tE;. NL-NL &tA;Dit hier is &tB;vet&tC; en &tD;cursief&tE;.
20130113~084022 Hans 0 DE-DE &tA;Das hier ist &tB;kursiv&tC; und &tD;fett&tE;. NL-NL &tA;Dit hier is &tB;cursief&tC; en &tD;vet&tE;.


The Wf Classic TM:

%20130113~084646 %HA (Hans) %TU=00000002 %DE-DE %Wordfast TM v.6.03t/00 %NL-NL %---inf.nl
20130113~084720 HA 0 DE-DE Das hier ist fett und kursiv. NL-NL Dit hier vet en cursief. EL ST 13-17:9-12|22-28:16-23|
20130113~084737 HA 0 DE-DE Das hier ist kursiv und fett. NL-NL Dit hier is cursief en vet. EL ST 24-28:23-26|13-19:12-19|


As you can see, the Wf Pro TM does have placeholders whereas the Wf Classic TM doesn't.

And indeed, when I use the WFP TM on a copy of the document in WFC, I can find matches in the concordance, but I don't get full matches (since WFC cannot process the placeholders).

Sorry for the confusion about the tag index – I should have looked at the WFP TM first (which I didn't, since I assumed it was a DB rather than a text file).

Thanks for all explanations!


 
Dominique Pivard
Dominique Pivard  Identity Verified
Local time: 11:51
финский => французский
The index file is totally disposable Jan 13, 2013

trhanslator wrote:
I take it that this index with the precious info about the tag location is built on the fly during the translation process?

What will happen when your index gets corrupted: Can Wf Pro recreate it, with the tag position info?

Since Wf C doesn't save tag positions in an index, one cannot switch between editors during one large project?

The index file can (and will) be re-created from scratch at any time, in both Classic and Pro. Everything that matters is contained in the .txt file. This is the one you should back up on a regular basis (preferably, incrementally).


 
Dominique Pivard
Dominique Pivard  Identity Verified
Local time: 11:51
финский => французский
Excel Jan 13, 2013

nrichy wrote:
Yes, Excel is not the right tool to correct or sort the TM, because some cells will be filled with #### or date formats will change.

Actually, you won't have that problem if you are careful to import fields in your TM as text. TM size can be a problem with older versions of Excel (only 65,000 rows supported). Excel 2010 supports up to one million rows.


 
Samuel Murray
Samuel Murray  Identity Verified
Нидерланды
Local time: 10:51
Член ProZ.com c 2006
английский => африкаанс
+ ...
Yes, and/but... Jan 13, 2013

trhanslator wrote:
Samuel Murray wrote:
WFC does add tags, if there are tags in the source text. ... So, if you translate an identical MS Word file in both WFC and WFP, the TM from the one translation will not be a perfect 100% match for the other...

I can confirm that now. I've translated a Word DOCX document with two sentences in both Wf versions.


Yes. Another way of putting it is to say that you did not translate the same document in the two programs. You may think that you translated the same MS Word file in both WFC and WFP, but the fact is that you didn't translate an MS Word file in WFP. What you translated in WFP was not an MS Word file, but a TXML file (...which was created after processing the MS Word file). TXML is a tagged format, hence the tags in the TM.

The same logic applies to using e.g. Trados 2009/11 or many other CAT tools these days. Most of those programs can't translate MS Word files, but they can translate the intermediary file format to which they convert the MS Word file.

By the way, there used to be a version of WFC that kept track of formatting within segments, by adding attributes to the TM (though not inline), but I'm not sure what happened to that feature.

And indeed, when I use the WFP TM on a copy of the document in WFC, I can find matches in the concordance, but I don't get full matches (since WFC cannot process the placeholders).


Just to make sure we understand each other: WFC can process placeholders, if placeholders exist in the document you're translating.

The reason you don't get 100% matches is this: the source text of the translation unit in the TM is not 100% the same as the source text in the document you're translating. The source text in the TM contains tags, and the file that you're translating *does not* contain tags (in WYSIWIG mode, which is the mode in which WFC works), and therefore it is not a 100% match.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Use Wf Classic's TMs in another CAT tool







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »