Cleaning a large translation memory - WF Pro 3
Thread poster: romine
romine
romine
Local time: 16:00
English to German
+ ...
Sep 26, 2019

We are trying to clean our main translation memories, starting with one of approx. 50,000 segments, to make things faster and more efficient.

What is the best way to go about this? In the past, I always just opened the txt in Excel and deleted or adjusted the redundant or faulty segments but I feel like this is neither efficient nor a particularly safe way to do it.

Is there a way to get the TM administration perspective to work properly? When set to the default of only
... See more
We are trying to clean our main translation memories, starting with one of approx. 50,000 segments, to make things faster and more efficient.

What is the best way to go about this? In the past, I always just opened the txt in Excel and deleted or adjusted the redundant or faulty segments but I feel like this is neither efficient nor a particularly safe way to do it.

Is there a way to get the TM administration perspective to work properly? When set to the default of only retrieving 100 segments at a time, it doesn't see to let me filter the segments in any kind of practical way. When I try to set the segment limit to a higher number (no matter if it's 1,000 or 50,000) the TM won't load.

Assuming that I can't get the TM administration perspective to work, what is the best way to remove all tags/placeables? Just remove them manually using copy & paste in the txt file?

Also, I think I read somewhere that "invalid" segments (segments that were updated at some point) are marked with "xx" at the beginning in the txt. Am I right in assuming that Wordfast won't retrieve these segments for matches anyway and I can just delete them from the txt? Why are they even still in the txt? I always thought that, if I update a segment previously committed to the TM, only the updated version is kept in the TM?

I should also add that I cannot download any external tools to help me with this as my company has a very strict policy about that and it would probably take weeks for them to approve a new tool.

Does anyone have any recommendations how to go about this? We are on the most recent version of WF Pro 3 (3.4.14).

Thank you in advance!
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 16:00
Member (2006)
English to Afrikaans
+ ...
@Romine Sep 26, 2019

romine wrote:
We are trying to clean our main translation memories, starting with one of approx. 50,000 segments, to make things faster and more efficient. ...
I should also add that I cannot download any external tools to help me with this as my company has a very strict policy about that and it would probably take weeks for them to approve a new tool.


I have never had good experiences with Wordfast's TM editors.

If you were able to download other utilities, I would have recommended that you try Okapi Olifant and Xbench, both of which I have used in the past to edit WF TMs.

Assuming that I can't get the TM administration perspective to work, what is the best way to remove all tags/placeables? Just remove them manually using copy & paste in the txt file?


Yes, I think w.r.t. tags, it should be fairly straight-forward to just do find/replace on the text file itself. You can also open it in Excel, or you can open it in a text editor and then copy it to Excel.

Also, I think I read somewhere that "invalid" segments (segments that were updated at some point) are marked with "xx" at the beginning in the txt. Am I right in assuming that Wordfast won't retrieve these segments for matches anyway and I can just delete them from the txt? Why are they even still in the txt? I always thought that, if I update a segment previously committed to the TM, only the updated version is kept in the TM?


Yes, TUs (translation units) whose date starts with "xx" are marked for deletion. There are processes in Wordfast that will actually delete them, but until you use those processes, the TUs are simply marked for deletion.

As to why they exist at all: whether Wordfast updates an existing TU or creates a new one (and marking the old one for deletion) depends on the TM settings in Wordfast. For example, if your current user ID different from the user ID of the TU, etc.


[Edited at 2019-09-26 19:33 GMT]


 
romine
romine
Local time: 16:00
English to German
+ ...
TOPIC STARTER
@Samuel Sep 26, 2019

Thank you very much, and sorry, of course I meant to say find/replace and not copy/paste for the tags.

Regarding the units marked for deletion, how can I use the processes to delete them? If I find xx marked segments in the txt can I just delete them there directly or will that corrupt the TM?


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 16:00
Member (2006)
English to Afrikaans
+ ...
@Romine Sep 27, 2019

romine wrote:
1. Regarding the units marked for deletion, how can I use the processes to delete them?
2. If I find xx marked segments in the txt can I just delete them there directly or will that corrupt the TM?


I don't know what the process in WFP3 is to delete them. But if you view the file in a Unicode-aware text editor with word wrap disabled, you can delete any line (i.e. the whole line) that starts with "xx", and the rest of the TM will be safe. Opening and deleting lines in Excel is also safe, except that if Excel believes a cell contains a formula, it will corrupt that particular cell (and that will affect that particular TU, but it won't affect the rest of the TM).


 
romine
romine
Local time: 16:00
English to German
+ ...
TOPIC STARTER
@Samuel Sep 27, 2019

Thanks again - we found 9,000 segments marked for deletion in the first TM today, wow. I can't believe we have been working with Wordfast for years and never knew about this.

 
kneyens
kneyens
Belgium
Local time: 16:00
French to Dutch
delete TU's marked with "xx" Oct 1, 2019

Also, I think I read somewhere that "invalid" segments (segments that were updated at some point) are marked with "xx" at the beginning in the txt. Am I right in assuming that Wordfast won't retrieve these segments for matches anyway and I can just delete them from the txt? Why are they even still in the txt? I always thought that, if I update a segment previously committed to the TM, only the updated version is kept in the TM?


The TU's marked with "xx" will disappear from your TM when you reorganise. I always ask to perform a reorganization before cleaning up a large TM. For the moment I do the clean-up in Excel, but this has many flaws, as you already mentioned yourself. I have searched and asked for tips for a better way, but I haven't found anything so far. I think it is rather ridiculous of WF to provide a TM editor that actually doesn't do the job at all...

Kind regard,
Katleen


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 16:00
Member (2006)
English to Afrikaans
+ ...
The purpose of WF's TM editor Oct 1, 2019

kneyens wrote:
I think it is rather ridiculous of WF to provide a TM editor that actually doesn't do the job at all...


I think the main purpose of WF's TM editor is to be a TM previewer, i.e. to help users see generally what is contained in the TM.


 
Milan Condak
Milan Condak  Identity Verified
Local time: 16:00
English to Czech
Erase of not valid TUs Oct 1, 2019

[quote]kneyens wrote:

Also, I think I read somewhere that "invalid" segments (segments that were updated at some point) are marked with "xx" at the beginning in the txt. ... I think it is rather ridiculous of WF to provide a TM editor that actually doesn't do the job at all...

Kind regard,
Katleen


I read manuals and the features works.

https://www.wordfast.net/zip/WFC_7_manual.html

Find (Ctrl+F): Data Editor.

Results: 8 times

8/8 Remarks:

The date does not necessarily have a tilde (~) separating date and time. Any printable character can be used there, except a number. WFC uses the tilde (~), the equal (=) sign, and the star sign(*). The equal sign means the TU was "marked" (flagged) by WFC's data editor. This has no consequence on the TU's status: it remains fully valid. Although WFC always records the date and time when writing a TU, the date and time are optional and could be empty (or even made of an invalid date) in which case WFC would simply assume the current computer's date and time, or previous TU incremented by one second, if in a sequential loop. Dates and times are "local", taken from the local computer's clock.
If any optional field is left empty, its trailing tabulator should be present. For a TU to be valid, there must be at least six tabulators, with the fifth field (the source segment, located between the fourth and the fifth tabulator) made of at least one printable character.
The date's first character (a number from 0 to 9, usually, a number 2 if the TU was created in the current millenium) can be "x". It means that this TU is not valid anymore - WFC marked it for future deletion.

xxx The first full reorganisation of the TM by WFC will erase this TU. xxx

Do not remove the "x", or replace it with a number, unless you know what you are doing.

--

The erase of TUs marked by "x" worked in all previous versions of WFC.

--
https://www.wordfast.net/index.php?whichpage=downloadpage

Documentation:
Wordfast Classic manual, version 7.xx (English, rev. 1 Sep 2019)
Wordfast Classic manual, version 6.9 (English, rev. 10 Aug 2017)
Wordfast Classic manual, version 5.9x (English, rev. 06 Oct 2010)
Download reference manuals in other languages


I do not use this feature for TEXT TM. I convert WTM to TMX. I manage translation memories in TMX format in other tools.

Milan

[Edited at 2019-10-01 11:36 GMT]


 
kneyens
kneyens
Belgium
Local time: 16:00
French to Dutch
It doesn't work in Wordfast Pro 3 though ... Sep 29, 2020

Milan Condak wrote:

I read manuals and the features works.

https://www.wordfast.net/zip/WFC_7_manual.html

Milan

[Edited at 2019-10-01 11:36 GMT]


Dear Milan,

Maybe it works in Wordfast Classic, it most certainly doesn't work in Wordfast Pro 3.

You also say you use other tools to do the clean-up. Could you tell me which ones? That might be interesting, as I'm the one who has to do the clean-up of the shared TM of my team.

Kind regards,
Katleen


 
kneyens
kneyens
Belgium
Local time: 16:00
French to Dutch
TM Editor / Previewer Sep 29, 2020

Samuel Murray wrote:


I think the main purpose of WF's TM editor is to be a TM previewer, i.e. to help users see generally what is contained in the TM.


OK, but then they should call it "TM Previewer" and not editor ...
In any case, even as a previewer it really doesn't work all that well...


 
Milan Condak
Milan Condak  Identity Verified
Local time: 16:00
English to Czech
TM Editor works in Wordfast Classic Sep 29, 2020

kneyens wrote:

it most certainly doesn't work in Wordfast Pro 3.

You also say you use other tools to do the clean-up. Could you tell me which ones? That might be interesting, as I'm the one who has to do the clean-up of the shared TM of my team.

Kind regards,
Katleen


Dear Katleen,

the "clean-up" is a feature which update TM from bilingual document (RTF, DOC, DOCX) and make target document from bilingual one.

I am sorry, I use another tools for other features: conversion, merge, split and deleting of of duplicate TUs. This precisly not to same as clean-up.

Conversions TM to TMX: Wf2TMX, WfConverter, Xbench,...

Editing of bitext: AlignEdit (a part of LF Aligner), Olifant

Create TMX: TMX Maker (a part of LF Aligner), Heartsome TMX Editor,

Deleting of of duplicate TUs: Xbench, TMLookUp,...

Merge, split TMX: Heartsome TMX Editor (last presentation: picture at the very bottom of the site http://www.condak.cz/nove/2020-09/28/cs/06.html)

Merge data: TMLookup,

Convert TMX to TXT: Heartsome TMX Editor, TMLookup, Goldpan Editor

Repair TMX: Heartsome TMX Editor, TMX Validator

Create TBX: Goldpan Editor

Some useful features are build-in in CATs I have been using.

Kind regards,
Milan


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Cleaning a large translation memory - WF Pro 3







CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »