How to extract terminology from a Word doc
Автор темы: Paula Ribeiro
Paula Ribeiro
Paula Ribeiro  Identity Verified
Local time: 09:18
английский => португальский
+ ...
Dec 4, 2012

Hello everyone.

I am currently in the process of deciding which CAT tool to buy for a specific company. I've used Trados Studio and I am also trying Wordfast. The problem is that I want to take advantage of the huge resources of past translations the last translator did, but they weren't using any CAT. And I really want to organize it. What I have is: the original in PT and translated file in ES, for example, but these are huge PPT, with pictures and a lot of formatting, with 5177
... See more
Hello everyone.

I am currently in the process of deciding which CAT tool to buy for a specific company. I've used Trados Studio and I am also trying Wordfast. The problem is that I want to take advantage of the huge resources of past translations the last translator did, but they weren't using any CAT. And I really want to organize it. What I have is: the original in PT and translated file in ES, for example, but these are huge PPT, with pictures and a lot of formatting, with 5177 TU when I try to align them in Winalign. So, as they're to big, winalign simply crashes after supposedly aligning the project. So, I have all this past translations but no TM, and I want to create one for each language pair.

What I would like to know is: is it possible to open an original to translate, either in Wordfast or Trados, and then get the terminology out of the translated file, which can be a doc or PPT? So, I would still have to translate it again, but I would have the Spanish terminology in a TM. Am I making myself understood?? Or do I have to create a glossary anyway?

This is especially different as I am really a beginner in all these extra functionalities...
Collapse


 
Tony M
Tony M
Франция
Local time: 10:18
Член ProZ.com
французский => английский
+ ...
ЛОКАЛИЗАТОР САЙТА
Align Dec 4, 2012

I too am a relative novice here, so the following suggestions are only vague ideas.

I think going to all the trouble of re-translating seems like an awful lot of work!

I would set about it by using Werecat to extract all the text from the PPTs into DOC files (hoping and praying that they do stay at least reasonably well aligned, depending on how careful the previous translators were!), strip out the tags to leave just the wanted text; and then use PlusTools 'Align' func
... See more
I too am a relative novice here, so the following suggestions are only vague ideas.

I think going to all the trouble of re-translating seems like an awful lot of work!

I would set about it by using Werecat to extract all the text from the PPTs into DOC files (hoping and praying that they do stay at least reasonably well aligned, depending on how careful the previous translators were!), strip out the tags to leave just the wanted text; and then use PlusTools 'Align' function — I have never had any trouble with that crashing on even pretty large docs, but if necessary, manually split the main doc into smaller chunks; it won't be very difficult to stick them back together again at TM time.

I hope that helps!
Collapse


 
John Holland
John Holland  Identity Verified
Франция
Local time: 10:18
французский => английский
Save as text Dec 4, 2012

Have you tried saving the PPT files as plain text and then aligning just the text files?

For example, see the last option, "Export all Text in PowerPoint Slide including Text in Text Box," on this page:
http://www.lytebyte.com/2009/08/07/how-to-export-powerpoint-text-contents-to-word/


 
Tony M
Tony M
Франция
Local time: 10:18
Член ProZ.com
французский => английский
+ ...
ЛОКАЛИЗАТОР САЙТА
Maybe simple is better? Dec 4, 2012

John Holland wrote:
For example, see the last option, "Export all Text in PowerPoint Slide including Text in Text Box,"


That seems like an awfully cumbersome way of doing it, John, and knowing the very variable results you can get recovering text from PDFs, I'd be somewhat mistrustful of that.

I think the Werecat solution is much simpler and less prone to problems; basic text formatting will be kept, but of course page layout will not.

NB: I have no idea if Werecat still works in Office 2007 / 2010, I use it very successfully in Office XP, in conjunction with Wordfast Classic — although it functions totally independently, of course.


 
John Holland
John Holland  Identity Verified
Франция
Local time: 10:18
французский => английский
Double post Dec 4, 2012

....

[Edited at 2012-12-04 19:04 GMT]


 
John Holland
John Holland  Identity Verified
Франция
Local time: 10:18
французский => английский
It's all in the tools you have and know... Dec 4, 2012

I've never used Werecat.

I'm a free software person, and I use tools that run on Linux. For this kind of situation, I've used a command line program called catppt to extract text, then LF Aligner to align the files and export as TMX, which I then use with OmegaT.... See more
I've never used Werecat.

I'm a free software person, and I use tools that run on Linux. For this kind of situation, I've used a command line program called catppt to extract text, then LF Aligner to align the files and export as TMX, which I then use with OmegaT.

catppt: http://www.wagner.pp.ru/~vitus/software/catdoc/
LF Aligner: http://sourceforge.net/projects/aligner/
OmegaT: http://www.omegat.org/

For the files I've had, that was a good work flow.

I just mentioned the option of using MS Office to extract text because it uses a tool that Paula presumably has already. She might not have Werecat available, and she most likely does not have any of those Linux-y tools...

Is there as PPT text extraction tool in the SDL universe?
Collapse


 
Tony M
Tony M
Франция
Local time: 10:18
Член ProZ.com
французский => английский
+ ...
ЛОКАЛИЗАТОР САЙТА
Free software Dec 4, 2012

John Holland wrote:
I'm a free software person, ...


Me too!

Werecat is simply a plug-in that works under Word (and with PPT), and is free to download, even though no longer supported by its creator.

It's very basic, but a REALLY powerful little utility that takes a very short time to extract all text from the text boxes in either a DOC or a PPT — and will then neatly put them all back in again for you later if you want!


 
Guillaume Chareyron
Guillaume Chareyron  Identity Verified
Франция
Local time: 10:18
немецкий => французский
+ ...
MemoQ or AlignFactory Dec 4, 2012

Hi Paula,

Do you know memoQ? In this tool, you have what is called "livedocs", it’s a kind of alignment function of past translations, it could be quite useful in your case.

But my favourite tool for this kind of work is AlignFactory (from Terminotix) which is fast and reliable. With the documents’ pairs, it produces bitextes (html with source and target side by side) or a translation memory (TMX).

It works very well with ppt files too.

If
... See more
Hi Paula,

Do you know memoQ? In this tool, you have what is called "livedocs", it’s a kind of alignment function of past translations, it could be quite useful in your case.

But my favourite tool for this kind of work is AlignFactory (from Terminotix) which is fast and reliable. With the documents’ pairs, it produces bitextes (html with source and target side by side) or a translation memory (TMX).

It works very well with ppt files too.

If you want, just send me two small ppt files and I sent you the results back, so that you can see if it suits your need.

You can probably ask for demo version too.

Cheers
Guillaume
Collapse


 
John Holland
John Holland  Identity Verified
Франция
Local time: 10:18
французский => английский
Extract the text before aligning Dec 4, 2012

Tony M wrote:

John Holland wrote:
I'm a free software person, ...


Me too!

Werecat is simply a plug-in that works under Word (and with PPT), and is free to download, even though no longer supported by its creator.


I was imprecise. I meant this kind of free software: https://en.wikipedia.org/wiki/Free_software

In any case, Werecat does sound like a possible alternative, especially if the included text export features of MS Office are not adequate for Paula's PPTs.

The main idea so far here is to extract the text from the PPTs in one way or another and then use Winalign on the extracted text, if that hasn't already been tried.

Guillaume's suggestion of AlignFactory from Terminotix sounds like a good option for PPT files, too.


 
Michael Beijer
Michael Beijer  Identity Verified
Великобритания
Local time: 09:18
Член ProZ.com c 2009
голландский => английский
+ ...
I always recommend AlignFactory Light Dec 4, 2012

Hi Paula,

I have tried many aligners, but none of them is as good as AlignFactory Light. I would recommend you email Jean-François Richard of Terminotix ([email protected]) for a free demo and try it yourself. AlignFactory Light supports .ppt and .pptx.

in
... See more
Hi Paula,

I have tried many aligners, but none of them is as good as AlignFactory Light. I would recommend you email Jean-François Richard of Terminotix ([email protected]) for a free demo and try it yourself. AlignFactory Light supports .ppt and .pptx.

info: http://www.terminotix.com/index.asp?name=AlignFactory_Light&content=item&brand=1&item=11&lang=en

memoQ's aligner (LiveDocs) is also very good and supports PowerPoint files, also without having to extract anything first.

info: http://kilgray.com/memoq/60/help-en/index.html?translation_grid.html

Michael

[Edited at 2012-12-05 00:02 GMT]
Collapse


 
Gabriel Catalan
Gabriel Catalan  Identity Verified
Испания
Local time: 10:18
английский => испанский
ppt size Dec 5, 2012

Paula, most probably winalign crashing is due to ppt size and ppt files size is caused by embedded pictures.
Have you try to remove pictures from ppt files so its size decreases.
your really only need text to be aligned.

Regards.


 
Paula Ribeiro
Paula Ribeiro  Identity Verified
Local time: 09:18
английский => португальский
+ ...
Автор темы
PPTs too big and with too many pictures Dec 6, 2012

Hello everyone,

thank you so much for your inputs! I think I'll probably try both methods, the werecat and Align Factory.

Guillaume, I'd love to be able to send you the files, but as I can't disclose any confidential files, I really cannot send thse files anywhere... thank you though!

And Gabriel, the PPts are 120 slides long, with pictures over pictures sometimes... The point here is actually to go around that exact problem.. If I'm looking to save time,
... See more
Hello everyone,

thank you so much for your inputs! I think I'll probably try both methods, the werecat and Align Factory.

Guillaume, I'd love to be able to send you the files, but as I can't disclose any confidential files, I really cannot send thse files anywhere... thank you though!

And Gabriel, the PPts are 120 slides long, with pictures over pictures sometimes... The point here is actually to go around that exact problem.. If I'm looking to save time, as I'm translating while trying to organize things as I get spare time, that really woudn't help me...

Again, thank you people. Let's see how I do... As I need approval to download any app to the company's computer, I think I'll pobably get around to actually doing it next week :/ I'll try at home during the weekend!
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to extract terminology from a Word doc







Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »