How to extract terminology from a Word doc Автор темы: Paula Ribeiro
| Paula Ribeiro Local time: 09:18 английский => португальский + ...
Hello everyone. I am currently in the process of deciding which CAT tool to buy for a specific company. I've used Trados Studio and I am also trying Wordfast. The problem is that I want to take advantage of the huge resources of past translations the last translator did, but they weren't using any CAT. And I really want to organize it. What I have is: the original in PT and translated file in ES, for example, but these are huge PPT, with pictures and a lot of formatting, with 5177 ... See more Hello everyone. I am currently in the process of deciding which CAT tool to buy for a specific company. I've used Trados Studio and I am also trying Wordfast. The problem is that I want to take advantage of the huge resources of past translations the last translator did, but they weren't using any CAT. And I really want to organize it. What I have is: the original in PT and translated file in ES, for example, but these are huge PPT, with pictures and a lot of formatting, with 5177 TU when I try to align them in Winalign. So, as they're to big, winalign simply crashes after supposedly aligning the project. So, I have all this past translations but no TM, and I want to create one for each language pair. What I would like to know is: is it possible to open an original to translate, either in Wordfast or Trados, and then get the terminology out of the translated file, which can be a doc or PPT? So, I would still have to translate it again, but I would have the Spanish terminology in a TM. Am I making myself understood?? Or do I have to create a glossary anyway? This is especially different as I am really a beginner in all these extra functionalities... ▲ Collapse | | | Tony M Франция Local time: 10:18 Член ProZ.com французский => английский + ... ЛОКАЛИЗАТОР САЙТА
I too am a relative novice here, so the following suggestions are only vague ideas. I think going to all the trouble of re-translating seems like an awful lot of work! I would set about it by using Werecat to extract all the text from the PPTs into DOC files (hoping and praying that they do stay at least reasonably well aligned, depending on how careful the previous translators were!), strip out the tags to leave just the wanted text; and then use PlusTools 'Align' func... See more I too am a relative novice here, so the following suggestions are only vague ideas. I think going to all the trouble of re-translating seems like an awful lot of work! I would set about it by using Werecat to extract all the text from the PPTs into DOC files (hoping and praying that they do stay at least reasonably well aligned, depending on how careful the previous translators were!), strip out the tags to leave just the wanted text; and then use PlusTools 'Align' function — I have never had any trouble with that crashing on even pretty large docs, but if necessary, manually split the main doc into smaller chunks; it won't be very difficult to stick them back together again at TM time. I hope that helps! ▲ Collapse | | | John Holland Франция Local time: 10:18 французский => английский | Tony M Франция Local time: 10:18 Член ProZ.com французский => английский + ... ЛОКАЛИЗАТОР САЙТА Maybe simple is better? | Dec 4, 2012 |
John Holland wrote: For example, see the last option, "Export all Text in PowerPoint Slide including Text in Text Box," That seems like an awfully cumbersome way of doing it, John, and knowing the very variable results you can get recovering text from PDFs, I'd be somewhat mistrustful of that. I think the Werecat solution is much simpler and less prone to problems; basic text formatting will be kept, but of course page layout will not. NB: I have no idea if Werecat still works in Office 2007 / 2010, I use it very successfully in Office XP, in conjunction with Wordfast Classic — although it functions totally independently, of course. | |
|
|
John Holland Франция Local time: 10:18 французский => английский
....
[Edited at 2012-12-04 19:04 GMT] | | | John Holland Франция Local time: 10:18 французский => английский It's all in the tools you have and know... | Dec 4, 2012 |
I've never used Werecat. I'm a free software person, and I use tools that run on Linux. For this kind of situation, I've used a command line program called catppt to extract text, then LF Aligner to align the files and export as TMX, which I then use with OmegaT. ... See more I've never used Werecat. I'm a free software person, and I use tools that run on Linux. For this kind of situation, I've used a command line program called catppt to extract text, then LF Aligner to align the files and export as TMX, which I then use with OmegaT. catppt: http://www.wagner.pp.ru/~vitus/software/catdoc/ LF Aligner: http://sourceforge.net/projects/aligner/ OmegaT: http://www.omegat.org/ For the files I've had, that was a good work flow. I just mentioned the option of using MS Office to extract text because it uses a tool that Paula presumably has already. She might not have Werecat available, and she most likely does not have any of those Linux-y tools... Is there as PPT text extraction tool in the SDL universe? ▲ Collapse | | | Tony M Франция Local time: 10:18 Член ProZ.com французский => английский + ... ЛОКАЛИЗАТОР САЙТА Free software | Dec 4, 2012 |
John Holland wrote: I'm a free software person, ... Me too! Werecat is simply a plug-in that works under Word (and with PPT), and is free to download, even though no longer supported by its creator. It's very basic, but a REALLY powerful little utility that takes a very short time to extract all text from the text boxes in either a DOC or a PPT — and will then neatly put them all back in again for you later if you want! | | | MemoQ or AlignFactory | Dec 4, 2012 |
Hi Paula, Do you know memoQ? In this tool, you have what is called "livedocs", it’s a kind of alignment function of past translations, it could be quite useful in your case. But my favourite tool for this kind of work is AlignFactory (from Terminotix) which is fast and reliable. With the documents’ pairs, it produces bitextes (html with source and target side by side) or a translation memory (TMX). It works very well with ppt files too. If... See more Hi Paula, Do you know memoQ? In this tool, you have what is called "livedocs", it’s a kind of alignment function of past translations, it could be quite useful in your case. But my favourite tool for this kind of work is AlignFactory (from Terminotix) which is fast and reliable. With the documents’ pairs, it produces bitextes (html with source and target side by side) or a translation memory (TMX). It works very well with ppt files too. If you want, just send me two small ppt files and I sent you the results back, so that you can see if it suits your need. You can probably ask for demo version too. Cheers Guillaume ▲ Collapse | |
|
|
John Holland Франция Local time: 10:18 французский => английский Extract the text before aligning | Dec 4, 2012 |
Tony M wrote: John Holland wrote: I'm a free software person, ... Me too! Werecat is simply a plug-in that works under Word (and with PPT), and is free to download, even though no longer supported by its creator. I was imprecise. I meant this kind of free software: https://en.wikipedia.org/wiki/Free_software In any case, Werecat does sound like a possible alternative, especially if the included text export features of MS Office are not adequate for Paula's PPTs. The main idea so far here is to extract the text from the PPTs in one way or another and then use Winalign on the extracted text, if that hasn't already been tried. Guillaume's suggestion of AlignFactory from Terminotix sounds like a good option for PPT files, too. | | | Michael Beijer Великобритания Local time: 09:18 Член ProZ.com c 2009 голландский => английский + ... |
Paula, most probably winalign crashing is due to ppt size and ppt files size is caused by embedded pictures. Have you try to remove pictures from ppt files so its size decreases. your really only need text to be aligned. Regards. | | | Paula Ribeiro Local time: 09:18 английский => португальский + ... Автор темы PPTs too big and with too many pictures | Dec 6, 2012 |
Hello everyone, thank you so much for your inputs! I think I'll probably try both methods, the werecat and Align Factory. Guillaume, I'd love to be able to send you the files, but as I can't disclose any confidential files, I really cannot send thse files anywhere... thank you though! And Gabriel, the PPts are 120 slides long, with pictures over pictures sometimes... The point here is actually to go around that exact problem.. If I'm looking to save time, ... See more Hello everyone, thank you so much for your inputs! I think I'll probably try both methods, the werecat and Align Factory. Guillaume, I'd love to be able to send you the files, but as I can't disclose any confidential files, I really cannot send thse files anywhere... thank you though! And Gabriel, the PPts are 120 slides long, with pictures over pictures sometimes... The point here is actually to go around that exact problem.. If I'm looking to save time, as I'm translating while trying to organize things as I get spare time, that really woudn't help me... Again, thank you people. Let's see how I do... As I need approval to download any app to the company's computer, I think I'll pobably get around to actually doing it next week :/ I'll try at home during the weekend! ▲ Collapse | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » How to extract terminology from a Word doc Trados Business Manager Lite | Create customer quotes and invoices from within Trados Studio
Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.
More info » |
| Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |