Google-based KWIC (keyword in context) tools?
Автор темы: Olaf (X)
Olaf (X)
Olaf (X)
Local time: 23:21
английский => немецкий
Feb 10, 2010

I'd like to use Google as a monolingual corpus to generate KWIC (keyword in context) lists. So far I only found one tool that allows me to that--WebCorp (http://www.webcorp.org.uk/) which has several limitations that I don't like. Does anybody know any other tools or scripts that could be used for this purpose?

Olaf

[Subject edited by staff or moderator 2010-02-11 08:29 GMT]


 
Bilbo Baggins
Bilbo Baggins
каталанский (каталонский) => английский
+ ...
A couple of options Feb 10, 2010

Hi Olaf

Many years ago I used WebCorp and even wrote about it (for the ATA), but it was limited by its slowness, although it seems to have improved somewhat. Have you tried the advanced search features?

The only tools I know that might do something similar to what you need are:

Rollyo: you can create your own restricted search engine, limited to specific site
... See more
Hi Olaf

Many years ago I used WebCorp and even wrote about it (for the ATA), but it was limited by its slowness, although it seems to have improved somewhat. Have you tried the advanced search features?

The only tools I know that might do something similar to what you need are:

Rollyo: you can create your own restricted search engine, limited to specific sites. www.rollyo.com/
PERC: a true corpus available online, but pre-built. http://www.corpora.jp/~perc04/

There's another way of compiling a quick corpus on the basis of keywords, Webbootcat, but it's not free and you need a concordancer to be able to search the texts (although it's possibly got an online concordance feature). http://www.sketchengine.co.uk/

There's also Corpis-Eye, but you can't concordance the web, just limited, specific parts of it: http://corp.hum.sdu.dk/cqp.en.html

Although that's about the limits of my knowledge, if you gave more details of what you want to achieve, I could maybe be more precise.





[Edited at 2010-02-10 23:14 GMT]
Collapse


 
Olaf (X)
Olaf (X)
Local time: 23:21
английский => немецкий
Автор темы
Thanks for the links Feb 11, 2010

Bilbo Baggins wrote:
Hi Bilbo,

Many years ago I used WebCorp and even wrote about it (for the ATA), but it was limited by its slowness, although it seems to have improved somewhat. Have you tried the advanced search features?

Yes, I did, but it didn't make a difference. Unfortunately, WebCorp doesn't seem to support queries for languages using non-Latin alphabets.
I'll check out the other links that you mentioned.

Thanks,
Olaf


[Edited at 2010-02-11 10:10 GMT]


 
Bilbo Baggins
Bilbo Baggins
каталанский (каталонский) => английский
+ ...
Re PERC and others Feb 11, 2010

PERC: I should have mentioned that it's an English language corpus. Also Corpis-Eye.

Seems to me that Rollyo might be the best option. And maybe Webbootcat.

With the first one, you can select URLs to roll your own search engine. You get a Google-like display (not a KWIC) with the search term highlighted.

With the second one, you enter keywords, then select from the URLs that result, and these are used to ctreate a corpus in TXT format that you can concordan
... See more
PERC: I should have mentioned that it's an English language corpus. Also Corpis-Eye.

Seems to me that Rollyo might be the best option. And maybe Webbootcat.

With the first one, you can select URLs to roll your own search engine. You get a Google-like display (not a KWIC) with the search term highlighted.

With the second one, you enter keywords, then select from the URLs that result, and these are used to ctreate a corpus in TXT format that you can concordance.
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Google-based KWIC (keyword in context) tools?






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »