How to guess language in cyrillic script? Автор темы: Jan Sundström
| Jan Sundström Швеция Local time: 19:46 Член ProZ.com c 1970 английский => шведский + ...
Hi all,
Is there any guide or smart overview how you distinguish between different languages, if you have a paper in your hand with cyrillic script?
Sometimes I get documents where I can't say whether it's Russian, Azeri, Mongolian, Macedonian or [insert exotic language here]...?
It would be very useful to have a quick reference chart, what to look for, to identify which language it is.
If we received the documents as files on the computer it w... See more Hi all,
Is there any guide or smart overview how you distinguish between different languages, if you have a paper in your hand with cyrillic script?
Sometimes I get documents where I can't say whether it's Russian, Azeri, Mongolian, Macedonian or [insert exotic language here]...?
It would be very useful to have a quick reference chart, what to look for, to identify which language it is.
If we received the documents as files on the computer it would be easy to cut/paste a sentence and search on the internet, or use language guessing software.
But these are mostly diplomas or forms with handwritten entries, stamps, stickers etc, which makes it cumbersome to scan, OCR etc.
I found this extensive alphabet list:
http://en.wikipedia.org/wiki/List_of_Cyrillic_letters
But I'm looking for a set of hard and fast rules, that I can use on the spot. Like: "if you see the letter Y, you can be sure it's the language X".
Is there any website or guide for this, or am I wishing for the impossible?!
/Jan ▲ Collapse | | | | | mjbjosh Local time: 19:46 английский => латышский + ... Depends on the writer | Jan 24, 2008 |
I am not familiar with all the languages that you named (also, I think Azeri is using a modified Latin alphabet), but I think it depends on the writer. For example, when I am writing in Russian, I tend to use a "t" that resembles the Greek "t" rather than the Cyrillic one that looks like a Latin "m". Or Greek "d" for that matter, which looks in Cyrillic rather like the Latin "g".
[Edited at 2008-01-24 21:44] | |
|
|
esperantisto Local time: 21:46 Член ProZ.com c 2006 английский => русский + ... ЛОКАЛИЗАТОР САЙТА I doubt that simple hard rules can be derived. | Jan 25, 2008 |
a) Even for the same language, there may be huge differences in historical view: the pre-revolutionary Russian script drastically differs from the modern.
b) One matter are Slavic languages that use Cyrillic generically and developed it in their own ways each (although, of course, under great influence of Russian), and the other matter are non-Slavic languages of the ex-USSR + Mongolian: their scripts were developed from Russian and are more uniform on one hand but more complex on the othe... See more a) Even for the same language, there may be huge differences in historical view: the pre-revolutionary Russian script drastically differs from the modern.
b) One matter are Slavic languages that use Cyrillic generically and developed it in their own ways each (although, of course, under great influence of Russian), and the other matter are non-Slavic languages of the ex-USSR + Mongolian: their scripts were developed from Russian and are more uniform on one hand but more complex on the other hand.
Well, learn languages, not scripts! It's just like for Latin.
However, many languages have specific letters. Just a couple of tips:
1. If your see Ўў, this may be Belarusian, Uzbek or some language of the Extreme North of the Russian Federation. I know nothing about the latter, but for the first two:
a) if you also see Ии, that's Uzbek;
b) otherwise, it's Belarusian.
Note: If it's a text from the 20s of the XXth century, Ў may be also in Ossetin, but I doubt you'll encounter it.
2. If Ӕӕ, Ossetin.
3. If Її, Ukrainian (or Ruthenian, but it's a minor language with no official status, not recognized as a separate language in Ukraine).
4. If Ӂӂ, Moldovan (Romanian). ▲ Collapse | | | Radica Schenck Германия Local time: 19:46 английский => македонский + ... F7 for texts in soft copy | Jan 26, 2008 |
If you received the text, say, in Word, select a word from the text, then press F7 and you will get a pop up message like this: "There is no Thesaurus available for (eg.) Macedonian"...
As for the table on wikipedia, it's also a very good source: it tells you, for example, that the Macedonian alphabet is the only alphabet that has the letters Ќ and Ѓ...
The same conclusion for Ћ and Ђ for Serb... See more If you received the text, say, in Word, select a word from the text, then press F7 and you will get a pop up message like this: "There is no Thesaurus available for (eg.) Macedonian"...
As for the table on wikipedia, it's also a very good source: it tells you, for example, that the Macedonian alphabet is the only alphabet that has the letters Ќ and Ѓ...
The same conclusion for Ћ and Ђ for Serbian...
Good luck! ▲ Collapse | | | Victor Quero Local time: 19:46 сербскохорватский => испанский + ...
1. Only Ukrainian and Belarussian use the letter I i.
2. If you find the letter Є є, it's Ukrainian. 100% sure, since no other language uses it(unless it's a text in Old Church Slavonic, but that would be very odd, and you would recognize it for the medieval look and the letter Ѣ ѣ). Also, only Ukrainian uses Ï ïI i.
2. If you find the letter Є є, it's Ukrainian. 100% sure, since no other language uses it(unless it's a text in Old Church Slavonic, but that would be very odd, and you would recognize it for the medieval look and the letter Ѣ ѣ). Also, only Ukrainian uses Ï ï, and it does NOT use Ъ ъ neither Ы ы.
Note: There's a small language called Rusyn, which some considere a dialect of Ukrainian. If you find Є є and Ï ï but also Ъ ъ and Ы ы, it must be Rusyn.
3. If you find a language with both I i and Ў ў, it's Belarussian. It does NOT use Ъ ъ neither Щ щ.
4. For Slavic languages, the letter J j is only used by Serbian and Macedonian. (There's a small dialect of Sami which also uses it, but you would recognize it for some letters with a comma-like symbol attached: Ӊ ӊ, Ҋ ҋ, Ӆ ӆ).
5. Besides J j, only Serbian and Macedonian have the distinctive letters Љ љ and Њ њ
6. If you find a text with Ћ ћ and Ђ ђ, you can be 100% sure it's Serbian.
7. If you find a text with J j plus Ѓ ѓ and Ќ ќ, you can be 100% sure it's Macedonian.
8. I don't know much about non-Slavic languages which use Cyrillic, but they are often characterized by 'unusual' letters like Ә ә or Ä ä, and by modifications like Ғ ғ, Ұ ұ (the latter found in Kazakh).
9. If there is not any distinctive letter of the mentioned above (I, Є, J, Ў, Љ, Ћ, Ќ, neither Ә, Ғ, Ұ), then most likely it's Russian or Bulgarian.
10. To tell Russian from Bulgarian: Bulgarian uses very often the letter Ъ ъ, while in Russian it's only used in some specific cases. Bulgarian does not use Ë ë, but the combination ьо instead, which is very unusual in Russian (I would say it is only possible with certain foreign words). Unfortunately, Ë ë in Russian is most often written as simply E e.
Hope it helps...
[Editado a las 2008-01-31 12:05] ▲ Collapse | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » How to guess language in cyrillic script? Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
| Trados Studio 2022 Freelance | The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |