APPENDIX H - Supported Languages for Language Detection
Discovery Manager can identify up to 3 different languages within a given file. When data is processed, it will read through the entire extracted and/or OCR text available for a given file and identify the top three languages found within the file, along with their corresponding percentages and character counts.
Note
These are the languages supported for Language Detection. To see what languages are supported specifically for OCR, please see Appendix I below.
Supported Languages |
|||||
Abkhazian |
Dutch |
Inuktitut |
Marathi |
Sesotho |
Venda |
Afar |
Dzongkha |
Inupiak |
Mauritian_Creole |
Shona |
Vietnamese |
Afrikaans |
English |
Irish |
Mongolian |
Sindhi |
Volapuk |
Akan |
Esperanto |
Italian |
Nauru |
Sinhalese |
Waray_Philippines |
Albanian |
Estonian |
Japanese |
Nepali |
Siswant |
Welsh |
Amharic |
Faroese |
Javanese |
Norwegian |
Slovak |
Wolof |
Arabic |
Fijian |
Kannada |
Norwegian_N |
Slovenian |
Xhosa |
Armenian |
Finnish |
Kashmiri |
Nyanja |
Somali |
Yiddish |
Assamese |
French |
Kazakh |
Occitan |
Spanish |
Yoruba |
Aymara |
Frisian |
Khasi |
Oriya |
Sundanese |
Zhuang |
Azerbaijani |
Galician |
Khmer |
Oromo |
Swahili |
Zulu |
Bashkir |
Ganda |
Kinyarwanda |
Pashto |
Swedish |
|
Basque |
Georgian |
Klingon |
Pedi |
Syriac |
|
Belarusian |
German |
Korean |
Persian |
Tagalog |
|
Bengali |
Greek |
Kurdish |
Pig_Latin |
Tajik |
|
Bihari |
Greenlandic |
Kyrgyz |
Polish |
Tamil |
|
Bislama |
Guarani |
Laothian |
Portuguese |
Tatar |
|
Breton |
Gujarati |
Latin |
Punjabi |
Telugu |
|
Bulgarian |
Haitian_Creole |
Latvian |
Quechua |
Thai |
|
Burmese |
Hausa |
Limbu |
Rhaeto_Romance |
Tibetan |
|
Catalan |
Hawaiian |
Lingala |
Romanian |
Tigrinya |
|
Cebuano |
Hebrew |
Lithuanian |
Rundi |
Tonga |
|
Cherokee |
Hindi |
Luxembourgish |
Russian |
Tsonga |
|
Chinese |
Hmong |
Macedonian |
Samoan |
Tswana |
|
Chinese_T |
Hungarian |
Malagasy |
Sango |
Turkish |
|
Corsican |
Icelandic |
Malay |
Sanskrit |
Turkmen |
|
Croatian |
Igbo |
Malayalam |
Scots |
Uighur |
|
Czech |
Indonesian |
Maltese |
Scots_Gaelic |
Ukrainian |
|
Danish |
Interlingua |
Manx |
Serbian |
Urdu |
|
Dhivehi |
Interlingue |
Maori |
Seselwa |
Uzbek |