Recently, I participated in a task 8: Cross-lingual textual entailment for content synchronization of SEM-eval 2013. I plan to complete this step for my graduate student.
One way to think of it is to translate non-English into English first and then perform regular text reasoning. However, Google's translation API is closed, but your web page will never be closed, right? Therefore, we found a good tool from our predecessors: Google Translate post to submit an online translation example with no length limit. The principle is to directly extract webpages. Long text is supported so that multiple sentences can be translated at a time.
However, the preceding tool has a defect. The translated results contain HTML Escape characters, such as "& amp;", "& #34 ;". For solutions, see my other blog: converting special characters in HTML into printable characters.
The following table lists the abbreviated national languages:
Language |
Language code |
Afrikaans |
af |
Albanian |
sq |
Arabic |
ar |
Belarusian |
be |
Bulgarian |
bg |
Catalan |
ca |
Chinese Simplified |
zh-CN |
Chinese Traditional |
zh-TW |
Croatian |
hr |
Czech |
cs |
Danish |
da |
Dutch |
nl |
English |
en |
Estonian |
et |
Filipino |
tl |
Finnish |
fi |
French |
fr |
Galician |
gl |
German |
de |
Greek |
el |
Haitian Creole |
ht |
Hebrew |
iw |
Hindi |
hi |
Hungarian |
hu |
Icelandic |
is |
Indonesian |
id |
Irish |
ga |
Italian |
it |
Japan |
ja |
Latvian |
lv |
Lithuanian |
lt |
Macedonian |
mk |
Malay |
ms |
Maltese |
mt |
Norwegian |
no |
Persian |
fa |
Polish |
pl |
Portuguese |
pt |
Romanian |
ro |
Russian |
ru |
Serbian |
sr |
Slovak |
sk |
Slovenian |
sl |
Spanish |
es |
Swahili |
sw |
Swedish |
sv |
Thai |
th |
Turkish |
tr |
Ukrainian |
uk |
Vietnamese |
vi |
Welsh |
cy |
Yiddish |
yi |
PS: It's almost a waste of space. It's been too long to do things ......
The following table lists the abbreviated national languages:
Language |
Language code |
Afrikaans |
af |
Albanian |
sq |
Arabic |
ar |
Belarusian |
be |
Bulgarian |
bg |
Catalan |
ca |
Chinese Simplified |
zh-CN |
Chinese Traditional |
zh-TW |
Croatian |
hr |
Czech |
cs |
Danish |
da |
Dutch |
nl |
English |
en |
Estonian |
et |
Filipino |
tl |
Finnish |
fi |
French |
fr |
Galician |
gl |
German |
de |
Greek |
el |
Haitian Creole |
ht |
Hebrew |
iw |
Hindi |
hi |
Hungarian |
hu |
Icelandic |
is |
Indonesian |
id |
Irish |
ga |
Italian |
it |
Japan |
ja |
Latvian |
lv |
Lithuanian |
lt |
Macedonian |
mk |
Malay |
ms |
Maltese |
mt |
Norwegian |
no |
Persian |
fa |
Polish |
pl |
Portuguese |
pt |
Romanian |
ro |
Russian |
ru |
Serbian |
sr |
Slovak |
sk |
Slovenian |
sl |
Spanish |
es |
Swahili |
sw |
Swedish |
sv |
Thai |
th |
Turkish |
tr |
Ukrainian |
uk |
Vietnamese |
vi |
Welsh |
cy |
Yiddish |
yi |