PHP code for detecting character encoding and language from textfiles.
Encodings dectected: 7-bit (ISO 2022, HZ, VN_HTML, ASCII) CJK (GB18030, CP950, BIG5-HKCS:2004, EUC-JP, CP932, CP949, JOHAB) Vietnamese (VPS, VNI, VIQR, VISCII) Single byte non-latin (Cytillic: CP1251, ISO-8859-5, MacCyrillic, KOI8-RU, CP855, CP866, Greek: CP1253, ISO-8859-3, MacGreek, CP737, CP869, Hebrew: CP1255, ISO-8859-8, MacHebrew, CP862, Arabic, Farsi, Urdu: CP1256, ISO-8859-6, MacArabic, CP864, Thai: MacThai, CP874) Single-byte latin (DOS OEMs, Windows ANSIs, MacOSs, ISO-8859-*)
Latin languages detected: English, Breton, Catalan, Dutch, Galician, Icelandic, Portuguese, Brazilian, Danish, Norwegian, Esperanto, Finnish, French, German, Italian, Luxemburgish, Occitan, Spanish, Swedish, Tagalog, Estonian, Latvian, Lithuanian, Malay, Indonesian, Albanian, Bosnian, Croatian, Serbian, Czech, Hungarian, Polish, Romanian, Roman Macedonian, Slovak, Slovenian, Turkish, Vietnamese
Non-latin languages: Russian, Bulgarian, Macedonian, Ukrainian, cyrillic Serbian, Mongolian, Arabic, Farsi, Urdu, Hebrew, Greek, Hindi, Tamil, Telugu, Mayalayam, Thai, Traditional Chinese, Simplified Chinese, Korean, Japanese
See example/Test_encoding for example of PHP script