Codes for Languages (and language groups) of the World are covered by the ISO-639 standard
These standards provide letter codes for each language. E.g. ISO-639-3 provides a three-letter code for all living languages.
There are too many such codes to be contained in a java-enum (e.g. https://github.com/TakahikoKawasaki/nv-i18n/blob/master/src/main/java/com/neovisionaries/i18n/LanguageAlpha3Code.java is just not complete)
This package has the tab separated files provided by https://iso639-3.sil.org/, and java classes to read this, and provide all language codes as java objects, with getters.
Major language and language families are contained in enums though, so can be addressed as constants.
import org.meeuw.i18n.languages.*;
...
// a language code can be obtained via static method
LanguageCode nl = LanguageCode.languageCode("nl");
// For major language having a ISO-639-1 code there are enum values available
assertThat(nl).isSameAs(ISO_639_1_Code.nl);
// other parts of the standards work too
assertThat(nl).isSameAs(LanguageCode.languageCode("dut")); // ISO-639-2/B code
assertThat(nl).isSameAs(LanguageCode.languageCode("nld")); // ISO-639-3 / ISO-639-2/T code
// use it in some way
assertThat(nl.nameRecord(Locale.US).inverted()).isEqualTo("Dutch");See also the
test cases
package org.meeuw.i18n.languages.test;
import java.util.*;
import java.util.concurrent.atomic.AtomicLong;
import org.junit.jupiter.api.Disabled;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.ValueSource;
import org.meeuw.i18n.languages.*;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
@SuppressWarnings("OptionalGetWithoutIsPresent")
class LanguageCodeTest {
@Test
public void example() {
// get a language by its code;
Optional<LanguageCode> optional = ISO_639.getByPart3("nld");
LanguageCode languageCode = LanguageCode.languageCode("nl");
// show its 'inverted' name
System.out.println(languageCode.nameRecord(Locale.US).inverted());
assertThatThrownBy(() -> languageCode.nameRecord(Locale.CHINESE)).isInstanceOf(UnsupportedOperationException.class);
// get a language family
Optional<LanguageFamilyCode> family = ISO_639.getByPart5("ger");
// get by any code
Optional<ISO_639_Code> byCode = ISO_639.get("nl");
// stream by names, language may have several names (dutch, flemish), and appear multiple times
ISO_639.streamByNames().forEach(e -> {
System.out.println(e.getKey() + " " + e.getValue());
});
assertThat(languageCode.toLocale()).isEqualTo(new Locale("nl"));
}
@Test
public void stream() {
LanguageCode.stream().forEach(lc -> {
System.out.println(lc + " (" + lc.scope() + ")" + " " + lc.getDisplayName(Locale.ENGLISH) + " " + lc.getDisplayName(new Locale("nl")));
assertThat(lc.code()).isNotNull();
assertThat(lc.languageType()).isNotNull();
assertThat(lc.scope()).isNotNull();
assertThat(lc.nameRecords()).hasSizeGreaterThanOrEqualTo(1);
assertThat(lc.nameRecord(Locale.US)).isNotNull();
assertThat(lc.nameRecord()).isNotNull();
assertThat(lc.refName()).isNotNull();
if (lc instanceof ISO_639_1_Code) {
assertThat(lc.getDisplayName(new Locale("nl"))).isEqualTo(
new Locale(lc.code()).getDisplayName(new Locale("nl")));
}
if (lc.part1() != null) {
assertThat(lc).isInstanceOf(ISO_639_1_Code.class);
}
if (lc.comment() != null) {
System.out.println("Comment: " + lc.comment());
}
if (lc.scope() == Scope.M) {
assertThat(lc.individualLanguages()).isNotEmpty();
System.out.println("Macro language with: " + lc.individualLanguages());
for (LanguageCode individual : lc.individualLanguages()) {
if (individual instanceof RetiredLanguageCode) {
System.out.println("Retired: " + individual + " " + ((RetiredLanguageCode) individual).retReason());
} else {
assertThat(individual.macroLanguages())
.withFailMessage("macro language " + lc + " has " + individual + " but this has not it as macro").contains(lc);
}
}
}
if (! lc.macroLanguages().isEmpty()) {
System.out.println("Macro language for " + lc + " :" + lc.macroLanguages());
for (LanguageCode macro : lc.macroLanguages()) {
assertThat(macro.individualLanguages()).contains(lc);
assertThat(macro.scope()).isEqualTo(Scope.M);
}
}
});
}
@Test
public void streamByName() {
AtomicLong count = new AtomicLong();
LanguageCode.streamByNames().forEach(e -> {
System.out.println(e.getKey() + " " + e.getValue());
count.incrementAndGet();
});
assertThat(count.get()).isGreaterThan(LanguageCode.stream().count());
}
@Test
public void sort() {
LanguageCode.stream()
.sorted(Comparator.comparing(LanguageCode::refName))
.forEach(lc -> {
System.out.println(
lc.code() + "\t" +
lc.refName() + " " +
lc.nameRecord(Locale.US));
});
}
@Test
public void getByCode() {
assertThat(ISO_639.getByPart3("nld").get().refName()).isEqualTo("Dutch");
assertThat(ISO_639.getByPart3(null)).isEmpty();
}
@Test
public void get() {
assertThat(LanguageCode.get("nl").get().refName()).isEqualTo("Dutch");
assertThat(LanguageCode.languageCode("nl").refName()).isEqualTo("Dutch");
assertThat(LanguageCode.get("nld").get().refName()).isEqualTo("Dutch");
}
@Test
public void getTokiPona() {
assertThat(LanguageCode.get("tok").get().refName()).isEqualTo("Toki Pona");
}
@Test
public void getByPart1() {
assertThat(ISO_639.getByPart1("nl").get().refName()).isEqualTo("Dutch");
assertThat(ISO_639.getByPart1(null)).isEmpty();
}
@Test
public void getByPart2T() {
assertThat(ISO_639.getByPart2T("nld").get().refName()).isEqualTo("Dutch");
assertThat(ISO_639.getByPart2T(null)).isEmpty();
}
@Test
public void getByPart2B() {
assertThat(ISO_639.getByPart2B("dut").get().refName()).isEqualTo("Dutch");
assertThat(ISO_639.getByPart2B(null)).isEmpty();
}
@Test
public void getUnknown() {
assertThat(ISO_639.getByPart3("doesntexist")).isEmpty();
}
@Test
public void getCode() {
assertThat(ISO_639.getByPart3("nld").get().code()).isEqualTo("nl");
assertThat(ISO_639.getByPart3("act").get().code()).isEqualTo("act");
}
@Test
public void krm() {
// the 'krim' dialect (Sierra Leano) officially merged into 'bmf' (Bom-Kim) in 2017
assertThat(ISO_639.getByPart3("krm").get().code()).isEqualTo("bmf");
}
@Test
public void ppr() {
assertThat(ISO_639.getByPart3("ppr").get().code()).isEqualTo("lcq");
}
@Test
public void lcq() {
assertThat(ISO_639.getByPart3("lcq").get().code()).isEqualTo("lcq");
}
@Test
public void XXFallBack() {
try {
assertThatThrownBy(() -> ISO_639.iso639("XX")).isInstanceOf(IllegalArgumentException.class);
LanguageCode.registerFallback("XX", LanguageCode.languageCode("zxx"));
assertThat(ISO_639.iso639("XX").code()).isEqualTo("zxx");
} finally {
LanguageCode.resetFallBacks();
}
}
@Test
public void XXYYFallBack() {
try {
assertThatThrownBy(() -> ISO_639.iso639("XX")).isInstanceOf(IllegalArgumentException.class);
assertThatThrownBy(() -> ISO_639.iso639("YY")).isInstanceOf(IllegalArgumentException.class);
LanguageCode.setFallbacks(Map.of(
"XX", LanguageCode.languageCode("zxx"),
"YY", LanguageCode.languageCode("nl"))
);
assertThat(LanguageCode.getFallBacks()).hasSize(2);
assertThat(ISO_639.iso639("XX").code()).isEqualTo("zxx");
assertThat(ISO_639.iso639("YY").code()).isEqualTo("nl");
} finally {
LanguageCode.resetFallBacks();
assertThat(LanguageCode.getFallBacks()).isEmpty();
}
}
@Test
@Deprecated
public void deprecated() {
assertThat(LanguageCode.getByCode("nld")).contains((LanguageCode) ISO_639.get("nl").get());
assertThat(LanguageCode.getByPart1("nl")).contains((LanguageCode) ISO_639.get("nl").get());
assertThat(LanguageCode.getByPart2B("dut")).contains((LanguageCode) ISO_639.get("nl").get());
assertThat(LanguageCode.getByPart2T("nld")).contains((LanguageCode) ISO_639.get("nl").get());
assertThat(LanguageCode.getByPart3("nld")).contains((LanguageCode) ISO_639.get("nl").get());
}
@Test
public void dutchSignLanguage() {
LanguageCode l = LanguageCode.get("dse").orElseThrow();
assertThat(l.refName()).isEqualTo("Dutch Sign Language");
assertThat(l.nameRecord().print()).isEqualTo("dse (Dutch Sign Language)");
assertThat(new Locale(l.code()).getDisplayName(Locale.US)).isEqualTo("dse");
assertThat(l.getDisplayName(Locale.US)).isEqualTo("Dutch Sign Language");
assertThat(l.getDisplayName(new Locale("nl"))).isEqualTo("Nederlandse Gebarentaal");
}
@ParameterizedTest
@ValueSource(strings = {"dse", "vgt","ase","bfi",
"csl",
"gsg",
"tsm",
"dsl",
"inl",
"ise",
"rsl"})
public void signLanguage(String code) {
LanguageCode l = LanguageCode.get(code).orElseThrow();
System.out.println(l.code() + "\t" +
l.refName() + "\t" +
l.getDisplayName(Locale.US) + "\t" +
l.getDisplayName(new Locale("nl")) + "\t" +
l.getDisplayName(new Locale("eo"))
);
}
// TODO
@Disabled
@Test
public void hashCodeStable() {
assertThat(ISO_639.iso639("NL").hashCode()).isEqualTo(320304382);
}
}LanguageCode#getByCode will also support retired codes if possible. This means that the code of the returned object may be different:
import org.meeuw.i18n.languages.*;
...
// the 'krim' dialect (Sierra Leone) officially merged into 'bmf' (Bom-Kim) in 2017
assertThat(ISO_639.getByPart3("krm").get().getCode()).isEqualTo("bmf");Sometimes we have to deal with systems which have their own versions of the standards. In these cases it is possible to register 'fall backs'.
E.g.
import org.meeuw.i18n.languages.*;
...
// Our partner uses the pseudo ISO-639-1 code 'XX' for 'no language'
// fall back to a proper Part 3 code.
try {
LanguageCode.registerFallback("XX", LanguageCode.languageCode("zxx"));
assertThat(ISO_639.iso639("XX").code()).isEqualTo("zxx");
} finally {
LanguageCode.resetFallBacks();
}The language code is annotated with a JAXB annotation. It will serialize and deserialize to and from the code. The dependency on the annotation is optional.
The needed classes are also annotated by Jackson annotations, so they can be serialized to and from JSON.
LanguageCode is serializable too, and ensures that on deserialization the same object for every language is returned. (And only the code is non-transient).
The default sort order of a LanguageCode used to be on 'Inverted Name'. There may be more than one (inverted) name though (E.g. Dutch and Flemish). Since 3.0 LanguageCode is not Sortable anymore. LanguageCode#stream() is sorted by ISO-639-3 code.
The files of sil.org only provide names of languages and language families in English. Java’s java.util.Locale can provide the name of a lot of languages (I presume the language with a ISO-639-1 code) in a lot of other languages.
E.g.
new Locale("en").getDisplayName(new Locale("nl"));will give the name of the language 'English' in Dutch ('Engels').
To make this available in a generic way the base interface ISO_639 just has
String getDisplayName(Locale locale)For ISO_639_1 this will first try to use the above way with java.util.Locale. For other codes, and as a fallback in ISO_639_1 will use the resource bundle org.meeuw.i18n.languages.DisplayNames, where the default and english names are provided by the sil.org files.
Noticeably, for example for sign languages this is the way to have proper names available
assertThat(LanguageCode.languageCode("dse").getDisplayName(new Locale("nl"))).isEqualTo("Nederlandse Gebarentaal");<1 |
developing/testing |
2023 |
|
1.x |
compatible with java 8, javax.xml, module-info java 11 |
||
1.0 |
2023-11-30 |
||
2.x |
java 11, jakarta.xml |
2024-01-28 |
jakarta mostly applies to the optional jaxb support (and to some - also optional - validation annotations) |
2.1 |
support for retired codes |
2024-02-11 |
|
2.2 |
migrated support for language code validation from i18n-regions |
2024-? |
|
3.0 |
Refactoring |
2024-3 |
Added enum for ISO-639-1 codes,
Made syntax forward compatible with records. So, getters like |
3.1 |
Refactoring |
2024-3 |
Support for ISO-639-5. Dropped the -3 from the artifact id. |
3.6 |
Support for #getDisplayName |
2024-8 |
|
3.8 |
Better support for fallbacks. Updated tables |
2025-1 |
|
4.0 |
support for jackson3, java 17 |
2025-12 |