Skip to content

Provides a java classes to work with all available codes in ISO-639-1, ISO-639-3 (language) and ISO-639-5 (language families)

License

Notifications You must be signed in to change notification settings

mihxil/i18n-iso-639

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JAVA ISO-639 support

Build Status codecov javadoc Maven Central

Codes for Languages (and language groups) of the World are covered by the ISO-639 standard

These standards provide letter codes for each language. E.g. ISO-639-3 provides a three-letter code for all living languages.

There are too many such codes to be contained in a java-enum (e.g. https://github.com/TakahikoKawasaki/nv-i18n/blob/master/src/main/java/com/neovisionaries/i18n/LanguageAlpha3Code.java is just not complete)

This package has the tab separated files provided by https://iso639-3.sil.org/, and java classes to read this, and provide all language codes as java objects, with getters.

Major language and language families are contained in enums though, so can be addressed as constants.

Usage

import org.meeuw.i18n.languages.*;

...
// a language code can be obtained via static method
LanguageCode nl = LanguageCode.languageCode("nl");
// For major language having a ISO-639-1 code there are enum values available
assertThat(nl).isSameAs(ISO_639_1_Code.nl);

// other parts of the standards work too
assertThat(nl).isSameAs(LanguageCode.languageCode("dut")); // ISO-639-2/B code
assertThat(nl).isSameAs(LanguageCode.languageCode("nld")); // ISO-639-3 / ISO-639-2/T code

// use it in some way
assertThat(nl.nameRecord(Locale.US).inverted()).isEqualTo("Dutch");

See also the

test cases
package org.meeuw.i18n.languages.test;

import java.util.*;
import java.util.concurrent.atomic.AtomicLong;

import org.junit.jupiter.api.Disabled;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.ValueSource;
import org.meeuw.i18n.languages.*;

import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;

@SuppressWarnings("OptionalGetWithoutIsPresent")
class LanguageCodeTest {


    @Test
    public void example() {

        // get a language by its code;
        Optional<LanguageCode> optional = ISO_639.getByPart3("nld");
        LanguageCode languageCode = LanguageCode.languageCode("nl");

        // show its 'inverted' name
        System.out.println(languageCode.nameRecord(Locale.US).inverted());

        assertThatThrownBy(() -> languageCode.nameRecord(Locale.CHINESE)).isInstanceOf(UnsupportedOperationException.class);


        // get a language family
        Optional<LanguageFamilyCode> family = ISO_639.getByPart5("ger");

        // get by any code
        Optional<ISO_639_Code> byCode = ISO_639.get("nl");

        // stream by names, language may have several names (dutch, flemish), and appear multiple times
        ISO_639.streamByNames().forEach(e -> {
            System.out.println(e.getKey() + " " + e.getValue());
        });

        assertThat(languageCode.toLocale()).isEqualTo(new Locale("nl"));
    }

    @Test
    public void stream() {
        LanguageCode.stream().forEach(lc -> {
            System.out.println(lc + " (" + lc.scope() + ")" + " " + lc.getDisplayName(Locale.ENGLISH) + " " + lc.getDisplayName(new Locale("nl")));
            assertThat(lc.code()).isNotNull();
            assertThat(lc.languageType()).isNotNull();
            assertThat(lc.scope()).isNotNull();
            assertThat(lc.nameRecords()).hasSizeGreaterThanOrEqualTo(1);
            assertThat(lc.nameRecord(Locale.US)).isNotNull();
            assertThat(lc.nameRecord()).isNotNull();
            assertThat(lc.refName()).isNotNull();

            if (lc instanceof ISO_639_1_Code) {
                assertThat(lc.getDisplayName(new Locale("nl"))).isEqualTo(
                    new Locale(lc.code()).getDisplayName(new Locale("nl")));
            }

            if (lc.part1() != null) {
                assertThat(lc).isInstanceOf(ISO_639_1_Code.class);
            }
            if (lc.comment() != null) {
                System.out.println("Comment: " + lc.comment());
            }
            if (lc.scope() == Scope.M) {
                assertThat(lc.individualLanguages()).isNotEmpty();
                System.out.println("Macro language with: " + lc.individualLanguages());
                for (LanguageCode individual : lc.individualLanguages()) {
                    if (individual instanceof RetiredLanguageCode) {
                        System.out.println("Retired: " + individual + " " + ((RetiredLanguageCode) individual).retReason());
                    } else {
                        assertThat(individual.macroLanguages())
                            .withFailMessage("macro language " + lc + " has " + individual + " but this has not it as macro").contains(lc);
                    }
                }
            }
            if (! lc.macroLanguages().isEmpty()) {
                System.out.println("Macro language for " + lc + " :" + lc.macroLanguages());
                for (LanguageCode macro : lc.macroLanguages()) {
                    assertThat(macro.individualLanguages()).contains(lc);
                    assertThat(macro.scope()).isEqualTo(Scope.M);
                }
            }
        });
    }

    @Test
    public void streamByName() {
        AtomicLong count = new AtomicLong();
        LanguageCode.streamByNames().forEach(e -> {
            System.out.println(e.getKey() + " " + e.getValue());
            count.incrementAndGet();
        });
        assertThat(count.get()).isGreaterThan(LanguageCode.stream().count());
    }

    @Test
    public void sort() {
        LanguageCode.stream()
            .sorted(Comparator.comparing(LanguageCode::refName))
            .forEach(lc -> {
            System.out.println(
                lc.code() + "\t" +
                    lc.refName() + " " +
                    lc.nameRecord(Locale.US));
        });
    }

    @Test
    public void getByCode() {
        assertThat(ISO_639.getByPart3("nld").get().refName()).isEqualTo("Dutch");
        assertThat(ISO_639.getByPart3(null)).isEmpty();
    }

    @Test
    public void get() {
        assertThat(LanguageCode.get("nl").get().refName()).isEqualTo("Dutch");
        assertThat(LanguageCode.languageCode("nl").refName()).isEqualTo("Dutch");
        assertThat(LanguageCode.get("nld").get().refName()).isEqualTo("Dutch");
    }

    @Test
    public void getTokiPona() {
        assertThat(LanguageCode.get("tok").get().refName()).isEqualTo("Toki Pona");
    }

    @Test
    public void getByPart1() {
        assertThat(ISO_639.getByPart1("nl").get().refName()).isEqualTo("Dutch");
        assertThat(ISO_639.getByPart1(null)).isEmpty();
    }

    @Test
    public void getByPart2T() {
        assertThat(ISO_639.getByPart2T("nld").get().refName()).isEqualTo("Dutch");
        assertThat(ISO_639.getByPart2T(null)).isEmpty();
    }

    @Test
    public void getByPart2B() {
        assertThat(ISO_639.getByPart2B("dut").get().refName()).isEqualTo("Dutch");
        assertThat(ISO_639.getByPart2B(null)).isEmpty();
    }

    @Test
    public void getUnknown() {
        assertThat(ISO_639.getByPart3("doesntexist")).isEmpty();
    }

    @Test
    public void getCode() {
        assertThat(ISO_639.getByPart3("nld").get().code()).isEqualTo("nl");
        assertThat(ISO_639.getByPart3("act").get().code()).isEqualTo("act");
    }

    @Test
    public void krm() {
        // the 'krim' dialect (Sierra Leano) officially merged into 'bmf' (Bom-Kim) in 2017
        assertThat(ISO_639.getByPart3("krm").get().code()).isEqualTo("bmf");
    }

    @Test
    public void ppr() {
        assertThat(ISO_639.getByPart3("ppr").get().code()).isEqualTo("lcq");
    }

    @Test
    public void lcq() {
        assertThat(ISO_639.getByPart3("lcq").get().code()).isEqualTo("lcq");
    }

    @Test
    public void XXFallBack() {
        try {
            assertThatThrownBy(() -> ISO_639.iso639("XX")).isInstanceOf(IllegalArgumentException.class);

            LanguageCode.registerFallback("XX", LanguageCode.languageCode("zxx"));

            assertThat(ISO_639.iso639("XX").code()).isEqualTo("zxx");
        } finally {
            LanguageCode.resetFallBacks();
        }
    }

    @Test
    public void XXYYFallBack() {
        try {
            assertThatThrownBy(() -> ISO_639.iso639("XX")).isInstanceOf(IllegalArgumentException.class);
            assertThatThrownBy(() -> ISO_639.iso639("YY")).isInstanceOf(IllegalArgumentException.class);

            LanguageCode.setFallbacks(Map.of(
                "XX", LanguageCode.languageCode("zxx"),
                "YY", LanguageCode.languageCode("nl"))
            );
            assertThat(LanguageCode.getFallBacks()).hasSize(2);

            assertThat(ISO_639.iso639("XX").code()).isEqualTo("zxx");
            assertThat(ISO_639.iso639("YY").code()).isEqualTo("nl");
        } finally {
            LanguageCode.resetFallBacks();
            assertThat(LanguageCode.getFallBacks()).isEmpty();
        }
    }


    @Test
    @Deprecated
    public void deprecated() {
        assertThat(LanguageCode.getByCode("nld")).contains((LanguageCode) ISO_639.get("nl").get());
        assertThat(LanguageCode.getByPart1("nl")).contains((LanguageCode) ISO_639.get("nl").get());
        assertThat(LanguageCode.getByPart2B("dut")).contains((LanguageCode) ISO_639.get("nl").get());
        assertThat(LanguageCode.getByPart2T("nld")).contains((LanguageCode) ISO_639.get("nl").get());
        assertThat(LanguageCode.getByPart3("nld")).contains((LanguageCode) ISO_639.get("nl").get());
    }

    @Test
    public void dutchSignLanguage() {

        LanguageCode l = LanguageCode.get("dse").orElseThrow();

        assertThat(l.refName()).isEqualTo("Dutch Sign Language");
        assertThat(l.nameRecord().print()).isEqualTo("dse (Dutch Sign Language)");
        assertThat(new Locale(l.code()).getDisplayName(Locale.US)).isEqualTo("dse");

        assertThat(l.getDisplayName(Locale.US)).isEqualTo("Dutch Sign Language");

        assertThat(l.getDisplayName(new Locale("nl"))).isEqualTo("Nederlandse Gebarentaal");
    }
    @ParameterizedTest
    @ValueSource(strings = {"dse", "vgt","ase","bfi",
        "csl",
        "gsg",
        "tsm",
        "dsl",
        "inl",
        "ise",
        "rsl"})
    public void signLanguage(String code) {
        LanguageCode l = LanguageCode.get(code).orElseThrow();


        System.out.println(l.code() + "\t" +
            l.refName() + "\t" +
            l.getDisplayName(Locale.US) + "\t" +
            l.getDisplayName(new Locale("nl")) + "\t" +
            l.getDisplayName(new Locale("eo"))
        );
    }


    // TODO
    @Disabled
    @Test
    public void hashCodeStable() {
        assertThat(ISO_639.iso639("NL").hashCode()).isEqualTo(320304382);
    }
}

Retired codes

LanguageCode#getByCode will also support retired codes if possible. This means that the code of the returned object may be different:

import org.meeuw.i18n.languages.*;

...

// the 'krim' dialect (Sierra Leone) officially merged into 'bmf' (Bom-Kim) in 2017

assertThat(ISO_639.getByPart3("krm").get().getCode()).isEqualTo("bmf");

Fall backs

Sometimes we have to deal with systems which have their own versions of the standards. In these cases it is possible to register 'fall backs'.

E.g.

import org.meeuw.i18n.languages.*;

...
// Our partner uses the pseudo ISO-639-1 code 'XX' for 'no language'
//  fall back to a proper Part 3 code.
try {
    LanguageCode.registerFallback("XX", LanguageCode.languageCode("zxx"));
    assertThat(ISO_639.iso639("XX").code()).isEqualTo("zxx");
} finally {
    LanguageCode.resetFallBacks();
}

Support

JAXB

The language code is annotated with a JAXB annotation. It will serialize and deserialize to and from the code. The dependency on the annotation is optional.

JSON

The needed classes are also annotated by Jackson annotations, so they can be serialized to and from JSON.

Serializable

LanguageCode is serializable too, and ensures that on deserialization the same object for every language is returned. (And only the code is non-transient).

Sortable

The default sort order of a LanguageCode used to be on 'Inverted Name'. There may be more than one (inverted) name though (E.g. Dutch and Flemish). Since 3.0 LanguageCode is not Sortable anymore. LanguageCode#stream() is sorted by ISO-639-3 code.

Internationalization of language names

The files of sil.org only provide names of languages and language families in English. Java’s java.util.Locale can provide the name of a lot of languages (I presume the language with a ISO-639-1 code) in a lot of other languages.

E.g.

 new Locale("en").getDisplayName(new Locale("nl"));

will give the name of the language 'English' in Dutch ('Engels').

To make this available in a generic way the base interface ISO_639 just has

String getDisplayName(Locale locale)

For ISO_639_1 this will first try to use the above way with java.util.Locale. For other codes, and as a fallback in ISO_639_1 will use the resource bundle org.meeuw.i18n.languages.DisplayNames, where the default and english names are provided by the sil.org files.

Noticeably, for example for sign languages this is the way to have proper names available

assertThat(LanguageCode.languageCode("dse").getDisplayName(new Locale("nl"))).isEqualTo("Nederlandse Gebarentaal");

Versions

<1

developing/testing

2023

1.x

compatible with java 8, javax.xml, module-info java 11

1.0

2023-11-30

2.x

java 11, jakarta.xml

2024-01-28

jakarta mostly applies to the optional jaxb support (and to some - also optional - validation annotations)

2.1

support for retired codes

2024-02-11

2.2

migrated support for language code validation from i18n-regions

2024-?

3.0

Refactoring

2024-3

Added enum for ISO-639-1 codes, Made syntax forward compatible with records. So, getters like getPart1()) are dropped in favor of part1(). LanguageCode itself is now an interface. This may be backported to 1.2 for javax compatibility.

3.1

Refactoring

2024-3

Support for ISO-639-5. Dropped the -3 from the artifact id.

3.6

Support for #getDisplayName

2024-8

3.8

Better support for fallbacks. Updated tables

2025-1

4.0

support for jackson3, java 17

2025-12

About

Provides a java classes to work with all available codes in ISO-639-1, ISO-639-3 (language) and ISO-639-5 (language families)

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •