This repository contain a dataset about the world's languages and language-like objects as well as a website framework to visualize the information.
There are multiple ways to visualize the data
| Card List | Details | Hierarchy | Table | Reports |
|---|---|---|---|---|
This website was put together to give an overview of the world's languages and language-related concepts. Similar to Ethnologue & Glottolog, the main differences of this are to:
- Free & open to all consumers.
- Show all data, even contested language definitions. Making sure to put contested data in context.
- Provide actionable insights -- make sure the data is clear enough that consumers can come to this page to get answers.
Common questions people will come to this resource for are:
- What's the language code for XXXX?
- What languages are used in country YYYY?
- What are the top languages in the world given some criteria.
- Frontend: The website is rendered in Javascript, particularly React using Typescript.
- Backend: The framework of the website Node and Vite.
- Data: Data files are written in Tab-separated-value format (tsv).
We've partnered with various organizations to get data and to provide data to. UNESCO's World Atlas of Languages, Digital Language Vitality, and Unicode CLDR
The data comes from multiple sources, primarily CLDR, Ethnologue, and Glottolog.
In order to generate the website on an internal server, follow these instructions.
- Install Node Project Manager, see the official Node documentation for install
- Download the repository to your computer -- go to that folder when you are done
- Run
npm installto install relevant Node and Vite packageas - Run
npm run devto start the server with some dev options- or
npm run buildfor the public version
- or
- Depending on what port is used, the website can now be accessed using a local browser at a URL like http://localhost:5173/
In order to push the changes to the deployed website (github pages site), follow these instructions.
- Run
npm run deployto deploy the changes. This will- Build the app into the dist/ folder.
- Push the dist/ contents to the gh-pages branch.
This is how we created the project originally -- you should not need to run these, but its for background.
- Initalize the project using vite
npm create vite@latest - Choose
lang-navas project name. Then React + TypeScript - Change into the
lang-navdirectory and runnpm install - Setup the linter
- Initialize
npx eslint --init - Choose options: what: javascript, use: problems, modules: esm, framework: react, typescript: yes, runs on: browser
npm install --save-dev prettier eslint-config-prettier eslint-plugin-prettier- More magic to get it to run... I had to install ESLint on my IDE (VSCode)
- Some plugins were added after the this library was started like
eslint-plugin-import - Import other lirbaries
npm install react-router-dom
- Start
npm run dev
There's a lot of data shown here but there always could be more. The main way to add or update data is to go to the Tab-separated files directly. They are all in the public/data directory.
If you want to add entries or update values, you can just edit the existing TSVs.
However, if you want to add a lot more data or add contested data it may be better to make new TSVs and then update the website to use those instead.
TODO add a guide
Here's a list of planned functionality. Completed functions are checked off.
- Language-adjacent objects
- Languages
- Core attributes
- ISO parent/child connections
- Language families
- Glottolog
- Digital Support details
- Vitality details
- Keyboard availability details
- Territories
- Countries & Dependencies
- Continents & other regions
- Locales (languages + territories + potentially other specificity)
- Basic data
- Computed regional locales
- Population estimate sources
- Writing Systems
- Basic data
- Relationship w/ other writing systems (containment, lineage)
- Language Variants / IANA tags
- Censuses
- Regular censuses
- Include citation information
- Continue importing new censuses
- Convert other imported datasets into census-like objects
- Languages
- Views
- Cards
- Details
- Hierarchy
- Table
- Map
- Reports
- Language name overlap
- Invalid languages
- Locales that should be added
- Metrics on the data we have
- Interactivity
- Search
- By Code
- By Name, Endonym
- Highlight search
- For Hierarchy
- Using typeahead
- When few results are shown, suggest alternatives
- Filter
- By Scope
- By Country
- Integrate in search bar
- Hovercard & Tooltips
- Related objects
- Field explanations
- Sort By: Population, Name, Code
- Limit
- Pagination
- Visual options
- Change locale separator (_ or -)
- Selection
- Export
- Search
- Manage data sources
- Show results based on different definitions of what a language is
- ISO, Glottolog, CLDR, All
- Highlight language codes in each
- Add a better guide for different kinds of users
- Show results based on different definitions of what a language is
- About Page
- Introduction, how to use
- Acknowledgements
- Future ideas
- Database-powered backend
- Feedback mechanisms
- Metrics
- The code in this repository is licensed under the MIT License.
- The language data, visualizations, and other content are licensed under Creative Commons Attribution-ShareAlike 4.0 International.