GPT-NL develops a large language model (LLM) as a responsible alternative for other language models. We aim to build, train and share a sovereign and transparent AI model to strengthen the digital position of the Netherlands and Europe. We focus on privacy, legal compliance, data availability and public values during this process.
Sovereignty and digital autonomy GPT-NL is developed in digital independence, ensuring total control over the model, the data and our choices that we make during the data acquisition, curation, model training, finetuning and testing/deployment processes.
Openness and transparency We commit to provide insight into our design and development. We explicitly define our choices that we make during the processes, documenting our data sources, model training decisions, and risk assessments. Bias and ethical issues are clearly identified. We publish most of our curation and training code in this GitHub organization and share our public data on Hugging Face. Model weights are made accessible under a licensing scheme for research and professional use.
Reliability and compliance We protect the rights of partners and end users. We avoid use of data from questionable origins which could contain copyright violations, personally identifiable information (PII) or unclear biases. This is done by training the model from scratch, using data which conforms to our fundamental requirements of protection of intellectual property rights, removal or anonymization of PII prior to training, exclusion of confidential and harmful content. We identify and remove duplication in our data set, also when non-infringing synthetic data is used.
Reciprocity and fairness We choose to partner with copyright holders in a legitimate and conscious manner, keeping an open dialogue and actively involving our data providers during the development. A portion of financial benefits flows back to these contributors, represented in the Content Board. This leads to a more equal innovation model where we share the growth of value instead of draining it from others.
Energy efficiency LLM development requires lots of computation power, which is why we actively steer toward more responsible use of resources. We monitor and optimize our model size and training utilization, in particular energy and water use. We strive to estimate energy consumption before every activity, and when deciding technical pathways for our curation and training.
Our focus areas for feasibility studies of GPT-NL deployment involve experimentation with the AI model in primarily Dutch contexts. We partner with research institutes (both universities and public institutions) to apply the model in novel areas. Domain-specific finetuning for application in safety, health, education and public services are in consideration. Government use in communication tools and document processing are some of the initial use cases.
Our model is designed for summarization, text simplification, and RAG‑style question answering. These capabilities address our stakeholders’ immediate needs and help them adopt LLM technology more quickly.
For a complete overview of the system architecture of the GPT-NL data extraction and data curation phases, please have a look at our System Architecture Document.
- GPT-NL Homepage (mainly Dutch)
- GPT‑NL at TNO: a sovereign language model for the Netherlands
- Q&A GPT-NL at SURF: Dutch own open AI language model
- GPT-NL at Hugging Face: our public datasets
- GPT-NL at LinkedIn
The GPT-NL regularly team releases news, materials, announcements, and new code sources a piece-at-a-time. Therefore, keep yourself updated by visiting the GPT-NL News often!
We welcome issues and discussions for our project. We are also collaborating with a wide range of partners in the public and private sector, including research institutes, universities, public institutions and companies. Consider joining our ecosystem via our partnership page where you can find out more about the different ways to work together.
Initially, we do not seek external contributions via Pull Requests (PRs) to our open source repositories. We are currently focused on open sourcing more of our code repositories and data sets. We prefer collaboration in the form of partnerships, or with external parties on specific topics. This also reduces the workload on ourselves to review and accept external work, further given the scope and vision of the project. We regularly review when we are able to welcome PRs that align with the goals of the GPT-NL project. Visit our contribution guidance.
GPT-NL is a collaboration of Dutch non-profit organizations TNO, NFI and SURF, combining forces to develop a large language model based on data sets with high quality, primarily Dutch origin and which have been legally obtained, often through collaborations with partners. Parts of the benefits flow back to authors and other copyright holders.
The project received funding from the Netherlands Enterprise Agency (RVO) and Ministry of Economic Affairs (EZK).