Skip to content

feminist-ai/training-feminist-llms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Training Feminist LLMs

What if we actually trained on feminist text, concepts and notions? Could that create feminist LLMs?

This repository will explore if, when, and how a privacy-respecting, communal and feminist LLM could be developed. Stay tuned for our initial experiments and results!

Call for Contributions for PyConDE

At PyConDE in Darmstadt from April 23-25, 2025, we will try training a feminist base language model. We are looking for contributions of text data, documents and writing from artists, writers, creators and/or librarians who have feminist writing and work.

What the contributions will be used for

We are going to start our experiment with a pretrained base language model. Think of this as just a model that takes text and makes more text from it. It is not a full chat model, it can only take text and write what word(s) will come next.

Then we are going to collect texts communally and with attribution. These texts will go through some preprocessing to prepare them for training. The log of contributions will be held for release of the model. Please ONLY contribute your own work or work that is truly meant for open-use with appropriate attribution.

This data will be used to perform what is called continued or extended pretraining. This means that the text will be fed into a machine learning training process, which will update the base model so that it creates text more similar to the text it is using. We hope this can result in better representation and research in how feminism is learned or unlearned by LLMs.

Any contributor who has provided their name will get a copy of the trained model for their own use. This is specifically why we ask that you only contribute your own work.

Privacy concerns

Because machine learning models can and do memorize parts of their training data, we ask that you only contribute material that you would be okay if the model memorized and repeated. Again, we will release a list of contributors to this initial model, but it is not possible with our setup to have the model directly cite your work.

If you are a librarian and have digitized open access content that you think might fit our goals, please contact us via email.

In the future, we will provide alternative approaches in this repository to allow for better privacy and copyright protections for any and all contributors.

Caveats

If we don't have enough submissions, it could be that we cannot properly train a model. We will make sure to inform all contributors as the process gets closer and will delete any unused data that is not explicitly confirmed to be saved for future events.

Note on machine learning and our world: Feminism is too diverse and complex to ever be truly represented by a series of cool math equations. This work isn't meant to replace amazing research and work on proving that point! It is meant to increase visibility, hold space and spark conversation about the intersections of feminism and what we today call AI.

I'm interested, how do I contribute?

If the number of files you have is large, please contact us via email to arrange file transfer. If you have a small amount of files or only want to contribute a small amount of texts, please use this Google Form.

About

Ideas, experiments and workflows for training communal and feminist LLMs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published