LM Memorization

A repository dedicated to using simple techniques to tease out what language models actually represent as learned knowledge and what they merely memorize from disparate training sources.

Setup and Installation

Python Interpreter

All included source code is written and tested in a Python 3.11 environment, though older versions of Python 3 may still function.

Packages

All packages imported by various scripts are included in requirements.txt and can be simply installed via pip3 install -r requirements.txt

Language Models

This repository is designed to work with locally run LMs via the Ollama python library. Changing to the OpenAI API/python library (which Ollama also implements and therefore can still interaface with your local models) is a consideration for the future.

Tokenization

For tokenization with GPT2's tokenizer (common to OpenAI models, LLAMA models, and several others -- however this may not be the tokenizer of your model, so update as needed), we use the HuggingFace Transformers library.

In the future, better tokenization control (including opting-out of tokenizers running) will be implemented.

Executable Scripts

All scripts are written with command line argument parsing, ergo their specific execution details can be read via: python3 <script.py> --help.

lm_math.py performs the primary investigation using locally run language models
plots.py produces figures using the outputs generated by lm_math.py

Supporting Scripts

special_operation.py defines an arbitrary function for lm_math.py to use as an un-memorizable input to the LM.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
informal		informal
lm_tokenizers		lm_tokenizers
op_extensions		op_extensions
.gitignore		.gitignore
README.md		README.md
lm_math.py		lm_math.py
op_extensions.py		op_extensions.py
plots.py		plots.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LM Memorization

Setup and Installation

Python Interpreter

Packages

Language Models

Tokenization

Executable Scripts

Supporting Scripts

About

Uh oh!

Releases

Packages

Uh oh!

Languages

tlranda/lm_math

Folders and files

Latest commit

History

Repository files navigation

LM Memorization

Setup and Installation

Python Interpreter

Packages

Language Models

Tokenization

Executable Scripts

Supporting Scripts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages