Skip to content

alok/OppositeDay

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OppositeDay

Toy mechinterp: take a word embedding, reflect it across all axes (i.e. negate the vector), then "back-project" by finding the nearest word in the embedding vocabulary.

This is intentionally dumb — it's a simple baseline for exploring what "opposite" means in distributional vector spaces.

Quickstart

uv sync
uv run oppositeday meek
uv run oppositeday meek --origin mean

Example: "meek"

With the default glove-wiki-gigaword-50 embeddings, the naive axis-reflection trick produces something very non-semantic (this is part of the point):

word:  meek
model: glove-wiki-gigaword-50
origin: zero
opposite (top-1): aiport  (sim=0.6693)

Mean-centered reflection (reflect about the global mean embedding vector) doesn't magically fix it, but is a useful variant to compare against:

word:  meek
model: glove-wiki-gigaword-50
origin: mean
opposite (top-1): sulabh  (sim=0.7326)

What does "reflect across all axes" mean?

If the embedding is a vector v ∈ R^d, then we define:

v_opposite = -v

Then we search the embedding vocabulary for the nearest neighbor to v_opposite (cosine similarity by default).

Notes

  • By default this repo uses the glove-wiki-gigaword-50 embeddings via gensim.downloader.
  • The first run will download the model into your ~/gensim-data cache.
  • The experiment write-up (negative result) is in EXPERIMENT_LOG.md.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages