Toy mechinterp: take a word embedding, reflect it across all axes (i.e. negate the vector), then "back-project" by finding the nearest word in the embedding vocabulary.
This is intentionally dumb — it's a simple baseline for exploring what "opposite" means in distributional vector spaces.
uv sync
uv run oppositeday meek
uv run oppositeday meek --origin meanWith the default glove-wiki-gigaword-50 embeddings, the naive axis-reflection trick produces
something very non-semantic (this is part of the point):
word: meek
model: glove-wiki-gigaword-50
origin: zero
opposite (top-1): aiport (sim=0.6693)
Mean-centered reflection (reflect about the global mean embedding vector) doesn't magically fix it, but is a useful variant to compare against:
word: meek
model: glove-wiki-gigaword-50
origin: mean
opposite (top-1): sulabh (sim=0.7326)
If the embedding is a vector v ∈ R^d, then we define:
v_opposite = -v
Then we search the embedding vocabulary for the nearest neighbor to v_opposite (cosine
similarity by default).
- By default this repo uses the
glove-wiki-gigaword-50embeddings viagensim.downloader. - The first run will download the model into your
~/gensim-datacache. - The experiment write-up (negative result) is in
EXPERIMENT_LOG.md.