Discworld Hex

Hex clusters Discworld's stories.

Clustering and search tool applied to plots of Discworld novels. Currently, given an input sentence, it will find the most similar parts of Discworld books based on their plot summaries from Wikipedia.

This is just a tiny proof-of-concept of using FAISS with transformer language models that could be easily extended to cover much larger datasets.

Setup

Should work out of the box with bash and a couple of prerequisites:

conda
poetry

( cd conda && source bootstrap.sh )
conda activate discworld-hex
poetry install

Usage

TL;DR (when poetry is installed and the discworld-hex conda env is activated):

build
search

To only fetch data and build and export the index:

build
# is just a shortcut for:
poetry run build

To use the index to search:

search
# is just a shortcut for:
poetry run search

To run any python script in this project:

poetry run python src/discworld_hex/any_file.py

To run all checks:

poetry run pre-cmmit

TODO

Functionality

(What the user would notice.)

Allow custom wikipedia queries on the input (and thus custom libraries)
Fine-tune (e.g., standard (masked) language modelling) on the specific subdomains
Aggregate search results per-book
Allow merging libraries
Better CLI, allow to change k, pass in multiple sentences, etc., either:
- clickify and richify the interface
- Alternatively, just make it into an API
Support other (faster, less accurate) indexes

Maintenance

(What the user shouldn't notice.)

Less redundant library serialization
More tests
- Rebuilding Library and the FAISS index

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
conda		conda
src/discworld_hex		src/discworld_hex
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Discworld Hex

Setup

Usage

TODO

Functionality

Maintenance

About

Uh oh!

Releases

Packages

Languages

m-k-l-s/discworld-hex

Folders and files

Latest commit

History

Repository files navigation

Discworld Hex

Setup

Usage

TODO

Functionality

Maintenance

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages