Development for this project has moved to sourcehut here. This repository is now in a read-only state.
Local semantic indexing and search for Emacs.
There is a dynamic module that needs cargo installed for building. This module is automatically compiled when you install the package using something like this:
(use-package sem
:vc (:fetcher github :repo lepisma/sem.el)
:demand t)Note that this method only handles creating .so files which means it won’t work
on non-Linux systems yet. For other systems, manually inspect the Makefile and
compile the module yourself.
Once built and installed, you can use it like the following. Note that the
example below needs you to have tokenizers.el and onnx.el installed as they are
needed by sem-embed to generate embeddings.
(require 'sem)
;; sem-embed is an optional package providing general purpose local embedding
;; functions
(require 'sem-embed)
(setq sem-data-dir (expand-file-name "~/.emacs.d/sem/"))
(setq sem-embed-model-path (expand-file-name "~/.emacs.d/sem/model_O2.onnx"))
(defun embed-batch-fn (items)
"The embedding function that takes lisp objects and returns a 2D matrix
of vectors (n-items x dim)."
(sem-embed-default (apply #'vector (mapcar #'prin1-to-string items))))
(defun embed-fn (item)
(aref (embed-batch-fn (list item)) 0))
;; The output dimension of default embedder from sem-embed is needed here
(sem-db-new "ml-test" sem-embed-dim)
(setq sem-db (sem-db-load "ml-test"))
;; First we will add a few items to do similarity search
(let ((items (list "hello world"
"this is an introduction"
"movies are bad"
"food is good")))
(sem-add-batch sem-db items #'embed-batch-fn))
;; #'identity is the read-fn which is used to load back the lisp object from
;; their string representation. By default the `sem-add' function uses
;; `prin1-to-string' for building the strings representation.
(sem-similar sem-db "worst" 2 #'embed-fn #'identity)
;; ((0.33184681863908483 . "movies are bad") (0.25182008665115796 . "hello world"))
;; Output is a list of score, item pairsIn case you have added a lot of data, you might want to consider indexing the vectors for fast (and approximate) searches. Here is how you can do this:
;; All are sync functions in the current version
;; This builds an index on the vectors
(sem-build-index sem-db)
;; Once you have added more new data, run re-indexing (and other optimizations)
(sem-optimize sem-db)
;; Check item counts with
(sem-items-count sem-db)On creation of a new database, the package automatically creates a default table
(see sem-default-table-name) which is used for all of the examples shown
above. In case you want to store many tables in one database, here is how you
can do so:
;; You can check if a table is already present
(sem-table-present-p sem-db table-name)
;; Also here is how to list all tables
(sem-table-list sem-db)
(sem-table-new sem-db table-name dim)
;; After creation, you can pass table-name as optional parameter in all
;; operations where it makes sense. Here are a few examples, check function
;; docstrings for more.
(sem-build-index sem-db table-name)
;; In case you want to delete a table
(sem-table-delete sem-db table-name)