sem.el

Development for this project has moved to sourcehut here. This repository is now in a read-only state.

Local semantic indexing and search for Emacs.

Installation and Usage

There is a dynamic module that needs cargo installed for building. This module is automatically compiled when you install the package using something like this:

(use-package sem
  :vc (:fetcher github :repo lepisma/sem.el)
  :demand t)

Note that this method only handles creating .so files which means it won’t work on non-Linux systems yet. For other systems, manually inspect the Makefile and compile the module yourself.

Once built and installed, you can use it like the following. Note that the example below needs you to have tokenizers.el and onnx.el installed as they are needed by sem-embed to generate embeddings.

(require 'sem)
;; sem-embed is an optional package providing general purpose local embedding
;; functions
(require 'sem-embed)

(setq sem-data-dir (expand-file-name "~/.emacs.d/sem/"))
(setq sem-embed-model-path (expand-file-name "~/.emacs.d/sem/model_O2.onnx"))

(defun embed-batch-fn (items)
  "The embedding function that takes lisp objects and returns a 2D matrix
of vectors (n-items x dim)."
  (sem-embed-default (apply #'vector (mapcar #'prin1-to-string items))))

(defun embed-fn (item)
  (aref (embed-batch-fn (list item)) 0))

;; The output dimension of default embedder from sem-embed is needed here
(sem-db-new "ml-test" sem-embed-dim)
(setq sem-db (sem-db-load "ml-test"))

;; First we will add a few items to do similarity search
(let ((items (list "hello world"
                   "this is an introduction"
                   "movies are bad"
                   "food is good")))
  (sem-add-batch sem-db items #'embed-batch-fn))

;; #'identity is the read-fn which is used to load back the lisp object from
;; their string representation. By default the `sem-add' function uses
;; `prin1-to-string' for building the strings representation.
(sem-similar sem-db "worst" 2 #'embed-fn #'identity)
;; ((0.33184681863908483 . "movies are bad") (0.25182008665115796 . "hello world"))
;; Output is a list of score, item pairs

In case you have added a lot of data, you might want to consider indexing the vectors for fast (and approximate) searches. Here is how you can do this:

;; All are sync functions in the current version

;; This builds an index on the vectors
(sem-build-index sem-db)

;; Once you have added more new data, run re-indexing (and other optimizations)
(sem-optimize sem-db)

;; Check item counts with
(sem-items-count sem-db)

Multiple tables

On creation of a new database, the package automatically creates a default table (see sem-default-table-name) which is used for all of the examples shown above. In case you want to store many tables in one database, here is how you can do so:

;; You can check if a table is already present
(sem-table-present-p sem-db table-name)

;; Also here is how to list all tables
(sem-table-list sem-db)

(sem-table-new sem-db table-name dim)
;; After creation, you can pass table-name as optional parameter in all
;; operations where it makes sense. Here are a few examples, check function
;; docstrings for more.
(sem-build-index sem-db table-name)

;; In case you want to delete a table
(sem-table-delete sem-db table-name)

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Makefile		Makefile
README.org		README.org
sem-embed.el		sem-embed.el
sem.el		sem.el

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

sem.el

Installation and Usage

Multiple tables

About

Uh oh!

Releases

Packages

Languages

lepisma/sem.el

Folders and files

Latest commit

History

Repository files navigation

sem.el

Installation and Usage

Multiple tables

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages