Skip to content
This repository was archived by the owner on Aug 23, 2025. It is now read-only.

lepisma/sem.el

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sem.el

Development for this project has moved to sourcehut here. This repository is now in a read-only state.

Local semantic indexing and search for Emacs.

Installation and Usage

There is a dynamic module that needs cargo installed for building. This module is automatically compiled when you install the package using something like this:

(use-package sem
  :vc (:fetcher github :repo lepisma/sem.el)
  :demand t)

Note that this method only handles creating .so files which means it won’t work on non-Linux systems yet. For other systems, manually inspect the Makefile and compile the module yourself.

Once built and installed, you can use it like the following. Note that the example below needs you to have tokenizers.el and onnx.el installed as they are needed by sem-embed to generate embeddings.

(require 'sem)
;; sem-embed is an optional package providing general purpose local embedding
;; functions
(require 'sem-embed)

(setq sem-data-dir (expand-file-name "~/.emacs.d/sem/"))
(setq sem-embed-model-path (expand-file-name "~/.emacs.d/sem/model_O2.onnx"))

(defun embed-batch-fn (items)
  "The embedding function that takes lisp objects and returns a 2D matrix
of vectors (n-items x dim)."
  (sem-embed-default (apply #'vector (mapcar #'prin1-to-string items))))

(defun embed-fn (item)
  (aref (embed-batch-fn (list item)) 0))

;; The output dimension of default embedder from sem-embed is needed here
(sem-db-new "ml-test" sem-embed-dim)
(setq sem-db (sem-db-load "ml-test"))

;; First we will add a few items to do similarity search
(let ((items (list "hello world"
                   "this is an introduction"
                   "movies are bad"
                   "food is good")))
  (sem-add-batch sem-db items #'embed-batch-fn))

;; #'identity is the read-fn which is used to load back the lisp object from
;; their string representation. By default the `sem-add' function uses
;; `prin1-to-string' for building the strings representation.
(sem-similar sem-db "worst" 2 #'embed-fn #'identity)
;; ((0.33184681863908483 . "movies are bad") (0.25182008665115796 . "hello world"))
;; Output is a list of score, item pairs

In case you have added a lot of data, you might want to consider indexing the vectors for fast (and approximate) searches. Here is how you can do this:

;; All are sync functions in the current version

;; This builds an index on the vectors
(sem-build-index sem-db)

;; Once you have added more new data, run re-indexing (and other optimizations)
(sem-optimize sem-db)

;; Check item counts with
(sem-items-count sem-db)

Multiple tables

On creation of a new database, the package automatically creates a default table (see sem-default-table-name) which is used for all of the examples shown above. In case you want to store many tables in one database, here is how you can do so:

;; You can check if a table is already present
(sem-table-present-p sem-db table-name)

;; Also here is how to list all tables
(sem-table-list sem-db)

(sem-table-new sem-db table-name dim)
;; After creation, you can pass table-name as optional parameter in all
;; operations where it makes sense. Here are a few examples, check function
;; docstrings for more.
(sem-build-index sem-db table-name)

;; In case you want to delete a table
(sem-table-delete sem-db table-name)

About

Local semantic indexing and search for Emacs

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published