Skip to content

jaju/lucene-clj

Repository files navigation

org.msync/lucene-clj https://github.com/jaju/lucene-clj/actions/workflows/clojure.yml/badge.svg

What is lucene-clj?

A simple Clojure wrapper for Apache Lucene, currently targeting Lucene 10.4.0.

Key usage scenarios

Search
The core use-case of Lucene.
Suggest
Prefix-queries for content in any field.
Query flexibility
Search supports fielded maps, OR via sets, AND via vectors/sequences, plain strings, and classic Lucene query syntax.
Analyzer composition
Convenience helpers exist for standard, keyword, simple, and per-field analyzers.

Both in-memory, and on-disk indexes can be used depending on the dataset size. Disk indexes can also be rebuilt in place with :re-create? true.

Note: UNSTABLE API. No releases yet.

Inspired by other example wrappers I’ve come across. Notably

Adding Dependency to a Project

[org.msync/lucene-clj "0.3.0-SNAPSHOT"]

Available via clojars.

Current development baseline:

  • Clojure 1.12.4
  • Apache Lucene 10.4.0
  • Java 21 or newer

Why Would You Want to Use lucene-clj?

The primary use-case is for in-process text search needs for read-only data-sets that can be managed on single-instance deployments. For multi-instance deployments, keeping modifications of data in sync is an effort.

Use this library when you need light-weight text-search support without the hassle of setting up something like Solr. You may update the index if you wish, but have to take care of any race conditions, and since it is in-process, you will also need to take care of updating all instances in a multi-instance use scenario.

The objectives are loosely as follows.

  • Stick to core Lucene. No script/language specific dependencies part of the core library, but can be added by users per need.
  • Support for prefix based suggestions - a feature of Lucene I found quite undocumented, as well as lacking good examples for.
  • Track the latest Lucene versions.

I am thankful to the above library authors for their liberal licensing. I’ve used their ideas/code in places.

Vision

lucene-clj is an opinionated embedded retrieval library for Clojure applications.

It is not trying to wrap all of Lucene. Instead, it focuses on a narrow slice of Lucene that is especially valuable for application search and agentic workflows:

  • map-first document indexing
  • analyzer composition
  • ergonomic lexical retrieval
  • suggestions and completion
  • explainable ranking
  • future vector and hybrid retrieval

The design principle is to keep the human-facing API compact and idiomatic. Where possible, Clojure data shapes carry query intent instead of multiplying API entry points.

For retrieval strategy, the intended order of importance is:

  • BM25-style lexical retrieval as the primary baseline
  • optional classic TF-IDF-style scoring when comparison or compatibility is useful
  • vector retrieval as a complement, not a replacement, for lexical retrieval
  • hybrid retrieval when both lexical precision and semantic recall matter

Deliberately Out Of Scope

At least for now, lucene-clj is not trying to be:

  • a complete wrapper over all Lucene modules
  • a distributed search system
  • a replacement for Solr, Elasticsearch, or OpenSearch
  • a schema-heavy document database
  • a host for every Lucene storage, codec, faceting, spatial, or analytics feature
  • a direct exposure of low-level Lucene internals when a smaller Clojure abstraction is sufficient

This focus is intentional. Lucene is vast; lucene-clj aims to be small, opinionated, and excellent at embedded retrieval rather than broad and thin.

Benchmarking

lucene-clj includes a small benchmark harness for the indexing hot path.

Run it with:

lein bench benchmarks/manual.edn manual $(git rev-parse --short HEAD)

This writes EDN captures under benchmarks/ so performance changes can be reviewed and committed.

Current measured improvements from the first indexing hot-path refactor:

  • map->document for one document: 33.2 us -> 6.15 us
  • compiled one-document encode: 7.53 us -> 4.10 us
  • compiled batch encode: 7.04 ms -> 5.18 ms
  • batch indexing: 31.6 ms -> 25.8 ms

The benchmark harness separates encoder cost from end-to-end indexing cost so hot-path changes can be evaluated more precisely.

Recent benchmark captures use the same directory strategy as lucene-clj’s :memory indexes. Older captures that used a different Lucene directory implementation are useful historically, but not directly comparable to the current ones.

The later clean break to the canonical :fields schema stayed in the same performance range:

  • compiled batch encode: 5.18 ms -> 4.94 ms
  • batch indexing: 25.8 ms -> 27.1 ms

That trade-off is intentional: schema clarity moved up substantially while indexing throughput stayed close to the optimized baseline.

Usage - A Complete Scenario

There’s sample data in the repository that we use in our examples. A hand-created sample with fictional and non-fictional characters is here and one from Kaggle on music albums is here. These are also used in the tests.

A complete scenario from index creation to search actions is described below.

Sample Datasets Used

  1. Albums - Kaggle - [local]
  2. Hand-created, real + fictional characters here

Lucene’s Document Model

When dealing with Lucene and data it processes, key terms to note are

Document
A unit of related text. It has possibly many fields, and is a unit of consumption and also of each search result. A Document is a collection of Fields.
Field
Every field is a container of indexable content. They can range across many types, from simple text to latitude and longitude.
Analyzer
Analyzes the input documents, and preprocesses terms appropriately. Depending on the context, decisions on tokenizing, stemming, stopwords removal, or treating input as-is - these are controlled by the use of appropriate analyzers

This is a pretty hand-wavy description, but useful enough for our purpose.

Some Background - Data Preparation

Lucene consumes documents, each of which is made up of fields having values. As is natural in Clojure, we represent all such things as maps.

{
 :title-field "This is a title"
 :abstract-field "This is an abstract of what is to follow"
 :author-field "Lekhak Sampaadak"
 :body-field "And here's the crux of the article with all the gory details"
 }

To prepare our content for ingestion and indexing, we do some straightforward CSV parsing and conversion of each row into a map. Each column has a name and is used as the key for the field name in the document-map. All the preparation code is in the msync.lucene.tests-common test namespace, which we’ll refer to as the common namespace where required for clarity. We use two CSV data-sets as our sources of documents to create two indexes, to demonstrate some distinct use-cases. All data files are in the ~test-resources~ subdirectory.

A Glimpse of the Data

We use two simple datasets, stored as CSV. Loading is straightforward CSV parsing and converting to maps – the first rows in each file are the header rows, holding names of respective columns.

  • Sample, hand-coded documents. Plain, simple data.
;; In the common namespace
(take 5 (read-csv-resource-file sample-data-file))
first-namelast-nameagerealgenderbio
SuppandiVaradarajan16falsemA wonderful, innocent soul. You’ll enjoy his antics.
ShikariShambhu32FalsemCarries a gun. But no bullets. Animals love him.
ChachaChaudhary64FalSemThe supercomputer. And then some more!
SabuJupiterwala2falsemYes, of legal age. Just a different age-scale because of the planet he comes from. Strong, powerful, but kind. Because, not an earthling. Children love him.
  • Albums data. From Kaggle.
    • The columns Genre and Subgenre, are comma-separated values themselves
      • They are to be pre-processed before feeding to lucene-clj
      • These are multi-valued fields.
;; In the common namespace
(take 5 (read-csv-resource-file albums-file))
NumberYearAlbumArtistGenreSubgenre
11967Sgt. Pepper’s Lonely Hearts Club BandThe BeatlesRockRock & Roll, Psychedelic Rock
21966Pet SoundsThe Beach BoysRockPop Rock, Psychedelic Rock
31966RevolverThe BeatlesRockPsychedelic Rock, Pop Rock
41965Highway 61 RevisitedBob DylanRockFolk Rock, Blues Rock

Creating Analyzers

Analyzers process each field’s content in a manner that is apt - according to what the programmer/domain-expert decides.

Some fields need to be tokenized and stemmed, while some are to be treated verbatim. Natural language text, versus some proper nouns like company name or music genre.

In the albums dataset, the Year, Genre and Subgenre fields’ texts are not to be tokenized and stemmed, or filtered for stop-words. Hence, they are configured to be analyzed with the keyword analyzer. Other fields can be treated like normal text. So, in this case, we use a composed analyzer that can treat each field in its special way.

Note that the same analyzers we use while creating indexes should be used when querying the index for search and suggest to avoid surprises. This shouldn’t be surprising.

Here’s how we create analyzers.

;; In the common namespace
;; This is the default analyzer, an instance of the StandardAnalyzer
;; of Lucene
(defonce default-analyzer (analyzers/standard-analyzer))

;; This analyzer considers field values verbatim
;; Will not tokenize and stem
(defonce keyword-analyzer (analyzers/keyword-analyzer))

;; A per-field analyzer, which composes other kinds of analyzers
;; For album data, we have marked some fields as verbatim
;; Takes a default analyzer, and then a map of field to field-specific analyzer
(defonce album-data-analyzer
  (analyzers/per-field-analyzer default-analyzer
                                {:Year     keyword-analyzer
                                 :Genre    keyword-analyzer
                                 :Subgenre keyword-analyzer}))

Some simple REPL-runs

With the background setup done and explained, let us move ahead to demonstrating indexing and searching. You may want to try the following in a REPL by requiring the namespace the prior code is in and then playing along. I’ve used the dev namespace below, the code for which can be found here.

Preamble

(ns dev
  (:require [msync.lucene :as lucene]
            [msync.lucene
             [document :as ld]
             [tests-common :as common]]))

Create an index

In memory

(defonce album-index (lucene/create-index! :type :memory
                                           :analyzer common/album-data-analyzer))

Or, on disk

(defonce album-index (lucene/create-index! :type :disk
                                           :path "/path/to/index/directory"
                                           :analyzer common/album-data-analyzer))

If you want to rebuild an existing on-disk index from scratch, pass :re-create? true.

(defonce album-index (lucene/create-index! :type :disk
                                           :path "/path/to/index/directory"
                                           :analyzer common/album-data-analyzer
                                           :re-create? true))

A sample of the album data for reference. The Genre and Subgenre columns are pre-processed, as mentioned above, and split further.

(drop 2 (take 5 common/album-data))
({:Number "3",
  :Year "1966",
  :Album "Revolver",
  :Artist "The Beatles",
  :Genre ("Rock"),
  :Subgenre ("Psychedelic Rock" "Pop Rock")}
 {:Number "4",
  :Year "1965",
  :Album "Highway 61 Revisited",
  :Artist "Bob Dylan",
  :Genre ("Rock"),
  :Subgenre ("Folk Rock" "Blues Rock")}
 {:Number "5",
  :Year "1965",
  :Album "Rubber Soul",
  :Artist "The Beatles",
  :Genre ("Rock" "Pop"),
  :Subgenre ("Pop Rock")})

Index documents

Documents are Clojure maps. Each key-value in the map represents one logical field. The second argument to index! is now a canonical :fields schema where each field is defined in one place.

The core field options are:

  • :type - currently :text, :keyword, :long, :boolean, :double, or :instant
  • :stored? - whether the field value is stored and can be returned later
  • :indexed? - whether the field participates in search
  • :multi-valued? - whether the field accepts multiple values
  • :suggest - optional completion settings
    • :weight adjusts ranking of suggestions
    • :contexts-from uses one field, several fields, or a function to derive suggestion contexts

Use :keyword for exact string matching, :long and :double for exact numeric matching, :boolean for true-or-false fields, and :instant for exact time matching.

Typed fields work naturally with map queries:

{:fields {:rating       {:type :double}
          :published-at {:type :instant}
          :active       {:type :boolean}}}

(lucene/search idx {:rating 4.5})
(lucene/search idx {:published-at (java.time.Instant/parse "1977-02-04T00:00:00Z")})
(lucene/search idx {:active true})

In the following schema:

  • :Year, :Genre, and :Subgenre are exact-match keyword fields
  • :Genre and :Subgenre are marked multi-valued
  • :Album and :Artist are suggest-enabled
  • :Album suggestions are given more weight than :Artist
  • suggestion contexts come from the :Genre field
(def album-fields
  {:Number   {:type :text
              :stored? true
              :indexed? true}
   :Year     {:type :keyword
              :stored? true
              :indexed? true}
   :Album    {:type    :text
              :stored? true
              :indexed? true
              :suggest {:weight 5
                        :contexts-from :Genre}}
   :Artist   {:type    :text
              :stored? true
              :indexed? true
              :suggest {:contexts-from :Genre}}
   :Genre    {:type          :keyword
              :stored?       true
              :indexed?      true
              :multi-valued? true}
   :Subgenre {:type          :keyword
              :stored?       true
              :indexed?      true
              :multi-valued? true}})
(lucene/index! album-index common/album-data
               {:fields common/album-fields})

Now, we can search

A simple search example, in which we pass a map specifying the field, and the value we are looking for. The result includes the :hit, a :score for that :hit, and the :doc-id which is an identifier that Lucene manages. Notice that the result - :hit - is a Lucene Document object.

(lucene/search album-index {:Year "1977"}
               {:results-per-page 2})
[{:doc-id 25,
  :score 1.4994705,
  :hit
  #object[org.apache.lucene.document.Document 0x24750f97 "Document<stored,indexed,tokenized,indexOptions=DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS<Number:26> stored,indexed,tokenized,indexOptions=DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS<Year:1977> stored,indexed,tokenized,indexOptions=DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS<Album:Rumours> stored,indexed,tokenized,indexOptions=DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS<Artist:Fleetwood Mac> stored,indexed,tokenized,indexOptions=DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS<Genre:Rock> stored,indexed,tokenized,indexOptions=DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS<Subgenre:Pop Rock>>"]}
 {:doc-id 40,
  :score 1.4994705,
  :hit
  #object[org.apache.lucene.document.Document 0x6d6a6fe4 "Document<stored,indexed,tokenized,indexOptions=DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS<Number:41> stored,indexed,tokenized,indexOptions=DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS<Year:1977> stored,indexed,tokenized,indexOptions=DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS<Album:Never Mind the Bollocks Here's the Sex Pistols> stored,indexed,tokenized,indexOptions=DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS<Artist:Sex Pistols> stored,indexed,tokenized,indexOptions=DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS<Genre:Rock> stored,indexed,tokenized,indexOptions=DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS<Subgenre:Punk>>"]}]

For convenience, lucene-clj has a function that can be used to convert the Lucene Document into a Clojure map. It supports both :multi-fields and :fields-to-keep to shape the result. When you use typed stored fields such as :boolean or :instant, pass :field-specs so document->map can decode them back to typed values. But beyond basic use-cases, supply your own.

(lucene/search album-index {:Year "1977"}
               {:results-per-page 2
                :hit->doc ld/document->map})
[{:doc-id 25,
  :score 1.4994705,
  :hit
  {:Number "26",
   :Year "1977",
   :Album "Rumours",
   :Artist "Fleetwood Mac",
   :Genre "Rock",
   :Subgenre "Pop Rock"}}
 {:doc-id 40,
  :score 1.4994705,
  :hit
  {:Number "41",
   :Year "1977",
   :Album "Never Mind the Bollocks Here's the Sex Pistols",
   :Artist "Sex Pistols",
   :Genre "Rock",
   :Subgenre "Punk"}}]

Notice though, that the :Genre and :Subgenre fields did not come back as collections. The document->map function isn’t smart to identify that, and needs a hint to make that happen. With the modified hit->doc argument, the two fields come back as vectors with possibly multiple values.

(lucene/search album-index
               {:Year "1977"}
               {:results-per-page 2
                :hit->doc #(ld/document->map % :multi-fields [:Genre :Subgenre])})
[{:doc-id 25,
  :score 1.4994705,
  :hit
  {:Number "26",
   :Year "1977",
   :Album "Rumours",
   :Artist "Fleetwood Mac",
   :Genre ["Rock"],
   :Subgenre ["Pop Rock"]}}
 {:doc-id 40,
  :score 1.4994705,
  :hit
  {:Number "41",
   :Year "1977",
   :Album "Never Mind the Bollocks Here's the Sex Pistols",
   :Artist "Sex Pistols",
   :Genre ["Rock"],
   :Subgenre ["Punk"]}}]

Paginated query results are supported via the :page option. Also, the following example projects a subset of the document fields by passing a modified function as the :hit->doc argument.

(lucene/search album-index
               {:Year "1968"} ;; Map of field-values to search with
               {:results-per-page 5 ;; Control the number of results returned
                :page 2             ;; Page number, starting 0 as default
                :hit->doc         #(-> %
                                       ld/document->map
                                       (select-keys [:Year :Album]))})
[{:doc-id 160,
  :score 1.4311604,
  :hit {:Year "1968", :Album "The Dock of the Bay"}}
 {:doc-id 170,
  :score 1.4311604,
  :hit {:Year "1968", :Album "The Notorious Byrd Brothers"}}
 {:doc-id 204,
  :score 1.4311604,
  :hit {:Year "1968", :Album "Wheels of Fire"}}
 {:doc-id 233,
  :score 1.4311604,
  :hit {:Year "1968", :Album "Bookends"}}
 {:doc-id 257,
  :score 1.4311604,
  :hit
  {:Year "1968",
   :Album "The Kinks Are The Village Green Preservation Society"}}]

The same projection can be expressed with ld/document->map directly by using :fields-to-keep.

(lucene/search album-index
               {:Year "1968"}
               {:results-per-page 5
                :page 2
                :hit->doc #(ld/document->map % :fields-to-keep #{:Year :Album})})

For one-off calls, the :page option is still convenient. For stable pagination across repeated calls, prefer a reusable search session plus :search-after. The continuation token can simply be the last result map from the previous page, since it already carries :doc-id and :score.

(with-open [session (lucene/open-session album-index)]
  (let [page-0 (lucene/search session
                              {:Year "1968"}
                              {:results-per-page 5
                               :hit->doc #(ld/document->map % :fields-to-keep #{:Year :Album})})
        page-1 (lucene/search session
                              {:Year "1968"}
                              {:results-per-page 5
                               :search-after (last page-0)
                               :hit->doc #(ld/document->map % :fields-to-keep #{:Year :Album})})]
    [page-0 page-1]))

The session pins a single Lucene reader snapshot. That means repeated search and suggest calls within the same with-open block see a stable view of the index. If the index changes and you open a new session, result ordering and pagination boundaries may change accordingly.

Search variations

Simple search

Searching in a single field, for a single value

(lucene/search album-index {:Year "1967"} {:results-per-page 2 :hit->doc ld/document->map})

Fielded string search

When the query form is a plain string, pass :field-name.

  • A single word becomes a term query.
  • A string containing spaces becomes a phrase query.
(lucene/search album-index "the sun"
               {:field-name :Album
                :hit->doc ld/document->map})

Classic Lucene query syntax

For Lucene’s query parser syntax, parse the DSL explicitly and pass the resulting Query object to lucene/search.

(lucene/search album-index
               (msync.lucene.query/parse-dsl "Album:\"the sun\" AND Year:1976"
                                             common/album-data-analyzer)
               {:hit->doc ld/document->map})

OR Search

Searching in a single field, where any of the values in the set are allowed

(lucene/search album-index {:Year #{"1960" "1965"}}
               {:results-per-page 5
                :hit->doc #(-> % ld/document->map (select-keys [:Year :Album]))})
[{:doc-id 118,
  :score 2.2562923,
  :hit {:Year "1960", :Album "At Last!"}}
 {:doc-id 347,
  :score 2.2562923,
  :hit {:Year "1960", :Album "Muddy Waters at Newport 1960"}}
 {:doc-id 357,
  :score 2.2562923,
  :hit {:Year "1960", :Album "Sketches of Spain"}}
 {:doc-id 3,
  :score 1.6102078,
  :hit {:Year "1965", :Album "Highway 61 Revisited"}}
 {:doc-id 4,
  :score 1.6102078,
  :hit {:Year "1965", :Album "Rubber Soul"}}]

AND Search

When looking for multiple terms in a single field, pass a vector.

(lucene/search album-index {:Album ["complete" "unbelievable"]} {:hit->doc ld/document->map})
[{:doc-id 253,
  :score 3.0571077,
  :hit
  {:Number "254",
   :Year "1966",
   :Album
   "Complete & Unbelievable: The Otis Redding Dictionary of Soul",
   :Artist "Otis Redding",
   :Genre "Funk / Soul",
   :Subgenre "Soul"}}]

Be sure that your queries are semantically right for the data-set. For example, AND-ing over two different years will lead to an empty result-set, obviously.

(lucene/search album-index {:Year ["1964" "1965"]})
[]

Phrase search

Spaces in the query string are inferred to mean a phrase search operation

(lucene/search album-index {:Album "the sun"} {:hit->doc ld/document->map})
[{:doc-id 10,
  :score 2.8861985,
  :hit
  {:Number "11",
   :Year "1976",
   :Album "The Sun Sessions",
   :Artist "Elvis Presley",
   :Genre "Rock",
   :Subgenre "Rock & Roll"}}
 {:doc-id 287,
  :score 2.544825,
  :hit
  {:Number "288",
   :Year "1968",
   :Album "Anthem of the Sun",
   :Artist "Grateful Dead",
   :Genre "Rock",
   :Subgenre "Psychedelic Rock"}}
 {:doc-id 310,
  :score 2.544825,
  :hit
  {:Number "311",
   :Year "1994",
   :Album "The Sun Records Collection",
   :Artist "Various",
   :Genre "& Country",
   :Subgenre "Rockabilly"}}]

Searching across fields

This is an AND operation

(lucene/search album-index {:Album "the sun" :Year "1976"} {:hit->doc ld/document->map})
[{:doc-id 10,
  :score 4.56387,
  :hit
  {:Number "11",
   :Year "1976",
   :Album "The Sun Sessions",
   :Artist "Elvis Presley",
   :Genre "Rock",
   :Subgenre "Rock & Roll"}}]

Suggestions

Notice that in the suggest function call, the field and suggestion-prefix are not passed as a map, as unlike search, suggest calls are only supported over a single field.

Suggestions for suggest-enabled fields

From above, the fields Album and Artist have a :suggest entry in their field specs. Suggestion weight and contexts are part of the field definition instead of being spread across separate indexing options.

(lucene/suggest album-index :Album "par"
                {:hit->doc #(ld/document->map % :multi-fields [:Genre :Subgenre])
                 :contexts ["Electronic"]})
[{:hit
  {:Number "140",
   :Year "1978",
   :Album "Parallel Lines",
   :Artist "Blondie",
   :Genre ["Electronic" "Rock"],
   :Subgenre ["New Wave" "Pop Rock" "Punk" "Disco"]},
  :score 1.0,
  :doc-id 139}]

Use :max-results to cap the number of suggestions returned, and :skip-duplicates? true when duplicate suggestions are not useful for the caller.

(lucene/suggest album-index :Album "s"
                {:max-results 2
                 :skip-duplicates? true
                 :hit->doc #(ld/document->map % :fields-to-keep #{:Album :Artist})})

We can ask for fuzzy matching when querying for suggestions.

(lucene/suggest album-index :Album "per"
                {:hit->doc #(ld/document->map % :multi-fields [:Genre :Subgenre])
                 :fuzzy? true
                 :contexts ["Electronic"]})
[{:hit
  {:Number "140",
   :Year "1978",
   :Album "Parallel Lines",
   :Artist "Blondie",
   :Genre ["Electronic" "Rock"],
   :Subgenre ["New Wave" "Pop Rock" "Punk" "Disco"]},
  :score 2.0,
  :doc-id 139}
 {:hit
  {:Number "76",
   :Year "1984",
   :Album "Purple Rain",
   :Artist "Prince and the Revolution",
   :Genre ["Electronic" "Rock" "Funk / Soul" "Stage & Screen"],
   :Subgenre ["Pop Rock" "Funk" "Soundtrack" "Synth-pop"]},
  :score 2.0,
  :doc-id 75}]

Or, do a fuzzy search

Notice how forever matches fever too.

(lucene/search album-index {:Album "forever"}
               {:hit->doc #(ld/document->map % :multi-fields [:Genre :Subgenre])
                :fuzzy? true})
[{:doc-id 39,
  :score 3.0850303,
  :hit
  {:Number "40",
   :Year "1967",
   :Album "Forever Changes",
   :Artist "Love",
   :Genre ["Rock"],
   :Subgenre ["Folk Rock" "Psychedelic Rock"]}}
 {:doc-id 131,
  :score 0.9592955,
  :hit
  {:Number "132",
   :Year "1977",
   :Album "Saturday Night Fever: The Original Movie Sound Track",
   :Artist "Various Artists",
   :Genre ["Electronic" "�Stage & Screen"],
   :Subgenre ["Soundtrack" "�Disco"]}}]

Query shape cheat sheet

lucene-clj leans on Clojure data shapes so that one search function can cover the common query kinds.

{:field "value"}
fielded search
{:field ["a" "b"]}
AND within one field
{:field #{"a" "b"}}
OR within one field
"single term" with :field-name
single-field term search
"multiple words" with :field-name
phrase search
Query from msync.lucene.query/parse-dsl
explicit Lucene query syntax

Examples:

(lucene/search album-index {:Album "rumours"})
(lucene/search album-index {:Album ["complete" "unbelievable"]})
(lucene/search album-index {:Year #{"1960" "1965"}})
(lucene/search album-index "the sun" {:field-name :Album})
(lucene/search album-index
               (msync.lucene.query/parse-dsl "Album:\"the sun\" AND Year:1976"
                                             common/album-data-analyzer))

This API is intentionally shape-driven: the kind of Clojure value you pass determines the query behavior. That keeps the public surface compact, but it also means that changing a query from a set to a vector, or from a single word to a spaced string, changes the semantics. `

Additional notes

  • Some minimal technical overview of Lucene internals for this project can be found here.

License

Copyright © 2018-2020 Ravindra R. Jaju

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

About

A Clojure wrapper for Apache Lucene.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors