Skip to content

JRuby wrapper and base classes for Lucene indexing

License

Notifications You must be signed in to change notification settings

diminish7/moonstone

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

= Moonstone

Moonstone is a JRuby wrapper around Java's Lucene search engine toolkit.

== Goals

* Engines: An engine abstraction that brings all common index and search operations under a single roof.
* Lucene Transparency: A transparent JRuby wrapper around Lucene which allows clear and direct access to the underlying lucene classes.
* Rubify Lucene: Higher level classes that simplify and Rubify the use of Lucene.

== Engines

Using moonstone is simple.

    engine = Moonstone::Engine.new
    # note: you can't use symbols for the keys
    engine.index([{'title' => 'first post', 'content' => "blah blah blah"}])
		# this sets the field that is searched by default
		engine.default_query_parser('title')
    documents = engine.search('first')
    documents.each {|d| puts d['title']}
		# to search on a different field
		engine.search("content:blah")

A typical engine will override two things; the doc_from method and the analyzer.

	class SimplestEngine < Moonstone::Engine
	  def doc_from(record)
	    doc = Lucene::Document::Doc.new
	    doc.add_field('name', record[:name])
	    doc.add_field('desc', record[:description])
	    doc
	  end

	  def analyzer
	    @analyzer ||= Lucene::Analysis::WhitespaceAnalyzer.new
	  end
	end 

	# We can now instantiate the engine
	engine = SimplestEngine.new

	# and create an index of some data
	records = [
	    {:name => "Moonstone", :description => "A simple JRuby wrapper around Lucene"},
	    {:name => "Foo", :description => "A search engine based on Moonstone"}
	  ]
	# By convention the index method accepts an Enumerable that yields hashes 
	engine.index(records)

	#Search for data
	q = Lucene::Search::TermQuery.new("name", "Moonstone")
	documents = engine.search(q)
	documents.each {|d| puts d['name']}

== Lucene Transparency

One of Moonstones core goals is to provide easy and transparent access to all of Lucene's classes and APIs. This is accomplished through wrapper classes which, among other things, can be used to mimic typical Lucene usage in Java.

	store = Lucene::Store::RAMDirectory.new
	analyzer = Lucene::Analysis::WhitespaceAnalyzer.new
	writer = Lucene::Index::IndexWriter.new(store, analyzer)
	document = Lucene::Document::Document.new
	STORE = Lucene::Document::Field::Store::YES
	ANALYZE = Lucene::Document::Field::Index::ANALYZED
	field = Lucene::Document::Field.new("name", "Pizza Hut", STORE, ANALYZE)
	document.add(field)
	writer.add_document(document)
	writer.close

== Rubify Lucene

Moonstone gently augments these wrapper classes with some common Ruby idioms and niceties, making enumerable things Enumerable, using the IO#open pattern with objects that need closing, and supporting the inclusion of namespaces.  Where possible, Moonstone adds helper methods that improve the signal-to-noise ratio.

	include Lucene::Store
	include Lucene::Index
	include Lucene::Analysis
	include Lucene::Document
	include Lucene::Search

	store = RAMDirectory.new
	analyzer = WhitespaceAnalyzer.new
	IndexWriter.open(store, analyzer) do |writer|
	  doc1 = Doc.new
	  #create a document and add fields manually
	  doc1.add_field("name", "Pizza Hut")
	  doc1.add_field("category", "pseudo-Italian food", :term_vector => :with_positions)
	  doc1.add_field("zip", "78660", :index => false)
	  #or create a doc and add fields by pattern in the constructor
	  doc2 = Doc.create [ ["name", "Olive Garden"], 
	                      ["category", "pseudo-Italian food", { :term_vector => :with_positions }],
	                      ["zip", "78660", { :index => false }] ]
	  writer.add_documents([doc1, doc2])
	end

	IndexSearcher.open(store) do |searcher|
	  query = TermQuery.new("name", "Pizza")
	  top_docs = searcher.search(query, 10)
	  top_docs.each(searcher) { |doc| puts doc["name"] }
	end

Moonstone also supplies base classes that make it easy to create such things as TokenFilters in Ruby:

	class StemFilter < Moonstone::Filter
	  def process(text)
	    # ... Code to stem tokens would go here ...
	  end
	end

Moonstone also makes the common stuff easy. Many filters transform a single token into multiple tokens which requires the filter to maintain a queue of the tokens for subsequent calls to next(). Moonstone provides a base class which handles this for you. Simply inherit from Moonstone::QueuedFilter and return an array of strings (or an individual string if no expansion is needed) and the tokenizing and queueing is handled by the base class.

	class MakePluralFilter < Moonstone::QueuedFilter
	  def process(text)
	    # we can return an array of strings or an individual string
	    [text, text + 's']
	  end
	end 

About

JRuby wrapper and base classes for Lucene indexing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published