Guidelines for search index creation to support auto complete / suggestions#2217
Conversation
There was a problem hiding this comment.
Pull Request Overview
The PR introduces comprehensive documentation for implementing search autocomplete functionality using edge n-gram tokenization in Bleve search engine. It provides detailed analysis of different tokenization methods and demonstrates why edge n-grams are optimal for autocomplete features.
Key changes:
- Adds detailed documentation on edge n-gram autocomplete implementation
- Compares various tokenization methods (single token, whitespace, regex, n-gram, edge n-gram)
- Provides practical code examples and configuration samples
Reviewed Changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| docs/search_autocomplete.md | New comprehensive guide covering edge n-gram autocomplete theory, implementation examples, and best practices |
| docs/create_and_search_your_first_index.md | New tutorial on basic Bleve index creation and search operations with incomplete ending section |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| 3. **Better caching**: Exact term queries cache better than prefix queries | ||
| 4. **Consistent performance**: Query time doesn't increase with index size | ||
|
|
||
| ## 5. On low level implementaion sample: |
There was a problem hiding this comment.
Typo in 'implementaion' should be 'implementation'.
| ## 5. On low level implementaion sample: | |
| ## 5. On low level implementation sample: |
| for _, token := range input { | ||
| runeCount := utf8.RuneCount(token.Term) | ||
| runes := bytes.Runes(token.Term) | ||
| // ..builds tokens based form either end, specified in the input |
There was a problem hiding this comment.
Typo in 'form' should be 'from'.
| // ..builds tokens based form either end, specified in the input | |
| // ..builds tokens based from either end, specified in the input |
| n gram token filter | ||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
The ending section appears to be incomplete notes rather than proper documentation. This section should either be completed with proper documentation or removed.
| ``` | ||
|
|
||
| ### search autocomplete | ||
| return the response for |
There was a problem hiding this comment.
@maneuvertomars you can add a link here to your search_autocomplete.md file, so its convenient for readers
|
@maneuvertomars It's our practice to be as descriptive as possible in the commit message section about what the commit covers. This'll benefit us in the future when we look at the commit log. |
| "autocomplete": { | ||
| "analyzer": "edge_ngram_analyzer", | ||
| "min_gram": 3, | ||
| "max_gram": 20 |
| "autocomplete": { | ||
| "analyzer": "edge_ngram_analyzer", | ||
| "min_gram": 2, | ||
| "max_gram": 12 |
There was a problem hiding this comment.
Syntax is off - i think it is just min and max in the config.
| ```json | ||
| { | ||
| "min_gram": 2, | ||
| "max_gram": 15 |
There was a problem hiding this comment.
Syntax is off - i think it is just min and max in the config.
| ```json | ||
| { | ||
| "min_gram": 1, | ||
| "max_gram": 20 |
…plete including stepwise flow
| @@ -0,0 +1,221 @@ | |||
| # Create and Search Index | |||
| @@ -0,0 +1,221 @@ | |||
| # Create and Search Index | |||
|
|
|||
| Demonstration of creating an index on Documents and making it searchable. | |||
There was a problem hiding this comment.
A simple how-to example using Bleve in Go to create an index, add documents, and run search queries with results.
| import ( | ||
| "fmt" | ||
| "log" | ||
| "os" |
There was a problem hiding this comment.
"os" package is not used, must be removed
| // Search the created index | ||
| query := bleve.NewQueryStringQuery("bleve") | ||
| searchRequest := bleve.NewSearchRequest(query) | ||
| searchRequest.Size = 10 | ||
| searchResult, err := index.Search(searchRequest) | ||
| if err != nil { | ||
| log.Fatal(err) | ||
| } | ||
|
|
||
| for i, hit := range searchResult.Hits { | ||
| fmt.Printf("%d. Document: %s (Score: %.2f)\n", i+1, hit.ID, hit.Score) | ||
| if len(hit.Fragments) > 0 { | ||
| for field, fragments := range hit.Fragments { | ||
| fmt.Printf(" %s: %s\n", field, fragments[0]) | ||
| } | ||
| } | ||
| fmt.Println() | ||
| } | ||
| } |
There was a problem hiding this comment.
simplify query
// Search the created index
query := bleve.NewMatchQuery("bleve")
searchRequest := bleve.NewSearchRequest(query)
searchRequest.Explain = true
searchRequest.Fields = []string{"title", "content"}
searchResult, err := index.Search(searchRequest)
if err != nil {
log.Fatal(err)
}
fmt.Println(searchResult)
| } | ||
| } | ||
| ``` | ||
| ### Output: |
There was a problem hiding this comment.
add a space between the code and the Output - also ## Output:
| **Field configuration explained:** | ||
| - `"analyzer": "search_autocomplete_feature"` - Use our custom analyzer | ||
| - `"store": true` - Keep original text for display | ||
| - `"index": true` - Make it searchable | ||
| - `"include_in_all": true` - Include in default search field |
There was a problem hiding this comment.
Not required, as the Bleve API will allow users to modify these options
| - `"index": true` - Make it searchable | ||
| - `"include_in_all": true` - Include in default search field | ||
|
|
||
|  |
| **User types "sc":** | ||
| 1. Query: `name:sc` | ||
| 2. Bleve looks up exact term "sc" in the index | ||
| 3. Finds document with "Schaumbergfest" |
There was a problem hiding this comment.
and "Script" right?
Finds documents with "Schaumbergfest" and "Script"
| ID string `json:"id"` | ||
| Title string `json:"title"` | ||
| Content string `json:"content"` | ||
| Author string `json:"author"` |
| 3. Finds document with "Schaumbergfest" | ||
| 4. Returns suggestion instantly | ||
|
|
||
|  |
There was a problem hiding this comment.
Replace Couchbase UI with code
type Document struct {
ID string `json:"id"`
Title string `json:"title"`
}
// 4. Index Documents
documents := []Document{
{
ID: "doc1",
Title: "Schaumbergfest",
},
{
ID: "doc2",
Title: "Script",
},
}
batch := index.NewBatch()
for _, doc := range documents {
batch.Index(doc.ID, doc)
}
if err := index.Batch(batch); err != nil {
log.Fatal(err)
}
// 5. Search the created index
query := bleve.NewMatchQuery("sc")
query.SetField("title")
searchRequest := bleve.NewSearchRequest(query)
searchRequest.Explain = true
searchRequest.Fields = []string{"title"}
searchResult, err := index.Search(searchRequest)
if err != nil {
log.Fatal(err)
}
fmt.Println(searchResult)
Output
$ go run main.go
2 matches, showing 1 through 2, took 311.125µs
1. doc2 (0.343255)
title
Script
2. doc1 (0.343255)
title
Schaumbergfest
all previous reviews fixed and got it re-reviewed from @CascadingRadium
Did a write up for search-autocomplete.
Comprehensive documentation for implementing search autocomplete functionality using edge n-gram tokenization in Bleve search engine. It provides detailed analysis of different tokenization methods and demonstrates why edge n-grams are optimal for autocomplete features.
Key changes:
docs/search_autocomplete.md -> Dsicussion about edge n-gram autocomplete theory, implementation examples, and json mappings
docs/create_and_search_your_first_index.md -> Basic Bleve index creation and search operations with incomplete ending section