Yamete-Go comes from the Japanese word "Yamete" (γγγ¦), which means "stop".
In this context, Yamete-Go is a high-performance text censorship library that utilizes a Trie-based pattern matching algorithm to detect and censor unwanted words in a text.
Input:
- badword
- banana
Trie Visualization:
Root
βββ b
β βββ a
β β βββ d
β β β βββ w
β β β β βββ o
β β β β β βββ r
β β β β β β βββ d (end)
β β β β β β
β βββ a
β β βββ n
β β β βββ a
β β β β βββ n
β β β β β βββ a (end)
If you insert words like banana and badword into the trie, the censorship system will replace them with asterisks, as shown below:
Input:
"This is a badword and banana!"
Output:
"this is a ******* and ******!"
(Note: The input is automatically converted to lowercase.)Yamete-Go processes only alphabetic characters (a-z). If the input text contains numbers, those characters are ignored during processing.
example:
- Input:
4ppl3s
- Trie Visualization:
Root
βββ p
β βββ p
β β βββ l
β β β βββ s (end)
- Output:
4ppl3s -> **true** (The word is not censored because numeric characters are ignored.)yamete-go is a library designed to help analyze and censor text based on predefined toxic word lists. Below are the steps to use it effectively.
To start using yamete-go, you need to create a configuration object (YameteConfig) that specifies the source of the toxic word list. You can load the word list either from a URL or a local file.
yameteCfg := yamete.YameteConfig{
URL: "", // URL of the file to be loaded (e.g., a raw GitHub link)
File: "", // File path of the file to be loaded (local file path)
}Note:
- If you load the word list from a URL, ensure that the raw text is UTF-8 encoded.
- Example of a valid URL: https://raw.githubusercontent.com/fanchann/toxic-word-list/master/id_toxic_371.txt
Once the configuration is set, initialize the yamete-go instance by passing the configuration object to the NewYamete function.
yameteInit, err := yamete.NewYamete(&yameteCfg)
if err != nil {
panic(err) // Handle errors appropriately in your application
}After initializing yamete-go, you can analyze any text using the AnalyzeText method. This method returns detailed information about the analyzed text, including the original text, censored text, detected toxic words, and the count of censored words.
response := yameteInit.AnalyzeText("dasar lu bot!")
// Print the response details
fmt.Printf("Original Text: %v\n", response.OriginalText) // Output: dasar lu bot!
fmt.Printf("Censored Text: %v\n", response.CensoredText) // Output: dasar lu ***!
fmt.Printf("Censored Words: %v\n", response.CensoredWords) // Output: [bot]
fmt.Printf("Censored Count: %v\n", response.CensoredCount) // Output: 1Hereβs a breakdown of the fields returned by the AnalyzeText method:
OriginalText: The original input text provided for analysis.CensoredText: The text after censoring toxic words (toxic words are replaced with***).CensoredWords: A list of toxic words detected in the text.CensoredCount: The total number of toxic words detected.
package main
import (
"fmt"
yamete "github.com/fanchann/yamete-go"
)
func main() {
yameteCfg := yamete.YameteConfig{
URL: "https://raw.githubusercontent.com/fanchann/toxic-word-list/refs/heads/master/id_toxic_371.txt",
// File: "files/id_words_toxic.txt",
}
yameteInit, err := yamete.NewYamete(&yameteCfg)
if err != nil {
panic(err)
}
response := yameteInit.AnalyzeText("dasar lu bot!")
fmt.Printf("response.OriginalText: %v\n", response.OriginalText) // dasar lu bot!
fmt.Printf("response.CensoredText: %v\n", response.CensoredText) // dasar lu ***!
fmt.Printf("response.CensoredWords: %v\n", response.CensoredWords) // [bot]
fmt.Printf("response.CensoredCount: %v\n", response.CensoredCount) // 1
}go get github.com/fanchann/yamete-go