A lightweight Go utility for identifying and extracting prominent keywords from given text inputs.
- Reads text from file or standard input.
- Implements a basic frequency-based keyword extraction strategy.
- Supports custom stop-word lists.
- Outputs a ranked list of keywords.
To install extract-text-keywords globally on your system, you can use go install:
go install github.com/yourusername/extract-text-keywords@latestAlternatively, you can clone the repository and build it manually:
git clone https://github.com/yourusername/extract-text-keywords.git
cd extract-text-keywords
go build -o extract-keywords .This will create an executable named extract-keywords in your current directory.
The extract-text-keywords utility can process text from a file or directly from standard input.
To extract keywords from a text file:
extract-keywords -file path/to/your/textfile.txtTo pipe text directly into the utility:
cat path/to/your/textfile.txt | extract-keywords
# Or type directly and press Ctrl+D when done:
# extract-keywords
# Your text goes here...
# ...You can provide a custom list of stop words (one word per line) using the -stopwords flag. If not provided, a default English stop-word list is used.
First, create a my_stopwords.txt file:
apple
orange
fruit
Then run the utility:
extract-keywords -file path/to/your/textfile.txt -stopwords my_stopwords.txtBy default, the utility might output a significant number of keywords. You can limit the output to the top N keywords using the -limit flag:
extract-keywords -file path/to/your/textfile.txt -limit 10Currently, extract-text-keywords employs a basic frequency-based approach:
- Text Cleaning: Removes punctuation and converts text to lowercase.
- Tokenization: Splits the text into individual words.
- Stop Word Filtering: Removes common words (stop words) that usually carry little semantic value.
- Frequency Counting: Counts the occurrences of each remaining word.
- Ranking: Ranks words by their frequency in descending order.
Future versions may include more advanced strategies like TF-IDF or Part-of-Speech tagging.
The utility outputs a ranked list of keywords, each on a new line, along with its frequency count (if applicable to the chosen strategy). For example:
keyword1: 5
keyword2: 3
keyword3: 2
Contributions are welcome! Please feel free to open issues or submit pull requests.
This project is licensed under the MIT License - see the LICENSE file for details. (Note: LICENSE file not yet present in repo, will be added later).