Skip to content

m0nirul/extract-text-keywords

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

extract-text-keywords

A lightweight Go utility for identifying and extracting prominent keywords from given text inputs.

Features

  • Reads text from file or standard input.
  • Implements a basic frequency-based keyword extraction strategy.
  • Supports custom stop-word lists.
  • Outputs a ranked list of keywords.

Installation

To install extract-text-keywords globally on your system, you can use go install:

go install github.com/yourusername/extract-text-keywords@latest

Alternatively, you can clone the repository and build it manually:

git clone https://github.com/yourusername/extract-text-keywords.git
cd extract-text-keywords
go build -o extract-keywords .

This will create an executable named extract-keywords in your current directory.

Usage

The extract-text-keywords utility can process text from a file or directly from standard input.

Basic Usage (from file)

To extract keywords from a text file:

extract-keywords -file path/to/your/textfile.txt

From Standard Input

To pipe text directly into the utility:

cat path/to/your/textfile.txt | extract-keywords
# Or type directly and press Ctrl+D when done:
# extract-keywords
# Your text goes here...
# ...

With Custom Stop Words

You can provide a custom list of stop words (one word per line) using the -stopwords flag. If not provided, a default English stop-word list is used.

First, create a my_stopwords.txt file:

apple
orange
fruit

Then run the utility:

extract-keywords -file path/to/your/textfile.txt -stopwords my_stopwords.txt

Limiting Keywords

By default, the utility might output a significant number of keywords. You can limit the output to the top N keywords using the -limit flag:

extract-keywords -file path/to/your/textfile.txt -limit 10

Keyword Extraction Strategy

Currently, extract-text-keywords employs a basic frequency-based approach:

  1. Text Cleaning: Removes punctuation and converts text to lowercase.
  2. Tokenization: Splits the text into individual words.
  3. Stop Word Filtering: Removes common words (stop words) that usually carry little semantic value.
  4. Frequency Counting: Counts the occurrences of each remaining word.
  5. Ranking: Ranks words by their frequency in descending order.

Future versions may include more advanced strategies like TF-IDF or Part-of-Speech tagging.

Output

The utility outputs a ranked list of keywords, each on a new line, along with its frequency count (if applicable to the chosen strategy). For example:

keyword1: 5
keyword2: 3
keyword3: 2

Contributing

Contributions are welcome! Please feel free to open issues or submit pull requests.

License

This project is licensed under the MIT License - see the LICENSE file for details. (Note: LICENSE file not yet present in repo, will be added later).

About

A lightweight Go utility for identifying and extracting prominent keywords from given text inputs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages