extract-text-keywords

A lightweight Go utility for identifying and extracting prominent keywords from given text inputs.

Features

Reads text from file or standard input.
Implements a basic frequency-based keyword extraction strategy.
Supports custom stop-word lists.
Outputs a ranked list of keywords.

Installation

To install extract-text-keywords globally on your system, you can use go install:

go install github.com/yourusername/extract-text-keywords@latest

Alternatively, you can clone the repository and build it manually:

git clone https://github.com/yourusername/extract-text-keywords.git
cd extract-text-keywords
go build -o extract-keywords .

This will create an executable named extract-keywords in your current directory.

Usage

The extract-text-keywords utility can process text from a file or directly from standard input.

Basic Usage (from file)

To extract keywords from a text file:

extract-keywords -file path/to/your/textfile.txt

From Standard Input

To pipe text directly into the utility:

cat path/to/your/textfile.txt | extract-keywords
# Or type directly and press Ctrl+D when done:
# extract-keywords
# Your text goes here...
# ...

With Custom Stop Words

You can provide a custom list of stop words (one word per line) using the -stopwords flag. If not provided, a default English stop-word list is used.

First, create a my_stopwords.txt file:

apple
orange
fruit

Then run the utility:

extract-keywords -file path/to/your/textfile.txt -stopwords my_stopwords.txt

Limiting Keywords

By default, the utility might output a significant number of keywords. You can limit the output to the top N keywords using the -limit flag:

extract-keywords -file path/to/your/textfile.txt -limit 10

Keyword Extraction Strategy

Currently, extract-text-keywords employs a basic frequency-based approach:

Text Cleaning: Removes punctuation and converts text to lowercase.
Tokenization: Splits the text into individual words.
Stop Word Filtering: Removes common words (stop words) that usually carry little semantic value.
Frequency Counting: Counts the occurrences of each remaining word.
Ranking: Ranks words by their frequency in descending order.

Future versions may include more advanced strategies like TF-IDF or Part-of-Speech tagging.

Output

The utility outputs a ranked list of keywords, each on a new line, along with its frequency count (if applicable to the chosen strategy). For example:

keyword1: 5
keyword2: 3
keyword3: 2

Contributing

Contributions are welcome! Please feel free to open issues or submit pull requests.

License

This project is licensed under the MIT License - see the LICENSE file for details. (Note: LICENSE file not yet present in repo, will be added later).

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
internal		internal
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

extract-text-keywords

Features

Installation

Usage

Basic Usage (from file)

From Standard Input

With Custom Stop Words

Limiting Keywords

Keyword Extraction Strategy

Output

Contributing

License

About

Uh oh!

Releases

Packages

Languages

m0nirul/extract-text-keywords

Folders and files

Latest commit

History

Repository files navigation

extract-text-keywords

Features

Installation

Usage

Basic Usage (from file)

From Standard Input

With Custom Stop Words

Limiting Keywords

Keyword Extraction Strategy

Output

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages