Infolio (beta)

Infolio uses AI to generate high-quality reading lists on specific topics from a large collection of documents. It achieves this by tagging articles according to each tag's description, then constructing reading lists which are defined by specifying a set of tags.

Features

Reading Lists: Automatically generate reading lists based on tags and their natural language descriptions.
Automatic Summarization: Generate AI-powered summaries of articles using OpenRouter API.
Smart Tagging: Automatically tag articles based on their content to improve organization and searchability.
Browser Integration: Import and process articles directly from your browser bookmarks and Downloads folder.
Powerful Search: Search articles by content, tags, or custom Boolean queries.
Article Management: Store and organize articles in various formats (PDF, HTML, MHTML, EPUB, etc.).

Python Dependencies

Python 3.13+
uv package manager

Command-Line Tool Dependencies

Pandoc: Used for converting markdown to EPUB format
pdftotext (from poppler-utils): Used for extracting text from PDF files
html2text: Used for converting HTML to plain text
epub2txt: Used for extracting text from EPUB files
ebook-convert (from Calibre): Used for converting various e-book formats
pdftitle: Used for extracting titles from PDF files
xclip: Used for clipboard operations (Linux only)

API Dependencies

Mineru API: Used for PDF processing and conversion
OpenRouter API: Used for article summarization and tagging

Setup

Clone the repository:

git clone https://github.com/yourusername/infolio.git
cd infolio

Install dependencies:

This project uses a pyproject.toml for dependency management. Use the following commands to install and manage dependencies with the uv tool:
```
# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies from pyproject.toml
uv sync
```
This will read the pyproject.toml and set up the virtual environment and dependencies accordingly.
Environment Configuration:
- Create a .env file in the project root to store sensitive information like API keys. Example:
```
OPENROUTER_API_KEY=your_openrouter_api_key
MINERU_API=your_mineru_api_key
```
- Update config.json with non-sensitive configuration parameters including file paths, bookmarks location, and processing preferences.
Note: Following best practices, sensitive information like API keys is stored in the .env file, while regular configuration parameters are kept in config.json.

Usage

Main Workflow

Run the main module to process bookmarks, download new articles, extract text for summarization, tag articles, and update your article lists:

uv run -m src.main

The main workflow performs these steps in order:

Clean database and remove nonexistent files
Calculate and download new articles from bookmarks
Retitle PDFs for better file names
Move documents to target folder
Add new files to database
Generate AI summaries for articles
Tag articles using AI based on content
Update per-tag URL files
Handle article deletion/hiding requests
Remove duplicate files
Update reading lists

The automatically generated reading lists allow you to easily import article paths into @Voice or other reader apps.

Tag-Based Management

Utilize tag-based filtering to search and organize articles. Example usage in Python:

from src.db import searchArticlesByTags

# Search for articles with specific criteria
articles = searchArticlesByTags(
    all_tags=["tag1", "tag2"],   # Must have ALL these tags
    any_tags=["tag3", "tag4"],   # Can have ANY of these tags
    not_any_tags=["tag5"],       # Must NOT have these tags
    readState="unread",          # Filter by read state
    formats=["pdf", "epub"]      # Filter by file formats
)

Search Tool

Use the search script to find articles with Boolean queries:

uv run scripts/search.py "your search query" [subject] [options]

Available options:

-p: Return article paths
-b: Return blog URLs
-g: Show article URLs in Gedit
-c: Copy article URLs to clipboard
-a: Send URL file to @Voice
-o: Overwrite articles in @Voice list

Reading Lists

One of the core functions of this project is the automatic generation of reading lists. These lists comprise paths to articles that match a set of specified tags and can be imported into various reading tools, including but not limited to @Voice on Android, allowing you to organize your reading based on topics of interest and seamlessly integrate with your preferred reading applications.

Each tag in the configuration is not only a label but comes with a natural language description (configured in config.json under the article_tags section). This description is used by the language model (LLM) to determine whether a given article should be associated with a particular tag, making tag assignment both dynamic and context-aware.

Example Tag Configuration

Here's an example of how tags are configured in config.json:

"article_tags": {
    "infofi": {
        "description": "about any of the following:\nprediction markets\ninformation markets\nidea markets\nfutarchy\nintelligence marketplaces\ninformation elicitation incentive mechanisms to improve human cognition, improve decision making or improve information quality\napplications of ai agents to crypto\nnovell crypto capital allocation mechanisms\ngovernance incentives",
        "use_summary": true
    }
}

Reading List Generation

The system generates reading lists based on tag configurations defined in config.json. This functionality is implemented in src/generateLists.py through the main function:

appendToLists() - Creates lists of articles matching specific tag criteria and converts PDF paths to EPUB paths where applicable
modifyListFiles() - (Currently disabled in main workflow) Processes the articles in each list, converting PDFs to EPUBs when possible and prefixing HTML/MHTML files with summaries if configured

Workflow Examples

Adding New Articles from Bookmarks

Save articles as bookmarks in your browser's bookmark folder (configured in config.json)
Run the main workflow:
```
uv run -m src.main
```
The system will:
- Download articles from bookmarks
- Extract text and generate summaries
- Tag articles based on content
- Update reading lists

Project Structure

src/: Source code directory
- main.py: Main entry point integrating all workflow steps including database cleanup, downloading, summarization, tagging, and list generation.
- db.py: Database operations for storing and querying article metadata, summaries, and tags.
- articleSummary.py: Functions for text extraction and AI-powered summarization of articles.
- articleTagging.py: Implements automatic tagging based on article content using OpenRouter API.
- textExtraction.py: Handles text extraction from various file formats.
- generateLists.py: Generates lists of articles based on tags and other criteria, with PDF to EPUB conversion capabilities.
- downloadNewArticles.py: Handles downloading new articles from bookmarks.
- manageDocs.py: Document management including PDF retitling, file moving, and duplicate removal.
- manageLists.py: Utilities for managing article lists and @Voice integration.
- utils.py: Utility functions for file operations, URL formatting, and configuration management.
scripts/: Utility scripts
- search.py: Implements a CLI tool for searching articles using Boolean queries.
- convertGitbooks.py: Script for converting GitBook content.
- deleteArticlesBasedOnUrl.py: Utility for deleting articles by URL.
- getAllBlogs.py: Script for extracting blog information.
storage/: SQLite database and other persistent storage files.
output/: Generated output files such as search results.
logs/: Application logs including summaries and error logs.

Configuration Details

Sensitive Information: Stored in the .env file (e.g., API keys).
Non-Sensitive Configuration: Stored in config.json (e.g., file paths and procedural settings).
article_tags: Defines the available tags along with their natural language descriptions. These descriptions inform the LLM during article tagging.
listToTagMappings: Specifies how articles should be grouped into reading lists based on tag criteria. This determines which articles appear on which reading lists.
Other settings include directories for storing articles, bookmarks paths, backup locations, document formats to process, and exclusion rules, ensuring that the system is exactly tailored to your workflow.

Advanced Configuration

Custom Tag Rules

You can create complex tag rules using the listToTagMappings configuration:

"listToTagMappings": {
    "infofi": {
        "all_tags": [],
        "formats": ["epub", "mobi", "html", "mhtml"],
        "any_tags": ["infofi"]
    }
}

This creates a reading list called "infofi" that includes any article with the "infofi" tag in the specified formats.

Multiple Tag Criteria

You can use multiple tag criteria to create more specific reading lists:

"advanced_topic": {
    "all_tags": ["technical", "research"],
    "any_tags": ["ai", "blockchain"],
    "not_any_tags": ["beginner"],
    "formats": ["epub", "mobi", "html", "mhtml"]
}

This creates a list of technical research articles about AI or blockchain that are not tagged as beginner-level.

Development

This project leverages the uv tool for running and adding dependencies.
Use uv run to execute the application or run modules.
Follow best practices in code efficiency, modularity, and security (as demonstrated in the project structure).

Contributing

Contributions are welcome! Please adhere to the guidelines outlined in CODE_OF_CONDUCT.md.

License

See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 320 Commits
scripts		scripts
src		src
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
ISSUES.md		ISSUES.md
LICENSE		LICENSE
README.md		README.md
config.json		config.json
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Infolio (beta)

Features

Python Dependencies

Command-Line Tool Dependencies

API Dependencies

Setup

Usage

Main Workflow

Tag-Based Management

Search Tool

Reading Lists

Example Tag Configuration

Reading List Generation

Workflow Examples

Adding New Articles from Bookmarks

Project Structure

Configuration Details

Advanced Configuration

Custom Tag Rules

Multiple Tag Criteria

Development

Contributing

License

About

Uh oh!

Releases

Languages

License

distbit0/infolio

Folders and files

Latest commit

History

Repository files navigation

Infolio (beta)

Features

Python Dependencies

Command-Line Tool Dependencies

API Dependencies

Setup

Usage

Main Workflow

Tag-Based Management

Search Tool

Reading Lists

Example Tag Configuration

Reading List Generation

Workflow Examples

Adding New Articles from Bookmarks

Project Structure

Configuration Details

Advanced Configuration

Custom Tag Rules

Multiple Tag Criteria

Development

Contributing

License

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Languages