Tags: hemanth/piragi
Tags
feat: add incremental progress reporting for embedding generation Fixes #9 - AsyncRagi add() now reports per-batch embedding progress - embed_chunks() accepts on_progress callback and batch_size parameter - Progress messages report: "Embedded 32/64 chunks", "Embedded 64/64 chunks" - Batched processing improves memory efficiency - Updated docs and tests
fix: use pysbd for accurate sentence boundary detection Fixes #10 - Text chunking no longer mangles bulleted numbers and acronyms - Replaced naive period-based sentence breaking with pysbd library - Correctly handles numbered lists (1. 2. 3.) - Correctly handles abbreviations (Dr., Mr., Prof.) - Correctly handles acronyms (U.S., Ph.D., B.A.) - Correctly handles initials (J.K. Rowling, C.S. Lewis)
feat: add knowledge graph support with graph=True flag - Add KnowledgeGraph class for entity/relationship extraction - LLM-based extraction during document ingestion - Graph-augmented retrieval for relationship questions - Direct graph access via kb.graph property - New optional extra: piragi[graph] (networkx)
feat: add recursive web crawling with /** syntax - Add crawl4ai integration for async crawling with JS rendering - Support /** suffix for recursive URL crawling (e.g., https://docs.example.com/**) - Crawls same-domain links, max depth 3, max 100 pages by default - New optional extra: pip install piragi[crawler] - Bump version to 0.5.0
PreviousNext