Skip to content

Tags: ArkNill/markgrab

Tags

v0.2.0

Toggle v0.2.0's commit message
feat: add extract_batch() for thread-safe concurrent URL extraction

Playwright browser fallback deadlocks when called from ThreadPoolExecutor.
extract_batch() replaces threading with asyncio.gather + Semaphore,
eliminating the deadlock entirely.

- extract_batch(): async batch extraction with per-domain rate limiting,
  per-URL timeout, and configurable concurrency (no threads)
- MCP extract_multiple(): sequential → parallel via extract_batch()
- publish.yml: add GitHub Release creation on publish
- 124 tests (10 new batch tests), ruff clean

v0.1.3

Toggle v0.1.3's commit message
Bump version to 0.1.3

v0.1.2

Toggle v0.1.2's commit message
bump to 0.1.2: add MCP registry metadata and server.json

v0.1.1

Toggle v0.1.1's commit message
fix: preserve text after mixed <br> and <br /> tags

Workaround for markdownify #244 / #58: Python's html.parser treats
mixed <br> and <br /> as an opening tag, swallowing subsequent text
as children. The upstream convert_br() discards child text entirely.

Override MarkdownConverter.convert_br() to append the text parameter
after the newline, preventing silent content loss in HTML-to-markdown
conversion.

- Add _BrFixedConverter with convert_br override
- Replace markdownify() call with custom converter
- Bump version to 0.1.1
- Add 2 regression tests (English + Korean)
- 114 tests passing