Tags · ArkNill/markgrab

v0.2.0

feat: add extract_batch() for thread-safe concurrent URL extraction

Playwright browser fallback deadlocks when called from ThreadPoolExecutor.
extract_batch() replaces threading with asyncio.gather + Semaphore,
eliminating the deadlock entirely.

- extract_batch(): async batch extraction with per-domain rate limiting,
  per-URL timeout, and configurable concurrency (no threads)
- MCP extract_multiple(): sequential → parallel via extract_batch()
- publish.yml: add GitHub Release creation on publish
- 124 tests (10 new batch tests), ruff clean

Apr 24, 2026
bfea515
zip
tar.gz
Notes

v0.1.3

Bump version to 0.1.3

Mar 25, 2026
44dc966
zip
tar.gz
Notes

v0.1.2

bump to 0.1.2: add MCP registry metadata and server.json

Mar 17, 2026
e34cbf3
zip
tar.gz
Notes

v0.1.1

fix: preserve text after mixed <br> and <br /> tags

Workaround for markdownify #244 / #58: Python's html.parser treats
mixed <br> and <br /> as an opening tag, swallowing subsequent text
as children. The upstream convert_br() discards child text entirely.

Override MarkdownConverter.convert_br() to append the text parameter
after the newline, preventing silent content loss in HTML-to-markdown
conversion.

- Add _BrFixedConverter with convert_br override
- Replace markdownify() call with custom converter
- Bump version to 0.1.1
- Add 2 regression tests (English + Korean)
- 114 tests passing

Mar 16, 2026
e3c1f94
zip
tar.gz
Notes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0

v0.1.3

v0.1.2

v0.1.1

Tags: ArkNill/markgrab

v0.2.0

v0.1.3

v0.1.2

v0.1.1