11 Sep 25
A map/reduce workflow for LLMs, with what looks like local caching. To me build systems and data processing pipelines like this one have a big intersection.
03 Aug 25
A tool from Google that uses LLMs to extract structured data. I suppose it’s quite reliable!
12 Mar 25
a scalable data profiler. Contribute to manzt/quak development by creating an account on GitHub.
06 Jul 23
Why I’m using the unstructured Python library to handle complex text data while building a private AI assistant, and also explaining its benefits and why it’s the perfect fit for my use case.
15 Dec 22
string_grouper is a library that makes finding groups of similar strings within a single, or multiple, lists of strings easy — and fast. string_grouper uses tf-idf to calculate cosine similarities within a single list or between two lists of strings. The full process is described in the blog Super Fast String Matching in Python.