11 Sep 25

A map/reduce workflow for LLMs, with what looks like local caching. To me build systems and data processing pipelines like this one have a big intersection.

by sebastien 3 months ago

03 Aug 25

A tool from Google that uses LLMs to extract structured data. I suppose it’s quite reliable!

by sebastien 4 months ago

12 Mar 25

a scalable data profiler. Contribute to manzt/quak development by creating an account on GitHub.

by piranha 9 months ago
Tags:

06 Jul 23

Why I’m using the unstructured Python library to handle complex text data while building a private AI assistant, and also explaining its benefits and why it’s the perfect fit for my use case.

by racewar 2 years ago

15 Dec 22

string_grouper is a library that makes finding groups of similar strings within a single, or multiple, lists of strings easy — and fast. string_grouper uses tf-idf to calculate cosine similarities within a single list or between two lists of strings. The full process is described in the blog Super Fast String Matching in Python.

by wheresalice 3 years ago
Tags: