24 Oct 25

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing. The pdf-to-markdown GitHub repository hosts a tool designed to convert PDF files into Markdown format for easier text extraction and reformatting, with the process running locally on the user’s machine.

by tmfnk 3 months ago

This blog post discusses the Red Book, a historical reference used for designing and calculating urban and infrastructural projects in the context of the Cape Verde (Cabo Verde) islands.

by tmfnk 3 months ago

Ultimate modern intro to web scraping using Python. How to scrape data using HTTP or headless browsers, parse it using AI and scale and deploy. This comprehensive guide covers everything a user needs to know about web scraping using Python, detailing essential tools, techniques, best practices, and ethical considerations.

by tmfnk 3 months ago

WhatTheDuck is an open-source web application built on DuckDB. It allows users to upload CSV and Parquet files, store them in tables, and perform SQL queries on the data.WhatTheDuck is a Python library available on GitHub that serves as a high-performance bridge for seamless data transfer and integration between the DuckDB analytical database and Pandas DataFrames.

by tmfnk 3 months ago

Reactive Data Board. Rabbit is a hyper-composable data platform that feels more like a creative game engine than a basic “dashboard” utility. Built on Clojure and SQL—but open to everyone—Rabbit gives you a direct-manipulation canvas where drag-and-drop meets livecoding, so you can shape, filter, pivot, visualize, and automate data exactly how you want.

by tmfnk 3 months ago

See My Repo is a simple online tool that generates an aesthetically pleasing and easy-to-read preview page for any GitHub repository by analyzing its content and metadata.

by tmfnk 3 months ago

Elizabeth Kolbert reviews “Dr. Calhoun’s Mousery,” by Lee Alan Dugatkin, and “Rat City,” by Edmund Ramsden and Jon Adams. This New Yorker review discusses the book about John B. Calhoun’s controversial “Mouse Universe” experiments, which explored the devastating psychological and social effects of overpopulation and resource saturation on rodent communities.

by tmfnk 3 months ago

SenseMap is a visual analytics tool designed to help users effectively explore, analyze, and gain insight from complex sensor data by mapping it visually.

by tmfnk 3 months ago

Let’s build a massive crosswalk connecting map data with just Wikidata, DuckDB, some Ruby, and a hard-won bash one-liner. This article argues that the primary practical value of Wikidata lies in its function as a massive, centralized “crosswalk file” that effectively links countless identifiers and data points across the semantic web.

by tmfnk 3 months ago saved 2 times

The illustrator Chris Ware surveys the work of Richard Scarry. This article from The Yale Review explores the thematic and artistic contrast between the works of the adult graphic novelist Chris Ware and the iconic children’s illustrator Richard Scarry, analyzing their respective depictions of society and anxiety.

by tmfnk 3 months ago

This article examines the phenomenon of “character amnesia” in China, where the increasing reliance on digital devices and pinyin input systems is causing many people to forget how to write Chinese characters by hand.

by tmfnk 3 months ago

List of open source low-code tools in GitHub. The oss-lowcode-tools GitHub repository is an awesome list that curates and categorizes various open-source low-code and no-code tools for building applications and workflows.

by tmfnk 3 months ago

A privacy-preserving home security camera that uses end-to-end encryption. (Secluso was previously named Privastead.) Secluso is a command-line tool designed for securely managing application secrets within various cloud computing environments.

by tmfnk 3 months ago

The impact of AI on the workforce will be influenced by what different institutions throughout society—including businesses, nonprofit institutions, worker organizations, colleges and universities, and government—choose to do to guide its development and use.

by tmfnk 3 months ago

This blog post shares the key professional and technical lessons the author learned over four years working in the field of Data Engineering, covering topics like infrastructure, pipelines, and career growth.

by tmfnk 3 months ago


The map above shows how much houses cost in 1950 in each US state in 2024 inflation adjusted US dollars, compared to what the average cost actually is in 2024.

by tmfnk 3 months ago

Psst, kid, want some cheap and small LLMs? This blog post provides a comprehensive guide on how to set up and use llama.cpp, a C++ library, to efficiently run large language models (LLMs) locally on consumer hardware.

by tmfnk 3 months ago

Live a calm, stress-free financial life with PaperMoney. Paper Money is a simple, self-hosted budgeting and expense tracking application designed to help users manage their personal finances with a focus on ease of use and privacy.

by tmfnk 3 months ago

The article introduces and explains the “Copper Sushi” project, which involves simulating and visualizing the complex, real-time power flow dynamics across the interconnected European electricity grid.

by tmfnk 3 months ago