24 Oct 25

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing. The pdf-to-markdown GitHub repository hosts a tool designed to convert PDF files into Markdown format for easier text extraction and reformatting, with the process running locally on the user’s machine.

by tmfnk 2 months ago

08 Jun 25

JupyterHub is a way to serve Jupyter notebook for multiple users. JupyterHub manages a separate Jupyter environment for each user. It can be used in a class of students, a corporate data science group, or a scientific research group. It is a multi-user Hub that spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server.

by agnieszka 6 months ago

LibrePythonista is an extension for LibreOffice Calc. The extension allows Interactive Python code to be run directly in a spreadsheet, including using Pandas and Matplotlib, allowing for creation of dataframes, series and graphs directly in a spreadsheet.

by agnieszka 6 months ago

OTB is a community driven, extensible and heavily documented open-source project for remote sensing. It can process high resolution optical, multispectral and radar images at the terabyte scale. All of algorithms are accessible from QGIS, Python, the command line or C++. It’s available on Linux, Windows, and macOS (via Docker for ARM Mac computers). Features include:

  • Features extraction
  • Change detection
  • Image manipulation
  • Image Segmentation
  • Hyperspectral processing
  • Data pre-processing
  • Data pre-processing
  • SAR processing

Supported formats:

  • Image formats supported by GDAL (tiff, hrd, img, data providers specifics …)
  • Vector formats supported by GDAL/OGR (shapefile, sqlite, postgis …)
  • Metadata formats supported by OSSIM and OSSIM plugins (Pléiades and other ADS sensors, DigitalGlobe sensors, NITF)
  • Digital Elevation Models supported by OSSIM (SRTM, DTED and generic geotiff DEM)
  • KMZ image export
by agnieszka 6 months ago

24 Apr 25

23 Apr 25

Official Python command-line tool of the Overture Maps Foundation. Overture Maps provides free and open geospatial map data, from many different sources and normalized to a common schema. This tool helps to download Overture data within a region of interest and converts it to a few different file formats.

by agnieszka 8 months ago

20 Apr 25

An open-source tool for reading OpenStreetMap PBF files using DuckDB.

  • Scalable reader for OpenStreetMap ProtoBuffer (pbf) files.
  • Is based on top of DuckDB1 with its Spatial2 extension.
  • Saves files in the GeoParquet3 file format for easier integration with modern cloud stacks.
  • Utilizes multithreading unlike GDAL that works in a single thread only.
  • Can filter data based on geometry without the need for ogr2ogr clipping before operation.
  • Can filter data based on OSM tags.
  • Utilizes caching to reduce repeatable computations.
  • Can be used as Python module as well as a beautiful CLI based on Typer4.
by agnieszka 8 months ago

A lightweight Python utility that provides colored visualization of object attributes, making it easier to inspect objects during development and debugging.

Color-coded attribute display: 🔵 Blue: Dunder methods 🟡 Yellow: Protected attributes (starting with _) 🟢 Green: Public attributes and methods

by agnieszka 8 months ago

A single Python script that prepares reports on environments and reports the differences between two environments.

by agnieszka 8 months ago