WORK IN PROGRESS - Dataiku DSS plugin to extract text data from documents
-
Updated
Jan 11, 2021 - Makefile
WORK IN PROGRESS - Dataiku DSS plugin to extract text data from documents
Helps to parse bank statement(PDF)
Flask application for OCR and extraction of text from documents with support for repository applications
smart multimodal information retrieval project
Extracts GPS coordinates from pdf files and Points/Polygons from kmz files to create a master kml file. 🌎
File Extension Fix Tool - Find and rename files with wrong extensions.
A Java application that uses Lucene and Tika to search document and display the document part in which the document is found.Along with precision and recall value
This project implements a multimedia content sharing system in Java 8, allowing users to upload and stream videos to their subscribers. Inspired by platforms like TikTok, it manages user channels, subscriptions, and real-time video streaming, developing the event delivery system for efficient content promotion.
POC: azure-functions (kotlin, gradle, tika)
Apache Solr/Tika index/search plus SHA256 content-based addressing for files stored into AWS S3 buckets
Este proyecto consiste en la construcción de un sistema de recuperación de información que puede manipular documentos de diferentes formatos provenientes de un repositorio de información. La aplicación utiliza herramientas como Lucene y Tika para indexar y extraer información de los documentos.
PDFTextSearch is a Spring Boot backend service that extracts text from uploaded PDF documents using Apache Tika and indexes the extracted content into Elasticsearch for full-text search capabilities. Users can upload PDFs, search through their content, and retrieve matching documents.
Apache Tika adapter in Go
Add a description, image, and links to the tika topic page so that developers can more easily learn about it.
To associate your repository with the tika topic, visit your repo's landing page and select "manage topics."