Stars
🦘 The Grouparoo Monorepo - open source customer data sync framework
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
Builds Lucene/Solr indexes out of NutchWAX segments and revisit records via Hadoop.
Wrong project! You should head over to http://github.com/sshuttle/sshuttle
Free and Open Source, Distributed, RESTful Search Engine
(T)he (N)ew (H)otness. Improved full-txt search of archival web data.