Showing 20 open source projects for "xml"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • Simple, Secure Domain Registration Icon
    Simple, Secure Domain Registration

    Get your domain at wholesale price. Cloudflare offers simple, secure registration with no markups, plus free DNS, CDN, and SSL integration.

    Register or renew your domain and pay only what we pay. No markups, hidden fees, or surprise add-ons. Choose from over 400 TLDs (.com, .ai, .dev). Every domain is integrated with Cloudflare's industry-leading DNS, CDN, and free SSL to make your site faster and more secure. Simple, secure, at-cost domain registration.
    Sign up for free
  • 1
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 2
    ScrapeGraphAI

    ScrapeGraphAI

    Python scraper based on AI

    Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    changedetection.io

    changedetection.io

    The best free open source website change detection and restock service

    Loved by smart shoppers, data journalists, research engineers, data scientists, security researchers, and more. From simply monitoring website pages that have a change (such as watching prices, and restocking notifications), to deep inspection such as PDF text support, JSON and XML monitoring, and extensive text triggers. Monitor out-of-stock products and get alerts when those products are back in stock, get restock alerts via Discord, Slack, email, and many other platforms. Using the browser...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    PHPScraper

    PHPScraper

    A universal web-util for PHP

    PHPScraper is a universal web-scraping util for PHP, built with simplicity in mind. The goal is to make xPath Selectors optional and avoid the commonly needed boilerplate code. Just create an instance of PHPScraper, go to a website, and start collecting data. All scraping functionality can be accessed either as a function call or a property call. For example, the title can be accessed in two ways. Many common use cases are covered already. You can find prepared extractors for various HTML...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 30 This Week
    Last Update:
    See Project
  • 6
    Goutte

    Goutte

    Goutte, a simple PHP Web Scraper

    Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses. Goutte depends on PHP 7.1+. Add fabpot/goutte as a require dependency in your composer.json file. Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser). Make requests with the request() method. The method returns a Crawler object (Symfony\Component\DomCrawler\Crawler). To use your own HTTP settings, you may...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Easyspider - Distributed Web Crawler

    Easyspider - Distributed Web Crawler

    Easy Spider is a distributed Perl Web Crawler Project from 2006

    Easy Spider is a distributed Perl Web Crawler Project from 2006. It features code from crawling webpages, distributing it to a server and generating xml files from it. The client site can be any computer (Windows or Linux) and the Server stores all data. Websites that use EasySpider Crawling for Article Writing Software: https://www.artikelschreiber.com/en/ https://www.unaique.net/en/ https://www.unaique.com/ https://www.artikelschreiben.com/ https://www.buzzerstar.com/ https...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    XRichText

    XRichText

    An Android rich text class library that supports graphic & text mixing

    An Android-rich text class library that supports graphic and text mixing, supports editing and previewing and supports inserting and deleting pictures. Use ScrollView as the outermost layout containing LineaLayout, filled with TextView and ImageView. When deleting, delete the TextView and ImageView according to the position of the cursor, and the text will be automatically merged. The generated data is a list collection, and the data format can be customized. Version V1.4 opens the image...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    SEO MACROSCOPE

    SEO MACROSCOPE

    SEO Macroscope is a website scanning tool, to check your website

    ... pages. Export reports to Excel and CSV formats. Generate and export text and XML sitemaps from the crawled pages. Analyze redirect chains. Use custom filters to verify the presence/absence of tracking tags. Use CSS Selectors, XPath Queries, and Regular Expressions to scrape website data.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Keep company data safe with Chrome Enterprise Icon
    Keep company data safe with Chrome Enterprise

    Protect your business with AI policies and data loss prevention in the browser

    Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.
    Download Chrome
  • 10
    phoneutria
    A Java Web crawler: multi-threaded, scalable, with high performance, extensible and polite. It can be used to crawl and index any web or enterprise domain and is configurable through a XML configuration file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    ... to master. The standard webStraktor output format is XML based, either in ASCII, UTF-8 or ISO-8859-1 (Latin1) code pages. webStraktor relies on the Apache HttpClient for retrieving content via the HTTP protocol. It adheres to the Robots Exclusion Protocol and it can be configured to operate in an anonymous way by connecting to the predominant types of web proxy servers. webStraktor extends the functionality of web crawlers, spiders or bots by integrating scraping and crawling capabilities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    datalus
    PHP web API designed to simplify object handling(loading, saving, querying, displaying, and editing), abstract the data from its display structure, and layout and allow the target data to be delivered to any supported format without special logic.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    An xml scriptable web scraper in PHP and Java.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    BTV Rename
    The goal of this project is 100% TVDB recognition and SxxExx renaming of all files generated by BeyondTV while maintaining the BeyondTV database. Currently the project relies on TVrage and scans the entire folder which is selected by the user in the conf
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Methanol is a scriptable multi-purpose web crawling system with an extensible configuration system and speed-optimized architectural design. Methabot is the web crawler of Methanol.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    HtmlClient provides an SGML/HTML/XHTML parser and connection client making web-spidering as easy for developers as actually surfing the web with a premade browser. Based on Apache's HttpClient.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    A basic Perl web spider with grandiose aspirations. Supports XML log file output and resumable spidering sessions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Wadsworth is a java based web scripting engine. It uses user-defined XML scripts to define its actions. It can be used as a web testing tool, or as a web scraper, or to automate any web actions you wish. It can also be invoked and controlled by another
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    dataflowkit

    Golang framework for scraping data from web pages

    Golang Web Scraper library for extracting data from web pages. Save results as CSV, JSON, XML
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Blackfire Player

    Blackfire Player

    Web Crawling, Web Testing, and Web Scraping application

    Blackfire Player is a powerful Web Crawling, Web Testing, and Web Scraper application. It provides a nice DSL to crawl HTTP services, assert responses, and extract data from HTML/XML/JSON responses. Some Blackfire Player use cases: Crawl a website/API and check expectations -- aka Acceptance Tests; Scrape a website/API and extract values; Monitor a website; Test code with unit test integration (PHPUnit, Behat, Codeception, ...); Test code behavior from the outside thanks to the native...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next