Browse free open source ETL tools and projects below. Use the toggles on the left to filter open source ETL tools by OS, license, language, programming language, and project status.

  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    AWS Data Wrangler

    AWS Data Wrangler

    Pandas on AWS, easy integration with Athena, Glue, Redshift, etc.

    An AWS Professional Service open-source python initiative that extends the power of Pandas library to AWS connecting DataFrames and AWS data-related services. Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON, and EXCEL). Built on top of other open-source projects like Pandas, Apache Arrow and Boto3, it offers abstracted functions to execute usual ETL tasks like load/unload data from Data Lakes, Data Warehouses, and Databases. Convert the column name to be compatible with Amazon Athena and the AWS Glue Catalog. Run a query against AWS CloudWatchLogs Insights and convert the results to Pandas DataFrame. Get QuickSight dashboard ID given a name and fails if there is more than 1 ID associated with this name. List IAM policy assignments in the current Amazon QuickSight account.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    It's a Java based Extract Transform Load(ETL) tool with following features -- 1. It can take data from any source to any destination, any thing you can think of - for example from a web crawler to a database or filesystem 2. It's multithreaded and
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Concatenate successive lines within a text file, with an option to skip a number of subsequent line(s), and an option to insert a character or string between lines. Useful for turning multi-line log files into single line files (think CSV!)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    A command line utility to read a text file containing lines of data, clean up any CR/LF anomalies, and output the lines of text with clean CR/LF terminators to standard output. The binary is a Windows 32 bit console app.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    TopGun Twitter Analytics is an open source data warehouse for collecting and analyzing Twitter topics. A topic is made up of one or more keywords.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    doXfolder

    doXfolder

    Document Management System

    Document Management System created using JEE6
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Pypes is a framework which allows users to break complex data processing logic down into a series of smaller less complex tasks. These tasks, referred to as components, can then be connected so that the output of one becomes the input to another.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    webStraktor is a programmable World Wide Web data extraction client. Its purpose is to scrape HTML based content via the HTTP protocol and extract relevant information. webStraktor features a scripting language to facilitate the collection, the extraction and the storage of information available on the web, including images. The scripting language uses elements of the Regular Expression and xPath syntax. The webStraktor scripting language has a small instruction set and its syntax is easy to master. The standard webStraktor output format is XML based, either in ASCII, UTF-8 or ISO-8859-1 (Latin1) code pages. webStraktor relies on the Apache HttpClient for retrieving content via the HTTP protocol. It adheres to the Robots Exclusion Protocol and it can be configured to operate in an anonymous way by connecting to the predominant types of web proxy servers. webStraktor extends the functionality of web crawlers, spiders or bots by integrating scraping and crawling capabilities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.