Skip to content

EdgeLake/EdgeLake

Repository files navigation

EdgeLake

Transform edge nodes into a permissioned, decentralized network that enables real-time data management, monitoring, and querying without moving data off-device.

  • Deploy EdgeLake instances on nodes at the edge.
  • Enable data management services on each node.
  • Stream data from PLCs, Sensors, and applications to the edge nodes.
  • Query the distributed data from a single point (as if the data is hosted in a centralized database).
  • Manage your edge resources from a single point (the network of nodes reflects a Single System Image).

Join the community

  • To track code changes, star the project.
  • For questions or support, join the community Slack channel.
  • Join our bi-weekly TSC meetings via our mailing list (only updates and event invites, no spam).

Table of Contents

How it Works

  • By deploying EdgeLake on a node, the node joins a decentralized, P2P network of nodes.
  • Using a network protocol and a shared metadata layer, the nodes operate as a single machine that is optimized to capture, host, manage and query data at the edge.
  • The nodes share a metadata layer. The metadata includes policies that describe the schemas of the data, the data distribution, the participating nodes, security, data ownership, and more. The shared metadata is hosted in one of the following:
    • A member node is designated as a Master Node.
    • A blockchain (making the network fully decentralized).
  • Each node in the network is configured to provide data services. Examples of services:
    • Capture data via REST, MQTT, gRPC, JSON Files.
    • Host data in a local database (like SQLite or PostgreSQL, MongoDB).
    • Satisfy Queries.

When an application issues a query, it is delivered to one of the nodes in the network. This node serves as an orchestrator of the query and operates as follows: Using the shared metadata, the node determines which are the target nodes that host the relevant data. The query is transferred to the target nodes and the replies from all the target nodes are aggregated dynamically and returned as a unified reply to the application. This process is similar to MapReduce, whereas the target nodes are determined dynamically by the query and the shared metadata. Monitoring of resources operates in a similar way.

Deploying an EdgeLake node and making the node a member of a network is done as follows:

  • Download and install the EdgeLake software on the Edge Node.
  • Enable the services that determine the functionalities provided by the node.

Services are enabled by one or a combination of the following:

  • Issuing configuration commands using the Node's Command Line Interface (CLI).
  • Listing configuration commands in script files and associating the node with the files.
  • Listing configuration commands in policies that are hosted in the shared metadata and associating the node with the policies.

The services configured determine the role of a node which can be one or multiple of the following:

  • Operator Node - a node that captures data and hosts the data on a local DBMS. Data sources like devices, PLCs and applications deliver data to Operator Nodes for storage.
  • Query Node - a node that orchestrates a query process. Applications deliver their queries to Query Nodes, which interact with Operator Nodes (that host the data) to return a unified and complete reply for each query.
  • Master Node - a node that replaces a blockchain platform for storage of metadata policies. The network metadata is organized in Policies, and users can associate a blockchain or, alternatively, a Master Node for metadata storage.

In a deployed network, devices, sensors, PLCs, and applications send their data to Operator Nodes. Data management on each Operator Node is automated.
Queries are satisfied by Query Nodes as if all the distributed data is managed in a centralized database.
The same setup monitors edge resources - for example, users and applications can monitor CPU, Network, disk space, of the distributed edge resources from a single point.

Download and Install

Detailed directions to install EdgeLake can be found in the docker-compose repository

Prepare Node(s):

  1. Install requirements
    • Docker
    • docker-compose
    • Makefile
sudo snap install docker
sudo apt-get -y install docker-compose 
sudo apt-get -y install make
 
# Grant non-root user permissions to use docker
USER=`whoami` 
sudo groupadd docker 
sudo usermod -aG docker ${USER} 
newgrp docker
  1. Clone docker-compose repository from EdgeLake
git clone https://github.com/EdgeLake/docker-compose
cd docker-compose

Deploy EdgeLake:

  1. Update .env configurations for the node(s) being deployed -- specifically LEDGER_CONN for Query and Operator Nodes
#--- General ---
# Information regarding which EdgeLake node configurations to enable. By default, even if everything is disabled, EdgeLake starts TCP and REST connection services.
NODE_TYPE=master
# Name of the EdgeLake instance
NODE_NAME=anylog-master
# Owner of the EdgeLake instance
COMPANY_NAME=New Company

#--- Networking ---
# Port address used by EdgeLake's TCP protocol to communicate with other nodes in the network
ANYLOG_SERVER_PORT=32048
# Port address used by EdgeLake's REST protocol
ANYLOG_REST_PORT=32049
# A bool value that determines if to bind to a specific IP and Port (a false value binds to all IPs)
TCP_BIND=false
# A bool value that determines if to bind to a specific IP and Port (a false value binds to all IPs)
REST_BIND=false

#--- Blockchain ---
# TCP connection information for Master Node
LEDGER_CONN=127.0.0.1:32048

#--- Advanced Settings ---
# Whether to automatically run a local (or personalized) script at the end of the process
DEPLOY_LOCAL_SCRIPT=false
  1. Start Node using makefile
make up [NODE_TYPE]

# examples
make up master
make up operator
make up query

Prerequisite and Setup considerations

Feature Requirement
Operating System Linux (Ubuntu, RedHat, Alpine, Suse), Windows, OSX
Memory footprint 100 MB available for EdgeLake deployed without Docker
300 MB available for EdgeLake deployed with Docker
Databases PostgreSQL installed (optional)
SQLite (default, no need to install)
MongoDB installed (Only if blob storage is needed)
CPU Intel, ARM, and AMD are supported.
EdgeLake can be deployed on a single CPU machine and up to the largest servers (can be deployed on gateways, Raspberry PI, and all the way to the largest multi-core machines).
Storage EdgeLake supports horizontal scaling - nodes (and storage) are added dynamically as needed, therefore less complexity in scaling considerations. Requirements are based on the expected volume and duration of data on each node. EdgeLake supports automated archival and transfer to larger nodes (if needed).
Network Required: a TCP based network (local TCP-based networks, over the internet, and combinations are supported)
An overlay network is recommended. Most overlay networks can be used transparently. Nebula is used as a default overlay network.
Static IP and 3 ports open and accessible on each node (either via an Overlay Network, or without an Overlay).
Cloud Integration Build in integration using REST, Pub-Sub, and Kafka.
Deployment options Executable (can be deployed as a background process), or Docker or Kubernetes.

Comments:

  • Databases:

    • SQLite - recommended for smaller nodes and in-memory data.
    • PostgreSQL - recommended for larger nodes.
    • MongoDB - used for blob storage.
    • Multiple databases can be deployed and used on the same node.
  • Network: An Overlay network is recommended for the following reasons:

    • Isolate the network for security considerations.
    • Manage IP and Ports availability. Without an overlay network, users need to configure and manage the availability of IP and Ports used.

Releases

No releases published

Packages

No packages published

Contributors 7