GraphYML with Dash

A graph-based data management system for YAML files with embedding and querying capabilities, now with a Dash web interface.

Features

Store and manage graph data in YAML files
Index data for fast querying
Generate embeddings for semantic search
Query data using a simple query language
Automatically link related nodes
Find similar nodes using embeddings
Comprehensive logging system for debugging
Web interface for managing nodes and relationships
User authentication and permission management
Backup and restore functionality

Modules

1. Indexing Module

The indexing module provides classes for indexing and searching graph data:

BaseIndex: Base class for all indexes
HashIndex: Hash-based index for exact matches
BTreeIndex: B-tree index for range queries
FullTextIndex: Full-text index for text search
VectorIndex: Vector index for embedding similarity search
IndexManager: Manager for multiple indexes

2. Embeddings Module

The embeddings module provides classes and functions for generating and working with embeddings:

EmbeddingGenerator: Class for generating embeddings
embedding_similarity: Function for calculating cosine similarity between embeddings
batch_generate_embeddings: Function for generating embeddings for all nodes in a graph

3. Graph Operations Module

The graph operations module provides functions for working with graph data:

auto_link_nodes: Function for automatically linking related nodes
tag_similarity: Function for calculating similarity between tag lists
a_star: Function for finding the shortest path between nodes
reconstruct_path: Function for reconstructing a path from a search
find_similar_nodes: Function for finding nodes similar to a given node

4. Query Engine Module

The query engine module provides classes and functions for querying graph data:

Condition: Class for representing a query condition
Query: Class for representing a query
QueryParser: Class for parsing query strings
query_graph: Function for querying a graph using a query string

5. Data Handler Module

The data handler module provides functions for loading and saving graph data:

validate_node_schema: Function for validating a node against a schema
load_graph_from_folder: Function for loading graph data from a folder of YAML files
save_node_to_yaml: Function for saving a node to a YAML file
create_zip: Function for creating a ZIP file from a folder
flatten_node: Function for flattening a node by combining text fields
query_by_tag: Function for querying a graph by tag

Installation

Prerequisites

Python 3.9 or higher
Docker and Docker Compose (optional, for containerized deployment)

Option 1: Local Installation

Clone the repository:

git clone https://github.com/yourusername/GraphYML.git
cd GraphYML

Install dependencies:
```
pip install -r requirements_dash.txt
```
Run the application:
```
python run_dash_app.py
```
Open your browser and navigate to http://localhost:8050

Option 2: Docker Deployment

Clone the repository:

git clone https://github.com/yourusername/GraphYML.git
cd GraphYML

Build and run with Docker Compose:
```
docker-compose up -d
```
Open your browser and navigate to http://localhost:8050

Usage

Authentication

Default admin credentials: username admin, password admin
Create new users through the User Management interface

Managing Nodes

Navigate to the Node Editor to edit existing nodes
Use the Create Node interface to add new nodes
Link nodes by adding references in the node content

Querying

Use the Query Interface to search for nodes
Perform text search, criteria-based search, or similarity search

Visualization

Navigate to the Visualization tab
Choose between clustering or interactive network visualization

Backup and Restore

Navigate to the Management tab
Use the Backup & Restore interface to create or restore backups

Embedding LLMs

Overview

The embedding module supports multiple embedding providers:

Ollama: Local embedding generation using Ollama API
OpenAI: Cloud-based embedding generation using OpenAI API
Sentence Transformers: Local embedding generation using Sentence Transformers library
Fallback: Random embedding generation as a last resort

Configuration

You can configure the embedding generator using environment variables or a configuration dictionary:

# Using environment variables
os.environ["OLLAMA_URL"] = "http://localhost:11434"
os.environ["OLLAMA_MODEL"] = "all-minilm-l6-v2"
os.environ["OPENAI_API_KEY"] = "your-api-key"
os.environ["OPENAI_EMBEDDING_MODEL"] = "text-embedding-3-small"
os.environ["ST_MODEL"] = "all-MiniLM-L6-v2"

# Using configuration dictionary
config = {
    "ollama_url": "http://localhost:11434",
    "ollama_model": "all-minilm-l6-v2",
    "openai_api_key": "your-api-key",
    "openai_embedding_model": "text-embedding-3-small",
    "st_model": "all-MiniLM-L6-v2",
    "embedding_dimension": 384,
    "allow_fallback": True
}

embedding_generator = EmbeddingGenerator(config)

Generating Embeddings

# Generate embedding for a single text
text = "This is a test text for embedding generation."
embedding, error = embedding_generator.generate_embedding(text)

# Generate embeddings for all nodes in a graph
updated_graph, errors = batch_generate_embeddings(
    graph,
    embedding_generator,
    text_fields=["title", "overview", "description"],
    force_update=False
)

Embedding Models

Ollama Models

all-minilm-l6-v2: Fast and efficient embedding model
nomic-embed-text: High-quality text embeddings
mxbai-embed-large: Multilingual embedding model

OpenAI Models

text-embedding-3-small: Fast and cost-effective embeddings (1536 dimensions)
text-embedding-3-large: High-quality embeddings (3072 dimensions)
text-embedding-ada-002: Legacy model (1536 dimensions)

Sentence Transformers Models

all-MiniLM-L6-v2: Fast and efficient embedding model (384 dimensions)
all-mpnet-base-v2: High-quality embeddings (768 dimensions)
paraphrase-multilingual-MiniLM-L12-v2: Multilingual embedding model (384 dimensions)

Embedding Similarity Search

# Create a vector index
index = VectorIndex("embedding_index", "embedding")

# Build the index
index.build(graph)

# Search for similar embeddings
results = index.search(query_embedding, threshold=0.7, limit=10)

# Find similar nodes
similar_nodes = find_similar_nodes(
    graph,
    node_id,
    similarity_threshold=0.7,
    max_results=10
)

Development

Project Structure

src/dash_app.py: Main Dash application
src/models/: Core data models
src/visualization/: Graph visualization utilities
src/config/: Configuration management
src/utils/: Utility functions

Running Tests

pytest

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
cleaned_data		cleaned_data
docs		docs
scripts		scripts
src		src
tests		tests
.coverage		.coverage
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.dash		Dockerfile.dash
Dockerfile.embedding		Dockerfile.embedding
README.md		README.md
app.py		app.py
config.json		config.json
db.py		db.py
docker-compose.yml		docker-compose.yml
requirements.embedding.txt		requirements.embedding.txt
requirements.txt		requirements.txt
requirements_dash.txt		requirements_dash.txt
run_dash_app.py		run_dash_app.py
run_tests.py		run_tests.py

BDR-Pro/GraphYML

Folders and files

Latest commit

History

Repository files navigation

GraphYML with Dash

Features

Modules

1. Indexing Module

2. Embeddings Module

3. Graph Operations Module

4. Query Engine Module

5. Data Handler Module

Installation

Prerequisites

Option 1: Local Installation

Option 2: Docker Deployment

Usage

Authentication

Managing Nodes

Querying

Visualization

Backup and Restore

Embedding LLMs

Overview

Configuration

Generating Embeddings

Embedding Models

Ollama Models

OpenAI Models

Sentence Transformers Models

Embedding Similarity Search

Development

Project Structure

Running Tests

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages