A graph-based data management system for YAML files with embedding and querying capabilities, now with a Dash web interface.
- Store and manage graph data in YAML files
- Index data for fast querying
- Generate embeddings for semantic search
- Query data using a simple query language
- Automatically link related nodes
- Find similar nodes using embeddings
- Comprehensive logging system for debugging
- Web interface for managing nodes and relationships
- User authentication and permission management
- Backup and restore functionality
The indexing module provides classes for indexing and searching graph data:
BaseIndex: Base class for all indexesHashIndex: Hash-based index for exact matchesBTreeIndex: B-tree index for range queriesFullTextIndex: Full-text index for text searchVectorIndex: Vector index for embedding similarity searchIndexManager: Manager for multiple indexes
The embeddings module provides classes and functions for generating and working with embeddings:
EmbeddingGenerator: Class for generating embeddingsembedding_similarity: Function for calculating cosine similarity between embeddingsbatch_generate_embeddings: Function for generating embeddings for all nodes in a graph
The graph operations module provides functions for working with graph data:
auto_link_nodes: Function for automatically linking related nodestag_similarity: Function for calculating similarity between tag listsa_star: Function for finding the shortest path between nodesreconstruct_path: Function for reconstructing a path from a searchfind_similar_nodes: Function for finding nodes similar to a given node
The query engine module provides classes and functions for querying graph data:
Condition: Class for representing a query conditionQuery: Class for representing a queryQueryParser: Class for parsing query stringsquery_graph: Function for querying a graph using a query string
The data handler module provides functions for loading and saving graph data:
validate_node_schema: Function for validating a node against a schemaload_graph_from_folder: Function for loading graph data from a folder of YAML filessave_node_to_yaml: Function for saving a node to a YAML filecreate_zip: Function for creating a ZIP file from a folderflatten_node: Function for flattening a node by combining text fieldsquery_by_tag: Function for querying a graph by tag
- Python 3.9 or higher
- Docker and Docker Compose (optional, for containerized deployment)
-
Clone the repository:
git clone https://github.com/yourusername/GraphYML.git cd GraphYML -
Install dependencies:
pip install -r requirements_dash.txt
-
Run the application:
python run_dash_app.py
-
Open your browser and navigate to
http://localhost:8050
-
Clone the repository:
git clone https://github.com/yourusername/GraphYML.git cd GraphYML -
Build and run with Docker Compose:
docker-compose up -d
-
Open your browser and navigate to
http://localhost:8050
- Default admin credentials: username
admin, passwordadmin - Create new users through the User Management interface
- Navigate to the Node Editor to edit existing nodes
- Use the Create Node interface to add new nodes
- Link nodes by adding references in the node content
- Use the Query Interface to search for nodes
- Perform text search, criteria-based search, or similarity search
- Navigate to the Visualization tab
- Choose between clustering or interactive network visualization
- Navigate to the Management tab
- Use the Backup & Restore interface to create or restore backups
The embedding module supports multiple embedding providers:
- Ollama: Local embedding generation using Ollama API
- OpenAI: Cloud-based embedding generation using OpenAI API
- Sentence Transformers: Local embedding generation using Sentence Transformers library
- Fallback: Random embedding generation as a last resort
You can configure the embedding generator using environment variables or a configuration dictionary:
# Using environment variables
os.environ["OLLAMA_URL"] = "http://localhost:11434"
os.environ["OLLAMA_MODEL"] = "all-minilm-l6-v2"
os.environ["OPENAI_API_KEY"] = "your-api-key"
os.environ["OPENAI_EMBEDDING_MODEL"] = "text-embedding-3-small"
os.environ["ST_MODEL"] = "all-MiniLM-L6-v2"
# Using configuration dictionary
config = {
"ollama_url": "http://localhost:11434",
"ollama_model": "all-minilm-l6-v2",
"openai_api_key": "your-api-key",
"openai_embedding_model": "text-embedding-3-small",
"st_model": "all-MiniLM-L6-v2",
"embedding_dimension": 384,
"allow_fallback": True
}
embedding_generator = EmbeddingGenerator(config)# Generate embedding for a single text
text = "This is a test text for embedding generation."
embedding, error = embedding_generator.generate_embedding(text)
# Generate embeddings for all nodes in a graph
updated_graph, errors = batch_generate_embeddings(
graph,
embedding_generator,
text_fields=["title", "overview", "description"],
force_update=False
)- all-minilm-l6-v2: Fast and efficient embedding model
- nomic-embed-text: High-quality text embeddings
- mxbai-embed-large: Multilingual embedding model
- text-embedding-3-small: Fast and cost-effective embeddings (1536 dimensions)
- text-embedding-3-large: High-quality embeddings (3072 dimensions)
- text-embedding-ada-002: Legacy model (1536 dimensions)
- all-MiniLM-L6-v2: Fast and efficient embedding model (384 dimensions)
- all-mpnet-base-v2: High-quality embeddings (768 dimensions)
- paraphrase-multilingual-MiniLM-L12-v2: Multilingual embedding model (384 dimensions)
# Create a vector index
index = VectorIndex("embedding_index", "embedding")
# Build the index
index.build(graph)
# Search for similar embeddings
results = index.search(query_embedding, threshold=0.7, limit=10)
# Find similar nodes
similar_nodes = find_similar_nodes(
graph,
node_id,
similarity_threshold=0.7,
max_results=10
)src/dash_app.py: Main Dash applicationsrc/models/: Core data modelssrc/visualization/: Graph visualization utilitiessrc/config/: Configuration managementsrc/utils/: Utility functions
pytestThis project is licensed under the MIT License - see the LICENSE file for details.