A power user focused interface for LLM base models, inspired by the designs of loom, loomsidian, exoloom, logitloom, wool, and mikupad.
- Some documents may cause the text editor to render token boundaries incorrectly
- This is due to a bug in egui regarding textedit underline rendering
- Tab bars are not read by screen readers
- This is due to a bug in egui_tiles
- UI state for closed weaves persists in memory after the weave editor UI is closed, creating a slow memory leak
- This is due to a bug in egui
- CPU usage is high when the window is not visible and not minimized
- This is due to a bug in egui
- Editor subview tooltips take longer to close after scrolling has happened
- This is due to a limitation of egui
- Root nodes containing long text may overlap in the canvas view
If you are experiencing an issue not listed here or in this repository's active issues, please file an issue so that it can be fixed.
Important
This application is a work in progress; Please make backups and report any bugs that you find.
Compiled binaries can be found on the releases page.
Before using the app, you will need to run the following CLI command in the extracted folder:
xattr -d com.apple.quarantine tapestry*Requires the Rust Programming Language and a working C compiler to be installed.
git clone --recurse-submodules https://github.com/transkatgirl/Tapestry-Loom.git
cd Tapestry-Loom
cargo build --releaseThe compiled binary can be found in the ./target/release/ folder.
Run the following commands in the repository folder:
git pull
git submodule update --init --recursive
cargo build --releaseSee Getting Started for more information on how to use the application.
The rest of this README covers the usage of external tools which Tapestry Loom can interface with.
See migration-assistant for more information on how to migrate weaves from other Loom implementations to Tapestry Loom.
llama.cpp's llama-server is recommended, as it has been confirmed to work properly with all of the features within Tapestry Loom (except returning prompt logprobs).
vLLM requires additional request arguments to work properly with Tapestry Loom:
- /v1/completions
return_token_ids=true- Optional; Allows (partial) reuse of output token IDs when using Tapestry Tokenize. However, (unlike llama.cpp) token IDs are only returned for the selected token, not for all top_logprobs.
- Must be removed when using
echo=true
- /v1/chat/completions
return_token_ids=true- Optional; Allows (partial) reuse of output token IDs when using Tapestry Tokenize. However, (unlike llama.cpp) token IDs are only returned for the selected token, not for all top_logprobs.
continue_final_message=trueadd_generation_prompt=false
Ollama should not be used due to bad sampling settings which cannot be overridden in API requests, along with a lack of available base models.
KoboldCpp is not recommended due to a lack of request queuing and a poor implementation of logprobs (the number of requested logprobs is entirely ignored).
LM Studio is not recommended due to a lack of support for logprobs.
The recommended CLI arguments for llama-server are listed below:
llama-server --embeddings --models-dir $MODEL_DIRECTORY --models-max 1 --sleep-idle-seconds 1200 --jinja --chat-template "message.content" --ctx-size 4096 --temp 1 --top-k 0 --top-p 1 --min-p 0Where $MODEL_DIRECTORY is set to the directory where model gguf files are stored.
(Regarding quantization: Benchmarks of how chat models are affected by quantization likely do not generalize to how base models are used. Quantization should be kept as low as reasonably possible, but q8_0 is likely good enough for most use cases.)
Explanation of arguments:
- Only one model loaded into VRAM at a time; old models are automatically unloaded to make room for new ones
- If you plan on using an embedding model, you should start a second server instance to avoid swapping out your text generation model when generating embeddings
- Models are automatically unloaded after 20 minutes of inactivity
- The specified chat template passes user input directly to the model without further changes.
- Reducing the maximum context length helps reduce VRAM usage without sacrificing quality.
- The default sampling parameters (those specified by the CLI arguments) should leave the model's output distribution unchanged. Sampling parameter defaults for chat models do not generalize to how base models are used.
- The sampling parameters specified in the CLI arguments will be overridden by any sampling parameters that are specified in a request.
Additional useful arguments (depending on your use case):
--no-cont-batching- Disabling continuous batching significantly improves response determinism at the expense of performance. Should be used if you plan on analyzing logprobs or using greedy sampling.
If you are running llama-server on the same device as Tapestry Loom (and you are using the default port), you do not need to explicitly specify an endpoint URL when filling out the "OpenAI-style Completions" and "OpenAI-style ChatCompletions" templates.
If you are new to working with LLM base models, Trinity-Mini-Base-Pre-Anneal or (Trinity-Nano-Base-Pre-Anneal if you have <32GB of VRAM) is a good first model to try.
If you plan on using seriation, embeddinggemma-300m is a good small embedding model.
Most inference providers support OpenAI-compatible clients and should work with minimal configuration.
However, every inference provider implements OpenAI compatibility in their own way, which may cause unexpected issues. Known issues with popular inference providers are listed below:
- OpenRouter
- Logprobs are not supported, even if the underlying provider supports them
- In addition, some providers on OpenRouter will return errors if
logprobsis included as a request argument
- In addition, some providers on OpenRouter will return errors if
- Logprobs are not supported, even if the underlying provider supports them
- Featherless
- Untested; Logprobs are not supported according to documentation
See tapestry-tokenize for more information on how to configure and use the (optional) tokenization server.
Once a tokenization endpoint is configured for a model, enabling the setting "(Opportunistically) reuse output token IDs" can slightly improve output quality by giving you more control over tokenization. In this mode, model output token IDs are reused whenever possible and tokenization is performed per-node.
However, the benefit of reusing the model's output tokenization is greatest when generating single-token nodes using non-ASCII characters and a single model (output token IDs cannot be reused across models).
This setting requires the inference backend to support returning token IDs (to check if this is working, hover over generated tokens in the text editor to see if they contain a token identifier). This is a non-standard addition to the OpenAI Completions API which is currently supported by very few inference backends (llama.cpp has been confirmed to work properly with this feature).
If your inference backend returns token IDs in OpenAI-style Completions responses but they do not appear in your weaves, please file an issue.
Please consider donating to help fund further development.
- Take inspiration from multiverse
- Take inspiration from mikupad
- Do more experimentation with loom's UI in order to find UX improvements which may be worth including in Tapestry Loom
Goal: Completion before April 1st, 2026
- Add in-app dialogs to display information about breaking changes
- Implement v1 format
- Store generation seed in node
- Add support for custom randomness sources
- Enable counterfactual logprobs by default
- Implement highlighting of active counterfactual token
- Implement a special "link" node to allow splitting giant weaves into multiple documents
- Refactor migration-assistant to natively use v1 format
- Improve handling of hovered + omitted/collapsed nodes
- Store generation seed in node
- Release version 0.14.0
- Implement DAG-based Weaves, similar to this unreleased loom implementation
- FIM completions
- Selected text is used to determine FIM location
- Diff-based editor content application
- Add mode to reuse model output nodes when updating tree whenever possible
- Implement node "editing" UI (not actually editing node content, but editing the tree by adding nodes / splitting nodes / merging nodes), similar to inkstream
- Allow the user to create connections between arbitrary nodes
- Implement functionality for bookmarking specific paths within a weave
- FIM completions
- Add documentation to
tapestry-weavelibrary
- Improve Weave saving & loading
- Initially load weaves using zero-copy deserialization, performing full deserialization in the background
- Perform weave saving in the background without visual glitches
- Support read-only weave editors using zero-copy deserialization and file memory mapping
- Show hovered child of active node in editor, similar to exoloom
- Add "autoloom" setting where clicking a node generates children, similar to inkstream
- Implement additional sorting methods
- Average token probability
- Cumulative node probability
- Alphabetical (content)
- Length (content)
- UI improvements
- Automatically calculate when to display "show more" based on available screen space
- Add content copying to node context menu
- Add setting to swap shift-click and normal click behavior
- Add sorting submenu to node context menu
- Add right click handling to node list background
- Better handle valid UTF-8 characters split across multiple nodes
- Add keyboard shortcuts for selecting specific child nodes
- Request post-processing arguments (using prefix of
TL#)- Single-token node pruning:
-
TL#keep_top_p -
TL#keep_top_k -
TL#prune_empty
-
- Node pruning:
-
TL#node_min_conf -
TL#node_max_conf -
TL#node_min_avg_p -
TL#node_max_avg_p -
TL#prune_empty
-
- Basic adaptive looming:
-
TL#min_tokens -
TL#p_threshold -
TL#conf_threshold
-
- Force single token node creation using
TL#force_single_token - Context window wrapping using
TL#ctx_length - Add setting to toggle request post-processing
- Single-token node pruning:
- Implement BERT FIM server using nonstandard
fim_tokensparameter
- Update Getting Started document
- Add node finding
- Support arbitrary color gradients for logprob highlighting
- Add OKLCH-based color picker
- Add blind comparison modes
- (Hide) Models & token probabilities / boundaries
- (Hide) Generated node text (only showing metadata & probabilities)
- Add weave statistical analysis tools
- Add customizable node color coding
- Probability
- Confidence
- Allow changing default editor subview layout
- Perform UX testing with all built-in color schemes
- Review and refactor application modules
- settings
- editor
- Optimize performance whenever reasonably possible
- Support opening weaves using CLI arguments to tapestry loom
- Review and refactor main module
Goal: Completion before June 1st, 2026
- Allow temporarilly overriding color in inference menu
- Add bookmark functionality to files view
- Add support for more weave migrations
- Add model configuration sharing functionality
- Automatically redact sensitive information (such as API keys)
- Allow the user to manually redact sensitive information
- Add ability to manually control refreshing of model tokenization identifier
- Add support for response streaming
- Improve API response building
- Add support for OpenAI Responses
- Add support for Anthropic Complete
- Add support for Anthropic Messages
- Add support for Gemini generateText
- Add support for Gemini generateContent
- Add support for Gemini embedContent
- Review and refactor settings/inference module
- Improve clarity of error messages
- Perform API client testing with commonly used inference backends
- llama-cpp
- ollama
- vllm
- sglang
- tensorrt-llm
- text-generation-inference
- text-embeddings-inference
- koboldcpp
- lm-studio
- litellm
- Perform API client testing with less commonly used inference backends
- lemonade
- infinity
- swama
- exllamav2
- lmdeploy
- mlc-llm
- shimmy
- Package Tapestry Loom with an icon and application metadata
- Create video-based documentation
- Create a website for Tapestry Loom's downloads and documentation
- Perform heavy unit testing and/or formal verification of
universal-weaveto prevent bugs that could result in data loss - Release
universal-weaveversion 1.0.0 - Write unit tests for response parser
- Release Tapestry Loom version 1.0.0-rc.1
The below items may be implemented in a 1.x release, or they may be delayed to be implemented in a future 2.x release.
- Token healing
- Instruct templating, similar to mikupad
- Do testing with using llamafile for easier onboarding?
- Prefix-based deduplication
- Improve graph/canvas layout algorithm
- Improve file manager
- Support keyboard shortcuts for all aspects of the UI, not just the weave editor
- Aim to support navigating the entirety of the UI without a mouse
- Improve built-in color schemes
- Node bulk selection
- Node custom ordering via drag and drop in all views
- Keyboard shortcut presets
- Built-in presets
- loomsidian-like
- exoloom-like
- Tapestry Loom
- Saving & loading custom presets
- Importing & exporting custom presets
- Built-in presets
- Support touchscreen-only devices
- Add ability to add custom labels to bookmarks/nodes
- Add ability to add custom attributes to nodes, rather than just bookmarks
- Collaborative weave editing
- Interfaces for AI agents to use Tapestry Loom
- Efficiently store full edit history in weave for lossless unbounded undo/redo
- Human as token-predictor mode
- "The LLM looms you"
- WASM version of Tapestry Loom
- Support multimodal weaves
- Support weaves of arbitrarily large size using a database-based format
- Self-contained packaging: All documentation and tools in one app, rather than being spread out over multiple
- Server-client, multi-user WebUI
- Alternate input devices
- Talon Voice
- Controllers / Gamepads
- USB DDR Pads
See also: the original rewrite plans