Duplikeit

### **duplikeit: Vision Statement**

You want **duplikeit** to help keep a body of knowledge **coherent, minimal, and future-friendly, while respecting its past**.  
The guiding principle is **DRY + SSOT**: a *single source of truth* that’s elegant and non-redundant, but still provides a **lens into history** for context.  

---

#### Core Functions
- **Vector representations**  
  Each document (or part of a document) is mapped into a compact vector space so similarity can be measured consistently.  

- **Similarity and distance**  
  - Items that are **current and active** should appear *closer* in that space, so the system can strongly suggest when they overlap too much.  
  - Items that are **historical or context-only** should appear *further away* in that space, so they remain accessible but don’t compete with the single source of truth.  

- **Thresholds for action**  
  When two or more documents fall within a certain closeness, the system can signal to the **author of new content** that it may be time to refactor — merging, consolidating, or replacing content to maintain the “Don’t Repeat Yourself” (DRY) ideal.  

- **Assist both author and reviewer roles**  
  The system not only warns authors about duplication during writing, but also acts as a **curation assistant** for reviewers/maintainers — guiding them in periodically tidying, refactoring, and simplifying the collection.  

---

#### Extended Goal: Dynamic Table of Contents (TOC)  
- As documents are created or edited, *duplikeit* should suggest **where they belong** in a Table of Contents (TOC).  
- The TOC could manifest as:  
  - A **hierarchical directory structure**,  
  - A **kanban board**, or  
  - Another organizational schema with a **linear or cardinal order** (e.g., priority, chronology, conceptual flow).  
- This way, an author who starts with “I have something to say or report” gets immediate guidance about **where it should live** in the larger knowledge structure, reducing drift and improving discoverability.  

---

#### Example Use Cases
- **Trouble tickets**: Group or merge overlapping reports while still preserving historical issues for context.  
- **Medical records**: Tune similarity so that “duplicate” means duplication relative to a patient, not just text overlap, while still surfacing related but distinct cases.  
- **Knowledge bases / policy libraries**: Suggest when new articles should replace older ones or link to them, helping maintain a concise SSOT.  
- **Source code repositories**: Treat each **file in a directory structure** as part of the collection.  
  - *duplikeit* can highlight near-duplicate modules or functions across files.  
  - It can suggest **refactoring opportunities** (e.g., consolidating repeated logic into shared libraries).  
  - It can even help guide **directory and module structure**, ensuring new code finds its proper place in the project’s “TOC” (the directory tree).  

---

#### What *duplikeit* is **not**
- **Not a general-purpose vector database**: It may use embeddings and vector search internally, but its purpose is *curation, refactoring, and guidance*, not raw similarity search as a service.  
- **Not just full-text search**: It’s not meant to retrieve documents like a search engine; it’s meant to **organize, compress, and guide authorship**.  
- **Not a replacement for domain expertise**: It won’t decide *what* should be merged or refactored without human oversight — it surfaces opportunities and recommendations.  
- **Not static**: Unlike a one-time deduplication tool, *duplikeit* operates **continuously**, shaping the evolution of the collection as new content arrives. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Duplikeit #8

duplikeit: Vision Statement

Core Functions

Extended Goal: Dynamic Table of Contents (TOC)

Example Use Cases

What duplikeit is not

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Duplikeit #8

Description

duplikeit: Vision Statement

Core Functions

Extended Goal: Dynamic Table of Contents (TOC)

Example Use Cases

What duplikeit is not

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions