SDialog is a modular Python toolkit for synthetic dialog generation, evaluation, and analysis. It standardizes a Dialog schema and offers personaβdriven multiβagent simulation with LLMs, composable orchestration, builtβin metrics, and mechanistic interpretabilityβso you can generate reliable, controllable dialog data at scale.
Quick links: Docs β’ API β’ Demo (Colab) β’ Tutorials β’ Datasets (HF) β’ Issues
- Standard dialog schema with JSON import/export (aiming to standardize dialog datasets format with your help π)
- Personaβdriven multiβagent simulation with contexts, tools, and thoughts
- Composable orchestration for precise control over behavior and flow
- Builtβin evaluation (metrics + LLMβasβjudge) for comparison and iteration
- Native mechanistic interpretability (inspect and steer activations)
- Easy creation of user-defined components by inheriting from base classes (personas, metrics, orchestrators, etc.)
- Interoperability across OpenAI, HuggingFace, Ollama, AWS, and more
If you are building controlled multiβagent conversational systems, benchmarking dialog models, producing synthetic training corpora, simulating diverse users to test or probe conversational systems, or analyzing internal model behavior, SDialog provides an endβtoβend workflow.
pip install sdialog
Here's a short, hands-on example showing personas, agents, a simple rule (orchestrator), and a tool.
import sdialog
from sdialog import Context
from sdialog.agents import Agent
from sdialog.personas import Persona
from sdialog.orchestrators import SimpleReflexOrchestrator
# First, let's set our preferred backend/model and parameters
sdialog.config.llm("openai:gpt-4.1", temperature=0.9)
# Let's define our personas
alice = Persona(name="Alice", role="barista", personality="cheerful")
bob = Persona(name="Bob", role="customer", personality="curious")
# (Optional) Let's add a concrete conversational context
ctx = Context(
location="Downtown cafe",
environment="noisy, aromatic cafe with occasional grinder sounds",
circumstances="Morning rush hour",
objects=["espresso machine", "menu board", "tip jar"]
)
# (Optional) Let's add a simple tool (just a plain Python function)
# We'll use a tiny mock function our agent can call as a tool
def lookup_menu(item: str) -> dict:
return {"item": item, "specials": ["vanilla latte", "cold brew"]}
# (Optional) Let's include a small rule-based orchestrator
react = SimpleReflexOrchestrator(
condition=lambda utt: "decaf" in utt.lower(),
instruction="Explain decaf options and suggest one."
)
# Now we create the agents
barista = Agent(persona=alice, tools=[lookup_menu])
customer = Agent(persona=bob, first_utterance="Hi!")
# (Optional) We can attach orchestrators to an agent using pipe-like composition
barista = barista | react
# Let's generate three dialogs!
for ix in range(3):
dialog = customer.dialog_with(barista, context=ctx)
dialog.print(orchestration=True)
dialog.to_file(f"dialog_{ix}.json")
Note
- See orchestration tutorial and agents with tools and thoughts.
- Dialogs are rich objects with helper methods (filter, slice, transform, etc.) that can be easily exported and loaded.
- Next: see Loading and saving dialogs and Auto-generating personas and contexts for persistence and controlled diversity.
Dialogs are JSONβserializable and can be created from multiple formats. After generating one you can persist it, then reload later for evaluation, transformation, or mixing with real data.
from sdialog import Dialog
# Load from JSON (generated by SDialog using `to_file()`)
dialog = Dialog.from_file("dialog_0.json")
# Load from HuggingFace Hub datasets
dialogs = Dialog.from_huggingface("sdialog/Primock-57")
# Create from plain text files or strings - perfect for converting existing datasets!
dialog_from_txt = Dialog.from_str("""
Alice: Hello there! How are you today?
Bob: I'm doing great, thanks for asking.
Alice: That's wonderful to hear!
""")
# Or, equivalently if the content is in a txt file
dialog_from_txt = Dialog.from_file("conversation.txt")
# Load from CSV files with custom templates
dialog_from_csv = Dialog.from_file("conversation.csv",
csv_speaker_col="speaker",
csv_text_col="value",)
# All Dialog objects have rich manipulation methods
dialog.filter("Alice").rename_speaker("Alice", "Customer").upper().to_file("processed.json")
avg_words_turn = sum(len(turn) for turn in dialog) / len(dialog)
Use generators to fill in (or selectively control) persona/context attributes using LLMs or other data sources (functions, CSV files, inline prompts). The .set()
method lets you override how individual attributes are produced.
from sdialog.personas import Doctor, Patient
from sdialog.generators import PersonaGenerator, ContextGenerator
from sdialog import Context
# By default, unspecified attributes are LLM generated
doc = PersonaGenerator(Doctor(specialty="Cardiology")).generate()
pat = PersonaGenerator(Patient(symptoms="chest pain")).generate()
# Optionally specify generation sources per attribute
ctx_gen = ContextGenerator(Context(location="emergency room"))
ctx_gen.set(
objects=get_random_object, # user-defined function
circumstances="{csv:circumstance:./data/circumstances.csv}", # CSV file values
goals="{llm:Suggest a realistic goal for the context}" # targeted LLM instruction
)
ctx = ctx_gen.generate()
Tip
πΉοΈ π Try the demo notebook to experiment with generators.
SDialog can also easily act as a controllable test harness for any (OpenAIβcompatible) conversational backend. Create realistic or adversarial user personas to roleβplay against your deployed system:
- Blackβbox functional checks (Does the system follow instructions? Handle edge cases?)
- Persona / useβcase coverage (Different goals, emotions, domains)
- Regression testing (Run the same persona batch each release; diff dialogs)
- Safety / robustness probing (Angry, confused, or noisy users)
- Automated evaluation (Pipe generated dialogs directly into evaluators below)
Core idea: your remote system is wrapped as an Agent
; simulated users are Agent
s with personas producing diverse conversation trajectories, all recorded as Dialog
objects you can save, diff, and score.
Below is a minimal example where an "angry customer" interacts once with a mock remote endpoint:
# Our remote system (your conversational backend exposing an OpenAI-compatible API)
system = Agent(
model="my/super-llm", # Model name exposed by your server
openai_api_base="http://my-endpoint.com:8000/v1", # Base URL of the service
openai_api_key="EMPTY", # Or a real key if required
name="System"
)
# Let's manually define one (minimal) synthetic customer persona
angry_customer = Customer(
name="Riley",
issue="Billing error on last invoice",
issue_description="Charged twice for the same month",
anger_level="high",
times_called=3,
)
simulated_customer = Agent(persona=angry_customer, name="Customer")
# Let's make the system talk to our simulated customer once
dialog = system.dialog_with(simulated_customer)
dialog.to_file("dialog_0.json")
Next, evaluate these dialogs or orchestrate agents with more complex flows using rule/LLM hybrid orchestrators (see tutorials 3 & 7).
Use builtβin metrics (readability, flow, linguistic features, LLM judges) or easily create new ones, then aggregate and compare datasets via DatasetComparator
.
from sdialog.evaluation import LLMJudgeRealDialog, LinguisticFeatureScore
from sdialog.evaluation import FrequencyEvaluator, MeanEvaluator
from sdialog.evaluation import DatasetComparator
reference = [...] # list[Dialog]
candidate = [...] # list[Dialog]
judge = LLMJudgeRealDialog()
flesch = LinguisticFeatureScore(feature="flesch-reading-ease")
comparator = DatasetComparator([
FrequencyEvaluator(judge, name="Realistic dialog rate"),
MeanEvaluator(flesch, name="Mean Flesch Reading Ease"),
])
results = comparator({"reference": reference, "candidate": candidate})
# Plot results for each evaluator
comparator.plot()
Tip
See evaluation tutorial.
Attach Inspectors to capture perβtoken activations and optionally steer (add/ablate directions) to analyze or intervene in model behavior.
import sdialog
from sdialog.interpretability import Inspector
from sdialog.agents import Agent
sdialog.config.llm("huggingface:meta-llama/Llama-3.2-3B-Instruct")
agent = Agent(name="Bob")
inspector = Inspector(target="model.layers.16.post_attention_layernorm")
agent = agent | inspector
agent("How are you?")
agent("Cool!")
# Let's get the last response's first token activation vector!
act = inspector[-1][0].act # [response index][token index]
Steering intervention (subtracting a direction):
anger_direction = torch.load("anger_direction.pt") # A direction vector (e.g., PCA / difference-in-mean vector)
agent_steered = agent | inspector - anger_direction # Ablate the anger direction from the target activations
agent_steered("You are an extremely upset assistant") # Agent "can't get angry anymore" :)
Tip
See the tutorial on using SDialog to remove the refusal capability from LLaMA 3.2.
Many backends supported, just use "BACKEND:MODEL"
string format to either set a global default LLM for all components or pass one to each component:
import sdialog
# Change the default global LLM
sdialog.config.llm("ollama:qwen3:14b")
# Any argument supported by the chosen backend/model can also be given, for example
sdialog.config.llm("ollama:qwen3:14b",
temperature=0.7,
base_url="https://my-ollama-endpoint.com:123") # Remote Ollama server
Any LLM-powered component can also take a specific model and its parameters as argument, to overwrite the default one:
from sdialog.agents import Agent
my_agent = Agent(model="amazon:anthropic.claude-3-5-sonnet-20240620-v1:0",
region_name="us-east-1")
- Demo notebook
- Tutorials
- API reference
- Documentation
- LLM-friendly docs for AI coding assistants (GitHub Copilot, etc.) following the llm.txt specification, in your chat use:
#fetch https://sdialog.readthedocs.io/en/latest/llm.txt Your prompt to use sdialog here...
To accelerate open, rigorous, and reproducible conversational AI research, SDialog invites the community to collaborate and help shape the future of open dialogue generation.
Contributions of any size are welcome and help shape the future of open dialogue generation:
- ποΈ Dataset Standardization: Help convert existing dialogue datasets to SDialog format. Currently, each dataset stores dialogues in different formats, making cross-dataset analysis and model evaluation challenging. Converted datasets are made available as Hugging Face datasets in the SDialog organization for easy access and integration.
- π§ Component Development: Create new personas, orchestrators, evaluators, generators, or backend integrations
- π Evaluation & Benchmarks: Design new metrics, evaluation frameworks, or comparative studies
- π§ Interpretability Research: Develop new analysis tools, steering methods, or mechanistic insights
- π Documentation & Tutorials: Improve guides, add examples, or create educational content
- π Issues & Discussions: Report bugs, request features, or share research ideas and use cases
Note
Example: Check out Primock-57, a sample dataset already available in SDialog format on Hugging Face.
If you have a dialogue dataset you'd like to convert to SDialog format, need help with the conversion process, or want to contribute in any other way, please open an issue or reach out to us. We're happy to help and collaborate!
See CONTRIBUTING.md. We welcome issues, feature requests, and pull requests. If you want to contribute to the project, please open an issue or submit a PR, and help us make SDialog better π
This project follows the all-contributors specification. All-contributors list:
Sergio Burdisso π» π€ π β |
Labrak Yanis π» π€ |
SΓ©verin π» π€ β |
Ricard Marxer π» π€ |
Thomas Schaaf π» |
David Liu π» |
ahassoo1 π€ π» |
Pawel Cyrta π» π€ |
ABCDEFGHIJKL π» |
This work was supported by the European Union Horizon 2020 project ELOQUENCE (grant number 101070558).
The initial development of this project began in preparation for the 2025 Jelinek Memorial Summer Workshop on Speech and Language Technologies (JSALT 2025) as part of the "Play your Part" research group.
MIT License
Copyright (c) 2025 Idiap Research Institute