Private monorepo for the JSPS KAKENHI-side EQ-Fuzzy research line on uncertainty-aware evaluation of LLM emotional intelligence.
This repository now covers three scientifically separate workstreams:
- ICECCME 2026: human-grounded multilingual pilot (conference site)
- SCIS 2026: persona x temperature factorial deconfounding (conference site)
- ICICIC 2026: benchmark positioning / matched comparison (conference site)
The repository is shared because the implementation substrate is shared. The papers remain separate because the scientific claims are separate.
Share code, not claims.
Shared infrastructure is encouraged when it supports reproducibility across the EQ-Fuzzy line:
- text and model registries
- prompt and response-schema infrastructure
- execution, retry, and provenance tooling
- parsing and validation
- alignment metrics, variance, fuzzy membership, and fuzzy entropy
- artifact regeneration for figures and tables
Shared novelty claims are not allowed. Each paper keeps its own main question, figure set, manuscript framing, discussion language, and submission-ready claims.
ICECCME, SCIS, and ICICIC all depend on the same benchmark substrate:
- literary text and translation provenance
- model panels and provider routing
- prompt templates and structured response schemas
- run manifests and normalized model-score tables
- metrics for human alignment, drift, validity, variance, and fuzzy behavior
- reproducible paper artifacts generated from the same audit trail
Keeping these pieces in one private monorepo reduces duplicated implementation work and makes it easier to trace results from raw outputs to paper figures without forcing the papers into one scientific story.
SPReAD1000 stays separate.
SPReAD1000 is an adjacent application / workflow PoC, not a workstream inside this benchmark monorepo. It is expected to have different assets and risks:
- annotation workflows
- review queues
- demo or UI layers
- expert-operation logs
- separate data-governance and deliverable requirements
SPReAD1000 may later reuse a small frozen utility package, vendored module, or extracted eqf_core component, but eq-fuzzy must not depend on SPReAD-specific workflow code, UI code, or annotation-ops logic.
| Workstream | Role | Main question | Not the claim |
|---|---|---|---|
| ICECCME 2026 | human-grounded multilingual pilot | Which current LLMs align best with Japanese human VAS references, and how robust is that alignment across EN/ZH? | full persona x temperature deconfounding |
| SCIS 2026 | factorial deconfounding | How much score variation is attributable to persona, temperature, and their interaction? | multilingual human-alignment ranking |
| ICICIC 2026 | benchmark positioning / matched comparison | What does EQ-Fuzzy capture beyond existing emotion benchmarks? | a rerun of ICECCME or SCIS |
The current working pipeline is still the ICECCME 2026 pipeline. ICECCME-specific implementations live under src/iceccme2026/, and root-level compatibility wrappers have been removed:
src/iceccme2026/remains the working package.uv run python -m src.iceccme2026.cli ...is the canonical ICECCME CLI.uv run python -m src.iceccme2026.openrouter_runner ...is the canonical OpenRouter runner.uv run python -m src.iceccme2026.verifyis the canonical verification command.paper/iceccme2026/remains the working manuscript path.scripts/iceccme2026/is the canonical home for ICECCME script implementations.configs/iceccme/,configs/shared/,prompts/iceccme/, andprompts/shared/are the canonical config and prompt locations.results/iceccme2026/is the canonical home for ICECCME result CSV/JSON/table/figure outputs.data/iceccme2026/is the canonical home for ICECCME data.- SCIS and ICICIC directories are placeholders only until their real configs, prompts, and analysis code are designed.
Before SCIS or ICICIC work starts, path ownership is fixed in docs/PATH_OWNERSHIP.md. New generated outputs should use runs/<workstream>/... or artifacts/<workstream>/...; future SCIS and ICICIC code must not overwrite the existing ICECCME results/iceccme2026/* outputs.
openai/gpt-5.4anthropic/claude-sonnet-4.5google/gemini-2.5-prox-ai/grok-4.20deepseek/deepseek-v3.2qwen/qwen3.6-plus
See docs/iceccme2026/model_selection_openrouter_2026-04-17.md for the rationale and reserve models.
configs/iceccme/experiment.yaml- default primary neutral runconfigs/iceccme/experiment_secondary_persona.yaml- secondary persona sensitivity runconfigs/shared/models_default.yaml- selected OpenRouter core-6 panelconfigs/shared/models_budget4.yaml- smaller budget fallback panelconfigs/shared/texts_from_definitions.yaml- source-of-truth text mapping fromdefinitions.pyconfigs/shared/personas_from_definitions.yaml- original p1-p4 mapping fromdefinitions.pyconfigs/iceccme/personas_primary_neutral.yaml- new p0 neutral persona for the main paper endpoint
The canonical config locations are configs/iceccme/ for ICECCME-specific experiment and paper settings, and configs/shared/ for model/text/persona registries that can be reused by later workstreams.
uv sync
uv run python -m src.iceccme2026.cli prepare-human --input /absolute/path/to/文学短編作品.xlsx --output-dir data/iceccme2026/derived_public
uv run python -m src.iceccme2026.cli build-manifest --config configs/iceccme/experiment.yaml --models configs/shared/models_default.yaml --output data/iceccme2026/manifests/iceccme2026_primary_neutral_manifest.csv
uv run python -m src.iceccme2026.cli build-manifest --config configs/iceccme/experiment_secondary_persona.yaml --models configs/shared/models_default.yaml --output data/iceccme2026/manifests/iceccme2026_secondary_persona_manifest.csv
uv run python -m src.iceccme2026.verify
# optional: normalize raw run outputs into the long-format file expected by score-alignment
uv run python -m src.iceccme2026.cli normalize-model-scores --input path/to/raw_outputs.jsonl --manifest data/iceccme2026/manifests/iceccme2026_primary_neutral_manifest.csv --join-on-order --output data/iceccme2026/interim/model_scores.csvEquivalent Make targets use explicit ICECCME names:
make iceccme-prepare-human
make iceccme-manifest
make iceccme-verify
make iceccme-paperICECCME prompt text lives in prompts/iceccme/, and the shared response schema lives in prompts/shared/.
Use the preview script before large runs:
uv run python scripts/iceccme2026/render_prompt_preview.py --story-id T1 --persona-id p0 --language ja --text-file data/catalogs/texts_private/ja/T1.txt --output T1_p0_ja_prompt.txtAfter results/iceccme2026/csv/ja_primary_ranking.csv and results/iceccme2026/csv/model_language_drift_vs_ja.csv exist, regenerate Figure 2, Figure 3, Figure 4, and Table 2 with:
uv run python scripts/iceccme2026/plot_figure2_ja_ranking.py
uv run python scripts/iceccme2026/plot_figure3_cross_language_drift.py
uv run python scripts/iceccme2026/plot_figure4_alignment_vs_avg_drift.py
uv run python scripts/iceccme2026/export_table2_primary.pyThe following directories are intentionally empty except for .gitkeep or a small README until the corresponding workstreams are ready:
configs/shared/configs/iceccme/configs/scis/configs/icicic/prompts/shared/prompts/iceccme/prompts/scis/prompts/icicic/src/core/scripts/iceccme2026/scripts/scis2026/scripts/icicic2026/data/iceccme2026/results/iceccme2026/paper/scis2026/paper/icicic2026/runs/iceccme2026/runs/scis2026/runs/icicic2026/artifacts/iceccme2026/artifacts/scis2026/artifacts/icicic2026/artifacts/scratch/artifacts/scratch/figures/artifacts/scratch/tables/artifacts/scratch/manuscripts/snapshots/iceccme2026/snapshots/scis2026/snapshots/icicic2026/
Do not add fake SCIS or ICICIC configs just to fill these directories.
No large refactor is part of this bootstrap. The next conservative extraction candidates are:
src/iceccme2026/manifest.pyfor shared manifest utilitiessrc/iceccme2026/metrics.pyfor shared alignment and statistics utilitiessrc/iceccme2026/model_scores.pyfor normalized score loading and validation- generic pieces of
src/iceccme2026/reporting.pyandsrc/iceccme2026/paper_exports.py
Only extract code after a second workstream actually needs it and the behavior can be covered by tests.
AGENTS.md- first-read instructions for coding agents working in this repositorydocs/README.md- documentation ownership and navigationdocs/WORKSTREAMS.md- scientific separation of ICECCME, SCIS, and ICICICdocs/MONOREPO_POLICY.md- repository rules and SPReAD boundarydocs/DEVELOPMENT_POLICY.md- environment, core extraction, test, and scratch-artifact policydocs/MIGRATION_PLAN.md- non-destructive migration sequence and shared-core targetsdocs/PATH_OWNERSHIP.md- ownership map for shared, ICECCME, SCIS, and ICICIC pathsdocs/context/- canonical context prompts for shared and per-workstream planningdocs/iceccme2026/- ICECCME-specific run guides, output inventory, reproducibility notes, and paper planning notesdocs/scis2026/- SCIS-specific planning notes; placeholder until the real experiment design is fixeddocs/icicic2026/- ICICIC-specific planning notes; placeholder until the real experiment design is fixed
The resent jaciii_iihmsp2025.zip still appears to contain directory entries only. The concrete reusable source in this update is therefore external/jaciii_iihmsp2025/definitions.py, which is also mirrored into src/iceccme2026/source_of_truth.py for easier downstream use.