Tags: rlouf/sigil
Tags
Keep truncated tool calls from poisoning the session A long write hit the 1200-token completion cap, leaving its arguments cut mid-JSON; the timeline replayed the malformed call on every later prompt and the chat template failed each request with an opaque 500. Raise the default completion cap to 8192 and reject responses that hit max_tokens mid tool call. Repair invalid tool call arguments when projecting the timeline into chat messages, so already-poisoned sessions render again. Include the HTTP error body in model request failures. Split the stream limits: a 600s first-output timeout covers connect plus prefill, and the 120s idle timeout only bounds silence between chunks once output flows.
Fix Linux-only test failures in the release pipeline Use POSIX octal escapes in the invalid-UTF-8 bash test (dash's printf does not support hex escapes) and close directly created SqliteStore instances so the unclosed-database ResourceWarning cannot escalate under filterwarnings=error.
Prepend only shell activity newer than the last model response Every fresh ask prepended the last 10 shell turns regardless of whether the model had already seen them. With timeline continuity that compounds: each snapshot is persisted in the recorded user message and re-fed on every later turn, so three asks in a row carry the same commands three times. prepend_recent_turns now anchors on last_event_time() — one ref read and one object read off the event head, no chain walk — and both recent_turns_context and active_failure_context filter to strictly newer than that. A session's first ask (no timeline) keeps the full bounded window as cold-start context; a follow-up with no intervening shell work sends the bare question; a failure the model already answered about is not repeated. Anchoring on the last event rather than the last assistant message also keeps a handoff-executed command from being re-prepended when its handoff result is already the newest timeline entry.