Producer-dependent poor throughput and tiny read fragments for large output on ConPTY

## Summary

When a ConPTY-backed Windows PTY created by `pywinpty 3.0.3` is used for large plain-text output, throughput appears to be highly producer-dependent on my machine.

`cmd.exe /d /c type <large-text-file>` and `pwsh -NoProfile -Command "Get-Content -Path <large-text-file>"` are much slower and more fragmented than a real `cat.exe <large-text-file>` producer on the same machine.

Minimal reproducer:

```powershell
cmd.exe /d /c type <large-text-file>
```

This approximates viewing large logs or other bulk text output in a Windows terminal.

Comparing against a direct non-PTY pipe capture is still useful as a baseline, but I no longer think this is necessarily a `pywinpty`-only issue: the same `type` producer is also very slow in a normal Windows terminal, while piping it to a non-terminal sink is fast. Still, from a `pywinpty` consumer's perspective, the PTY path can be dramatically slower than direct pipe capture, and the fragmentation pattern varies sharply by producer.

I stripped the app-side relay logic down pretty aggressively and the slowdown still reproduces when talking to `pywinpty` directly, so this does not appear to be caused by my Python-side output handling.

Possibly related:

- https://github.com/andfoy/pywinpty/issues/545
- https://github.com/andfoy/pywinpty/issues/463

This issue is specifically about throughput / fragmentation rather than interactive VT-query lag. I am filing post-exit EOF / drain behavior [separately](https://github.com/andfoy/pywinpty/issues/565).

## Environment

- Windows build: `10.0.26100`
- Python: `3.14`
- `pywinpty`: `3.0.3`

## Reproduction

I attached a self-contained script `measure_pywinpty.py` that generates a deterministic large text fixture, measures a direct non-PTY pipe-capture baseline, and then measures the same producer through a ConPTY-backed PTY created by `pywinpty`.

Run it with:

```powershell
uv run --script measure_pywinpty.py --producer type
```

<details>
<summary>Benchmark reproducer script: <code>measure_pywinpty.py</code></summary>

```python
# /// script
# requires-python = ">=3.12"
# dependencies = [
#   "pywinpty==3.0.3",
# ]
# ///
from __future__ import annotations

import argparse
import json
import platform
import shutil
import statistics
import subprocess
import sys
import time
from pathlib import Path

import winpty

CONPTY_BACKEND_NAME = "conpty"
DEFAULT_LINE_COUNT = 200_000
LEFT_BLOCK = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789" * 2
RIGHT_BLOCK = LEFT_BLOCK[::-1]
POST_EXIT_DRAIN_SECONDS = 2.0
POST_EXIT_DRAIN_POLL_SECONDS = 0.01


def create_fixture(path: Path, line_count: int) -> None:
    path.parent.mkdir(parents=True, exist_ok=True)

    with path.open("w", encoding="utf-8", newline="\n") as stream:
        stream.write("PyWinPTY large output throughput fixture\n")
        stream.write(f"lines={line_count}\n")
        for i in range(1, line_count + 1):
            stream.write(f"{i:06d} | {LEFT_BLOCK} | {RIGHT_BLOCK}\n")


def build_command(producer: str, fixture_path: Path) -> list[str]:
    fixture_literal = "'" + str(fixture_path).replace("'", "''") + "'"

    if producer == "type":
        return ["cmd.exe", "/d", "/c", "type", str(fixture_path)]

    if producer == "bat":
        return ["bat", "-P", str(fixture_path)]

    if producer == "cat":
        return ["cat", str(fixture_path)]

    if producer == "get_content":
        return [
            "pwsh.exe",
            "-NoProfile",
            "-Command",
            f"Get-Content -Path {fixture_literal}",
        ]

    if producer == "python":
        payload = """\
import shutil
import sys

with open(sys.argv[1], "rb") as f:
    shutil.copyfileobj(f, sys.stdout.buffer)
"""
        return [sys.executable, "-u", "-c", payload, str(fixture_path)]

    message = f"Unsupported producer: {producer}"
    raise ValueError(message)


def resolve_executable(executable: str) -> str:
    return shutil.which(executable) or executable


def build_cmdline(args: list[str]) -> str | None:
    if not args:
        return None

    return subprocess.list2cmdline(args)


def measure_direct(command: list[str]) -> dict[str, object]:
    started_at = time.perf_counter()
    process = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, check=False)
    elapsed_seconds = time.perf_counter() - started_at

    output = process.stdout
    total_chars = len(output)
    total_bytes = len(output.encode("utf-8", errors="replace"))

    return {
        "kind": "direct",
        "returncode": process.returncode,
        "elapsed_seconds": round(elapsed_seconds, 3),
        "total_chars": total_chars,
        "total_bytes": total_bytes,
        "chars_per_second": round(total_chars / elapsed_seconds, 1),
        "mb_per_second": round(total_bytes / elapsed_seconds / 1024 / 1024, 2),
    }


def create_pty(cols: int, rows: int) -> winpty.PTY:
    return winpty.PTY(cols, rows, backend=winpty.Backend.ConPTY)


def accumulate_output(
    output: str, read_sizes: list[int], total_chars: list[int], total_bytes: list[int]
) -> None:
    read_sizes.append(len(output))
    total_chars[0] += len(output)
    total_bytes[0] += len(output.encode("utf-8", errors="replace"))


def get_exitstatus(pty: winpty.PTY) -> int | None:
    return pty.get_exitstatus()


def drain_after_exit(
    pty: winpty.PTY,
    read_sizes: list[int],
    total_chars: list[int],
    total_bytes: list[int],
    *,
    grace_seconds: float,
    poll_seconds: float,
) -> tuple[bool, bool]:
    deadline = time.monotonic() + grace_seconds

    while True:
        if pty.iseof():
            return True, False

        try:
            output = pty.read(blocking=False)
        except winpty.WinptyError:
            if pty.iseof():
                return True, False
            if time.monotonic() >= deadline:
                return False, True

            time.sleep(poll_seconds)
            continue

        if output:
            accumulate_output(output, read_sizes, total_chars, total_bytes)
            deadline = time.monotonic() + grace_seconds
            continue

        if time.monotonic() >= deadline:
            return False, True

        time.sleep(poll_seconds)


def measure_pty(command: list[str], *, cols: int, rows: int) -> dict[str, object]:
    pty = create_pty(cols, rows)
    pty.spawn(
        resolve_executable(command[0]),
        cmdline=build_cmdline(command[1:]),
    )

    read_sizes: list[int] = []
    total_chars = [0]
    total_bytes = [0]
    started_at = time.perf_counter()
    reached_eof = False
    drain_timed_out = False
    exitstatus: int | None = None

    while True:
        try:
            output = pty.read(blocking=True)
        except winpty.WinptyError as error:
            exitstatus = get_exitstatus(pty)
            if pty.iseof() or exitstatus is not None:
                if pty.iseof():
                    reached_eof = True
                else:
                    reached_eof, drain_timed_out = drain_after_exit(
                        pty,
                        read_sizes,
                        total_chars,
                        total_bytes,
                        grace_seconds=POST_EXIT_DRAIN_SECONDS,
                        poll_seconds=POST_EXIT_DRAIN_POLL_SECONDS,
                    )
                break

            message = f"PTY read failed unexpectedly: {error}"
            raise RuntimeError(message) from error

        if not output:
            if pty.iseof():
                reached_eof = True
                break
            exitstatus = get_exitstatus(pty)
            if exitstatus is not None:
                reached_eof, drain_timed_out = drain_after_exit(
                    pty,
                    read_sizes,
                    total_chars,
                    total_bytes,
                    grace_seconds=POST_EXIT_DRAIN_SECONDS,
                    poll_seconds=POST_EXIT_DRAIN_POLL_SECONDS,
                )
                break
            continue

        accumulate_output(output, read_sizes, total_chars, total_bytes)

    elapsed_seconds = time.perf_counter() - started_at

    return {
        "kind": "pty",
        "backend": CONPTY_BACKEND_NAME,
        "exitstatus": get_exitstatus(pty),
        "elapsed_seconds": round(elapsed_seconds, 3),
        "total_chars": total_chars[0],
        "total_bytes": total_bytes[0],
        "reads": len(read_sizes),
        "chars_per_second": round(total_chars[0] / elapsed_seconds, 1),
        "mb_per_second": round(total_bytes[0] / elapsed_seconds / 1024 / 1024, 2),
        "mean_chars_per_read": round(statistics.mean(read_sizes), 1) if read_sizes else 0.0,
        "median_chars_per_read": round(statistics.median(read_sizes), 1) if read_sizes else 0.0,
        "max_chars_per_read": max(read_sizes) if read_sizes else 0,
        "eof_reached": reached_eof,
        "post_exit_drain_timed_out": drain_timed_out,
    }


def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description="Measure pywinpty ConPTY behavior for a large plain-text producer on Windows."
    )
    parser.add_argument(
        "--fixture",
        type=Path,
        default=Path(__file__).with_name("large_output_fixture.txt"),
        help="Path to the generated text fixture.",
    )
    parser.add_argument(
        "--lines",
        type=int,
        default=DEFAULT_LINE_COUNT,
        help="Number of generated data lines in the fixture.",
    )
    parser.add_argument(
        "--overwrite-fixture",
        action="store_true",
        help="Regenerate the fixture even if it already exists.",
    )
    parser.add_argument(
        "--producer",
        choices=["type", "bat", "cat", "get_content", "python"],
        default="type",
        help=(
            "Producer command to benchmark. 'type' uses cmd.exe built-in type, "
            "'bat' runs bat -P, 'cat' uses the cat executable on PATH, and "
            "'get_content' uses pwsh Get-Content."
        ),
    )
    parser.add_argument("--cols", type=int, default=120, help="PTY column count.")
    parser.add_argument("--rows", type=int, default=40, help="PTY row count.")
    parser.add_argument(
        "--skip-direct",
        action="store_true",
        help="Skip the direct non-PTY subprocess baseline.",
    )
    return parser.parse_args()


def main() -> None:
    args = parse_args()

    if args.overwrite_fixture or not args.fixture.exists():
        create_fixture(args.fixture, args.lines)

    command = build_command(args.producer, args.fixture)

    results: dict[str, object] = {
        "environment": {
            "python": sys.version.split()[0],
            "platform": platform.platform(),
            "windows_version": platform.version(),
            "pywinpty": winpty.__version__,
        },
        "fixture": {
            "path": str(args.fixture.resolve()),
            "bytes": args.fixture.stat().st_size,
            "lines_argument": args.lines,
        },
        "command": command,
    }

    if not args.skip_direct:
        results["direct"] = measure_direct(command)

    results["pty"] = measure_pty(command, cols=args.cols, rows=args.rows)

    print(json.dumps(results, indent=2))


if __name__ == "__main__":
    main()
```

</details>

The script currently supports these relevant producers:

- `type`: `cmd.exe /d /c type <file>`
- `get_content`: `pwsh -NoProfile -Command "Get-Content -Path <file>"`
- `cat`: `cat.exe <file>` from `PATH`
- `bat`: `bat -P <file>`
- `python`: a simple Python binary-copy producer used in the separate EOF / drain issue

Example alternate producer:

```powershell
uv run --script measure_pywinpty.py --producer get_content
```

## Expected behavior

I do not expect PTY throughput to exactly match every non-PTY code path, but I would expect simple large-text producers to stay in roughly the same qualitative range instead of some degrading into tiny fragments while others remain much healthier.

## Actual behavior

Representative results on the same machine and same `31.4 MB` generated fixture:

| Producer | Direct pipe capture | PTY | Mean chars/read | Notes |
| --- | ---: | ---: | ---: | --- |
| `cmd.exe /d /c type <file>` | `1.2s` | `37.8s` | `243` | Also very slow in a normal Windows terminal |
| `pwsh.exe -NoProfile -Command "Get-Content -Path <file>"` | `12.9s` | `32.0s` | `281.3` | Still fragmented, but better than `type` |
| `cat.exe <file>` | `0.248s` | `6.063s` | `7083.6` | Much larger chunks and much better throughput |

Producer	Direct pipe capture	PTY	Mean chars/read	Notes
`cmd.exe /d /c type <file>`	`1.2s`	`37.8s`	`243`	Also very slow in a normal Windows terminal
`pwsh.exe -NoProfile -Command "Get-Content -Path <file>"`	`12.9s`	`32.0s`	`281.3`	Still fragmented, but better than `type`
`cat.exe <file>`	`0.248s`	`6.063s`	`7083.6`	Much larger chunks and much better throughput

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Producer-dependent poor throughput and tiny read fragments for large output on ConPTY #564

Summary

Environment

Reproduction

Expected behavior

Actual behavior

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Producer-dependent poor throughput and tiny read fragments for large output on ConPTY #564

Description

Summary

Environment

Reproduction

Expected behavior

Actual behavior

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions