Skip to content

xlsx: fix ignoring rich text <r> after initial plain <t>#637

Merged
jmcnamara merged 2 commits into
tafia:masterfrom
aquasync:xlsx-richtext-after-plain
May 10, 2026
Merged

xlsx: fix ignoring rich text <r> after initial plain <t>#637
jmcnamara merged 2 commits into
tafia:masterfrom
aquasync:xlsx-richtext-after-plain

Conversation

@aquasync
Copy link
Copy Markdown
Contributor

Currently read_string_with_bufs just ignores all subsequent events until closing tag if hitting a plain text node first. This causes it to ignore subsequent text in examples like this:

  <si>
    <t>tval</t>
    <r>
      <t>rval1</t>
    </r>
    <r>
      <t>rval2</t>
    </r>
  </si>

I guess this was for speed reasons? I think in the typical case the next event would be the closing tag anyway, so shouldn't be much different, and I don't think this introduces any additional copies. It does make rich_buffer a bit of a misnomer though, so maybe should be changed (though text_buf means something different in this function).

@jmcnamara jmcnamara self-assigned this May 10, 2026
@jmcnamara jmcnamara added the ready for merge Suitable for merge in the next release cycle. label May 10, 2026
@jmcnamara jmcnamara merged commit 78a3021 into tafia:master May 10, 2026
2 of 6 checks passed
@aquasync aquasync deleted the xlsx-richtext-after-plain branch May 12, 2026 11:01
ddimaria added a commit to ddimaria/calamine that referenced this pull request May 12, 2026
…eet layout

Add style support for Calamine xlsx files.

Public API:
- `Xlsx::worksheet_style(sheet)` returns a row x col grid of cell styles
  using run-length encoding for memory efficiency on large workbooks.
- `Xlsx::worksheet_layout(sheet)` returns column widths and row heights.

Style types (in `src/style.rs`):
- `Style` with optional Font / Fill / Borders / Alignment / NumberFormat
  / Protection.
- `Color` with theme + tint resolution and indexed-color fallback.
- `RichText` / `TextRun` for cells with mixed inline formatting.
- `StyleRange` with RLE storage and a `cells()` iterator.

Parser in `src/xlsx/style_parser.rs` handles fonts (bold / italic /
underline / strikethrough / sz / color), fills, borders (with color and
style per side), number formats (built-in + custom format codes),
alignment (horizontal / vertical / wrap / indent / shrink / text
rotation incl. stacked), protection (locked default per OOXML), theme
colors with tint, and sysClr lastClr fallback.

Shared-string reader now decodes rich text runs and preserves their
formatting, while also handling plain text that precedes rich runs
(consistent with upstream PR tafia#637).

Includes benchmarks in `benches/style.rs` and test fixtures
(styles.xlsx, borders.xlsx, EMSI_JobChange_UK.xlsx,
problematic_formats.xlsx, styles_1M.xlsx) covering the various code
paths.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready for merge Suitable for merge in the next release cycle.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants