Markdown by agerada · Pull Request #48 · rchaput/acronyms

agerada · 2025-09-14T23:01:29Z

Implementation of markdown parsing for acronyms.

This required quite extensive code changes, which may have implications for future that will need to be considered. The implementation uses pandoc lua filters to parse markdown in the short or long names. This seems to capture most markdown characters (at least the common ones that I have tested such as bold, italic, subscript, etc). Existing functionality with case modifiers has been kept, although this is not exhaustive (see text 46).

One of the challenges was the style functions that generate links. These are now more complicated, which is an important limitation. However, compatibility with the extension approach described in docs/advanced/extending.qmd) has been maintained.

Markdown parsing is kept as an 'opt in', by default, markdown is not parsed. In the yaml, users can declare markdown:longname (or shortname, or both), which will enable parsing. Markdown strings should be wrapped in quotes to ensure that there are no issues in parsing the yaml. Markdown can also be triggered (or suppressed) for an individual acronym, see test 42.

To avoid ambiguous codes, users must provide an explicit acronym key when parsing markdown in the shortname. Similiarly, implicit plural is not supported with markdown (plural text must be provided).

GIven the extensive changes proposed in this PR, I will leave as a draft for now until I complete more testing on my documents. Documentation has not been updated (will do once design agreed).

rchaput · 2025-09-16T17:12:35Z

Woah, you're on a roll!

That was a very complex feature, in my initial tests I found it to conflict with the "custom List of Acronyms" (because it was easier to print acronyms as strings at this point).

It'll take me some time to review it, due to the sheer amount of commits and lines of code; but a few remarks just by reading your description:

the PR includes the commits for the capitalize option, can you update it to begin with commit d236e85?
you mention that markdown parsing is "opt-in", was it required to avoid bugs? I feel like it would be easier if Markdown parsing was on by default. For acronyms that require disabling the Markdown, (for example, *shortname* exactly as-is, not interpreted as italic), maybe there is a Pandoc syntax, such as "raw strings" that could do the job?
I'm not sure it will be useful to allow individual triggering of Markdown, I can't see a case in which one would want to print the acronym without Markdown?
I agree that users should specify the key when using Markdown in the shortname. It could be difficult to identify the correct acronym otherwise.
As I said, I remember having problems with the "custom LoA" (meaning using Markdown to specify how the LoA should be rendered), did you try mixing the two? If it does not work, we'll have to mark it as a limitation.

Thanks again for this long awaited feature! I'll try to review as quickly as possible.

Refactored case conversion to support pandoc inline. Test 44 now fixed.

Allow per acronym markdown setting. Now use markdown: longname|shortname|false Raise errors when implicit pleural + md attempted. General refactoring to reduce repeated code and improve readability

Restored old create_element function to allow extensions as described in docs/advanced/extending.qmd

agerada · 2025-09-18T09:49:34Z

Hi @rchaput, thanks for the review! I'm happy to contribute as I have a personal need for this feature. My PhD thesis is otherwise complete and I've commited to use acronyms in it. Because I work in a biological domain, the use of italics for bacterial names is not optional for me, so without markdown (or some equivalent), I would probably not be able to use acronyms.

* the PR includes the commits for the capitalize option, can you update it to begin with commit [d236e85](https://github.com/rchaput/acronyms/pull/48/commits/d236e8536020f1559d9f59a44638e2573e7b2f41)?

Apologies, I had messed something up with git here (maybe did not sync the branches properly before starting work). Should be fixed now.

* you mention that markdown parsing is "opt-in", was it required to avoid bugs? I feel like it would be easier if Markdown parsing was on by default. For acronyms that require disabling the Markdown, (for example, `*shortname*` exactly as-is, not interpreted as italic), maybe there is a Pandoc syntax, such as "raw strings" that could do the job?
* I'm not sure it will be useful to allow individual triggering of Markdown, I can't see a case in which one would want to print the acronym without Markdown?

I agree that it would be easier if markdown was parsed by default. The two main reasons I have not done this, at least for now, is that:

The issue with shortname being used as the key. If we always parsed markdown in the shortname then to be safe, users would have to always provide an explicit key.
There may be some unexpected effects of markdown parsing which I have not appreciated yet. By having this as an opt-in feature, the impact of future issues would be minimised.

Perhaps a reasonable approach would be to parse the longname by default, with the shortname being an opt-in feature.

Regarding the use of raw/unparsed strings, this is currently possible by escaping the markdown characters (see test 42 for an example). This allows users to mix markdown and non-markdown.

* As I said, I remember having problems with the "custom LoA" (meaning using Markdown to specify how the LoA should be rendered), did you try mixing the two? If it does not work, we'll have to mark it as a limitation.

Ah, I didn't appreciate that this was the main issue – I can see why this is tricky. I have tested this, and the good news is that the longname seems to work well. Any markdown added in the custom LoA is simply added to the longname (so for example E. coli applied to **{longname}** becomes E. coli), which I think would be the desired behaviour. Unfortunately the shortname currently does not work. I think the problem is the addition of the link target as raw markdown. Hopefully I can fix this by applying the link target using pandoc filters instead.

Thanks again for this long awaited feature! I'll try to review as quickly as possible.

No problem. I have kept the PR as draft for now, as I plan to do some refactoring for readability to help with the review and future maintenance.

rchaput · 2025-09-18T10:19:33Z

Just a quick idea (I need to read your message in more details after work): you mention that you need italics because of your domain. A quick but working option would be to implement a new style, let's say "italics_short" which always puts the shortname in italics. This would be far easier because in Lua we can use pandoc.Italic (or something similar). Thanks to your previous PR, we can even specify the style for each acronym. I don't mean that it is useless to allow Markdown in acronyms; just that there is a faster way if you (or someone else) require it for your thesis. I'll try to make some time to review your PR later today. Sep 18, 2025 11:49:56 Alessandro Gerada ***@***.***>:

…

[Image]*agerada* left a comment (rchaput/acronyms#48)[#48 (comment)] Hi @rchaput[https://github.com/rchaput], thanks for the review! I'm happy to contribute as I have a personal need for this feature. My PhD thesis is otherwise complete and I've commited to use *acronyms* in it. Because I work in a biological domain, the use of italics for bacterial names is not optional for me, so without markdown (or some equivalent), I would probably not be able to use *acronyms*. ** the PR includes the commits for the capitalize option, can you update it to begin with commit [d236e85](d236e85)? * Apologies, I had messed something up with git here (maybe did not sync the branches properly before starting work). Should be fixed now. ** you mention that markdown parsing is "opt-in", was it required to avoid bugs? I feel like it would be easier if Markdown parsing was on by default. For acronyms that require disabling the Markdown, (for example, `*shortname*` exactly as-is, not interpreted as italic), maybe there is a Pandoc syntax, such as "raw strings" that could do the job? * I'm not sure it will be useful to allow individual triggering of Markdown, I can't see a case in which one would want to print the acronym without Markdown? * I agree that it would be easier if markdown was parsed by default. The two main reasons I have not done this, at least for now, is that: 1. The issue with shortname being used as the key. If we always parsed markdown in the shortname then to be safe, users would have to always provide an explicit key. 2. There may be some unexpected effects of markdown parsing which I have not appreciated yet. By having this as an opt-in feature, the impact of future issues would be minimised. Perhaps a reasonable approach would be to parse the longname by default, with the shortname being an opt-in feature. Regarding the use of raw/unparsed strings, this is currently possible by escaping the markdown characters (see test 42 for an example). This allows users to mix markdown and non-markdown. ** As I said, I remember having problems with the "custom LoA" (meaning using Markdown to specify how the LoA should be rendered), did you try mixing the two? If it does not work, we'll have to mark it as a limitation. * Ah, I didn't appreciate that this was the main issue – I can see why this is tricky. I have tested this, and the good news is that the longname seems to work well. Any markdown added in the custom LoA is simply added to the longname (so for example /E. coli/ applied to ***{longname}*** becomes /*E. coli*/), which I think would be the desired behaviour. Unfortunately the shortname currently does not work. I think the problem is the addition of the link target as raw markdown. Hopefully I can fix this by applying the link target using pandoc filters instead. Thanks again for this long awaited feature! I'll try to review as quickly as possible. No problem. I have kept the PR as draft for now, as I plan to do some refactoring for readability to help with the review and future maintenance. — Reply to this email directly, view it on GitHub[#48 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AGZR5IIU4KQJ2PHRCYRJM2D3TJ54JAVCNFSM6AAAAACGPVS2A2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGMBWGU2TCMBWGQ]. You are receiving this because you were mentioned. [Tracking image][https://github.com/notifications/beacon/AGZR5INDXEQ55PPJCYXIEK33TJ54JA5CNFSM6AAAAACGPVS2A2WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTWFCX3RQ.gif]

agerada · 2025-09-18T13:58:39Z

That's an interesting thought that I hadn't considered. Unfortunately, many of the acronyms are mixtures of different cases, for example methicillin-resistant Staphylococcus aureus (MRSA), or Clostridium difficile infection (CDI). Sadly I don't think it will work (unless using very complex styles, and then it might as well be markdown). Fortunately the thesis is rendering great with acronyms installed from this branch, including italics where required.

rchaput · 2025-09-18T18:56:08Z

I've had time to read your comments in details

I agree that it would be easier if markdown was parsed by default. The two main reasons I have not done this, at least for now, is that:
1. The issue with shortname being used as the key. If we always parsed markdown in the shortname then to be safe, users would have to always provide an explicit key.

I'm wondering whether we can use Pandoc to detect markdown in the shortname. If we could do that, we could: 1) detect if there is markdown; 2) if there is, check if the key is defined, and raise an error otherwise. I don't remember the notions of raw_strings and others like that enough...

2. There may be some unexpected effects of markdown parsing which I have not appreciated yet. By having this as an opt-in feature, the impact of future issues would be minimised.

I agree that it can be risky. Then again, if we have some kind of feature to disable Markdown parsing (I'm thinking something like shortname: r"This is *not* Markdown"), we can mitigate the impact as well, while still simplifying use. I'm not sure many acronyms will include Markdown-specific characters.

Perhaps a reasonable approach would be to parse the longname by default, with the shortname being an opt-in feature.

Do you think we could enable Markdown shortname if and only if the key is set? That would avoid including an additional markdown: shortname option, but I fear that this would confuse people. From a technical point of view, it makes sense, since we require an explicit key to avoid ambiguous shortname->key conversion; but this may not be clear...

Regarding the use of raw/unparsed strings, this is currently possible by escaping the markdown characters (see test 42 for an example). This allows users to mix markdown and non-markdown.

I've seen the example, it's good that we can escape some characters, but I was thinking of escaping the whole string. As I mentioned earlier, I remember some specific syntax like r"raw string", but maybe I'm confusing with Python, I'm not sure...

Ah, I didn't appreciate that this was the main issue – I can see why this is tricky. I have tested this, and the good news is that the longname seems to work well. Any markdown added in the custom LoA is simply added to the longname (so for example E. coli applied to **{longname}** becomes E. coli), which I think would be the desired behaviour. Unfortunately the shortname currently does not work. I think the problem is the addition of the link target as raw markdown. Hopefully I can fix this by applying the link target using pandoc filters instead.

It's great that the longname already works. If we can make the shortname work as well, it's best; but I'm OK with stating it as a limitation if it doesn't. I think that allowing Markdown is too much important.

As we can now assume input is inlines, legacy create_element is not required, as create_rich_element replaces it. To keep compatibility with extending.qmd, kept local reference to create_element.

General refactor and clean up to styles functions.

This commit removes the internal features that allowed markdown to be optional. Now markdown will always be parsed. Characters can still be escaped if necessary (see test 42).

agerada · 2025-09-20T22:21:24Z

After giving this some thought. I fully agree that markdown should be parsed by default. I have enabled this.

I have also removed the internal workings that allowed markdown to be turned on or off (a405c2a). This has allowed me to clean up the code and remove a lot of checking to see whether markdown is being parsed. There is now a contains_markdown function that should detect markdown and ask the user to provide a key (if shortname has markdown).

The issue of escaping markdown-like characters (which, to be fair, should be rarely needed), I think is sufficiently covered by escaping using \\. I imagine it would be possible to have a raw string (or equivalent option), but it would make the internal workings more complicated (needing to check if raw string or inline), for something that already has a workaround. The only annoyance is that two \\ are needed, so that would have to be documented.

Hopefully the code is cleaner now, so I have removed the draft status of this PR.

rchaput

I've read most of your changes; lots of great ideas!

I think it needs a bit of refactoring to simplify the code, but everything should work well (just as you tested)

rchaput · 2025-09-21T11:33:54Z

+    -- Enforce explicit key when markdown parsing for shortname is enabled.
+    if Helpers.contains_markdown(object.shortname) then
+        if object.key == nil then
+            quarto.log.error("[acronyms] Each acronym must provide an explicit `key` when using markdown in shortname.")


It would be useful to print the acronym shortname here so it's clear for the user which one to fix.

Added 82930b4

rchaput · 2025-09-21T11:52:20Z

+        -- Always parse markdown for names
+        local shortname = Helpers.extract_meta_field(v.shortname, true)
+        local longname  = Helpers.extract_meta_field(v.longname,  true)
+    -- raw longname parsed


I don't understand this comment?

old comment, sorry aab2c99

rchaput · 2025-09-21T11:58:10Z

+    if value == nil then return false end
+
+    -- If we have a Pandoc type, check AST nodes directly
+    if pandoc and pandoc.utils and pandoc.utils.type then


Why checking for pandoc.utils.type? Is there a Pandoc version in which we do not have access to this function? We can use the Helpers.is_at_least(version) for this kind of problem.

Agreed, see 62a1c1e

rchaput · 2025-09-21T11:59:22Z

+                if v.t ~= "Str" and v.t ~= "Space" and v.t ~= "SoftBreak" and v.t ~= "LineBreak" then
+                    return true
+                end


This condition is the same as in the previous loop (case). Can we merge the two loops? (in order to simplify the code, not making it harder to read)

Agreed, have changed here: 943a3af

rchaput · 2025-09-21T12:14:24Z

 function AcronymsPandoc.generateDefinitionList(sorted_acronyms)
    local definition_list = {}
    for _, acronym in ipairs(sorted_acronyms) do
-        -- The definition's name. A Span with an ID so we can create a link.


Why delete this comment? Is it no longer up-to-date with the code?

I think I removed this while modifying this function, but have returned here: 44b1149

rchaput · 2025-09-21T12:21:32Z

+-- Kept as a local reference here to keep compatibility with previous versions.
+create_element = Helpers.create_rich_element


I'm not sure we need to keep this; we can simply replace the reference in all styles, and update the documentation.

rchaput · 2025-09-21T12:24:25Z

+        -- Conditional strictness: when markdown is present in a part, require an explicit plural for that part.
+        local need_long_strict = Helpers.contains_markdown(acronym.longname) and not acronym._explicit_plural_longname
+        local need_short_strict = Helpers.contains_markdown(acronym.shortname) and not acronym._explicit_plural_shortname
+        if need_long_strict then
+            quarto.log.error("[acronyms] Plural form requested for '" .. tostring(acronym.key) .. "' but 'plural.longname' was not explicitly provided while markdown parsing is enabled for its longname. Define it under plural: { longname: ... } to use \\acrs{" .. tostring(acronym.key) .. "} .")
+            assert(false)
+        end
+        if need_short_strict then
+            quarto.log.error("[acronyms] Plural form requested for '" .. tostring(acronym.key) .. "' but 'plural.shortname' was not explicitly provided while markdown parsing is enabled for its shortname. Define it under plural: { shortname: ... } to use \\acrs{" .. tostring(acronym.key) .. "} .")
+            assert(false)
+        end


I would prefer raising the error before, typically in acronyms_pandoc (if I remember correctly, the function should be something like replaceExistingAcronym). This is simply checking that an option is correctly set.
The style function should perform very minimal error-checking (such as checking the acronym is not nil, otherwise the function cannot work!).

Agree, much better to raise earlier (and still has the advantage of only raising if plural was attempted, so no unecessary errors). See 66a976a. I have tested this informally as the current testing suite does not (as far as I can see) support expected exceptions.

Co-authored-by: Remy Chaput <rchaput.pro@gmail.com>

…styles

Co-authored-by: Remy Chaput <rchaput.pro@gmail.com>

agerada

Thanks for the review; I appreciate there was a lot of code to go through here! Hopefully all comments addressed but happy to look at anything else that comes up.

The default branch was named `master`, but Quarto recommends using `main`. In addition, the docs was not consistent, and sometimes used an unworking command. This commit changes the few documents that refer to the `master` branch (including the CI/CD and direct links to files in the repository).

psomhorst · 2025-11-20T16:01:37Z

Does this feature also support math (e.g., a double subscript $P_{E_1}$ => $P_{E_1}$) in shortnames or longnames?

agerada · 2025-11-22T17:53:10Z

Hi @psomhorst. The current implementation almost certainly does not support math, although I haven't specifically tested this. I'm not sure how feasible math support would be, it mainly depends on whether there is a Pandoc lua filter to parse it. I can look into this.

agerada · 2025-11-24T22:52:29Z

Good news, it seems like I was wrong! Math is being parsed in this feature. I didn't need to change any of the code, but added a couple of tests (f1c3cff) to demonstrate it. Although I haven't exhaustively tested edge cases, I didn't run into any issues while informally testing it out. Perhaps good for this to be covered in the documentation as well, as it might be important for some scientific domains.

psomhorst · 2025-11-25T10:33:23Z

Fantastic! If you want, I can supply some cases or documentation from a scientific writing standpoint.

agerada · 2025-12-01T13:03:49Z

If you do have a relevant example or two from your domain that demonstrate the use of mathematics in acronyms then I think it would be quite helpful.

psomhorst · 2025-12-08T08:04:59Z

If you do have a relevant example or two from your domain that demonstrate the use of mathematics in acronyms then I think it would be quite helpful.

See this PR.

agerada closed this Sep 16, 2025

agerada force-pushed the markdown branch from 2a847e8 to e6a27e3 Compare September 16, 2025 20:18

agerada added 22 commits September 16, 2025 21:30

parse_markdown_in_longname option

4c23fd1

markdown parsers

9dbd0a7

apply markdown styles

2274335

tests markdown

c78ee4c

merge links

4806a51

explicit plural markdown support

73eb8d7

markdown tests show limitations

e0d0a79

removed redundant parse_simple_italics_markdown

5834d5c

shortname markdown

cf202e5

shortname markdown tests

9eeb105

refactor case conversion

a6eaebc

Refactored case conversion to support pandoc inline. Test 44 now fixed.

test renames

9379eb7

case test text fixed

dcc7e49

markdown sort patch

f87df5b

refactor

14bd29a

Allow per acronym markdown setting. Now use markdown: longname|shortname|false Raise errors when implicit pleural + md attempted. General refactoring to reduce repeated code and improve readability

Tests updated with new options

2efdcc1

Backward compatibility with extensions

56a4db1

Restored old create_element function to allow extensions as described in docs/advanced/extending.qmd

Removed redundant argument from create_rich_element

f32fd27

refactor out helper functions from styles for clarity

0b6a3bb

Test 46 shows case changes limited to common md

52ff2ca

minor refactor

e8c095a

add make_parenthesized

ac0fb0e

agerada reopened this Sep 16, 2025

agerada added 3 commits September 20, 2025 22:06

Remove unnecessary element creation function

4bb5756

As we can now assume input is inlines, legacy create_element is not required, as create_rich_element replaces it. To keep compatibility with extending.qmd, kept local reference to create_element.

Refactor styles

3c9277a

General refactor and clean up to styles functions.

Remove optional markdown code

a405c2a

This commit removes the internal features that allowed markdown to be optional. Now markdown will always be parsed. Characters can still be escaped if necessary (see test 42).

agerada marked this pull request as ready for review September 20, 2025 22:21

rchaput reviewed Sep 21, 2025

View reviewed changes

agerada and others added 13 commits September 22, 2025 21:31

make_parenthesized to helpers

ae43456

line width

b5dbe0b

md docs

3844626

print acronyms with missing key

82930b4

Apply suggestion from @rchaput

dba835c

Co-authored-by: Remy Chaput <rchaput.pro@gmail.com>

clean up old comments from acronyms.lua

aab2c99

check pandoc version when using type

62a1c1e

simplify contains_markdown

943a3af

move transform_case create_element and make_parenthesized to acronym_…

5c5ed3f

…styles

Apply suggestion from @rchaput

ea495e1

Co-authored-by: Remy Chaput <rchaput.pro@gmail.com>

Reinsert definition list comments

44b1149

move plural error raising fro styles to acronyms_pandoc

66a976a

Apply suggestion from @rchaput

5cd113c

Co-authored-by: Remy Chaput <rchaput.pro@gmail.com>

agerada commented Sep 24, 2025

View reviewed changes

rchaput added 2 commits September 30, 2025 18:56

Add instructions to install specific version

6960de2

agerada added 2 commits November 24, 2025 22:21

Merge branch 'rchaput:main' into markdown

38a5ac7

Test maths in names

f1c3cff

		-- Kept as a local reference here to keep compatibility with previous versions.
		create_element = Helpers.create_rich_element

Conversation

agerada commented Sep 14, 2025

Uh oh!

rchaput commented Sep 16, 2025

Uh oh!

agerada commented Sep 18, 2025

Uh oh!

rchaput commented Sep 18, 2025 via email

Uh oh!

agerada commented Sep 18, 2025

Uh oh!

rchaput commented Sep 18, 2025

Uh oh!

agerada commented Sep 20, 2025

Uh oh!

rchaput left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agerada Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

agerada left a comment

Choose a reason for hiding this comment

Uh oh!

psomhorst commented Nov 20, 2025

Uh oh!

agerada commented Nov 22, 2025

Uh oh!

agerada commented Nov 24, 2025

Uh oh!

psomhorst commented Nov 25, 2025

Uh oh!

agerada commented Dec 1, 2025

Uh oh!

psomhorst commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

agerada Sep 24, 2025 •

edited

Loading

psomhorst commented Dec 8, 2025 •

edited

Loading