Skip to content

Markdown#48

Open
agerada wants to merge 44 commits into
rchaput:masterfrom
agerada:markdown
Open

Markdown#48
agerada wants to merge 44 commits into
rchaput:masterfrom
agerada:markdown

Conversation

@agerada
Copy link
Copy Markdown
Contributor

@agerada agerada commented Sep 14, 2025

Implementation of markdown parsing for acronyms.

This required quite extensive code changes, which may have implications for future that will need to be considered. The implementation uses pandoc lua filters to parse markdown in the short or long names. This seems to capture most markdown characters (at least the common ones that I have tested such as bold, italic, subscript, etc). Existing functionality with case modifiers has been kept, although this is not exhaustive (see text 46).

One of the challenges was the style functions that generate links. These are now more complicated, which is an important limitation. However, compatibility with the extension approach described in docs/advanced/extending.qmd) has been maintained.

Markdown parsing is kept as an 'opt in', by default, markdown is not parsed. In the yaml, users can declare markdown:longname (or shortname, or both), which will enable parsing. Markdown strings should be wrapped in quotes to ensure that there are no issues in parsing the yaml. Markdown can also be triggered (or suppressed) for an individual acronym, see test 42.

To avoid ambiguous codes, users must provide an explicit acronym key when parsing markdown in the shortname. Similiarly, implicit plural is not supported with markdown (plural text must be provided).

GIven the extensive changes proposed in this PR, I will leave as a draft for now until I complete more testing on my documents. Documentation has not been updated (will do once design agreed).

@rchaput
Copy link
Copy Markdown
Owner

rchaput commented Sep 16, 2025

Woah, you're on a roll!

That was a very complex feature, in my initial tests I found it to conflict with the "custom List of Acronyms" (because it was easier to print acronyms as strings at this point).

It'll take me some time to review it, due to the sheer amount of commits and lines of code; but a few remarks just by reading your description:

  • the PR includes the commits for the capitalize option, can you update it to begin with commit d236e85?
  • you mention that markdown parsing is "opt-in", was it required to avoid bugs? I feel like it would be easier if Markdown parsing was on by default. For acronyms that require disabling the Markdown, (for example, *shortname* exactly as-is, not interpreted as italic), maybe there is a Pandoc syntax, such as "raw strings" that could do the job?
  • I'm not sure it will be useful to allow individual triggering of Markdown, I can't see a case in which one would want to print the acronym without Markdown?
  • I agree that users should specify the key when using Markdown in the shortname. It could be difficult to identify the correct acronym otherwise.
  • As I said, I remember having problems with the "custom LoA" (meaning using Markdown to specify how the LoA should be rendered), did you try mixing the two? If it does not work, we'll have to mark it as a limitation.

Thanks again for this long awaited feature! I'll try to review as quickly as possible.

@agerada agerada reopened this Sep 16, 2025
@agerada
Copy link
Copy Markdown
Contributor Author

agerada commented Sep 18, 2025

Hi @rchaput, thanks for the review! I'm happy to contribute as I have a personal need for this feature. My PhD thesis is otherwise complete and I've commited to use acronyms in it. Because I work in a biological domain, the use of italics for bacterial names is not optional for me, so without markdown (or some equivalent), I would probably not be able to use acronyms.

* the PR includes the commits for the capitalize option, can you update it to begin with commit [d236e85](https://github.com/rchaput/acronyms/pull/48/commits/d236e8536020f1559d9f59a44638e2573e7b2f41)?

Apologies, I had messed something up with git here (maybe did not sync the branches properly before starting work). Should be fixed now.

* you mention that markdown parsing is "opt-in", was it required to avoid bugs? I feel like it would be easier if Markdown parsing was on by default. For acronyms that require disabling the Markdown, (for example, `*shortname*` exactly as-is, not interpreted as italic), maybe there is a Pandoc syntax, such as "raw strings" that could do the job?
* I'm not sure it will be useful to allow individual triggering of Markdown, I can't see a case in which one would want to print the acronym without Markdown?

I agree that it would be easier if markdown was parsed by default. The two main reasons I have not done this, at least for now, is that:

  1. The issue with shortname being used as the key. If we always parsed markdown in the shortname then to be safe, users would have to always provide an explicit key.
  2. There may be some unexpected effects of markdown parsing which I have not appreciated yet. By having this as an opt-in feature, the impact of future issues would be minimised.

Perhaps a reasonable approach would be to parse the longname by default, with the shortname being an opt-in feature.

Regarding the use of raw/unparsed strings, this is currently possible by escaping the markdown characters (see test 42 for an example). This allows users to mix markdown and non-markdown.

* As I said, I remember having problems with the "custom LoA" (meaning using Markdown to specify how the LoA should be rendered), did you try mixing the two? If it does not work, we'll have to mark it as a limitation.

Ah, I didn't appreciate that this was the main issue – I can see why this is tricky. I have tested this, and the good news is that the longname seems to work well. Any markdown added in the custom LoA is simply added to the longname (so for example E. coli applied to **{longname}** becomes E. coli), which I think would be the desired behaviour. Unfortunately the shortname currently does not work. I think the problem is the addition of the link target as raw markdown. Hopefully I can fix this by applying the link target using pandoc filters instead.

Thanks again for this long awaited feature! I'll try to review as quickly as possible.

No problem. I have kept the PR as draft for now, as I plan to do some refactoring for readability to help with the review and future maintenance.

@rchaput
Copy link
Copy Markdown
Owner

rchaput commented Sep 18, 2025 via email

@agerada
Copy link
Copy Markdown
Contributor Author

agerada commented Sep 18, 2025

That's an interesting thought that I hadn't considered. Unfortunately, many of the acronyms are mixtures of different cases, for example methicillin-resistant Staphylococcus aureus (MRSA), or Clostridium difficile infection (CDI). Sadly I don't think it will work (unless using very complex styles, and then it might as well be markdown). Fortunately the thesis is rendering great with acronyms installed from this branch, including italics where required.

@rchaput
Copy link
Copy Markdown
Owner

rchaput commented Sep 18, 2025

I've had time to read your comments in details

I agree that it would be easier if markdown was parsed by default. The two main reasons I have not done this, at least for now, is that:

1. The issue with shortname being used as the key. If we always parsed markdown in the shortname then to be safe, users would have to always provide an explicit key.

I'm wondering whether we can use Pandoc to detect markdown in the shortname. If we could do that, we could: 1) detect if there is markdown; 2) if there is, check if the key is defined, and raise an error otherwise. I don't remember the notions of raw_strings and others like that enough...

2. There may be some unexpected effects of markdown parsing which I have not appreciated yet. By having this as an opt-in feature, the impact of future issues would be minimised.

I agree that it can be risky. Then again, if we have some kind of feature to disable Markdown parsing (I'm thinking something like shortname: r"This is *not* Markdown"), we can mitigate the impact as well, while still simplifying use. I'm not sure many acronyms will include Markdown-specific characters.

Perhaps a reasonable approach would be to parse the longname by default, with the shortname being an opt-in feature.

Do you think we could enable Markdown shortname if and only if the key is set? That would avoid including an additional markdown: shortname option, but I fear that this would confuse people. From a technical point of view, it makes sense, since we require an explicit key to avoid ambiguous shortname->key conversion; but this may not be clear...

Regarding the use of raw/unparsed strings, this is currently possible by escaping the markdown characters (see test 42 for an example). This allows users to mix markdown and non-markdown.

I've seen the example, it's good that we can escape some characters, but I was thinking of escaping the whole string. As I mentioned earlier, I remember some specific syntax like r"raw string", but maybe I'm confusing with Python, I'm not sure...

Ah, I didn't appreciate that this was the main issue – I can see why this is tricky. I have tested this, and the good news is that the longname seems to work well. Any markdown added in the custom LoA is simply added to the longname (so for example E. coli applied to **{longname}** becomes E. coli), which I think would be the desired behaviour. Unfortunately the shortname currently does not work. I think the problem is the addition of the link target as raw markdown. Hopefully I can fix this by applying the link target using pandoc filters instead.

It's great that the longname already works. If we can make the shortname work as well, it's best; but I'm OK with stating it as a limitation if it doesn't. I think that allowing Markdown is too much important.

As we can now assume input is inlines, legacy create_element is not required, as create_rich_element replaces it. To keep compatibility with extending.qmd, kept local reference to create_element.
General refactor and clean up to styles functions.
This commit removes the internal features that allowed markdown to be optional. Now markdown will always be parsed. Characters can still be escaped if necessary (see test 42).
@agerada
Copy link
Copy Markdown
Contributor Author

agerada commented Sep 20, 2025

After giving this some thought. I fully agree that markdown should be parsed by default. I have enabled this.

I have also removed the internal workings that allowed markdown to be turned on or off (a405c2a). This has allowed me to clean up the code and remove a lot of checking to see whether markdown is being parsed. There is now a contains_markdown function that should detect markdown and ask the user to provide a key (if shortname has markdown).

The issue of escaping markdown-like characters (which, to be fair, should be rarely needed), I think is sufficiently covered by escaping using \\. I imagine it would be possible to have a raw string (or equivalent option), but it would make the internal workings more complicated (needing to check if raw string or inline), for something that already has a workaround. The only annoyance is that two \\ are needed, so that would have to be documented.

Hopefully the code is cleaner now, so I have removed the draft status of this PR.

@agerada agerada marked this pull request as ready for review September 20, 2025 22:21
Copy link
Copy Markdown
Owner

@rchaput rchaput left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read most of your changes; lots of great ideas!

I think it needs a bit of refactoring to simplify the code, but everything should work well (just as you tested)

Comment thread _extensions/acronyms/acronyms.lua Outdated
-- Enforce explicit key when markdown parsing for shortname is enabled.
if Helpers.contains_markdown(object.shortname) then
if object.key == nil then
quarto.log.error("[acronyms] Each acronym must provide an explicit `key` when using markdown in shortname.")
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to print the acronym shortname here so it's clear for the user which one to fix.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 82930b4

Comment thread _extensions/acronyms/acronyms.lua Outdated
Comment thread _extensions/acronyms/acronyms.lua Outdated
-- Always parse markdown for names
local shortname = Helpers.extract_meta_field(v.shortname, true)
local longname = Helpers.extract_meta_field(v.longname, true)
-- raw longname parsed
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old comment, sorry aab2c99

if value == nil then return false end

-- If we have a Pandoc type, check AST nodes directly
if pandoc and pandoc.utils and pandoc.utils.type then
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why checking for pandoc.utils.type? Is there a Pandoc version in which we do not have access to this function? We can use the Helpers.is_at_least(version) for this kind of problem.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, see 62a1c1e

Comment on lines +144 to +146
if v.t ~= "Str" and v.t ~= "Space" and v.t ~= "SoftBreak" and v.t ~= "LineBreak" then
return true
end
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is the same as in the previous loop (case). Can we merge the two loops? (in order to simplify the code, not making it harder to read)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, have changed here: 943a3af

function AcronymsPandoc.generateDefinitionList(sorted_acronyms)
local definition_list = {}
for _, acronym in ipairs(sorted_acronyms) do
-- The definition's name. A Span with an ID so we can create a link.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why delete this comment? Is it no longer up-to-date with the code?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I removed this while modifying this function, but have returned here: 44b1149

Comment thread _extensions/acronyms/acronyms_pandoc.lua Outdated
Comment on lines +44 to +45
-- Kept as a local reference here to keep compatibility with previous versions.
create_element = Helpers.create_rich_element
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need to keep this; we can simply replace the reference in all styles, and update the documentation.

Comment on lines +141 to +151
-- Conditional strictness: when markdown is present in a part, require an explicit plural for that part.
local need_long_strict = Helpers.contains_markdown(acronym.longname) and not acronym._explicit_plural_longname
local need_short_strict = Helpers.contains_markdown(acronym.shortname) and not acronym._explicit_plural_shortname
if need_long_strict then
quarto.log.error("[acronyms] Plural form requested for '" .. tostring(acronym.key) .. "' but 'plural.longname' was not explicitly provided while markdown parsing is enabled for its longname. Define it under plural: { longname: ... } to use \\acrs{" .. tostring(acronym.key) .. "} .")
assert(false)
end
if need_short_strict then
quarto.log.error("[acronyms] Plural form requested for '" .. tostring(acronym.key) .. "' but 'plural.shortname' was not explicitly provided while markdown parsing is enabled for its shortname. Define it under plural: { shortname: ... } to use \\acrs{" .. tostring(acronym.key) .. "} .")
assert(false)
end
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer raising the error before, typically in acronyms_pandoc (if I remember correctly, the function should be something like replaceExistingAcronym). This is simply checking that an option is correctly set.
The style function should perform very minimal error-checking (such as checking the acronym is not nil, otherwise the function cannot work!).

Copy link
Copy Markdown
Contributor Author

@agerada agerada Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, much better to raise earlier (and still has the advantage of only raising if plural was attempted, so no unecessary errors). See 66a976a. I have tested this informally as the current testing suite does not (as far as I can see) support expected exceptions.

Comment thread _extensions/acronyms/acronyms_styles.lua Outdated
Copy link
Copy Markdown
Contributor Author

@agerada agerada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review; I appreciate there was a lot of code to go through here! Hopefully all comments addressed but happy to look at anything else that comes up.

rchaput added 2 commits September 30, 2025 18:56
The default branch was named `master`, but Quarto recommends using
`main`. In addition, the docs was not consistent, and sometimes used an
unworking command.
This commit changes the few documents that refer to the `master` branch
(including the CI/CD and direct links to files in the repository).
@psomhorst
Copy link
Copy Markdown

Does this feature also support math (e.g., a double subscript $P_{E_1}$ => $P_{E_1}$) in shortnames or longnames?

@agerada
Copy link
Copy Markdown
Contributor Author

agerada commented Nov 22, 2025

Hi @psomhorst. The current implementation almost certainly does not support math, although I haven't specifically tested this. I'm not sure how feasible math support would be, it mainly depends on whether there is a Pandoc lua filter to parse it. I can look into this.

@agerada
Copy link
Copy Markdown
Contributor Author

agerada commented Nov 24, 2025

Good news, it seems like I was wrong! Math is being parsed in this feature. I didn't need to change any of the code, but added a couple of tests (f1c3cff) to demonstrate it. Although I haven't exhaustively tested edge cases, I didn't run into any issues while informally testing it out. Perhaps good for this to be covered in the documentation as well, as it might be important for some scientific domains.

@psomhorst
Copy link
Copy Markdown

Fantastic! If you want, I can supply some cases or documentation from a scientific writing standpoint.

@agerada
Copy link
Copy Markdown
Contributor Author

agerada commented Dec 1, 2025

If you do have a relevant example or two from your domain that demonstrate the use of mathematics in acronyms then I think it would be quite helpful.

@psomhorst
Copy link
Copy Markdown

psomhorst commented Dec 8, 2025

If you do have a relevant example or two from your domain that demonstrate the use of mathematics in acronyms then I think it would be quite helpful.

See this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants