Markdown#48
Conversation
|
Woah, you're on a roll! That was a very complex feature, in my initial tests I found it to conflict with the "custom List of Acronyms" (because it was easier to print acronyms as strings at this point). It'll take me some time to review it, due to the sheer amount of commits and lines of code; but a few remarks just by reading your description:
Thanks again for this long awaited feature! I'll try to review as quickly as possible. |
Refactored case conversion to support pandoc inline. Test 44 now fixed.
Restored old create_element function to allow extensions as described in docs/advanced/extending.qmd
|
Hi @rchaput, thanks for the review! I'm happy to contribute as I have a personal need for this feature. My PhD thesis is otherwise complete and I've commited to use
Apologies, I had messed something up with git here (maybe did not sync the branches properly before starting work). Should be fixed now.
I agree that it would be easier if markdown was parsed by default. The two main reasons I have not done this, at least for now, is that:
Perhaps a reasonable approach would be to parse the longname by default, with the shortname being an opt-in feature. Regarding the use of raw/unparsed strings, this is currently possible by escaping the markdown characters (see test 42 for an example). This allows users to mix markdown and non-markdown.
Ah, I didn't appreciate that this was the main issue – I can see why this is tricky. I have tested this, and the good news is that the longname seems to work well. Any markdown added in the custom LoA is simply added to the longname (so for example E. coli applied to
No problem. I have kept the PR as draft for now, as I plan to do some refactoring for readability to help with the review and future maintenance. |
|
Just a quick idea (I need to read your message in more details after work): you mention that you need italics because of your domain. A quick but working option would be to implement a new style, let's say "italics_short" which always puts the shortname in italics. This would be far easier because in Lua we can use pandoc.Italic (or something similar). Thanks to your previous PR, we can even specify the style for each acronym.
I don't mean that it is useless to allow Markdown in acronyms; just that there is a faster way if you (or someone else) require it for your thesis.
I'll try to make some time to review your PR later today.
Sep 18, 2025 11:49:56 Alessandro Gerada ***@***.***>:
…
[Image]*agerada* left a comment (rchaput/acronyms#48)[#48 (comment)]
Hi @rchaput[https://github.com/rchaput], thanks for the review! I'm happy to contribute as I have a personal need for this feature. My PhD thesis is otherwise complete and I've commited to use *acronyms* in it. Because I work in a biological domain, the use of italics for bacterial names is not optional for me, so without markdown (or some equivalent), I would probably not be able to use *acronyms*.
** the PR includes the commits for the capitalize option, can you update it to begin with commit [d236e85](d236e85)?
*
Apologies, I had messed something up with git here (maybe did not sync the branches properly before starting work). Should be fixed now.
** you mention that markdown parsing is "opt-in", was it required to avoid bugs? I feel like it would be easier if Markdown parsing was on by default. For acronyms that require disabling the Markdown, (for example, `*shortname*` exactly as-is, not interpreted as italic), maybe there is a Pandoc syntax, such as "raw strings" that could do the job?
* I'm not sure it will be useful to allow individual triggering of Markdown, I can't see a case in which one would want to print the acronym without Markdown?
*
I agree that it would be easier if markdown was parsed by default. The two main reasons I have not done this, at least for now, is that:
1. The issue with shortname being used as the key. If we always parsed markdown in the shortname then to be safe, users would have to always provide an explicit key.
2. There may be some unexpected effects of markdown parsing which I have not appreciated yet. By having this as an opt-in feature, the impact of future issues would be minimised.
Perhaps a reasonable approach would be to parse the longname by default, with the shortname being an opt-in feature.
Regarding the use of raw/unparsed strings, this is currently possible by escaping the markdown characters (see test 42 for an example). This allows users to mix markdown and non-markdown.
** As I said, I remember having problems with the "custom LoA" (meaning using Markdown to specify how the LoA should be rendered), did you try mixing the two? If it does not work, we'll have to mark it as a limitation.
*
Ah, I didn't appreciate that this was the main issue – I can see why this is tricky. I have tested this, and the good news is that the longname seems to work well. Any markdown added in the custom LoA is simply added to the longname (so for example /E. coli/ applied to ***{longname}*** becomes /*E. coli*/), which I think would be the desired behaviour. Unfortunately the shortname currently does not work. I think the problem is the addition of the link target as raw markdown. Hopefully I can fix this by applying the link target using pandoc filters instead.
Thanks again for this long awaited feature! I'll try to review as quickly as possible.
No problem. I have kept the PR as draft for now, as I plan to do some refactoring for readability to help with the review and future maintenance.
—
Reply to this email directly, view it on GitHub[#48 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AGZR5IIU4KQJ2PHRCYRJM2D3TJ54JAVCNFSM6AAAAACGPVS2A2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGMBWGU2TCMBWGQ].
You are receiving this because you were mentioned.
[Tracking image][https://github.com/notifications/beacon/AGZR5INDXEQ55PPJCYXIEK33TJ54JA5CNFSM6AAAAACGPVS2A2WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTWFCX3RQ.gif]
|
|
That's an interesting thought that I hadn't considered. Unfortunately, many of the acronyms are mixtures of different cases, for example methicillin-resistant Staphylococcus aureus (MRSA), or Clostridium difficile infection (CDI). Sadly I don't think it will work (unless using very complex styles, and then it might as well be markdown). Fortunately the thesis is rendering great with |
|
I've had time to read your comments in details
I'm wondering whether we can use Pandoc to detect markdown in the shortname. If we could do that, we could: 1) detect if there is markdown; 2) if there is, check if the key is defined, and raise an error otherwise. I don't remember the notions of
I agree that it can be risky. Then again, if we have some kind of feature to disable Markdown parsing (I'm thinking something like
Do you think we could enable Markdown shortname if and only if the key is set? That would avoid including an additional
I've seen the example, it's good that we can escape some characters, but I was thinking of escaping the whole string. As I mentioned earlier, I remember some specific syntax like
It's great that the longname already works. If we can make the shortname work as well, it's best; but I'm OK with stating it as a limitation if it doesn't. I think that allowing Markdown is too much important. |
As we can now assume input is inlines, legacy create_element is not required, as create_rich_element replaces it. To keep compatibility with extending.qmd, kept local reference to create_element.
General refactor and clean up to styles functions.
This commit removes the internal features that allowed markdown to be optional. Now markdown will always be parsed. Characters can still be escaped if necessary (see test 42).
|
After giving this some thought. I fully agree that markdown should be parsed by default. I have enabled this. I have also removed the internal workings that allowed markdown to be turned on or off (a405c2a). This has allowed me to clean up the code and remove a lot of checking to see whether markdown is being parsed. There is now a The issue of escaping markdown-like characters (which, to be fair, should be rarely needed), I think is sufficiently covered by escaping using Hopefully the code is cleaner now, so I have removed the draft status of this PR. |
rchaput
left a comment
There was a problem hiding this comment.
I've read most of your changes; lots of great ideas!
I think it needs a bit of refactoring to simplify the code, but everything should work well (just as you tested)
| -- Enforce explicit key when markdown parsing for shortname is enabled. | ||
| if Helpers.contains_markdown(object.shortname) then | ||
| if object.key == nil then | ||
| quarto.log.error("[acronyms] Each acronym must provide an explicit `key` when using markdown in shortname.") |
There was a problem hiding this comment.
It would be useful to print the acronym shortname here so it's clear for the user which one to fix.
| -- Always parse markdown for names | ||
| local shortname = Helpers.extract_meta_field(v.shortname, true) | ||
| local longname = Helpers.extract_meta_field(v.longname, true) | ||
| -- raw longname parsed |
There was a problem hiding this comment.
I don't understand this comment?
| if value == nil then return false end | ||
|
|
||
| -- If we have a Pandoc type, check AST nodes directly | ||
| if pandoc and pandoc.utils and pandoc.utils.type then |
There was a problem hiding this comment.
Why checking for pandoc.utils.type? Is there a Pandoc version in which we do not have access to this function? We can use the Helpers.is_at_least(version) for this kind of problem.
| if v.t ~= "Str" and v.t ~= "Space" and v.t ~= "SoftBreak" and v.t ~= "LineBreak" then | ||
| return true | ||
| end |
There was a problem hiding this comment.
This condition is the same as in the previous loop (case). Can we merge the two loops? (in order to simplify the code, not making it harder to read)
| function AcronymsPandoc.generateDefinitionList(sorted_acronyms) | ||
| local definition_list = {} | ||
| for _, acronym in ipairs(sorted_acronyms) do | ||
| -- The definition's name. A Span with an ID so we can create a link. |
There was a problem hiding this comment.
Why delete this comment? Is it no longer up-to-date with the code?
There was a problem hiding this comment.
I think I removed this while modifying this function, but have returned here: 44b1149
| -- Kept as a local reference here to keep compatibility with previous versions. | ||
| create_element = Helpers.create_rich_element |
There was a problem hiding this comment.
I'm not sure we need to keep this; we can simply replace the reference in all styles, and update the documentation.
| -- Conditional strictness: when markdown is present in a part, require an explicit plural for that part. | ||
| local need_long_strict = Helpers.contains_markdown(acronym.longname) and not acronym._explicit_plural_longname | ||
| local need_short_strict = Helpers.contains_markdown(acronym.shortname) and not acronym._explicit_plural_shortname | ||
| if need_long_strict then | ||
| quarto.log.error("[acronyms] Plural form requested for '" .. tostring(acronym.key) .. "' but 'plural.longname' was not explicitly provided while markdown parsing is enabled for its longname. Define it under plural: { longname: ... } to use \\acrs{" .. tostring(acronym.key) .. "} .") | ||
| assert(false) | ||
| end | ||
| if need_short_strict then | ||
| quarto.log.error("[acronyms] Plural form requested for '" .. tostring(acronym.key) .. "' but 'plural.shortname' was not explicitly provided while markdown parsing is enabled for its shortname. Define it under plural: { shortname: ... } to use \\acrs{" .. tostring(acronym.key) .. "} .") | ||
| assert(false) | ||
| end |
There was a problem hiding this comment.
I would prefer raising the error before, typically in acronyms_pandoc (if I remember correctly, the function should be something like replaceExistingAcronym). This is simply checking that an option is correctly set.
The style function should perform very minimal error-checking (such as checking the acronym is not nil, otherwise the function cannot work!).
There was a problem hiding this comment.
Agree, much better to raise earlier (and still has the advantage of only raising if plural was attempted, so no unecessary errors). See 66a976a. I have tested this informally as the current testing suite does not (as far as I can see) support expected exceptions.
Co-authored-by: Remy Chaput <rchaput.pro@gmail.com>
Co-authored-by: Remy Chaput <rchaput.pro@gmail.com>
Co-authored-by: Remy Chaput <rchaput.pro@gmail.com>
agerada
left a comment
There was a problem hiding this comment.
Thanks for the review; I appreciate there was a lot of code to go through here! Hopefully all comments addressed but happy to look at anything else that comes up.
The default branch was named `master`, but Quarto recommends using `main`. In addition, the docs was not consistent, and sometimes used an unworking command. This commit changes the few documents that refer to the `master` branch (including the CI/CD and direct links to files in the repository).
|
Does this feature also support math (e.g., a double subscript |
|
Hi @psomhorst. The current implementation almost certainly does not support math, although I haven't specifically tested this. I'm not sure how feasible math support would be, it mainly depends on whether there is a Pandoc lua filter to parse it. I can look into this. |
|
Good news, it seems like I was wrong! Math is being parsed in this feature. I didn't need to change any of the code, but added a couple of tests (f1c3cff) to demonstrate it. Although I haven't exhaustively tested edge cases, I didn't run into any issues while informally testing it out. Perhaps good for this to be covered in the documentation as well, as it might be important for some scientific domains. |
|
Fantastic! If you want, I can supply some cases or documentation from a scientific writing standpoint. |
|
If you do have a relevant example or two from your domain that demonstrate the use of mathematics in acronyms then I think it would be quite helpful. |
See this PR. |
Implementation of markdown parsing for
acronyms.This required quite extensive code changes, which may have implications for future that will need to be considered. The implementation uses pandoc lua filters to parse markdown in the short or long names. This seems to capture most markdown characters (at least the common ones that I have tested such as bold, italic, subscript, etc). Existing functionality with case modifiers has been kept, although this is not exhaustive (see text 46).
One of the challenges was the style functions that generate links. These are now more complicated, which is an important limitation. However, compatibility with the extension approach described in
docs/advanced/extending.qmd) has been maintained.Markdown parsing is kept as an 'opt in', by default, markdown is not parsed. In the
yaml, users can declaremarkdown:longname(orshortname, orboth), which will enable parsing. Markdown strings should be wrapped in quotes to ensure that there are no issues in parsing theyaml. Markdown can also be triggered (or suppressed) for an individual acronym, see test 42.To avoid ambiguous codes, users must provide an explicit acronym key when parsing markdown in the shortname. Similiarly, implicit plural is not supported with markdown (plural text must be provided).
GIven the extensive changes proposed in this PR, I will leave as a draft for now until I complete more testing on my documents. Documentation has not been updated (will do once design agreed).