Skip to content

Txt2tags writer#6551

Closed
farvardin wants to merge 19 commits into
jgm:mainfrom
farvardin:master
Closed

Txt2tags writer#6551
farvardin wants to merge 19 commits into
jgm:mainfrom
farvardin:master

Conversation

@farvardin

Copy link
Copy Markdown

hello,

I've made a txt2tags writer, adapted from Dokuwiki writer. It works rather ok but will be improved. Could you integrate this in pandoc please?

thank you

Comment thread src/Text/Pandoc/Writers.hs Outdated
, writeTEI
, writeTexinfo
, writeTextile
, writeTxt2Tags

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should line up with the other entries

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's strange because I got some compilation warnings into other places, which I fixed, and I believed it was clean when I've submitted the PR. I'll check this soon.

@jgm

jgm commented Jul 21, 2020

Copy link
Copy Markdown
Owner

Thank you! This seems a good start, but it still needs some work before it's merge-ready.

At the very least, we need automated tests. The recommended (and easiest) approach would be to add to test/Tests/Old.hs (this requires creating a golden file test/writer.t2t).

Another thing you might try is doing round-trip tests with the t2t reader. This will help identify bugs and missing things.

@tarleb

tarleb commented Dec 28, 2020

Copy link
Copy Markdown
Collaborator

Thanks @farvardin, I think it would be great if we could get this moving again. I see you already added an "old-style" golden file test, but no code to use it. Are you still interested in advancing this? Please let us know how we can help.

I had quick read through the writer source, I think it looks good. Would you be interested in an in-depth review, or do you rather just want to get it merged soon?

@farvardin

Copy link
Copy Markdown
Author

@tarleb if you could help me how to improve it, it would be great. I don't really understand the test-thing.

@tarleb

tarleb commented Jan 2, 2021

Copy link
Copy Markdown
Collaborator

Gladly.

You can enable the tests by adding the appropriate commands to test/Tests/Old.hs, lines 162-164. Check the other formats to see what's required.

It would also be great if you could take a look at the doclayout package. It is used in plain text writers to avoid overlong lines. If I understand correctly, then txt2tags supports breaking long paragraphs into shorter lines by inserting newline characters. This is what doclayout is designed to do, see also pandoc's --columns option.

LMK if you run into problems.

@farvardin

Copy link
Copy Markdown
Author

@tarleb Thank you, I understand better the test process now. I've added the txt2tags writer to the file and it seems to work now. About the Lint error the missing file is in my pull request (I've never used lint or Haskell before and I'm not really a developper) and therefore I don't know why it's complaining.

I've had a look at doclayout but even with the example I can't figure out what I could do with this, sorry. I've seen all the writers are importing it, but I don't even find where they really use it in the code.

@tarleb

tarleb commented Jan 6, 2021

Copy link
Copy Markdown
Collaborator

The linting issue is caused by a missing entry in pandoc.cabal. All test files must be listed under extra-source-files; otherwise they won't be included in the source distribution, causing test failure for people building from source. I'm guilty of doing that (more than once 😔), so John added this nice check. See here.

The Doc Text type from doclayout behaves mostlyl just like Text, so most of the work should be updating the function type signatures. If you'd like to discuss this with less latency, feel free to reach out in the fp-chat slack chat.

@farvardin

Copy link
Copy Markdown
Author

@tarleb thanks again for the explanation. I should have figured it out myself by looking for the reference to "writer.rtf" for example, since it can only be found in pandoc.cabal.

lint is no longer complaining, so I suppose it could be merged officially!

I've just converted an epub file into txt2tags, using the command "pandoc -f epub -t txt2tags doc.epub > doc.t2t" and it works fine!

@tarleb tarleb left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are good news, thanks! I went through the code and it looks good; I made inline comments whenever I was unsure about something.

I'm not sure if @jgm considers it essential, but it would be best if we could switch the result type of blockToTxt2Tags from Text to Doc Text so pandoc's --columns option would work better with this writer. Do you have time to give it a try?

Comment thread src/Text/Pandoc/Writers/Txt2Tags.hs Outdated
"javascript", "latex", "lisp", "lua", "matlab", "mirc", "mpasm", "mysql", "nsis", "objc",
"ocaml", "ocaml-brief", "oobas", "oracle8", "pascal", "perl", "php", "php-brief", "plsql",
"python", "qbasic", "rails", "reg", "robots", "ruby", "sas", "scheme", "sdlbasic",
"smalltalk", "smarty", "sql", "tcl", "", "thinbasic", "tsql", "vb", "vbnet", "vhdl",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty string in this list strikes me as weird. Is it intentional?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'm pretty sure I've just copied this part from the DokuWiki.hs without modification. I'm not sure how this part should work. The new code in DokuWiki is quite different now:

image

I'll remove the empty string, though I'm sure most of this part could be removed. But if I change it myself, I might break it!

Comment thread src/Text/Pandoc/Writers/Txt2Tags.hs Outdated

inlineToTxt2Tags _ il@(RawInline f str)
| f == Format "Txt2Tags" = return str
| f == Format "html" = return $ "<html>" <> str <> "</html>"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most other <html> tags are uppercase, but here they are lower case. Is there a difference?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose there is no difference. Again, this part is straight from the DokuWiki.hs file:

image

Those are good news, thanks! I went through the code and it looks good; I made inline comments whenever I was unsure about something.

I'm not sure if @jgm considers it essential, but it would be best if we could switch the result type of blockToTxt2Tags from Text to Doc Text so pandoc's --columns option would work better with this writer. Do you have time to give it a try?

I don't know how to do this, sorry. I've just hacked the existing dokuwiki code so it's outputting valid txt2tags markup (most of the difference is with the headings)

Comment thread src/Text/Pandoc/Writers/Txt2Tags.hs Outdated
case txt of
[Str s] | "mailto:" `T.isPrefixOf` src -> return $ "<" <> s <> ">"
| escapeURI s == src -> return src
_ -> if isURI src

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this if statement necessary? I'm not aware of a string with a leading / that would be recognized as valid URI.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just copied from Dokuwiki... most txt2tags syntax is similar to dokuwiki, so I suppose it's legit

image

probably the code should be refined in the future

, stUseTags = False
, stBackSlashLB = False }

type Txt2Tags m = ReaderT WriterEnvironment (StateT WriterState m)

@tarleb tarleb Jan 18, 2021

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we don't need the WriterState, so we could shorten this to ReaderT WriterEnvironment m. The function runTxt2Tags would then become runTxt2Tags = flip runReaderT def.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know anything about haskell, so I don't understand the difference, sorry :(

image

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copy-pasted together without you knowing Haskell?

I'm not even mad. That's amazing!

@farvardin farvardin Jan 19, 2021

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main stumbling block for us will be maintenance. We have to ensure that this will be maintained after it's merged. Not sure how to proceed there, I'm not too keen to up to take up an additional responsibility. @jgm, what do you think?

the txt2tags language is not evolving, for 2 reasons: it's stable for many years, and also it's possible to add custom markup with some preproc and postprocessors, so the core doesn't need changing. So the only maintenance would be to adapt the writer code in the case pandoc change something in its internal code. But it would require the same maintenance as for the dokuwiki writer.

@tarleb

tarleb commented Jan 18, 2021

Copy link
Copy Markdown
Collaborator

Oh, I forgot: I think we should also add a test/tables.t2t file, as otherwise table output would remain untested.

@farvardin

Copy link
Copy Markdown
Author

Oh, I forgot: I think we should also add a test/tables.t2t file, as otherwise table output would remain untested.

done

@tarleb

tarleb commented Jan 19, 2021

Copy link
Copy Markdown
Collaborator

I think the main stumbling block for us will be maintenance. We have to ensure that this will be maintained after it's merged. Not sure how to proceed there, I'm not too keen to up to take up an additional responsibility. @jgm, what do you think?

@jgm

jgm commented Jan 19, 2021

Copy link
Copy Markdown
Owner

Have you considered putting this through its paces with round trip tests?
pandoc -f native -t txt2tags -s | pandoc -f txt2tags -t native -s should ideally give you what you started from. (You could try it with testsuite.native and txt2tags.native in the test directory.)

Pandoc doesn't generally offer a round-tripping guarantee, so we usually don't put things like this in the test suite, but doing this is a good way to identify bugs.

I'd also echo @tarleb's comment that doclayout should be used. It's not used in some of the older writers, and in formats where newlines cause paragraph breaks, but I'd rather have it supported in new writers. This is a matter of

  • importing Text.DocLayout
  • changing the writer function types to produce Doc rather than Text
  • using doclayout's functions to create the Doc values. e.g. where t is a Text, literal t is a Doc. For breakable spaces, use space. There are also functions for nesting, hanging indents, etc. -- see doclayout documentation on hackage.

Anyway, it's good to know that you don't need to know Haskell to contribute to pandoc.

@farvardin

Copy link
Copy Markdown
Author

I've done it, and it's rather faithful (I've written in the header of the writer which parts are not correctly working, but it's not very important, like definition lists)

image

while testing your command, I've noticed the writer is using the "txt2tags" tag, while the reader is using "t2t". I don't know which one is better.

pandoc -f html sample.html -t txt2tags -s | pandoc -f t2t -t html -s > test.html

about doclayout it's totally beyond my skills, I'm sorry.

@jgm

jgm commented Jan 21, 2021

Copy link
Copy Markdown
Owner

What are you using for sample.html?

@jgm

jgm commented Jan 21, 2021

Copy link
Copy Markdown
Owner

Can you try with test/testsuite.native? That exercises most of the standard pandoc features, and you wouldn't get irrelevant differences due to HTML formatting.

@jgm

jgm commented Jan 21, 2021

Copy link
Copy Markdown
Owner

As for the name, we'd better stick with t2t just for consistency's sake.

@farvardin

Copy link
Copy Markdown
Author

What are you using for sample.html?

it was a sample document from the txt2tags website (https://txt2tags.org/sample.html)

Can you try with test/testsuite.native? That exercises most of the standard pandoc features, and you wouldn't get irrelevant differences due to HTML formatting.

I still get several differences because pandoc creates id and adds some < p > markup in the original export (or maybe I've missed an option to avoid this) but yet it's still acceptable as result.

image

As for the name, we'd better stick with t2t just for consistency's sake.

I suppose it would be more logical to rename the reader part to txt2tags, because markdown and dokuwiki are written respectively "markdown" and "dokuwiki", but I understand if you prefer not breaking the previous behavior. So I've just changed all referencies of txt2tags to "t2t" in the writer part option, which will be a new addition, so it wouldn't confuse people.

Comment thread src/Text/Pandoc/Writers/Txt2Tags.hs Outdated
@@ -20,8 +20,7 @@ Txt2Tags: <https://www.Txt2Tags.org/>
- Definition lists are broken
- Tables could be improved
- Some formats make better results than others (html is ok, md is somehow broken)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this comment?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember, it was 6 months ago. I suppose when writing from markdown, some elements were not correctly rendered

Comment thread src/Text/Pandoc/Writers/Txt2Tags.hs Outdated
- Definition lists are broken
- Tables could be improved
- Some formats make better results than others (html is ok, md is somehow broken)
- code related to dokuwiki only could be removed

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there still dokuwiki-specific code? IF so, let's remove it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose there are still some code, yes

@jgm

jgm commented Jan 21, 2021

Copy link
Copy Markdown
Owner

Yes, I agree, txt2tags would make more sense, but I think it's more important not to break existing behavior, and to remain consistent.

Comment on lines +1 to +6
$pagetitle$
$author-meta$
$date-meta$



Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why you're getting all that blank space at the beginning. Do you want the 3 blank lines if there's a title? If there's no title block, would it be better to omit them? You can use conditionals like

$if(pagetitle)$
$pagetitle$
$author-meta$
$date-meta$


$endif$

@jgm

jgm commented Jan 21, 2021

Copy link
Copy Markdown
Owner

I ran writer.t2t through txt2tags, and it appears that some of the syntax is invalid. (e.g. the <HTML> bits).
Have you tried this experiment? (Also with tables.t2t.)
I think this writer still needs some work before it's ready.

@farvardin

Copy link
Copy Markdown
Author

I ran writer.t2t through txt2tags, and it appears that some of the syntax is invalid. (e.g. the <HTML> bits).
Have you tried this experiment? (Also with tables.t2t.)
I think this writer still needs some work before it's ready.

I've removed the html part, and now the test no longer works. There are some parts I'll never be able to fix (def lists), so I suppose I'll give up then, I'll use the generated binary for myself. There are fewer and fewer people who are still using txt2tags today, so it's not a big deal :(

@jgm

jgm commented Jan 22, 2021

Copy link
Copy Markdown
Owner

You can update the tests, no?
Or by "no longer works" do you mean that the new output is invalid t2t?

@jgm

jgm commented Jan 22, 2021

Copy link
Copy Markdown
Owner

Note: dokuwiki falls back to HTML for tables and lists whose contents are too complex to represent using the native syntax. I don't know if t2t offers any fallback possibility like that, so there's a question what to do if some of the things pandoc can represent can't be renedred in t2t. We could simply skip them and issue a warning, I suppose.

@farvardin

Copy link
Copy Markdown
Author

You can update the tests, no?
Or by "no longer works" do you mean that the new output is invalid t2t?

I don't know, I've updated the code, and thought I've also updated the test, but I still get:

      ------------------------------------------------------------------------
      --- writer.t2t
      +++ /temp/github/pandoc/.stack-work/dist/x86_64-linux-tinfo6/Cabal-3.0.1.0/build/pandoc/pandoc --data-dir=../data --quiet testsuite.native -r native -w t2t --columns=78 --variable pandoc-version= -s
      +   1 
      +   2 
      +   3 
      +   4 
      +   5 
      +   6 
      ------------------------------------------------------------------------

so I don't know what is wrong.

@farvardin

Copy link
Copy Markdown
Author

Note: dokuwiki falls back to HTML for tables and lists whose contents are too complex to represent using the native syntax. I don't know if t2t offers any fallback possibility like that, so there's a question what to do if some of the things pandoc can represent can't be renedred in t2t. We could simply skip them and issue a warning, I suppose.

it's still possible to start a line with % which means it's a comment. It won't be interpreted, but user could see it and check and correct it.

@jgm

jgm commented Jan 23, 2021

Copy link
Copy Markdown
Owner

make TESTARGS=--accept will regenerate all the tests with the current output of the writer.
(Then it's very important to inspect all the changes before committing.)

@farvardin farvardin reopened this Feb 15, 2026
farvardin and others added 13 commits April 10, 2026 00:08
- Fix inline code to use `` `` `` `` instead of `''%%...%%''`
- Fix raw block/inline format name to lowercase `txt2tags`
- Fix HTML raw passthrough syntax (`"""\n...\n"""` for blocks, `""..."""` for inline)
- Remove unused `stBlockQuoteLevel` field; blockquotes render as plain content
- Fix `escapeString` to also escape `--` and ` `` `
- Fix definition list: use `: **term**` with indented content instead of DokuWiki `* **term**`
- Remove leftover DokuWiki-specific `languageNames` map and `isSimpleList` helpers
- Add `Tests.Writers.Txt2Tags` with unit tests covering inline formatting,
  escaping, headers, code blocks, lists, definition lists, blockquotes,
  horizontal rules, tables, links, images, and raw blocks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Txt2Tags writer: fix syntax, add tests
- Fix inline code to use `` `` `` `` instead of `''%%...%%''`
- Fix raw block/inline format name to lowercase `txt2tags`
- Fix HTML raw passthrough syntax: `"""\n...\n"""` for blocks,
  `""..."""` for inline
- Remove unused `stBlockQuoteLevel` field; blockquotes render as
  plain content
- Fix `escapeString` to also escape `--` and ` `` `
- Fix definition list: use `: **term**` with indented content
  instead of DokuWiki `* **term**`
- Remove leftover DokuWiki-specific `languageNames` map
  and `isSimpleList` helpers
- Add `Tests.Writers.Txt2Tags` with unit tests covering inline
  formatting, escaping, headers, code blocks, lists, definition
  lists, blockquotes, horizontal rules, tables, links, images,
  and raw blocks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add `writeTxt2Tags` to the export list so it is accessible via
the `Text.Pandoc` module, fixing a compilation error in tests.
Txt2Tags writer: export writeTxt2Tags from Text.Pandoc.Writers
@farvardin

Copy link
Copy Markdown
Author

@jgm is the test ok now? it looks like there is no longer the previous issue with this writer.

@luginf luginf mentioned this pull request Jun 11, 2026
@farvardin farvardin closed this Jun 11, 2026
@farvardin

Copy link
Copy Markdown
Author

Closing in favor of #11699 - rebased onto current main as a single clean commit to fix the check-commit-msg-length CI failure (the history here had gotten tangled across multiple forks).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants