Convert pasted HTML to Markdown in Markdown tabs #673

carli2 · 2025-10-01T08:39:16Z

With this commit, I can paste formatted text now which automatically gets converted to Markdown.

This way, copy&paste from ChatGPT won't destroy document structure like ordered/unordered lists and headings

This reverts commit cd612cb.

mitya57

Thank you for this pull request!

I like the idea, but left some comments.

mitya57 · 2025-10-02T20:54:06Z

ReText/editor.py

    QTextCursor,
    QTextFormat,
    QTextOption,
+    QTextDocument,


Please keep this list sorted. QTextDocument should be between QTextCursor and QTextFormat.

mitya57 · 2025-10-02T20:56:18Z

ReText/editor.py

+        document = QTextDocument()
+        document.setHtml(html)
+        to_markdown = getattr(document, 'toMarkdown', None)
+        if callable(to_markdown):


Why is this check needed?

toMarkdown seems to be available since Qt 5.14, and we are requiring Qt 6.

mitya57 · 2025-10-02T20:57:33Z

ReText/editor.py

                imageText = self.getImageMarkup(url)
                self.textCursor().insertText(imageText)
                return
+        if source.hasHtml() and getattr(self, 'tab', None):


self.tab should be always there.

mitya57 · 2025-10-02T20:58:24Z

ReText/editor.py

            super().inputMethodEvent(event)

+    def _convertHtmlToMarkdown(self, html):
+        if not html:


Can this really happen? I see you are passing source.html() to this function but only if source.hasHtml() returns True, so it should be never None?

mitya57 · 2025-10-02T21:04:42Z

ReText/mdx_tablength.py

+from markdown.preprocessors import Preprocessor
+
+
+class _ListIndentPreprocessor(Preprocessor):


This is a potentially breaking change, since it changes how existing documents are interpreted.

Also, it makes ReText deviate from the Markdown specification, which mandates that “Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab”.

If the problem is Qt, which uses different indentation, can we normalize it on insert instead? It also will be better performance-wise, since we normalize indentation only once, not on each render.

…ng, not to the rendering logic

carli2 · 2025-10-03T10:33:49Z

I tried to tackle all the things you mentioned. Please review again

mitya57 · 2025-10-03T18:47:42Z

ReText/converterprocess.py

                    current_markup = markup_class(job['filename'])
-                    current_markup.requested_extensions = job['requested_extensions']
+
+                current_markup.requested_extensions = job['requested_extensions']


Now that we don't have a new extension, changes in this file are probably not needed anymore?

mitya57 · 2025-10-03T18:48:30Z

ReText/tab.py


 PreviewDisabled, PreviewLive, PreviewNormal = range(3)

+def get_markdown_requested_extensions(sync_scroll_enabled):


And changes in this file seem also not needed anymore?

mitya57 · 2025-10-03T18:49:15Z

tests/test_editor.py

+    def test_pasteHtmlConvertedToMarkdown(self):
+        mimeData = QMimeData()
+        mimeData.setText('plain text fallback')
+        mimeData.setHtml('<p>Hello <strong>world</strong></p>')


Can you please add a test that will check indentation normalization?

mitya57 · 2025-10-03T18:55:24Z

ReText/editor.py

+        markdown = self._normalize_markdown_list_indentation_text(markdown)
+        return markdown.rstrip('\n')
+
+    def _normalize_markdown_list_indentation_text(self, text):


Can you please use camelCase? I know it's not PEP 8 compliant, but I use it in ReText because PyQt uses it.

Also I think the word text in this function name is not needed?

Suggested change

def _normalize_markdown_list_indentation_text(self, text):

def _normalizeMarkdownListIndentation(self, text):

carli2 · 2025-10-03T21:41:48Z

This is the squashed diff. I hope, it fits this time.

diff --git a/ReText/converterprocess.py b/ReText/converterprocess.py
index c91ec34..389e7e1 100644
--- a/ReText/converterprocess.py
+++ b/ReText/converterprocess.py
@@ -191,4 +191,3 @@ class ConverterProcess(QObject):
     def stop(self):
         sendObject(self.conn, {'command': 'quit'})
         self.conn.close()
-
diff --git a/ReText/editor.py b/ReText/editor.py
index 8b10bac..3e6f023 100644
--- a/ReText/editor.py
+++ b/ReText/editor.py
@@ -46,6 +46,7 @@ from PyQt6.QtGui import (
     QPainter,
     QPalette,
     QTextCursor,
+    QTextDocument,
     QTextFormat,
     QTextOption,
     QWheelEvent,
@@ -404,11 +405,31 @@ class ReTextEdit(QTextEdit):
                     matchedText = matchedPrefix + str(nextNumber) + ". "
         else:
             matchedText = ''
+        # For Markdown, normalize nested list indentation to multiples of 4 spaces
+        # to comply with the Markdown specification for list indentation.
+        markupClass = self.tab.getActiveMarkupClass()
+        if matchedText and markupClass == MarkdownMarkup:
+            matchedText = self._normalizeListIndent(matchedText)
         # Reset the cursor
         cursor = self.textCursor()
         cursor.insertText('\n' + matchedText)
         self.ensureCursorVisible()
 
+    def _normalizeListIndent(self, prefix):
+        # Split prefix into indentation and the rest (e.g., "    - ", "  1. ")
+        i = 0
+        while i < len(prefix) and prefix[i] in (' ', '\t'):
+            i += 1
+        indent = prefix[:i]
+        rest = prefix[i:]
+        # Only normalize when the rest looks like a list marker
+        if re.match(r'(?:[-+*]|\d+\.)\s', rest):
+            indent_len = len(indent.expandtabs(4))
+            if indent_len and (indent_len < 4 or indent_len % 4 != 0):
+                new_len = max(4, ((indent_len + 3) // 4) * 4)
+                return (' ' * new_len) + rest
+        return prefix
+
     def moveLineUp(self):
         self.moveLine(QTextCursor.MoveOperation.PreviousBlock)
 
@@ -594,6 +615,38 @@ class ReTextEdit(QTextEdit):
         if event.preeditString() or event.commitString() or event.attributes():
             super().inputMethodEvent(event)
 
+    def _convertHtmlToMarkdown(self, html):
+        document = QTextDocument()
+        document.setHtml(html)
+        markdown = document.toMarkdown()
+        markdown = self._normalizeMarkdownListIndentation(markdown)
+        return markdown.rstrip('\n')
+
+    def _normalizeMarkdownListIndentation(self, text):
+        # Normalize nested list indentation in a multi-line Markdown string.
+        # Ensures sublists are indented by multiples of 4 spaces.
+        fence_re = re.compile(r'^\s*`{3,}.*$')
+        item_re = re.compile(r'^(?P<indent>[ \t]+)(?P<marker>(?:[-+*]|\d+\.))\s+')
+        in_fence = False
+        lines = text.splitlines(True)
+        out = []
+        for line in lines:
+            if fence_re.match(line):
+                in_fence = not in_fence
+                out.append(line)
+                continue
+            if not in_fence:
+                m = item_re.match(line)
+                if m:
+                    indent = m.group('indent')
+                    indent_len = len(indent.expandtabs(4))
+                    if indent_len and (indent_len < 4 or indent_len % 4 != 0):
+                        new_len = max(4, ((indent_len + 3) // 4) * 4)
+                        new_indent = ' ' * new_len
+                        line = new_indent + line[len(indent):]
+            out.append(line)
+        return ''.join(out)
+
     def insertFromMimeData(self, source):
         if source.hasUrls():
             url = source.urls()[0]
@@ -604,6 +657,13 @@ class ReTextEdit(QTextEdit):
                 imageText = self.getImageMarkup(url)
                 self.textCursor().insertText(imageText)
                 return
+        if source.hasHtml():
+            markupClass = self.tab.getActiveMarkupClass()
+            if markupClass == MarkdownMarkup:
+                markdown = self._convertHtmlToMarkdown(source.html())
+                if markdown:
+                    self.textCursor().insertText(markdown)
+                    return
         return super().insertFromMimeData(source)
 
 
diff --git a/tests/test_editor.py b/tests/test_editor.py
index cbdcecf..37e8772 100644
--- a/tests/test_editor.py
+++ b/tests/test_editor.py
@@ -112,6 +112,17 @@ class TestClipboardHandling(unittest.TestCase):
         self.editor.insertFromMimeData(mimeData)
         self.assertTrue('pasted text' in self.editor.toPlainText())
 
+    def test_pasteHtmlConvertedToMarkdown(self):
+        mimeData = QMimeData()
+        mimeData.setText('plain text fallback')
+        mimeData.setHtml('<p>Hello <strong>world</strong></p>')
+        self.dummytab.markupClass = MarkdownMarkup
+
+        self.editor.insertFromMimeData(mimeData)
+
+        pasted = self.editor.toPlainText()
+        self.assertIn('Hello **world**', pasted)
+
     @patch.object(ReTextEdit, 'getImageFilename', return_value='/tmp/myimage.jpg')
     @patch.object(QImage, 'save')
     def test_pasteImage_Markdown(self, _mock_image, _mock_editor):
@@ -291,5 +302,15 @@ class TestOrderedListMode(unittest.TestCase):
         QTest.keyClicks(editor, 'World')
         self.assertEqual(editor.document().toPlainText(), '1. Hello\n1. World')
 
+    def test_markdown_list_indentation_normalization(self):
+        editor = ReTextEdit(self)
+        editor.tab = self.DummyReTextTab()
+        editor.tab.markupClass = MarkdownMarkup
+        # Type a list item with two-space indent, then press Return.
+        QTest.keyClicks(editor, '  - Foo')
+        QTest.keyClick(editor, Qt.Key.Key_Return)
+        QTest.keyClicks(editor, 'Bar')
+        self.assertEqual(editor.document().toPlainText(), '  - Foo\n    - Bar')
+
 if __name__ == '__main__':
     unittest.main()

mitya57 · 2025-10-04T22:33:00Z

tests/test_editor.py

        QTest.keyClicks(editor, 'World')
        self.assertEqual(editor.document().toPlainText(), '1. Hello\n1. World')

+    def test_markdown_list_indentation_normalization(self):


Thank you very much for the updates!

This test is good, but does it really test your _normalizeMarkdownListIndentation function? It looks like no HTML to markdown conversion is happening here, and this test is testing the editor functionality of preserving indentation instead?

If I'm right, you may keep this test and add another one. Or, maybe just add more complex content (that requires normalization) to your test_pasteHtmlConvertedToMarkdown function above?

Also, please rename this test to camelCase if you keep it.

carli2 · 2025-10-05T14:26:25Z

Hi, can you merge and throw out the test if it'd be fine?

mitya57 · 2025-10-05T15:45:27Z

Done, thank you again!

I will look at the new PR a bit later.

carli2 · 2025-10-05T15:46:44Z

Oh, I saw you threw out the list normalization algo. This is needed since the html-to-markdown helper function does not produce correct markdown with nested lists. you can try to paste nested lists

mitya57 · 2025-10-05T16:10:10Z

I kept the _normalizeMarkdownListIndentation function and added a test that inserting lists with sublist produces the expected result (see test_pasteHtmlConvertedToMarkdown_lists).

Isn't it enough? What HTML do I need to insert to make it broken?

carli2 · 2025-10-05T16:22:59Z

This is the example I used:

<meta http-equiv="content-type" content="text/html; charset=utf-8"><h2 data-start="1203" data-end="1250">🔑 Warum der Wechsel zu den Offenen Brüdern?</h2>
<ol data-start="1251" data-end="2463">
<li data-start="1251" data-end="1660">
<p data-start="1254" data-end="1282"><strong data-start="1254" data-end="1280">Strenge der Exklusiven</strong></p>
<ul data-start="1286" data-end="1660">
<li data-start="1286" data-end="1551">
<p data-start="1288" data-end="1384">In Neugersdorf (wie in vielen Orten) erlebten die Leute die <strong data-start="1348" data-end="1381">exklusiven Regeln als zu hart</strong>:</p>
<ul data-start="1390" data-end="1551">
<li data-start="1390" data-end="1426">
<p data-start="1392" data-end="1426">Absonderung von „Nicht-Brüdern“.</p>
</li>
<li data-start="1432" data-end="1480">
<p data-start="1434" data-end="1480">Ausschluss wegen kleinster Lehrabweichungen.</p>
</li>
<li data-start="1486" data-end="1551">
<p data-start="1488" data-end="1551">kaum Kontakt mit Landeskirchlern, Baptisten oder Methodisten.</p>
</li>
</ul>
</li>
<li data-start="1555" data-end="1660">
<p data-start="1557" data-end="1660">Das passte schlecht in eine kleine Stadt, wo man auf Nachbarn, Kollegen und Verwandte angewiesen war.</p>
</li>
</ul>
</li>
<li data-start="1662" data-end="1955">
<p data-start="1665" data-end="1710"><strong data-start="1665" data-end="1708">Einfluss missionarisch geprägter Brüder</strong></p>
<ul data-start="1714" data-end="1955">
<li data-start="1714" data-end="1842">
<p data-start="1716" data-end="1842">Durch Kontakte zu Offenen Brüdern in Sachsen und Thüringen sah man, dass <strong data-start="1789" data-end="1831">Gemeinden auch ohne Absonderung blühen</strong> konnten.</p>
</li>
<li data-start="1846" data-end="1955">
<p data-start="1848" data-end="1955">Offene Brüder waren evangelistisch aktiv → passte besser zur pietistisch-erweckten Mentalität der Region.</p>
</li>
</ul>
</li>
<li data-start="1957" data-end="2192">
<p data-start="1960" data-end="1998"><strong data-start="1960" data-end="1996">Familien- und Gemeindespaltungen</strong></p>
<ul data-start="2002" data-end="2192">
<li data-start="2002" data-end="2083">
<p data-start="2004" data-end="2083">Die Exklusiven hatten oft <strong data-start="2030" data-end="2053">innere Zerwürfnisse</strong> (wer ist rein, wer nicht?).</p>
</li>
<li data-start="2087" data-end="2192">
<p data-start="2089" data-end="2192">In Neugersdorf führte das zu Spannungen – die <strong data-start="2135" data-end="2189">Offenen Brüder wirkten friedlicher und praktischer</strong>.</p>
</li>
</ul>
</li>
<li data-start="2194" data-end="2463">
<p data-start="2197" data-end="2223"><strong data-start="2197" data-end="2221">DDR-Zeit (nach 1945)</strong></p>
<ul data-start="2227" data-end="2463">
<li data-start="2227" data-end="2306">
<p data-start="2229" data-end="2306">Unter staatlicher Beobachtung war strikte Absonderung schwer durchzuhalten.</p>
</li>
<li data-start="2310" data-end="2463">
<p data-start="2312" data-end="2463">Gemeinden, die <strong data-start="2327" data-end="2338">offener</strong> waren, konnten leichter bestehen, weil sie besser vernetzt waren und nicht in endlosen internen Streitigkeiten erstarrten.</p>
</li>
</ul>
</li>
</ol>

mitya57 · 2025-10-05T16:33:27Z

This HTML seems to produce a working Markdown with the current code:

carli2 · 2025-10-05T16:35:01Z

ok nice :) for some reasons my test during implementation didn't work out that well but nice when it works

carli2 added 4 commits October 1, 2025 10:36

Convert pasted HTML to Markdown in Markdown tabs

e05d194

Add optional system markdown command for conversions

cd612cb

Revert "Add optional system markdown command for conversions"

c846408

This reverts commit cd612cb.

Normalize Markdown list indentation

cc12ad0

mitya57 requested changes Oct 2, 2025

View reviewed changes

removed some unnecessary checks; move indentation fixes to HTML pasti…

1bf6aa8

…ng, not to the rendering logic

mitya57 requested changes Oct 3, 2025

View reviewed changes

Clean up for PR merge

27debc9

mitya57 reviewed Oct 4, 2025

View reviewed changes

mitya57 added 2 commits October 5, 2025 18:43

Revert changes unrelated to pasting HTML

e5aff21

Add a test for _normalizeMarkdownListIndentation function

55f91de

mitya57 merged commit baf76e5 into retext-project:master Oct 5, 2025

		from markdown.preprocessors import Preprocessor


		class _ListIndentPreprocessor(Preprocessor):


		PreviewDisabled, PreviewLive, PreviewNormal = range(3)

		def get_markdown_requested_extensions(sync_scroll_enabled):

	def _normalize_markdown_list_indentation_text(self, text):
	def _normalizeMarkdownListIndentation(self, text):

Convert pasted HTML to Markdown in Markdown tabs #673

Convert pasted HTML to Markdown in Markdown tabs #673

Uh oh!

Conversation

carli2 commented Oct 1, 2025

Uh oh!

mitya57 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carli2 commented Oct 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carli2 commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carli2 commented Oct 5, 2025

Uh oh!

mitya57 commented Oct 5, 2025

Uh oh!

carli2 commented Oct 5, 2025

Uh oh!

mitya57 commented Oct 5, 2025

Uh oh!

carli2 commented Oct 5, 2025

Uh oh!

mitya57 commented Oct 5, 2025

Uh oh!

carli2 commented Oct 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

carli2 commented Oct 3, 2025 •

edited

Loading