Skip to content

Conversation

@carli2
Copy link
Contributor

@carli2 carli2 commented Oct 1, 2025

With this commit, I can paste formatted text now which automatically gets converted to Markdown.

This way, copy&paste from ChatGPT won't destroy document structure like ordered/unordered lists and headings

Copy link
Member

@mitya57 mitya57 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this pull request!

I like the idea, but left some comments.

ReText/editor.py Outdated
QTextCursor,
QTextFormat,
QTextOption,
QTextDocument,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep this list sorted. QTextDocument should be between QTextCursor and QTextFormat.

ReText/editor.py Outdated
document = QTextDocument()
document.setHtml(html)
to_markdown = getattr(document, 'toMarkdown', None)
if callable(to_markdown):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this check needed?

toMarkdown seems to be available since Qt 5.14, and we are requiring Qt 6.

ReText/editor.py Outdated
imageText = self.getImageMarkup(url)
self.textCursor().insertText(imageText)
return
if source.hasHtml() and getattr(self, 'tab', None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.tab should be always there.

ReText/editor.py Outdated
super().inputMethodEvent(event)

def _convertHtmlToMarkdown(self, html):
if not html:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this really happen? I see you are passing source.html() to this function but only if source.hasHtml() returns True, so it should be never None?

from markdown.preprocessors import Preprocessor


class _ListIndentPreprocessor(Preprocessor):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a potentially breaking change, since it changes how existing documents are interpreted.

Also, it makes ReText deviate from the Markdown specification, which mandates that “Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab”.

If the problem is Qt, which uses different indentation, can we normalize it on insert instead? It also will be better performance-wise, since we normalize indentation only once, not on each render.

@carli2
Copy link
Contributor Author

carli2 commented Oct 3, 2025

I tried to tackle all the things you mentioned. Please review again

current_markup = markup_class(job['filename'])
current_markup.requested_extensions = job['requested_extensions']

current_markup.requested_extensions = job['requested_extensions']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we don't have a new extension, changes in this file are probably not needed anymore?

ReText/tab.py Outdated

PreviewDisabled, PreviewLive, PreviewNormal = range(3)

def get_markdown_requested_extensions(sync_scroll_enabled):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And changes in this file seem also not needed anymore?

def test_pasteHtmlConvertedToMarkdown(self):
mimeData = QMimeData()
mimeData.setText('plain text fallback')
mimeData.setHtml('<p>Hello <strong>world</strong></p>')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a test that will check indentation normalization?

ReText/editor.py Outdated
markdown = self._normalize_markdown_list_indentation_text(markdown)
return markdown.rstrip('\n')

def _normalize_markdown_list_indentation_text(self, text):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please use camelCase? I know it's not PEP 8 compliant, but I use it in ReText because PyQt uses it.

Also I think the word text in this function name is not needed?

Suggested change
def _normalize_markdown_list_indentation_text(self, text):
def _normalizeMarkdownListIndentation(self, text):

@carli2
Copy link
Contributor Author

carli2 commented Oct 3, 2025

This is the squashed diff. I hope, it fits this time.

diff --git a/ReText/converterprocess.py b/ReText/converterprocess.py
index c91ec34..389e7e1 100644
--- a/ReText/converterprocess.py
+++ b/ReText/converterprocess.py
@@ -191,4 +191,3 @@ class ConverterProcess(QObject):
     def stop(self):
         sendObject(self.conn, {'command': 'quit'})
         self.conn.close()
-
diff --git a/ReText/editor.py b/ReText/editor.py
index 8b10bac..3e6f023 100644
--- a/ReText/editor.py
+++ b/ReText/editor.py
@@ -46,6 +46,7 @@ from PyQt6.QtGui import (
     QPainter,
     QPalette,
     QTextCursor,
+    QTextDocument,
     QTextFormat,
     QTextOption,
     QWheelEvent,
@@ -404,11 +405,31 @@ class ReTextEdit(QTextEdit):
                     matchedText = matchedPrefix + str(nextNumber) + ". "
         else:
             matchedText = ''
+        # For Markdown, normalize nested list indentation to multiples of 4 spaces
+        # to comply with the Markdown specification for list indentation.
+        markupClass = self.tab.getActiveMarkupClass()
+        if matchedText and markupClass == MarkdownMarkup:
+            matchedText = self._normalizeListIndent(matchedText)
         # Reset the cursor
         cursor = self.textCursor()
         cursor.insertText('\n' + matchedText)
         self.ensureCursorVisible()
 
+    def _normalizeListIndent(self, prefix):
+        # Split prefix into indentation and the rest (e.g., "    - ", "  1. ")
+        i = 0
+        while i < len(prefix) and prefix[i] in (' ', '\t'):
+            i += 1
+        indent = prefix[:i]
+        rest = prefix[i:]
+        # Only normalize when the rest looks like a list marker
+        if re.match(r'(?:[-+*]|\d+\.)\s', rest):
+            indent_len = len(indent.expandtabs(4))
+            if indent_len and (indent_len < 4 or indent_len % 4 != 0):
+                new_len = max(4, ((indent_len + 3) // 4) * 4)
+                return (' ' * new_len) + rest
+        return prefix
+
     def moveLineUp(self):
         self.moveLine(QTextCursor.MoveOperation.PreviousBlock)
 
@@ -594,6 +615,38 @@ class ReTextEdit(QTextEdit):
         if event.preeditString() or event.commitString() or event.attributes():
             super().inputMethodEvent(event)
 
+    def _convertHtmlToMarkdown(self, html):
+        document = QTextDocument()
+        document.setHtml(html)
+        markdown = document.toMarkdown()
+        markdown = self._normalizeMarkdownListIndentation(markdown)
+        return markdown.rstrip('\n')
+
+    def _normalizeMarkdownListIndentation(self, text):
+        # Normalize nested list indentation in a multi-line Markdown string.
+        # Ensures sublists are indented by multiples of 4 spaces.
+        fence_re = re.compile(r'^\s*`{3,}.*$')
+        item_re = re.compile(r'^(?P<indent>[ \t]+)(?P<marker>(?:[-+*]|\d+\.))\s+')
+        in_fence = False
+        lines = text.splitlines(True)
+        out = []
+        for line in lines:
+            if fence_re.match(line):
+                in_fence = not in_fence
+                out.append(line)
+                continue
+            if not in_fence:
+                m = item_re.match(line)
+                if m:
+                    indent = m.group('indent')
+                    indent_len = len(indent.expandtabs(4))
+                    if indent_len and (indent_len < 4 or indent_len % 4 != 0):
+                        new_len = max(4, ((indent_len + 3) // 4) * 4)
+                        new_indent = ' ' * new_len
+                        line = new_indent + line[len(indent):]
+            out.append(line)
+        return ''.join(out)
+
     def insertFromMimeData(self, source):
         if source.hasUrls():
             url = source.urls()[0]
@@ -604,6 +657,13 @@ class ReTextEdit(QTextEdit):
                 imageText = self.getImageMarkup(url)
                 self.textCursor().insertText(imageText)
                 return
+        if source.hasHtml():
+            markupClass = self.tab.getActiveMarkupClass()
+            if markupClass == MarkdownMarkup:
+                markdown = self._convertHtmlToMarkdown(source.html())
+                if markdown:
+                    self.textCursor().insertText(markdown)
+                    return
         return super().insertFromMimeData(source)
 
 
diff --git a/tests/test_editor.py b/tests/test_editor.py
index cbdcecf..37e8772 100644
--- a/tests/test_editor.py
+++ b/tests/test_editor.py
@@ -112,6 +112,17 @@ class TestClipboardHandling(unittest.TestCase):
         self.editor.insertFromMimeData(mimeData)
         self.assertTrue('pasted text' in self.editor.toPlainText())
 
+    def test_pasteHtmlConvertedToMarkdown(self):
+        mimeData = QMimeData()
+        mimeData.setText('plain text fallback')
+        mimeData.setHtml('<p>Hello <strong>world</strong></p>')
+        self.dummytab.markupClass = MarkdownMarkup
+
+        self.editor.insertFromMimeData(mimeData)
+
+        pasted = self.editor.toPlainText()
+        self.assertIn('Hello **world**', pasted)
+
     @patch.object(ReTextEdit, 'getImageFilename', return_value='/tmp/myimage.jpg')
     @patch.object(QImage, 'save')
     def test_pasteImage_Markdown(self, _mock_image, _mock_editor):
@@ -291,5 +302,15 @@ class TestOrderedListMode(unittest.TestCase):
         QTest.keyClicks(editor, 'World')
         self.assertEqual(editor.document().toPlainText(), '1. Hello\n1. World')
 
+    def test_markdown_list_indentation_normalization(self):
+        editor = ReTextEdit(self)
+        editor.tab = self.DummyReTextTab()
+        editor.tab.markupClass = MarkdownMarkup
+        # Type a list item with two-space indent, then press Return.
+        QTest.keyClicks(editor, '  - Foo')
+        QTest.keyClick(editor, Qt.Key.Key_Return)
+        QTest.keyClicks(editor, 'Bar')
+        self.assertEqual(editor.document().toPlainText(), '  - Foo\n    - Bar')
+
 if __name__ == '__main__':
     unittest.main()

QTest.keyClicks(editor, 'World')
self.assertEqual(editor.document().toPlainText(), '1. Hello\n1. World')

def test_markdown_list_indentation_normalization(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the updates!

This test is good, but does it really test your _normalizeMarkdownListIndentation function? It looks like no HTML to markdown conversion is happening here, and this test is testing the editor functionality of preserving indentation instead?

If I'm right, you may keep this test and add another one. Or, maybe just add more complex content (that requires normalization) to your test_pasteHtmlConvertedToMarkdown function above?

Also, please rename this test to camelCase if you keep it.

@carli2
Copy link
Contributor Author

carli2 commented Oct 5, 2025

Hi, can you merge and throw out the test if it'd be fine?

@mitya57 mitya57 merged commit baf76e5 into retext-project:master Oct 5, 2025
@mitya57
Copy link
Member

mitya57 commented Oct 5, 2025

Done, thank you again!

I will look at the new PR a bit later.

@carli2
Copy link
Contributor Author

carli2 commented Oct 5, 2025

Oh, I saw you threw out the list normalization algo. This is needed since the html-to-markdown helper function does not produce correct markdown with nested lists. you can try to paste nested lists

@mitya57
Copy link
Member

mitya57 commented Oct 5, 2025

I kept the _normalizeMarkdownListIndentation function and added a test that inserting lists with sublist produces the expected result (see test_pasteHtmlConvertedToMarkdown_lists).

Isn't it enough? What HTML do I need to insert to make it broken?

@carli2
Copy link
Contributor Author

carli2 commented Oct 5, 2025

This is the example I used:

<meta http-equiv="content-type" content="text/html; charset=utf-8"><h2 data-start="1203" data-end="1250">🔑 Warum der Wechsel zu den Offenen Brüdern?</h2>
<ol data-start="1251" data-end="2463">
<li data-start="1251" data-end="1660">
<p data-start="1254" data-end="1282"><strong data-start="1254" data-end="1280">Strenge der Exklusiven</strong></p>
<ul data-start="1286" data-end="1660">
<li data-start="1286" data-end="1551">
<p data-start="1288" data-end="1384">In Neugersdorf (wie in vielen Orten) erlebten die Leute die <strong data-start="1348" data-end="1381">exklusiven Regeln als zu hart</strong>:</p>
<ul data-start="1390" data-end="1551">
<li data-start="1390" data-end="1426">
<p data-start="1392" data-end="1426">Absonderung von „Nicht-Brüdern“.</p>
</li>
<li data-start="1432" data-end="1480">
<p data-start="1434" data-end="1480">Ausschluss wegen kleinster Lehrabweichungen.</p>
</li>
<li data-start="1486" data-end="1551">
<p data-start="1488" data-end="1551">kaum Kontakt mit Landeskirchlern, Baptisten oder Methodisten.</p>
</li>
</ul>
</li>
<li data-start="1555" data-end="1660">
<p data-start="1557" data-end="1660">Das passte schlecht in eine kleine Stadt, wo man auf Nachbarn, Kollegen und Verwandte angewiesen war.</p>
</li>
</ul>
</li>
<li data-start="1662" data-end="1955">
<p data-start="1665" data-end="1710"><strong data-start="1665" data-end="1708">Einfluss missionarisch geprägter Brüder</strong></p>
<ul data-start="1714" data-end="1955">
<li data-start="1714" data-end="1842">
<p data-start="1716" data-end="1842">Durch Kontakte zu Offenen Brüdern in Sachsen und Thüringen sah man, dass <strong data-start="1789" data-end="1831">Gemeinden auch ohne Absonderung blühen</strong> konnten.</p>
</li>
<li data-start="1846" data-end="1955">
<p data-start="1848" data-end="1955">Offene Brüder waren evangelistisch aktiv → passte besser zur pietistisch-erweckten Mentalität der Region.</p>
</li>
</ul>
</li>
<li data-start="1957" data-end="2192">
<p data-start="1960" data-end="1998"><strong data-start="1960" data-end="1996">Familien- und Gemeindespaltungen</strong></p>
<ul data-start="2002" data-end="2192">
<li data-start="2002" data-end="2083">
<p data-start="2004" data-end="2083">Die Exklusiven hatten oft <strong data-start="2030" data-end="2053">innere Zerwürfnisse</strong> (wer ist rein, wer nicht?).</p>
</li>
<li data-start="2087" data-end="2192">
<p data-start="2089" data-end="2192">In Neugersdorf führte das zu Spannungen – die <strong data-start="2135" data-end="2189">Offenen Brüder wirkten friedlicher und praktischer</strong>.</p>
</li>
</ul>
</li>
<li data-start="2194" data-end="2463">
<p data-start="2197" data-end="2223"><strong data-start="2197" data-end="2221">DDR-Zeit (nach 1945)</strong></p>
<ul data-start="2227" data-end="2463">
<li data-start="2227" data-end="2306">
<p data-start="2229" data-end="2306">Unter staatlicher Beobachtung war strikte Absonderung schwer durchzuhalten.</p>
</li>
<li data-start="2310" data-end="2463">
<p data-start="2312" data-end="2463">Gemeinden, die <strong data-start="2327" data-end="2338">offener</strong> waren, konnten leichter bestehen, weil sie besser vernetzt waren und nicht in endlosen internen Streitigkeiten erstarrten.</p>
</li>
</ul>
</li>
</ol>

@mitya57
Copy link
Member

mitya57 commented Oct 5, 2025

This HTML seems to produce a working Markdown with the current code:

Screenshot

@carli2
Copy link
Contributor Author

carli2 commented Oct 5, 2025

ok nice :) for some reasons my test during implementation didn't work out that well but nice when it works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants