-
-
Notifications
You must be signed in to change notification settings - Fork 13
Fix List Formatting #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix List Formatting #16
Conversation
- 🐛 Fix orphaned <li> tags by wrapping them in <ul> or <ol> if no parent is found - 🎨 Update list styles to support nested lists with specific styles for depth - ✏️ Modify paragraph handling to ensure proper formatting within list items - 📄 Add new test cases for nested list items in HTML input
- 🧹 Remove unused imports for base64, urllib, and BytesIO
|
Hello @TaylorN15 ! I'll take a look and review your PR soon, thanks for your support! |
I just tested with your example HTML, and the numbering looks different I have a fix for restarting the list numbering, it's a bit hacky as you need to inject a numId into the XML def restart_numbering(paragraph, num_id=1000):
"""Forces Word to treat this paragraph as the start of a new numbered list."""
p = paragraph._p
pPr = p.get_or_add_pPr()
numPr = OxmlElement('w:numPr')
numId = OxmlElement('w:numId')
numId.set(qn('w:val'), str(num_id))
numPr.append(numId)
ilvl = OxmlElement('w:ilvl')
ilvl.set(qn('w:val'), '0') # top-level list
numPr.append(ilvl)
pPr.append(numPr)This solves the numbering issue: There is still an issue with the nested lists, because one of them is wrapped inside <ol>
<li>first list, first level, item 1</li>
<li>first list, first level, item 2</li>
<li>first list, first level, item 3</li>
</ol>
<ol>
<li>second list, first level, item 1</li>
<li>
second list, first level, item 2
<ol>
<li>second list, second level, item 1</li>
<li>second list, second level, item 2</li>
<li>
<ol>
<li>second list, third level, item 1</li>
<li>second list, third level, item 2</li>
</ol>
</li>
<li>
<ul>
<li>third level, unsorted list, item 1</li>
<li>third level, unsorted list, item 2</li>
</ul>
</li>
</ol>
</li>
</ol>If I change to the below, it works as expected: <ol>
<li>first list, first level, item 1</li>
<li>first list, first level, item 2</li>
<li>first list, first level, item 3</li>
</ol>
<ol>
<li>second list, first level, item 1</li>
<li>
second list, first level, item 2
<ol>
<li>second list, second level, item 1</li>
<li>second list, second level, item 2</li>
<ol>
<li>second list, third level, item 1</li>
<li>second list, third level, item 2</li>
</ol>
<ul>
<li>third level, unsorted list, item 1</li>
<li>third level, unsorted list, item 2</li>
</ul>
</ol>
</li>
</ol>What do you think? |
- 🐛 Fix list numbering for nested ordered lists - ✨ Add support for multiple levels of ordered and unordered lists - 🔧 Refactor list handling logic for improved clarity and functionality - 📜 Update test HTML to validate new list features
Ok, so I made some tests here to make sure the issues I have found. First, I still have problems on order count... but maybe because I'm using LibreOffice for my tests, but I don't have a Word here to validate this. I also tested on Google Docs, and was even worst than before I believe there is something we could fix by understanding the counting order and try to "hack fix" this context, what you think? Try to reproduce on other editors like Google Docs to see what happens. |
|
That’s odd, however, python-docx was designed to work specifically with Microsoft Word documents, making it work with all other applications will be difficult. The numbered list order seems to be working fine in my Word documents, as it uses OXmlElement, I don’t know if LibreOffice and Google Docs work 100% with OpenXML based documents. I can have a look into this a bit further, and I’ll see what I can do. But ultimately, as I said, it was designed to work with MS Word only. |
Yes, I agree that python-docx was designed specifically for Microsoft Word, but currently, there's good harmony between the applications, with most features working quite similarly. So, I’d like to keep it that way if possible, especially since I'm using LibreOffice for my own tests. If it’s not possible, then we can proceed with a Word-focused solution. On that note, I ran a few new tests using your branch as inspiration and managed to get it working on LibreOffice as well, aside from one small issue (which you might be able to help me with). As you can see, in my version it's working fine in most cases. However, when there's a restart on a nested list, I'm still struggling to get it to reset properly. The restart counter works correctly only at the base level. I tried fixing it, but so far without success. You might have some new ideas that could help me find a solution. I feel like we’re close to solving it completely. I created a new branch called I would appreciate it if you could test my approach on Microsoft Word to confirm that it’s still working well, and let me know if you have any ideas to fix the remaining issue. |
|
Just to confirm, you're expecting 3. second list, third level, item 1 and 2 to restart? So instead of 1, 2, *, *, 3, 4, it should be 1, 2, *, *, 1, 2? I would imagine we just need to restart the numbering again if we detect a new |
|
OK I have tried so many things, from what i can tell, it's some issue with the |
Okay, I agree! I may include this as a known issue at least, before release new version I'll update docs with it. |
- 🧪 Implement test for ordered list to verify item presence - 🧪 Implement test for unordered list to verify item presence
- 🐛 Ensure 'href' is checked for None before processing links - 🔗 Update logic to handle cases where 'href' is not provided
|
I've added a couple of tests, and also a small fix for anchor/hrefs that I noticed was causing issues for me. |
@TaylorN15 I may have misunderstood. Do you have tested my branch and validate that works fine on Word, but you couldn't fix the issue I explained about, correct? If so, do you mind updating your branch with my branch Also, thanks for find and fix about anchor/link, I appreciate it 🙏 |
|
I pulled your branch and added the unit tests but didn't merge back into mine. My bad. |
|
@TaylorN15 today I checked your PR, just few adjustments I suggested to you, let me know when you update it! |
- 🐛 Remove orphaned <li> tag handling to streamline list processing - ✅ Updated tests for ordered and unordered list scenarios to ensure proper conversion - ✨ Enhance test cases for better coverage of list items and styles
All done :) |
|
Nice @TaylorN15 ! I'll merge it now and release the official version until the end of this week, because I still need update the docs, thanks for your help and support! If you find something else to fix/update let me know :) |
|
Just realised I forgot to run flake8 afterward and there are some unused imports! Normally I use isort & black, but if i ran that over this codebase it would change too much :/ |
|
No worries, I'll update and let everything working fine before launch the new version, thx for let me know :) |
Description
This PR enhances the handling of HTML lists during DOCX generation. Key improvements include:
✅ List Style Improvements
ul,ul2,ul3→List Bullet,List Bullet 2,List Bullet 3ol,ol2,ol3→List Number,List Number 2,List Number 3List Bulletwhen no wrapping tag is present.🛠 Bug Fixes & Edge Case Handling
<li>tags (those not wrapped in<ul>or<ol>) by:<ul>before parsing.<p>) appear inside<li>.🧹 Code Cleanup
base64,urllib,BytesIO,Inches.handle_li()for improved clarity and maintainability.in_liflag to track context inside list items.Checklist Before Requesting a Review