Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DocumentAssembler, OpenXmlRegex, UnicodeMapper bugfixes #46

Open
wants to merge 12 commits into
base: vNext
Choose a base branch
from

Conversation

lowellstewart
Copy link

This pull request aims to fix erroneous handling of

  • w:lastRenderedPageBreak elements
  • xml:space="preserve" attributes on w:t elements

lastRenderedPageBreak elements are included in DOCX files by Word when it saves, I guess as a sort of temporary placeholder to indicate where Word's layout engine last broke content across pages when it laid out the file. They do not represent any part of the document content itself. However, the OpenXmlRegex code was unable to match regexes when the match happened to include one of these elements. I traced the problem back to the implementation of UnicodeMapper.RunToString, where I added some code to ignore these lastRenderedPageBreak elements, and added a unit test to check for proper handling.

Also, Word has a certain way of handling the xml:space attribute on elements, but most of the OpenXmlPowerTools seem to be ignoring that attribute and its meaning. I added a helper method to UnicodeMapper that aims to emulate Word's behavior when converting from XML elements -> Unicode text strings, and a unit test to assert/explain the behavior. I also added code (and a unit test) to DocumentAssembler, so when it inserts content that may begin or end with intentional whitespace, the whitespace actually shows up in the DOCX file (instead of being ignored as it was before).

Lowell Stewart and others added 12 commits July 3, 2019 21:28
(when converting DOCX content to Unicode strings)
(now that UnicodeMapper.RunToString supports it
well enough to tell whether it's working or not!)
for test cases borrowed from OXPT fork sergey-tihon/Clippit.
The bugs driving these cases were fixed independently
(using quite different approaches in each fork) but I still
wanted to capture the test cases.
* Merge Himanshu's fix into the repository: https://github.com/EricWhiteDev/Open-Xml-PowerTools/pull/15/files
* Dispose opened stream
* Fix similar bugs
* Closed further leaks
---------
Co-authored-by: Andrei Atanasiu <[email protected]>
Co-authored-by: Markus Rudolph <[email protected]>
Old unit tests passed on .NET Framework but failed on .NET Core
Improvement based in part on Codeuctivity/OpenXmlPowerTools
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant