TODO

Option to dump the annotations as markdown definition list. This is useful with PDF viewers that cannot show the annotations as popups.
Is a Windows installer possible?
Test popups with Microsoft Edge Browser
Test with pyPDF2
Test with python3
hunspell issues:
- python-HunspellPure should be a separate module. Split it.
- we artificially limit to [A-Z_-]+ for words. This is bad for german umlauts.
- extend hunspell to allow a progress indicator callback. (counting newlines seen in response)
testsuite
- maybe prepare a test script that allows numbers to be off by some percentage, but wants everything else precise. This helps with pdf source checking.
improve --log logfile generator. produce a json/xml/csv/txt file describing the diffs, -s word locations and --spellcheck results.
one letter changes always become word changes. Either run in single character mode. Or try to trim the replaced text for common suffix or common prefix.
Normalize nonbreaking spaces to spaces. This is important when e.g. markdown source has a 0x20 space, but rendered PDF may have instead.

DONE:

write compressed streams.
catch file open errors, before ET complains about 0 elements.
perform only same-length-replace. All other replace-ops should be replace+insert or replace+delete.
place delete marker at last text end position, rather than next text start position. This is a tricky, implementation in markword().
testsuite
- a 1:1 comparison is not possible, as e.g. poppler-0.18 and poppler-0.20 produce differences in the exact coordinates used.
- make a fuzzy comparison against templates with python-cv, pHash, etc... http://stackoverflow.com/questions/1819124/image-comparison-algorithm suggests Scipy. imgcmp.py does this.
- generate several output.pdf, convert via ImageMagick to png,
- run pdfcompare --version.
nicer +++---~~~== git style diagnostics per page, rather than saying '87 hits'.
if pagebreaks are within deleted text, point this out in the baloon popup.
Navigation from changebar to changebar, if there are many unchanged pages to jump over.
- calculation, graphics done. Hack with relocated navigation done.
popups are all in one line in okular. Need to provide linebreaks manually, sigh.
introduce an ignore-margin for text changes. Any words there will not go into the compare wordlists, and will not match with --search. This is meant to skip over pagenumbers and other bottom or top matter, that is not considered part of the document contents stream. --feature margin shall draw the margin area as shaded gray, so that we know where we are.
feature: pipe the wordlist through hunspell, if hunspell is available. use search-highlights to mark all words for which hunspell has spelling suggestions.
feature: added a trivial --log implementation
second level diff for moved blocks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODO.md

TODO.md

Files

TODO.md

Latest commit

History

TODO.md

File metadata and controls