Skip to content

Latest commit

 

History

History
84 lines (58 loc) · 2.9 KB

TODO.md

File metadata and controls

84 lines (58 loc) · 2.9 KB

TODO

  • Option to dump the annotations as markdown definition list. This is useful with PDF viewers that cannot show the annotations as popups.

  • Is a Windows installer possible?

  • Test popups with Microsoft Edge Browser

  • Test with pyPDF2

  • Test with python3

  • hunspell issues:

    • python-HunspellPure should be a separate module. Split it.
    • we artificially limit to [A-Z_-]+ for words. This is bad for german umlauts.
    • extend hunspell to allow a progress indicator callback. (counting newlines seen in response)
  • testsuite

    • maybe prepare a test script that allows numbers to be off by some percentage, but wants everything else precise. This helps with pdf source checking.
  • improve --log logfile generator. produce a json/xml/csv/txt file describing the diffs, -s word locations and --spellcheck results.

  • one letter changes always become word changes. Either run in single character mode. Or try to trim the replaced text for common suffix or common prefix.

  • Normalize nonbreaking spaces to spaces. This is important when e.g. markdown source has a 0x20 space, but rendered PDF may have   instead.

DONE:

  • write compressed streams.

  • catch file open errors, before ET complains about 0 elements.

  • perform only same-length-replace. All other replace-ops should be replace+insert or replace+delete.

  • place delete marker at last text end position, rather than next text start position. This is a tricky, implementation in markword().

  • testsuite

    • a 1:1 comparison is not possible, as e.g. poppler-0.18 and poppler-0.20 produce differences in the exact coordinates used.
    • make a fuzzy comparison against templates with python-cv, pHash, etc... http://stackoverflow.com/questions/1819124/image-comparison-algorithm suggests Scipy. imgcmp.py does this.
    • generate several output.pdf, convert via ImageMagick to png,
    • run pdfcompare --version.
  • nicer +++---~~~== git style diagnostics per page, rather than saying '87 hits'.

  • if pagebreaks are within deleted text, point this out in the baloon popup.

  • Navigation from changebar to changebar, if there are many unchanged pages to jump over.

    • calculation, graphics done. Hack with relocated navigation done.
  • popups are all in one line in okular. Need to provide linebreaks manually, sigh.

  • introduce an ignore-margin for text changes. Any words there will not go into the compare wordlists, and will not match with --search. This is meant to skip over pagenumbers and other bottom or top matter, that is not considered part of the document contents stream. --feature margin shall draw the margin area as shaded gray, so that we know where we are.

  • feature: pipe the wordlist through hunspell, if hunspell is available. use search-highlights to mark all words for which hunspell has spelling suggestions.

  • feature: added a trivial --log implementation

  • second level diff for moved blocks.