Skip to content

anwaralameddin/pdf2

Repository files navigation

PDF2

CI

A package for inspecting PDF files.

It is at an early stage of development.

Goal

The current aim of this package is to implement the following features:

  • Parse PDF files
  • Validate PDF files
  • Extract metadata
  • Extract text, images, tables, links, annotations...
  • Check for potential security vulnerabilities

References

References to the International Standard ISO 32000-2:2020 (PDF 2.0) Portable document format – Part 2: PDF 2.0 are included in the comments and documentation. These are indicated by the section number, name, and page number(s) in square brackets, e.g. [7.3.10 Indirect objects, p33-34]. Nested square brackets indicate references to other sources, e.g. [[https://www.w3.org/TR/png/#4Concepts.EncodingScanlineAbs] 4.6.2 Scanline serialization].

Needed Help

If you are interested in contributing, please check the TODO list. Contributions to tests with extracts of PDF files that do not open correctly are highly appreciated, provided they do not require a change to the LICENSE.

About

A package for inspecting PDF files

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published