pdftools_toolbox.pdf.structure.node

Classes

Node(tag, document, page)

This class represents a structure element node in the structure element tree of a tagged PDF.

class pdftools_toolbox.pdf.structure.node.Node(tag: str, document: Document, page: Page | None)[source]

Bases: _NativeObject

This class represents a structure element node in the structure element tree of a tagged PDF. Nodes may either have a collection of other nodes as children, or be associated with marked content. These two roles cannot be mixed.

__init__(tag: str, document: Document, page: Page | None)[source]
Parameters:
  • tag (str) – Tags should conform to the Standard Structure Types described within the PDF standard or refer to entries in the RoleMap. Allowed values from the PDF standard are: Document, Part, Sect, Art, Div, H1, H2, H3, H4, H5, H6, P, L, LI, Lbl, LBody, Table, TR, TH, TD, THead, TBody, TFoot, Span, Quote, Note, Reference, Figure, Caption, Artifact, Form, Field, Link, Code, Annot, Ruby, Warichu, TOC, TOCI, Index and BibEntry.

  • document (pdftools_toolbox.pdf.document.Document) – The document containing the structure element tree.

  • page (Optional[pdftools_toolbox.pdf.page.Page]) – The page on which marked content associated with the structure element node is to be found. This is optional, but is best omitted for nodes which are not associated with marked content.

Raises:

StateError – if the object or the owning document has already been closed

get_string_attribute(key: str) str | None[source]

Query a string attribute

Parameters:

key (str) – The attribute key

Returns:

the attribute value

Return type:

Optional[str]

Raises:

StateError – if the object or the owning document has already been closed

set_string_attribute(key: str, value: str) None[source]

Set a string attribute

Parameters:
  • key (str) – The attribute key

  • value (str) – The attribute value

Raises:

StateError – if the object or the owning document has already been closed

property parent: Node

The parent node in the structure element tree.

Returns:

pdftools_toolbox.pdf.structure.node.Node

Raises:
  • StateError – if the object or the owning document has already been closed

  • OperationError – if the parent is the structure element tree root node

property children: NodeList

The list of child nodes under this node in the structure element tree. Once child nodes have been added to a node, it can no longer be associated with marked content.

Returns:

pdftools_toolbox.pdf.structure.node_list.NodeList

Raises:
property tag: str

Tags should conform to the Standard Structure Types described within the PDF standard.

Returns:

str

Raises:
property page: Page | None

The page on which marked content associated with the structure element node is to be found. This is optional, but is best omitted for nodes which are not associated with marked content.

Returns:

Optional[pdftools_toolbox.pdf.page.Page]

Raises:

StateError – if the object or the owning document has already been closed

property alternate_text: str | None

Alternate text to be used where the content denoted by the structure element and its children cannot be rendered because of accessibility or other concerns.

Returns:

Optional[str]

Raises:

StateError – if the object or the owning document has already been closed

property actual_text: str | None

Actual text is a textual replacement for the content.

Returns:

Optional[str]

Raises:

StateError – if the object or the owning document has already been closed

property language: str | None

A language identifier specifying the natural language for all text in the node

Returns:

Optional[str]

Raises:

StateError – if the object or the owning document has already been closed

property abbreviation: str | None

The expanded form of an abbreviation or an acronym

Returns:

Optional[str]

Raises:

StateError – if the object or the owning document has already been closed

property bounding_box: Rectangle | None

Bounding box for contents - should only be set for Figure, Formula and Table

Returns:

Optional[pdftools_toolbox.geometry.real.rectangle.Rectangle]

Raises:

StateError – if the object has already been closed