Package 'Rpoppler'

Title: PDF Tools Based on Poppler
Description: PDF tools based on the Poppler PDF rendering library. See <http://poppler.freedesktop.org/> for more information on Poppler.
Authors: Kurt Hornik [aut, cre]
Maintainer: Kurt Hornik <[email protected]>
License: GPL-2
Version: 0.1-3
Built: 2024-09-19 06:53:03 UTC
Source: CRAN

Help Index


PDF document reference

Description

Create a reference to a Portable Document Format (PDF) file for use in subsequent information extraction from the file.

Usage

PDF_doc(file)

Arguments

file

A character string giving the path to a PDF file.

Value

A reference to a PDF file (external pointer object).

Examples

file <- system.file(file.path("doc", "Sweave.pdf"), package = "utils")
doc <- PDF_doc(file)
## Can now use the reference for information extraction, avoiding
## the creation of new PopplerDocument objects when doing so.
PDF_info(doc)
PDF_fonts(doc)

PDF font information

Description

Obtain the fonts used in a Portable Document Format (PDF) file and further information about these fonts.

Usage

PDF_fonts(file)

Arguments

file

A character string giving the path to a PDF file, or an object of class "PDF_doc" giving a reference to a PDF file.

Value

A data frame inheriting from PDF_fonts (which has a useful print method), with the following variables:

name

the full name of the font (character)

type

the font type (Type 1, Type 3, etc.; character)

file

the file name of the font (character; empty if the font is embedded)

emb

whether the font is embedded in the PDF file or not (logical)

sub

whether the font is a subset of another font (logical)

Examples

file <- system.file(file.path("doc", "Sweave.pdf"), package = "utils")
PDF_fonts(file)

PDF document information

Description

Extract document information from a Portable Document Format (PDF) file.

Usage

PDF_info(file)

Arguments

file

A character string giving the path to a PDF file, or an object of class "PDF_doc" giving a reference to a PDF file.

Value

An object of class PDF_info (which has useful format and print methods), containing the information in the PDF Info dictionary (title, subject, keywords, author, creator, producer, creation date, modification date) as well as the number of pages and the page sizes, whether the document is optimized (linearized), and the PDF version it uses.

Examples

file <- system.file(file.path("doc", "Sweave.pdf"), package = "utils")
PDF_info(file)

PDF text extraction

Description

Extract text from a Portable Document Format (PDF) file.

Usage

PDF_text(file)

Arguments

file

A character string giving the path to a PDF file, or an object of class "PDF_doc" giving a reference to a PDF file.

Value

A character vector with the extracted texts for each page.

Examples

file <- system.file(file.path("doc", "Sweave.pdf"), package = "utils")
PDF_text(file)