Skip to content

tzkoshi/pdftotext

 
 

Repository files navigation

Extract text from a PDF with pdftotext

Software License Latest Stable Version Build Status SensioLabsInsight Packagist Downloads

This package provides a class to extract text from a pdf. It is more or less a PHP 5.6 compatible copy of spatie/pdf-to-text.

  \Ottosmops\Pdftotext\Extract::getText('/path/to/file.pdf') //returns the text from the pdf

Requirements

The Package uses pdftotext. Make sure that this is installed: which pdftotext

For Installation see: poppler-utils

If the installed binary is not found ("The command "which pdftotext" failed.") you can pass the full path to the _constructor (see below) or use putenv('PATH=$PATH:/usr/local/bin/:/usr/bin') (with the dir where pdftotext lives) before you call the class Extract.

Installation

composer require ottosmops/pdftotext

Usage

Extracting text from a pdf:

$text = (new Extract())
    ->pdf('file.pdf')
    ->text();

You can set the binary and you can specify options:

$text = (new Extract('/path/to/pdftotext'))
    ->pdf('path/to/file.pdf')
    ->options('-layout')
    ->text();

Default options are: -eol unix -enc UTF-8 -raw

License

The MIT License (MIT). Please see License File for more information.

About

extract text from pdf (a PHP wrapper for pdftotext)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • PHP 100.0%