Text Lib
Recent News

Mar. 28, 2013
Responding to numerous requests, MS Outlook parser was released. more

Oct. 16, 2009
New OpenDocument family documents parsers are availble now.
more

Aug. 31, 2007
New MS Office 2007 documents parsers has been added to the collection.
more

Nov. 21, 2005
Docs2text 2.0 component released. Supported document formats are MS Word, MS Excel, MS PowerPoint, rtf, Adobe Acrobat PDF.
more

Our partner

TEXTOLUTION
Full Text Indexing and Retrieval library with Approximate Search.

Check also odf2text, doc2text, xls2text, ppt2text and pst2text

pdf2text

pdf2text is a component/library for extracting text from Adobe Acrobat PDF documents.

 

Below is a short list of the key features of pdf2text:

  • doesn't require Adobe Acrobat to process documents;
  • isn't based on xPDF sources;
  • fastest processing speed - up to 100 times faster than its competitors as our customers independent benchmarkings show (see the chart below);
  • support of multilanguage documents, including Asian text (CJK);
  • support of rotated pages;
  • support of PDF forms;
  • read password protected PDFs;
  • read encrypted documents, including AES encrypted.
  • advanced text preprocessing, including shadow or duplicated text removal, original document layout restoration etc.

 

Test case included 129 randomly collected PDF documents (overall size 65 MB), only applications successfully converted all documents appear on the chart. TextLib pdf2text completed the test within 19 seconds which is 12 times better than its fastest competitor.
ConvertDoc, Easy PDF To Text, VeryPDF, Gemini, IntraPDF, Midas Extractor, PDF Manager PDF2TXT, Glenn Alcott PDF Converter, PDF Ripper

Still have questions - contact us and we'll be happy to help you.


Proceed to Download page to download pdf2text demo as the part of docs2text demo.