 |
|
|
Aug. 31, 2007
New MS Office 2007 documents parsers has been added to collection.
more
Nov. 21, 2005
Docs2text 2.0 component released. Supported document formats are MS Word, MS Excel, MS PowerPoint, rtf, Adobe Acrobat PDF.
more
|
|
|
TEXTOLUTION
Full Text Indexing and Retrieval library with Approximate Search.
|
| Check also doc2text, xls2text and ppt2text
 |
pdf2text
pdf2text is a component/library for extracting text from Adobe Acrobat PDF documents.
Below is a short list of the key features of pdf2text:
- doesn't require Adobe Acrobat to process documents;
- isn't based on xPDF sources;
- fastest processing speed - up to 100 times faster than its competitors as our customers independent researches shows (see the chart below);
- support of multilanguage documents, including Asian text (CJK);
- support of rotated pages;
- read password protected PDFs;
- advanced text preprocessing, including shadow or duplicated text removal, restoration original document layout etc.
Test case included 129 randomly collected PDF documents (overall size 65 MB), only applications that was able to convert all documents appears on the chart. TextLib pdf2text completed the test within 19 seconds which is 12 times better than its fastest competitor.
Still have questions - contact us and we'll be happy to help you.
Proceed to Download page to download pdf2text demo which is part of docs2text demo.
|
 |
|
|