TextLib
Recent News

Oct. 16, 2009
New OpenDocument family documents parsers are availble now.
more

Aug. 31, 2007
New MS Office 2007 documents parsers has been added to collection.
more

Nov. 21, 2005
Docs2text 2.0 component released. Supported document formats are MS Word, MS Excel, MS PowerPoint, rtf, Adobe Acrobat PDF.
more

Our partner

TEXTOLUTION
Full Text Indexing and Retrieval library with Approximate Search.

MS Office® family

MS Word logo
MS PowerPoint logo
MS Excel logo

MS Word document format is a proprietary binary format used by MS Word® being de facto standard at office document's management it became very popular however its nondocumented structure makes it almost impossible to correctly read it by a third-party applications.
docs2text component/library is able to read MS Word 97 - 2003 documents without having MS Office/Word installed delivering high accuracy and incredible processing speed.
learn more


MS PowerPoint® format is a popular presentations format using for creating a stunning slide shows and presentations.
docs2text can extract text objects from MS PowerPoint presentations without MS PowerPoint installed.
learn more


MS Excel document format represents a popular spreadsheets storage. It can contain text, formulas, charts, images, complex calculations
As all MS Office binary formats MS Excel format doesn't make an exception and is nondocumented as well and as you may notice docs2text can easily read MS Excel's spreadsheets without any applications/components installed providing high accuracy, unbeatable performance and extreme flexibility.
learn more

NEW!!! MS Word 2007 documents (docx) are also supported now.

Adobe Acrobat® PDF

PDF logo

PDF (stands for Portable Document Format) is developed by Adobe Systems Inc. for displaying/printing documents on a different systems and devices keeping its layout unchanged. It can contain text, images, movies, sounds, forms etc.
While PDF format is documented it isn't a trivial task to develop a reliable parser to process PDF documents. Vast majority of the current solutions on the market is based on the open source project xPDF with all its con's and pro's. According to our customers survey, pdf2text is up to, as unbelievable as it sounds, 100 times faster than any text from PDF extraction solution available on the market.
learn more

OpenDocument Format family

ODT logo
ODS logo
ODP logo
The OpenDocument Format (ODF) is an open cross-platform file format for office documents (text documents, spreadsheets, drawings, presentations and more), developed at OASIS, an independent, international standards group. Open means that any developer can learn its details and create an application that can read and write this format. ODF is a native file format for OpenOffice.org 2.0+, StarOffice 8+, IBM Workplace, AbiWord, KOffice 1.5+ and many other applications (MS Office can also read and write it).


In addition to being an OASIS standard, it is published as an ISO/IEC international standard, ISO/IEC 26300:2006 Open Document Format for Office Applications (OpenDocument) v1.0.


ODF is being adopted by many governments worldwide as a required file format for publishing and accepting documents.


Our OpenDocument parser is designed to convert OpenDocument's documents to text or extract any other necessary data and can handle the following document extentions:

  • .odt — for word processing (text) documents;
  • .ods — for spreadsheets;
  • .odp — for presentations;
learn more