Text Lib
Recent News

Mar. 28, 2013
Responding to numerous requests, MS Outlook parser was released. more

Oct. 16, 2009
New OpenDocument family documents parsers are availble now.
more

Aug. 31, 2007
New MS Office 2007 documents parsers has been added to the collection.
more

Nov. 21, 2005
Docs2text 2.0 component released. Supported document formats are MS Word, MS Excel, MS PowerPoint, rtf, Adobe Acrobat PDF.
more

Our partner

TEXTOLUTION
Full Text Indexing and Retrieval library with Approximate Search.

docs2text component is able to extract text from various formats - currently it's MS Word, MS Excel, MS PowerPoint, Adobe PDF, MS Word 2007, MS Excel 2007, MS PowerPoint 2007, MS Outlook and rtf documents. In order to fit to your needs they can be delivered to our customers as bundled or standalone components.

Demo version, which is available on the Download page, contains all of the libraries available at this moment and provided as ActiveX component with samples for the most popular development languages/environments.

 

The library (ActiveX) can be used with with any .NET or x64 application.

 


 

Use the links below to find more information about the libraries, docs2text consist of.

  • pdf2text - extracts plain text from PDF documents;
  • odf2text - extracts text from OpenDocument format documents (.odt, .ods, .odp);
  • doc2text - extracts text from MS Word documents (this component is also able to process rtf documents as well);
  • xls2text - converts MS Excel worksheets to plain text;
  • ppt2text - extracts plain text from MS PowerPoint slides;
  • pst2text - extracts messages and other content from MS Outlook storages;