Text Lib
Recent News

Aug. 31, 2007
New MS Office 2007 documents parsers has been added to collection.
more

Nov. 21, 2005
Docs2text 2.0 component released. Supported document formats are MS Word, MS Excel, MS PowerPoint, rtf, Adobe Acrobat PDF.
more

Our partner

TEXTOLUTION
Full Text Indexing and Retrieval library with Approximate Search.

docs2text component is able to extract text from various formats - currently it's MS Word, MS Excel, MS PowerPoint, Adobe PDF, MS Word 2007, MS Excel 2007 and rtf documents. In order to fit to your needs they can be delivered to our customers as bundled or standalone components.

Demo version, which is available in the Download page, contains all of the libraries available at this moment and provided as ActiveX component with samples for the most popular development languages/environments.


Use the links below to find more information about the libraries, docs2text consist of.

  • pdf2text - extracts plain text from PDF documents;
  • doc2text - extracts text from MS Word documents (this component is also able to process rtf documents as well);
  • xls2text - converts MS Excel worksheets to plain text;
  • ppt2text - extracts plain text from MS PowerPoint slides;