Global Companies Relying on Business Critical xDoc Based Applications... 

 

 
 

CambridgeDocs Announces Version 2.01 of the PDF-XML Converter

Integration of PDF-XML Converter into xDoc Converter platform allows unlocking PDF content in batch and server modes.
 

BOSTON, MA – March 15, 2006  - CambridgeDocs (www.cambridgedocs.com) today announced the release of Version 2.01 of its xDoc PDF-XML Converter and integration of it into its xDoc Converter Desktop and Server products, significantly enhancing an already powerful platform for extracting document content to meaningful XML.

The PDF file format is widely used because it combines content security with high-fidelity document rendering. Its drawback is that the very same mechanisms that protect PDF source content also result in it being exceptionally difficult to update, index or share with other document systems as anything other then closed, un-interpretable PDF files.

CambridgeDocs’ PDF-XML Converter overcomes these issues by enabling PDF content to be converted to XML. As XML, the previous PDF content can be meaningfully used for indexing by search engines, XML repositories and content management systems -- for example allowing it to be stored as chapters, sections, tables or cells within any repository for fast, easy and accurate re-use.

“Integration of the PDF-XML Converter into the java based xDoc Server gives us a previously un-reachable degree of flexibility in managing our clients PDF content,” says Spencer Ewald, President of NXTBook Media. “As we extend our client’s reach into their customer base by presenting their content with the look and feel of actual magazine and catalog pages, we are now able to provide high-lighted search results within the actual images of the original content, combining meaningful access to PDF data as well as the layout the designers originally had in mind.”

The xDoc PDF-XML Converter extracts PDF content to XML and provides best-of-breed functionality for enabling conversion that yields:

• Stylistic XML, including format, layout and content information
• Extraction of financial data
• Organization of related XML “chunks”, such as financial tables
• Compatibility with existing target XML schemas or DTD’s, such as DocBook or DITA
• Conversion to HTML/XHTML, with visual information than surpasses even Google’s “view as HTML” functionality
• Conversion to simple text

Version 2.01 adds the PDF-XML Converter as a special module in the xDoc Converter Desktop 2.01 platform and includes sample conversions of PDF documents into a variety of XML formats, such as DocBook and DITA. The release also adds a new and improved user interface, called the TableDef interface for extracting financial data using positioning and textual clues.

The integration of the PDF-XML Converter into the xDoc Converter enables easy access to its functionality by consolidating download, installation and licensing processes. It also provides access to xDoc’s rich Visual Mapping tool and works with xDoc’s Adobe® Acrobat® plug-in. The PDF-XML Conversion functionality is available for download now at www.cambridgedocs.com/downloads.htm.
 

About CambridgeDocs

CambridgeDocs is a leader in the emerging market for XML-based content integration.   This market deals with the integration of legacy content with new XML-based systems (e.g. Content Management, Enterprise Information Portals, EAI, and Web Services) and standards (e.g. DocBook, DITA, HRXML, RIXML, FPML, NewsML, or any custom XML schema/DTD’s, etc.).

Towards this end, CambridgeDocs provides a technology platform & services for taking existing unstructured and semi-structured internal and external content (e.g. MS Word, HTML, PDF, Quark, etc.), and transforming it into "meaningful XML".  Once transformed, the content can be made available for delivery through XML-based Web Services, classified and indexed within Enterprise Information Portals, and aggregated, assembled and published in multiple different formats including support for wireless and mobile devices.


# # #

 

Terri Slater

 

Riz Virk

Slater Public Relations

 

CambridgeDocs

561-487-7037

 

760-602-1400

tslater@slaterpr.com

 

riz@cambridgedocs.com