|
BOSTON, MA –
March 15, 2006 - CambridgeDocs (www.cambridgedocs.com)
today announced the release of Version 2.01 of
its xDoc PDF-XML Converter and integration of it
into its xDoc Converter Desktop and Server
products, significantly enhancing an already
powerful platform for extracting document
content to meaningful XML.
The PDF file format is widely used because it
combines content security with high-fidelity
document rendering. Its drawback is that the
very same mechanisms that protect PDF source
content also result in it being exceptionally
difficult to update, index or share with other
document systems as anything other then closed,
un-interpretable PDF files.
CambridgeDocs’ PDF-XML Converter overcomes these
issues by enabling PDF content to be converted
to XML. As XML, the previous PDF content can be
meaningfully used for indexing by search
engines, XML repositories and content management
systems -- for example allowing it to be stored
as chapters, sections, tables or cells within
any repository for fast, easy and accurate
re-use.
“Integration of the PDF-XML Converter into the
java based xDoc Server gives us a previously
un-reachable degree of flexibility in managing
our clients PDF content,” says Spencer Ewald,
President of NXTBook Media. “As we extend our
client’s reach into their customer base by
presenting their content with the look and feel
of actual magazine and catalog pages, we are now
able to provide high-lighted search results
within the actual images of the original
content, combining meaningful access to PDF data
as well as the layout the designers originally
had in mind.”
The xDoc PDF-XML Converter extracts PDF content
to XML and provides best-of-breed functionality
for enabling conversion that yields:
• Stylistic XML,
including format, layout and content information
• Extraction of financial data
• Organization of related XML “chunks”, such as
financial tables
• Compatibility with existing target XML schemas
or DTD’s, such as DocBook or DITA
• Conversion to HTML/XHTML, with visual
information than surpasses even Google’s “view
as HTML” functionality
• Conversion to simple text
Version 2.01 adds the PDF-XML Converter as a
special module in the xDoc Converter Desktop
2.01 platform and includes sample conversions of
PDF documents into a variety of XML formats,
such as DocBook and DITA. The release also adds
a new and improved user interface, called the
TableDef interface for extracting financial data
using positioning and textual clues.
The integration of the PDF-XML Converter into
the xDoc Converter enables easy access to its
functionality by consolidating download,
installation and licensing processes. It also
provides access to xDoc’s rich Visual Mapping
tool and works with xDoc’s Adobe® Acrobat®
plug-in. The PDF-XML Conversion functionality is
available for download now at
www.cambridgedocs.com/downloads.htm.
About CambridgeDocs
CambridgeDocs is a leader in the
emerging market for XML-based content
integration. This market deals with
the integration of legacy content with
new XML-based systems (e.g. Content
Management, Enterprise Information
Portals, EAI, and Web Services) and
standards (e.g. DocBook, DITA, HRXML,
RIXML, FPML, NewsML, or any custom XML
schema/DTD’s, etc.).
Towards this end, CambridgeDocs provides
a technology platform & services for
taking existing unstructured and
semi-structured internal and external
content (e.g. MS Word, HTML, PDF, Quark,
etc.), and transforming it into
"meaningful XML". Once transformed, the
content can be made available for
delivery through XML-based Web Services,
classified and indexed within Enterprise
Information Portals, and aggregated,
assembled and published in multiple
different formats including support for
wireless and mobile devices.
# # #
|
|
|