Global Companies Relying on Business Critical xDoc Based Applications... 

 

 
 

CambridgeDocs xDoc HTML to XML Conversions

Questions? Download xDoc Pro Download Documentation
xDoc uses the Java HTML Driver to read in all of your HTML content, including content auto-generated from database applications, HTML content created from Microsoft Word, content styled with CSS files, content pulled down from internal and external websites, etc.  The Java HTML Driver is the most sophisticated and complete means of processing HTML files available, and is used by xDoc as part of its integrated multi-step process for transforming content.

The Java HTML Driver reads in the HTML content -- even malformed HTML -- and outputs stylistic XML, which you can then use to transform the content into another XML schema or DTD like DocBook or DITA as shown below.  This stylistic XML gives you vastly easier programmatic access to the HTML content, including its formatting characteristics and document structure.  No longer do you have to parse unwieldy HTML directly -- instead, you can use much more sophisticated Java DOM capabilities to get to the HTML content.

Java HTML Driver Benefits:
Parse and process HTML files on Windows, Solaris, and Linux machines
Convert HTML files into standard and custom XML schemas and DTDs, such as DITA and DocBook
Pull down HTML content from external and internal websites
Index HTML content, both from intranet sites as well as external site like government regulatory bodies
Republish HTML content in PDF and RTF formats

The list below provides a sample of the items that the xDoc Java HTML Driver provides you with the ability to identify, parse and process in your HTML content:

Java HTML Driver Features:
Paragraph Data and its stylistic characteristics, including original CSS class, style, font, font-size, and weight
Tables of Data, including rows, and cells, along with cell characteristics like alignment, background-color, and border
Images, which are either copied from the file location and given a relative-path href attribute, or are left "as-is" and given an absolute-path href attribute
Ordered and Unordered Lists, including list item labels and indentation
Links