CambridgeDocs xDoc HTML to XML Conversions |
|
Questions? |
Download
xDoc Pro |
Download
Documentation |
| xDoc uses the Java HTML Driver to read in all of
your HTML content, including content auto-generated from database
applications, HTML content created from Microsoft Word, content styled
with CSS files, content pulled down from internal and external
websites, etc. The Java HTML Driver is the most sophisticated
and complete means of processing HTML files available, and is
used by xDoc as part of its integrated
multi-step process for transforming content.
The Java HTML Driver reads in the HTML content -- even
malformed HTML -- and outputs stylistic XML, which you can then use
to transform the content into another XML schema or DTD like
DocBook or
DITA as shown below. This stylistic
XML gives you vastly easier programmatic access to the HTML content,
including its formatting characteristics and document structure.
No longer do you have to parse unwieldy HTML directly -- instead, you can use
much more sophisticated Java DOM capabilities to get to the HTML
content.
|
Java HTML Driver Benefits: |
 |
 |
 |
 |
 |

The list below provides a sample of the items that the
xDoc Java HTML Driver provides you with the ability to
identify, parse and process in your HTML content:
|
Java HTML Driver Features: |
 |
 |
 |
 |
 |
|