CambridgeDocs xDoc Transformation Engine |
The xDoc Transformation Engine, or XTE for short, is the core
technology underpinning the xDoc Product Family.
As highlighted in the
xDoc architecture, the XTE is a J2SE 1.4.2
application that is either called by:
- a .NET graphical user
interface, when you use the xDoc Converter Desktop
- a Servlet, or a set of
Java / .NET APIs, when you use the
xDoc
Server
As shown in the image to the right,
the XTE converts and publishes documents in
multiple steps, which gives you a tremendous amount of granularity
and flexibility in how you transform your content.
Because of the XTE's granularity,
you can examine the output of one step either visually or
programmatically. And because of the XTE's flexibility, you can
modify how the next step operates accordingly. |
Here is a brief discussion of each
of the steps the XTE takes in transforming a document:
- The XTE uses the appropriate Java Preprocessing Driver to
open the binary input file, where the Java PDF Driver is used to
open Adobe PDF files, the Java Word Driver is used to open
Microsoft Word files, etc. The Java Preprocessing Driver
then converts the binary content into preprocess XML, or
ppXML for short. ppXML is a stylistic XML rendering of the
document -- that is, ppXML tags reflect stylistic
characteristics, like so:
<PARAGRAPH Style="Normal"
align="left" emphasis-bold="false" emphasis-italic="false"
emphasis-underline="false" font="Times New Roman"
font-size="12.0" number="196" widow-control="true">The
CambridgeDocs products can be used to convert a single document
at a time on the desktop, or to convert a large number of
documents on the server. The CambridgeDocs products are
increasingly geared towards server usage, which are called from
the </PARAGRAPH>
- Once the content is in ppXML format, the xDoc Rules Engine
then reads it in and applies a set of rules against it.
These rules can manipulate XML tags, attributes, and text
content, and provide you with the means of
identifying that a certain paragraph is the title of the
document, another paragraph is a section heading, a third
paragraph is an address, etc. The output of the xDoc Rules
Engine is intermediate XML.
- This intermediate XML can then be loaded into a Java
Post-Processing Driver. The XSLT Post-Processing Driver
can convert the data into another XML or HTML format, the
PublishRTF Post-Processing Driver can convert the content into
Rich Text Format (RTF) -- which can be opened in Microsoft Word
-- or the PublishPDF Post-Processing Driver can convert the
intermediate XML into Adobe PDF content.
The XTE provides the output from
each step, as illustrated in the example screenshot below, so that
you can readily understand, debug, and modify the
transformation to suit your needs.
|
|
 |