Global Companies Relying on Business Critical xDoc Based Applications... 

 

 
 

CambridgeDocs xDoc Transformation Engine

The xDoc Transformation Engine, or XTE for short, is the core technology underpinning the xDoc Product Family. 

As highlighted in the xDoc architecture, the XTE is a J2SE 1.4.2 application that is either called by:

  1. a .NET graphical user interface, when you use the xDoc Converter Desktop
  2. a Servlet, or a set of Java / .NET APIs, when you use the xDoc Server

As shown in the image to the right, the XTE converts and publishes documents in multiple steps, which gives you a tremendous amount of granularity and flexibility in how you transform your content. 

Because of the XTE's granularity, you can examine the output of one step either visually or programmatically.  And because of the XTE's flexibility, you can modify how the next step operates accordingly.  


Here is a brief discussion of each of the steps the XTE takes in transforming a document:
  1. The XTE uses the appropriate Java Preprocessing Driver to open the binary input file, where the Java PDF Driver is used to open Adobe PDF files, the Java Word Driver is used to open Microsoft Word files, etc.  The Java Preprocessing Driver then converts the binary content into preprocess XML, or ppXML for short.  ppXML is a stylistic XML rendering of the document -- that is, ppXML tags reflect stylistic characteristics, like so:

<PARAGRAPH Style="Normal" align="left" emphasis-bold="false" emphasis-italic="false" emphasis-underline="false" font="Times New Roman" font-size="12.0" number="196" widow-control="true">The CambridgeDocs products can be used to convert a single document at a time on the desktop, or to convert a large number of documents on the server. The CambridgeDocs products are increasingly geared towards server usage, which are called from the </PARAGRAPH>

  1. Once the content is in ppXML format, the xDoc Rules Engine then reads it in and applies a set of rules against it.  These rules can manipulate XML tags, attributes, and text content, and provide you with the means of identifying that a certain paragraph is the title of the document, another paragraph is a section heading, a third paragraph is an address, etc.  The output of the xDoc Rules Engine is intermediate XML.
     
  2. This intermediate XML can then be loaded into a Java Post-Processing Driver.  The XSLT Post-Processing Driver can convert the data into another XML or HTML format, the PublishRTF Post-Processing Driver can convert the content into Rich Text Format (RTF) -- which can be opened in Microsoft Word -- or the PublishPDF Post-Processing Driver can convert the intermediate XML into Adobe PDF content.

The XTE provides the output from each step, as illustrated in the example screenshot below, so that you can readily understand, debug, and modify the transformation to suit your needs.