ExportDocumentToXml

This method exports the structure and the field data of the current workdoc to an XML file or MSXML object in a predefined format.

This method is called by the Runtime Server option Export to XML with cxml extension. For more information, see "Export to XML" in the Brainware Intelligent Capture Runtime Server Help.

Use the named properties XML_ExportCandidates, XML_ExportWords, and XML_ExportWordChars to configure the export to optionally capture the field candidates, OCR word data and the associated character data.

By default, the method exports all fields and table field columns. Use the XmlExportEnabled and ColumnExportEnable properties to exclude specific fields or table field columns from the XML export.

Whenever possible, the XML element and attribute names correspond to the SCBCdrWorkdoc property names.

An ErrorDescription attribute is only added to an XML element if the corresponding Valid attribute is set to false.

Note: The Components\Tools directory contains the XML schema file Workdoc.xsd, which you can use to validate the exported XML file.

Syntax

ExportDocumentToXml(ByVal vTarget As Variant)
Parameter Description
vTarget

Possible values

  • A string which specifies the filename, including path. Any existing file will be overwritten.

    Note: You can only specify existing directories, the method does not create them.
  • An MSXML 3.0 or MSXML 6.0 object. It is the equivalent of saving the XML file and reparsing it using this object.

Sample Code

The following sample code saves the OCR data, candidates, fields and workdoc structure to an XML file.

pWorkdoc.NamedProperty("XML_ExportWords") = True
pWorkdoc.NamedProperty("XML_ExportWordChars") = True
pWorkdoc.NamedProperty("XML_ExportCandidates") = True
pWorkdoc.ExportDocumentToXml("C:\ExistingFolder\" & pWorkdoc.Filename & ".xml") 

Sample Code

The following sample code saves the XML data to an MSXML2.DOMDocument60 object instead of a file.

' Note: Add reference to Microsoft XML, version 6.0 in the script page 
Dim xmlDoc60 As MSXML2.DOMDocument60 
Set xmlDoc60 = New MSXML2.DOMDocument60
pWorkdoc.ExportDocumentToXml(xmlDoc60) 
' Change xmlDoc60 here
xmlDoc60.documentElement.appendChild(xmlDoc60.createElement("NewNode")) 
' ...
xmlDoc60.Save("xmlDoc60.xml") 
Set xmlDoc60 = Nothing 

XML element definitions

Section Description
DocClass Contains document class information, such as class name, parent class and classification results.
DocFiles Contains the document file structure, such as name and type (CI or Image document.)
DocPages Contains the document page information, such as size and applied rotation.
Words Contains information about the single words in the document, such as word text, page number, and position of the word in pixels.
Characters For a word, contains information about the single characters that compose it, such as character code and position in pixels.

Note: For CI documents, the reported position and confidence values are those for the word.

Fields Contains the workdoc field information, such as name, extracted text, text position and validity.
Candidates For a field, contains all candidate information, such as text, weight and position.

Sample XML

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> 
<Workdoc XML_version="2.0" FileName="01English_US01_STP.wdc"> 
    <DocClass DocClassName="Invoices"> 
        <ParentDocClass DocClassName="Invoices"/> 
        <ClsDocClass ID="1" ClsDocClassName="Invoices" Res="4" Confidence="0"/> 
        <ClsDocClass ID="2" ClsDocClassName="Generic" Res="1" Confidence="1"/> 
        ... 
    </DocClass> 
    <DocFiles DocFileCount="1"> 
        <DocFile ID="0" DocFileName="C:\...\01English_US01_STP.tif" DocFileType="CDRDocFileTypeCroImage"/> 
    </DocFiles> 
    <DocPages DocPageCount="3">
        <DocPage PageNr="0" DocIndex="0" DocPageIndex="0" Width="2464" Height="3508" XRes="300" YRes="300" Rotation="0" ImportedFileName="00000478.tif" ImportedFilePageIndex="0"/>
        <DocPage PageNr="1" DocIndex="0" DocPageIndex="1" Width="2464" Height="3508" XRes="300" YRes="300" Rotation="0" ImportedFileName="00000478.tif" ImportedFilePageIndex="1"/>
        <DocPage PageNr="2" DocIndex="0" DocPageIndex="2" Width="2464" Height="3508" XRes="300" YRes="300" Rotation="0" ImportedFileName="00000562.tif" ImportedFilePageIndex="0"/>
    </DocPages>
    <Lines LineCount="54"/> 
    <Words WordCount="359"> 
    ... 
        <Word ID="3" Page="0" Line="2" Left="1496" Top="149" Width="67" Height="22"> 
            <Text>PAGE</Text> 
            <Characters CharCount="4"> 
                <Char ID="0" Code="P" Confidence="100" Left="1496" Top="150" Width="14" Height="19"/> 
                <Char ID="1" Code="A" Confidence="100" Left="1511" Top="150" Width="16" Height="19"/> 
                ... 
            </Characters> 
        </Word> 
    ... 
    </Words> 
    <Fields FieldCount="88"> 
        ... 
        <Field ID="2" Name="InvoiceNumber" Valid="false" Page="0" Left="1649" Top="219" Width="139" Height="31" ErrorDescription="Invalid invoice number"> 
            <Text>7A6F2</Text> 
            <Candidates CandidateCount="61"> 
                <Candidate ID="0" Weight="1.3563312626" Page="0" Left="1649" Top="219" Width="139" Height="31"> 
                    <Text>7A6F2</Text> 
                </Candidate> 
                <Candidate ID="1" Weight="0.53850805759" Page="0" Left="2337" Top="219" Width="139" Height="30"> 
                    <Text>23013</Text> 
                </Candidate> 
                ... 
            </Candidates> 
        </Field> 
        ... 
        <Field ID="15" Name="LineItems" Valid="true" Page="-1" Left="0" Top="0" Width="0" Height="0"> 
            <Table Valid="true"> 
                <Columns ColumnCount="14"> 
                    ... 
                    <Column ID="4" Name="Description"/> 
                    <Column ID="5" Name="Quantity"/> 
                    ... 
                </Columns> 
                <Rows RowCount="5"> 
                    <Row ID="0" Page="0" Valid="true"> 
                        ... 
                        <Cell Column="4" Left="482" Top="1420" Right="1146" Bottom="1552" Valid="true"> 
                            <Text>CDROM EDITION</Text> 
                        </Cell> 
                        <Cell Column="5" Left="1221" Top="1420" Right="1813" Bottom="1552" Valid="true"> 
                            <Text>1</Text> 
                        </Cell> 
                        ... 
                    </Row> 
                    ... 
                </Rows> 
            </Table> 
        </Field> 
    </Fields> 
</Workdoc>