Document to XML Converter

 

Home ANSI X12 EDIFACT SAP IDOC Download XML Links

Welcome to Suli Ding's Web site!

 

Electronic Commerce has bee used by many companies. The heart of E-Commerce is the interchange of business documents within computers. In order for computers to recognize and correctly process the documents, there are standards for these documents like ANSI X12, UN EDIFACT or some proprietary standards.

The fast way to move these e-commerce documents to XML is to build a converter utility program that can convert the existing standard documents onto XML. The converter utility is not a long term solution but it will save company time and money to go XML while gaining knowledge and experience of using XML.

I developed a small program "doc2xml" on my own. It can be used to convert documents to valid XML documents. With this program's help, EDI(ANSI, EDIFACT), SAP, NACHA, etc. documents can be converted to valid XML document.

 

"doc2xml" is written as a command line utility available on Windows and Unix platforms. It takes an input template file and an input data file, to produce a XML document . The input template file defines the input data record/segment layout, how and where to extract information from the input data file for each field and what is the XML tag name to used for the extracted information. The program recognize two template file formats. One is specified via XML tag and the other is the keyword format. The output is written to a file (optionally be written to a socket connection). By default, the output file name is the input file name with ".XML" appended to it. User can specify an output file name.

 

The syntax to execute program "doc2xml" is:

doc2xml -ttemplate|-xXML_template [-e] [-oOutput file name] input_file

"-t" to specify the record template file name to use
"-x" to specify the XML template file name to user
        (either "-t" or "-x" must be used to specify a template file)
"-e" is optional. If it is specified, empty element will not be produced
"-o" is optional. If it is not specified, the output file name is constructed by appending ".XML" to the input data file name

 

In the template file, a header defines a group of segments/records. With the header itself, it defines the record/segment identifier (used to locate the record/segment), record separator (segment terminator), field/element separator and sub-field/sub-element separator. Along within the header, other attributes for the header tag may be defined.

A set of records/segments may be defined using multiple RID identifiers. Each RID identifies a specific record/segment. More fields/elements are defined with each RID. There is a way to group the fields/elements within a record/segment.

HEADER=ISA, SEP=p4, SUB=':', TER=p106, Date=f9,Time=f10,TEST=f15,ACK=f14, Version=f12
RID=ISA, Sender=[ISA05=f5,ISA06=f6], Receiver=[ISA07=f7,ISA08=f8], ISA04=f4, ISA13=f13
TRAILER=IEA

The above is a example record format template. The header defines a ANSI X12 segment identifier is "ISA", element separator is in position 4, segment terminator is in position 106, attribute Date, Time, TEST, ACK and Version are going to be extracted from field 9, 10, 15, 14 and 12.

The RID also points to the same segment (identifier is "ISA"). Group are used for "Sender" and "Receiver".

The following document with the above template file

ISA*00*850 *01*PASSWORD *12*TESTUSER001 *08*TESTUSER002 *YYMMDD*HHMM*U*00304*SSNNRRFF0*0*P*:
...
IEA*1*SSNNRRFF0

will generate a XML output like

<ISA Date="YYMMDD" Time="HHMM" TEST="P" ACK="0" Version="00304">
    <Sender>
        <ISA05>12</ISA05>
        <ISA06>TESTUSER001</ISA06>
    </Sender>
    <Receiver>
        <ISA07>08</ISA07>
        <ISA08>TESTUSER002</ISA08>
    </Receiver>
    <ISA04>PASSWORD</ISA04>
    <ISA13>SSNNRRFF0</ISA13>
</ISA>

The following XML template file will produce the same XML output as above

<TEMPLATE>
<HEADER MATCH="ISA" TLR="IEA" SEP="p4" SUB="':'" TER="p106" Date="f9" Time="f10" TEST="f15" ACK="f14" Version="f12">
    <RID match="ISA">
        <Sender group="1">
            <ISA05>f5</ISA05>
            <ISA06>f6</ISA06>
        </Sender>
        <Receiver group="1">
            <ISA07>f7</ISA07>
            <ISA08>f8</ISA08>
        </Receiver>
        <ISA04>f4</ISA04>
        <ISA13>f13</ISA13>
    </RID>
</HEADER>
</TEMPLATE>

 

It can convert not just EDI documents to XML as it shows in the above example. You can see an example by click the SAP IDOC tab on the left side or at the bottom of this page. Here I would like to show another example using "doc2xml" to convert a semicolon-delimited file to XML. (The delimiter is not limited to semicolon but it can be any character you specified in the template file for the HEADER record) The data in this example is the one used by DataChannel for their XML Generator.

Data to be converted to XML

1;Juergen;Modre;Reisdorf;6;A-9371;Brueckl;Kaernten;Austria;jmodre@edu.uni-klu.ac.at 2;Norbert;Mikula;108th NE Suite 400;155;98004;Bellevue;Washington;U.S.A.;norbert@datachannel.com

The template file contains the following records

<HDTXT=<?xml version="1.0"?>,
  <MyEmployeeDB>,
  <Title>Employees in the employeeDB as of 10 Feb. 1998</Title>,
  <Hint>This is an easy XML data exchange example</Hint>
TLTXT=</MyEmployeeDB>
#
Header=<Employee>, Name=f2, Country=f8, sep=';', sub=':', term=13
RID=, SSN=f0, Name=[First=f1, Last=f2], E-mail=f9, ,
   Address=[Mail_stop="MNB3C Building #3",Street=f3,Streetnr=f4,,
   City=f6,State=f7,Zip=f5, Country=f8]
trailer=

Comma at the end of a line is used as continuation mark.

The last name and the country are also appeared in the attribte list for Employee. The first name and the last name are grouped togather under tag <Name>. Element Mail_stop is added to the address group for every record it process to show the capability of adding some constant elements.

The output XML document will look like

<?xml version="1.0"?><MyEmployeeDB><Title>Employees in the employeeDB as of 10 Feb. 1998</Title> <Hint>This is an easy XML data exchange example</Hint> <Employee Name="Modre" Country="Austria">
  <SSN>1</SSN>
  <Name>
    <First>Juergen</First>
    <Last>Modre</Last>
  </Name>
  <E-mail>jmodre@edu.uni-klu.ac.at</E-mail>
  <Address>
    <Mail_stop>MNB3C Building #3</Mail_stop>
    <Street>Reisdorf</Street>
    <Streetnr>6</Streetnr>
    <City>Brueckl</City>
    <State>Kaernten</State>
    <Zip>A-9371</Zip>
    <Country>Austria</Country>
  </Address>
</Employee>
<Employee Name="Mikula" Country="U.S.A.">
  <SSN>2</SSN>
  <Name>
    <First>Norbert</First>
    <Last>Mikula</Last>
  </Name>
  <E-mail>norbert@datachannel.com</E-mail>
  <Address>
    <Mail_stop>MNB3C Building #3</Mail_stop>
    <Street>108th NE Suite 400</Street>
    <Streetnr>155</Streetnr>
    <City>Bellevue</City>
    <State>Washington</State>
    <Zip>98004</Zip>
    <Country>U.S.A.</Country>
  </Address>
</Employee>
</MyEmployeeDB>

An equivlent XML template file will look like

<?xml version="1.0"?>
<TEMPLATE>
<HDTXT>
<!--<?xml version="1.0"?>
  <EmployeeDB>
  <Title>Employees in the employeeDB as of 10 Feb. 1998</Title>
  <Hint>This is an easy XML data exchange example</Hint>
-->
</HDTXT>
<TLTXT>
<!--</EmployeeDB>
-->
</TLTXT>
<!--
  #
  # semi-colon delimited documents
  #
-->
<HEADER MATCH="" ALTTAG="Employee" SEP="';'" SUB="':'" TER="13" Name="f2" Country="f8">
    <RID match="">
    <SSN>f0</SSN>
    <Name group="1">
      <First>f1</First>
      <Last>f2</Last>
    </Name>
    <E-mail>f9</E-mail>
    <Address group="1">
      <Mail_Stop>MNB3C Building #3</Mail_Stop>
      <Street>f3</Street>
      <Streetnr>f4</Streetnr>
      <City>f6</City>
      <State>f7</State>
      <Zip>f5</Zip>
      <Country>f8</Country>
    </Address>
  </RID>
</HEADER>

 

There are four ways to specify where and how to get the separators and terminator within the header

  1. hard coded within the header
  2. extracted from the input record/segment by position
  3. extracted from the input record/segment by keyword and position
  4. extracted from the input record/segment by value

There are five ways to specify where and how to get the data from the identified record/segment

  1. hard coded the value
  2. extract from input record/segment by field (element)
  3. extract from input record/segment by sub-field (sub-element)
  4. extract from input record/segment by keyword
  5. extract from input record/segment by position for length of characters

 

 

This page was last updated on 1/5/1999.

ANSI X12 EDIFACT SAP IDOC Download XML Links