Thursday, August 30, 2012

Converting Edifact to XML Using Java - Sample Implementation

Here is a sample implementation of Java implementation of converting from Edifact format to XML and vice versa. Conversion from XML to Edifact is straight forward while the opposite is not. Great thing is that this code will work irrespective of how many messages are there at any level.
You can download(for educational purpose only) this implementation here.
For comments pl email me at ddn.job --at-the-rate g-mail.com.
Here is the description of how the attachment is composed and code created:
-------------------------------------------------------------------
Directory: DB
This contains formatted file containing necessary information for converting from Edifact to XML
and vice versa
CODELT: Coded data elements
COMPT: Composite data elements
ELEMT: Simple elements
SEGMT: Segment descriptions
These have been generated by the perl formatting programs in PerlDBCreate. These files' contents are from
UN 2001B directory and from the 2001B service directory and code lists. These need to be downloaded
separately from the unece.org and gefeg.com websites for PerlDBCreate inputs.
---------------------------------------------------------------
Directory: PerlDBCreate
This contains perl formatting programs especially changed for Y2001B directories.
For different directories create new programs in new directory and let the main perl program
BuildEdifactDB.pl call the programs in it.
When BuildEdifactDB.pl is run it creates the files in DB directory which will act as input
for the programs in Parser directory.
---------------------------------------------------------------------
Directory: Parser
This contains all the source to:
1) Load edifact infomation in data structures (src\BuildEdifactDirectory)
2) Format Edifact to XML (src)
A) Parse the input Edifact document
B) Convert into XML
3) Format XML to Edifact (src\XMLtoEdifactConversion)
For Format Edifact to XML (2) the starting class is Main.java.
It first defines the Edifact message structure in terms of trees/nodes and child/parents.
Please be very careful because debugging will be difficult if any link is invalid or broken.
The it calls BuildEdifactDirectory to load the Edifact data structures.
Next it reads the inputs file and parses it with the parser. If unexpected input is seen then
it emits an error.
It should be noted that the output XML document looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<segment name='interchange.header'>
<composite name='syntax.identifier'>
<element name='syntax.identifier'>UNOA</element>
<element name='syntax.version.number'>3</element>
</composite>
<composite name='interchange.sender'>
<element name='interchange.sender.identification'>5400141000009</element>
<element name='identification.code.qualifier' uncl="Not Found">14</element>
</composite>
........
If the element is UNCL/UNSL the uncl attribute will contain the code list description.
If not description is found then it'll contain "Not Found"
Although all odd elements are coded but I'm not sure whether the code list description of
all such odd elements will be found. I've tried to ask this question from UN but they could not
clarify it.
Groups always start with <group> and end with </group> tag
Also I've taken care to convert all the xml reserved characters to & equivalents so that
no problem will arise.
-------------------------------------------------------------------------------
Directory: Parser/XML
Contains the XML DTD and XSD of output xml file generated by parser for validation.
-------------------------------------------------------------------------------
Directory: "Edifact e-mail conversations"
These mail exchanges contain some of my Edifact queries and answers if any. At this time of leaving I'm clear about
90% of Edifact.
-------------------------------------------------------------------------------
Directory : EDI_Sample
This I'd received from Abhijit and I was running my code on this input only. I've already created the tree
structure for this input only. However these input/doc are not correct and are partial only.
The word doc describing the *.edi input file does not correlate with the input. So correct file needs to be given.
----------------------------------------------------------------------------------
Directory: SampleOutput
Contains the output generated from the input *.edi file present in the EDI_Sample directory mentioned above.
Just like input edifact file, this file is unformatted one. This software will not generated formatted xml because
when formatting preceding/trailing blank spaces may be added/removed. But Edifact allows only trailing blanks spaces
to be removed. This will create problem when converting formatted XML document back to Edifact.
-----------------------------------------------------------------------------------
File: Parser\src\BuildEdifactDirectory\EdifactConstants.java
No constant has been hardcoded except for in EdifactConstants.java. For example if non default segment separator is
used in Edifact then it must be changed in this file. All hard codings must be done only in this file.

---------------------------------------------------------------------------------------
Process to Run ( Only for Edifact to XML)
1) Download the required Edifact directories
2) Create a new directory( say 2002C) name in PerlDBCreate and copy all the programs in 2001B
3) Change all the programs so that all the elementes are correctly extracted in all the files as
discussed in DB directory above
4) Create the files as in DB directory and update their parent directory in EdifactConstants.DATABASE_PATH
constant
4) Create correct parse tree describing the Edifact format in Main.java. Note that we can describe
any sort of tree so the edifact file can contain even multiple messages to any level. However groups
in Edifact file must be implicit(default) ones as in the sample input file given. Explicit group
handling needs to be added.
5) Review EdifactConstants.java file for Edifact constants
6) Change the input Edifact file path in EdifactFile class
7) Execute Main.java
---------------------------------------------------------------------------------------
Other Issues
1) This software is not optimized for production use. For example there are some place where I've created new String where
StringBuffer should have been used for faster manipulation.
2) Also the MessageParser's errors when non-matching input needs to be made useful/meaningful messages.
3) This software in no way does any sort of Edifact format-verification.
---------------------------------------------------------------------------------------

SUGGESTIONS
1) Use XMLSpy/XMLPad etc to xml editing and manipulation
2) For generating PDF documents use the only standard XML-FO and don't use the non-standard itext library.
3) For xml-fo formatter download the free apache executable