Splitting Large Import Files

If a failure occurs while importing a syndication file, the whole file must be re-imported in order to ensure that no data is lost. Exported syndication files may in some cases be very large (large enough that importing them may take many hours). It is therefore a good idea to split such large syndication files into smaller units before importing them. A failure during import will then be much less costly: only the failed import(s) will need to be repeated.

A tool for splitting large syndication files is included with the Content Engine. It is called XMLSplit. To split a file, enter:

$ java -cp classpath com.escenic.syndication.xml.util.XMLSplit filename number-of-elements

where:

  • classpath is engine-root/lib/xom-1.1.jar:engine-root/lib/engine-syndication-5.1-1.jar.

  • filename is the name of the syndication file you want to split.

  • number-of-elements is the number of second-level elements you want each output file to contain.

For example:

$ java -cp engine-root/lib/xom-1.1.jar:engine-root/lib/engine-syndication-5.1-1.jar \
> com.escenic.syndication.xml.util.XMLSplit import.xml 100

If import.xml contains 950 second-level elements, then XMLSplit will output 10 files: 9 containing 100 second-level elements, and the 10th containing 50.

XMLSplit is in fact a general-purpose tool that can be used to split any large XML file, not just Escenic syndication files.