Splitting Large Import Files
If a failure occurs while importing a syndication file, the whole file must be re-imported in order to ensure that no data is lost. Exported syndication files may in some cases be very large (large enough that importing them may take many hours). It is therefore a good idea to split such large syndication files into smaller units before importing them. A failure during import will then be much less costly: only the failed import(s) will need to be repeated.
A tool for splitting large syndication files is included with the
Content Engine. It is called XMLSplit
. To
split a file, enter:
$
java -cp classpath com.escenic.syndication.xml.util.XMLSplit filename number-of-elements
where:
-
classpath is engine-root
/lib/xom-1.1.jar:
engine-root/lib/engine-syndication-5.1-1.jar
. -
filename is the name of the syndication file you want to split.
-
number-of-elements is the number of second-level elements you want each output file to contain.
For example:
$
java -cp engine-root/lib/xom-1.1.jar:
engine-root/lib/engine-syndication-5.1-1.jar
\>
com.escenic.syndication.xml.util.XMLSplit import.xml 100
If import.xml
contains 950 second-level
elements, then XMLSplit
will output 10 files: 9
containing 100 second-level elements, and the 10th containing 50.
XMLSplit
is in fact a general-purpose tool
that can be used to split any large XML file, not just Escenic
syndication files.