Sometimes
source system produces thousands of source files. If you open and process each
of them individually, it takes plenty of time. Sometimes it is must faster to
merge files together and process larger file.
Here are
example. Source system produces tons of XML-files about comics. Single XML file looks
like this:
<?xml
version="1.0" encoding="UTF-8"?>
<comics>
<comic>
<name>Moomin</name>
<authors>
<author>Tove
Jansson</author>
<author>Lars
Jansson</author>
</authors>
<started>19470101</started>
<ended>19750101</ended>
<publisher>Associated Newspapers</publisher>
</comic>
</comics>
File merging XSL looks like this.
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output name="XML-format" method="xml" indent="yes" encoding="utf-8"/>
<!-- Patch size. We don't want put all files togheter. We want to have chunks of controllable size. In this example size 5 is okay. --><xsl:param name="patchSize" select="5"/>
<!-- Source path variable with filemask. -->
<xsl:variable name="sourceFiles" select="collection('file:///C:/comic/input/?select=comic*.xml')"/>
<xsl:template match="/">
<!-- group-by with patch size -->
<xsl:for-each-group select="$sourceFiles/comics" group-by="(position() - 1) idiv $patchSize">
<xsl:variable name="patchID" select="position()"/
<xsl:result-document format="XML-format" href="Comics_{$patchID}.xml"> <!-- Output file name-->
<comics>
<xsl:for-each select="current-group()"> <xsl:sequence select="*"/>
</xsl:for-each>
</comics>
</xsl:result-document>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
We will save xsl file as: C:/comic/merge_comic_files.xsl
Input folder will be: C:/comic/input
Output folder will be: c:/comic/output/
Let's open command prompt. Give this kind of command:
> java -jar "c:/saxonb9-1-0-8j/saxon9.jar" -s:"c:/comic/input/" -o:"c:/comic/output/" -xsl:"c:/comic/merge_comic_files.xsl"
Command will use Java JRE with -jar parameter. Second parameter is location of Saxon-B processor.
-s parameter is source, -o is parameter for output and -xsl is path of xsl file.
You can download example files here.