Showing posts with label Saxon. Show all posts
Showing posts with label Saxon. Show all posts

20210430

Merging XML files togheter with XSL using Saxon-B

Sometimes source system produces thousands of source files. If you open and process each of them individually, it takes plenty of time. Sometimes it is must faster to merge files together and process larger file.

 Here are example. Source system produces tons of XML-files about comics. Single XML file looks like this:

<?xml version="1.0" encoding="UTF-8"?>

<comics>
    <comic>
        <name>Moomin</name>
        <authors>
           <author>Tove Jansson</author>
           <author>Lars Jansson</author>
        </authors>
        <started>19470101</started>
        <ended>19750101</ended>
        <publisher>Associated Newspapers</publisher>
    </comic>
</comics>

File merging XSL looks like this.


<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output name="XML-format" method="xml" indent="yes" encoding="utf-8"/>

    <!-- Patch size. We don't want put all files togheter. We want to have chunks of controllable size. In this example size 5 is okay. --><xsl:param name="patchSize" select="5"/>

    <!-- Source path variable with filemask. -->
    <xsl:variable name="sourceFiles" select="collection('file:///C:/comic/input/?select=comic*.xml')"/>

    <xsl:template match="/">

        <!-- group-by with patch size -->
        <xsl:for-each-group select="$sourceFiles/comics" group-by="(position() - 1) idiv $patchSize">
           <xsl:variable name="patchID" select="position()"/
              <xsl:result-document format="XML-format" href="Comics_{$patchID}.xml"> <!-- Output file name-->
                <comics>
                   <xsl:for-each select="current-group()">                                <xsl:sequence select="*"/>
                   </xsl:for-each>
                </comics>
              </xsl:result-document>
        </xsl:for-each-group>

    </xsl:template>
</xsl:stylesheet>


We will save xsl file as: C:/comic/merge_comic_files.xsl 

Input folder will be: C:/comic/input

Output folder will be: c:/comic/output/


Let's open command prompt. Give this kind of command:

> java -jar "c:/saxonb9-1-0-8j/saxon9.jar" -s:"c:/comic/input/" -o:"c:/comic/output/" -xsl:"c:/comic/merge_comic_files.xsl"

Command will use Java JRE with -jar parameter. Second parameter is location of Saxon-B processor.
-s parameter is source, -o is parameter for output and -xsl is path of xsl file.

You can download example files here.

When scheduled BAT-files cascades

Server was acting slowly. What have happened? When I opened Task Manager. There were thousands of cmd-programs open. I noticed there were so...