Greg's Blog

helping me remember what I figure out

Changing the XML Parser

| Comments

A while back I stumbled across some information over at Brandon Purcell’s site on how to change the XML parser that CFMX uses. You may at this stage wonder why I would want to do that? Well CFMX uses by default a DOM based parser, which is great if you are working with small XML files, but gets rather memory intensive when you work with larger ones. ION our case we were going to have to work with a 250 Megabyte XML file and DOM parsing would have soon brought our server to it’s knees. SAX parsers are far more suited to parsing large files so that was reason number 1. Number 2 we needed to be able to validate the XML file and I don’t think that this is possible using the DOM based parser. Xerces on the other hand met both of our requirements, being both a SAX based parser and able to validate XML files on the fly.

Below I have reproduced Brandon’s instructions:

  1. Get the Xerces kit at http://xml.apache.org/dist/xerces-j/Xerces-J-bin.1.4.4.zip
  2. extract xerces.jar and place it in the classpath ahead of jrun.jar (for example, in runtime/servers/lib)
  3. add these switches to the JVM arguments in CF Administrator (or in jvm.config for JRun). Note that this is one long line, not three lines:
    • Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser -Djavax.xml.parsers.DocumentBuilderFactory=org.apache.xerces.jaxp.DocumentBuilderFactoryImpl -Djavax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl
  4. restart CFMX (JRun)

It all looks pretty straightforward right? Well I still ran into a few problems, and Brandon was only too kind to answer my questions. Now for starters I got a little confused about the classpath statement, but after a little digging I figured out my development installation of CFMX running as an instance on JRun, that the jrun.jar file was actually situated at <drive letter>:\JRun\lib and that’s where I extracted the xerces.jar file to. Then when it was time to specify the actual classpath I needed to locate jvm.config. This file is located inside of <drive letter>:\JRun\bin. Before editing your file please make a backup of the jvm.config file. Why when I started messing round with this, I didn’t and messed up my install of CFMX and needed to re-install the lot (there’s nothing quite like learning the hard way :)). The JVM classpath info I needed to edit was located at the end of the file and the entry I added I have bolded in the following:

# JVM classpath
java.class.path={application.home}/servers/lib,{application.home}/servers/cfusion/cfusion-ear/cfusion-war/WEB-INF/cfusion/lib/cfusion.jar,{application.home}/servers/cfusion/cfusion-ear/cfusion-war/WEB-INF/cfusion/lib,{application.home}/lib/xerces.jar,{application.home}/lib/jrun.jar,{application.home}/lib

Next we needed to add the switches for the parser to JVM’s arguments. If you look at your jvm.config file again you’ll see a line like this (and the following lines [which are all on one line]):

# Arguments to VM
java.args=-server -Xmx512m -Dsun.io.useCanonCaches=false -Xbootclasspath/a:"{application.home}/servers/cfusion/cfusion-ear/cfusion-war/WEB-INF/cfusion/lib/webchartsJava2D.jar" -XX:MaxPermSize=128m -XX:+UseParallelGC -DJINTEGRA_NATIVE_MODE -DJINTEGRA_PREFETCH_ENUMS

Now you will need to add the switches Brandon mentioned as follows:

# Arguments to VM
java.args=-server -Xmx512m -Dsun.io.useCanonCaches=false -Xbootclasspath/a:"{application.home}/servers/cfusion/cfusion-ear/cfusion-war/WEB-INF/cfusion/lib/webchartsJava2D.jar" -XX:MaxPermSize=128m -XX:+UseParallelGC -DJINTEGRA_NATIVE_MODE -DJINTEGRA_PREFETCH_ENUMS -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser -Djavax.xml.parsers.DocumentBuilderFactory=org.apache.xerces.jaxp.DocumentBuilderFactoryImpl -Djavax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl

Right all of the configuration is done now, all that’s left is to re-start your JRun instance. I started off with just re-starting the cfusion instance as that was the one that mattered to me. And this is where I ran into some strange behaviour, but I’ll tell you this as far as I can tell it works. Stopping the server was no problem, but on -restart the console was telling me that it has failed to re-start. And that was true, however when I tried to re-start it again it came up no problems and had loaded the Xerces XML parser. I tried this a few more times and after every stop it appeared to need a little break before being started up again. Go figure, but at least it works.