CFMX and jTidy
While writing the comment code I was hoping to make use of jTidy to parse the comment passed in and have it tidied up so that any HTML provided would be valid. It was also a test for integration into the CMS for this site and it passed with flying colours on CFMX but sadly with the FREE version of BD (which this site runs on) you can’t deploy additional jar files [UPDATE: have a read of this entry and you’ll find out how to load java files on the fly]. So you’ll need to deploy Tidy.jar to your {installPath}WEB_INF/lib/ (CFMX) or {installPath}/BlueDragon_Server_61/lib/ (BD) folder and re-start your server.
The approach I adopted I find far from ideal, but maybe somebody out there with more experience in using jTidy and java can provide a few hints, but here’s prinicipally how it works. I created a method (makexHTMLValid()) that expects three arguments: strToParse, thisUrl, tmpPath. The first is the string to be cleaned, the second the URL from which a while will be read in with the string to be cleaned is held and a finally a physical path where the aforementioned template is generated and held for the duration of the parsing.
It does seem very laborious and it is. To further clarify the string is written to a file that jTidy then reads in by making an http connection and reads the file in. jTidy itself then writes a while out with the cleaned string and the function finishes by reading in the file and cleaning up all the temp files before returning the cleaned String. The only implementation examples I could find dealt with reading in StringBuffers using the above outline. I’d be delighted to hear of examples on converting a String variable into a StringBuffer and then back again.
pathToTempFile = “/relativePath/toYourFile”;
cleanedString = makeXHTMLValid(yourStringToParse, “http://”&cgi.SERVER_NAME&”/”&pathToTempFile, ExpandPath(pathToTempFile));
The function is as follows
<cffunction name=”makexHTMLValid” displayname=”Tidy parser” hint=”Takes a string and url as a arguments and returns parsed and valid xHTML” output=”true”>
<cfargument name=”strToParse” required=”true” type=”string” default=”” />
<cfargument name=”thisUrl” required=”true” type=”string” default=”” />
<cfargument name=”tmpPath” required=”true” type=”string” default=”” />
<cfscript>
/**
* This function reads in a string, checks and corrects any invalid HTML. It creates two
* temporary files, because as far as I can tell jTidy relies on files for parsing
* By Greg Stewart
*
* @param strToParse The string to parse (will be written to file).
* @param thisUrl The Url to parse
* @param tmpPath The location where the tmp files we be written to, must be
* accessible from the web browser
* @return returnPart
* @author Greg Stewart (gregs(at)tcias.co.uk)
* @version 1, August 22, 2004
*/
var fileReadIn = “”; // xHTML output
var returnPart = “”; // return variable
var pageIn = “tmpIn.”&CreateUUID()&”.html”;
var pageOut = tmpPath&”tmpOut.”&CreateUUID()&”.html”;
var filename = tmpPath&pageIn;
var writeData = “”;
// create the file stream
jFile = createobject(“java”, “java.io.File”);
jFile.init(filename);
// the file doesn’t exist so use the file stream to create it
jFile.createNewFile();
//
// writeFile = filename;
writeData = toString(trim(arguments.strToParse));
jStream = createobject(“java”,”java.io.FileOutputStream”).init(jFile);
// create the UTF-8 file writer and write the file contents
jWriter = createobject(“java”, “java.io.OutputStreamWriter”);
jWriter.init(jStream);
jWriter.write(writeData);
// flush the output, clean up and close
jWriter.flush();
jWriter.close();
jStream.close();
// jTidy part
jTidy = createObject(“java”,”org.w3c.tidy.Tidy”);
jTidy.setQuiet(false);
jTidy.setIndentContent(true);
jTidy.setSmartIndent(true);
jTidy.setIndentAttributes(true);
jTidy.setWraplen(1024);
jTidy.setXHTML(true);
// build the Url to parse
theUrl = arguments.thisUrl & pageIn;
// create the in and out streams for jTidy
u = createObject(“java”,”java.net.URL”).init(theUrl);
inP = createObject(“java”,”java.io.BufferedInputStream”).init(u.openStream());
outx = createObject(“java”,”java.io.FileOutputStream”).init(pageOut);
// do the parsing
jTidy.parse(inP,outx);
// close the stream
outx.close();
// read in the validated file
if (fileExists(pageOut)) {
fileReader = createObject(“java”, “java.io.FileReader”);
fileReader = fileReader.init(pageOut);
if (isObject(fileReader)) {
lineCount = 0;
lineReader = createObject(“java”,”java.io.LineNumberReader”);
lineReader = lineReader.init(fileReader);
line = lineReader.readLine(); //Read first line, if any into variable line
while (isDefined(“line”)) {
lineCount = lineCount + 1;
//Process the variable line
fileReadIn = fileReadIn & line;
line = lineReader.readLine(); //Read the next line, if any
}
}
}
// close the connection
fileReader.close();
// ok now strip all the header/body stuff
startPos = REFind(“<body>”, fileReadIn)+6;
endPos = REFind(“</body>”, fileReadIn);
returnPart = Mid(fileReadIn, startPos, endPos-startPos);
// delete the temp files
jFile.init(filename);
jFile.delete();
jFile.init(pageOut);
jFile.delete();
</cfscript>
<cfreturn returnPart />
</cffunction>