Scratching my programming itch: January 2010

Welcome to the final posting of this series where I have been porting one of my Ant tasks from Java to Groovy. If you haven't read part 1 or part 2, then you might want to take a quick pass through those before reading this posting. In this posting, I will show the following:

FileValidator class ported to Groovy
Using HTML Builder to generate the HTML report
Full listings of the scripts & classes created

FileValidator ported to Groovy
One of the last tasks left to do is to port the FileValidator class from Java over to Groovy. The FileValidator class is created in the process() method of the Validator class and called for every file to be validated. This is another example of where the Groovy code is a lot shorter and more concise than the Java code. The FileValidator class handles the iteration thru a file, line by line, checking each regular expression checking for a potential error condition.

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.regex.Matcher;

/**
 * File validator runs a list of regular expressions against the file
 * attempting to validate the file.  Errors are kept in a list and
 * then the caller can get the list of errors and log them - right
 * now the ant task will log them. 
 *
 */
public class FileValidator {

    private List regExDescriptors;
    private File file;
    private List errors;
    
    /**
     * Default constructor
     * @param list List of RegExDescriptors used to validate a file
     * @param file File to be validated
     */
    public FileValidator(List list, File file) {
        regExDescriptors = new ArrayList(list);
        this.file = file;
        errors = new ArrayList();
    }
    
    public List getErrors() {
        return errors;
    }
    
    public void validate() {
      // see if we should exclude this file  
      processExcludes();
      
      // Run the regEx(s) against the file to check for problems
      BufferedReader in = null;
      try {
          in = new BufferedReader(new FileReader(file));
          String str;
          int lineNumber = 1;
          while ((str = in.readLine()) != null) {
              for (Iterator iter = regExDescriptors.iterator(); iter.hasNext();) {
                  RegExDescriptor validator = (RegExDescriptor) iter.next();
                  Matcher matcher = validator.getMatcher(str);
                  if (matcher.find()) {
                   errors.add( validator.getDescription() + " on line " + lineNumber);
                  }
              }
              lineNumber++;
          }
      } catch (IOException e) {
          e.printStackTrace();
      } finally {
          if (in != null)
              try {
                  in.close();
              } catch (IOException e) {
              }
      }  
    }

    private void processExcludes()
    {
        String fullName;
        try
        {
            fullName = file.getCanonicalPath();
        }
        catch (IOException e)
        {
            fullName = file.getName();
            e.printStackTrace();
        }
        for (Iterator iter = regExDescriptors.iterator(); iter.hasNext();)
        {
            RegExDescriptor descriptor = (RegExDescriptor) iter.next();
            if (descriptor.excludeFile(fullName))
            {
                iter.remove();
            }
        }
    }    
}

Groovy code for file validation

//**************************************
// process each file, line by line
//**************************************
files.each {fileName ->
  def errors = []
  def lines = new File(fileName).readLines()
  def lineNumber = 1
  lines.each {line ->
    // for each line - iterate thru descriptors looking for matches if file not excluded
    descriptorList.each {
      if (!(fileName =~ /$it.exclude/)) {
        if ((line =~ /$it.regex/).find()) {
          errors.add(it.description + " on line " + lineNumber)
          errCnt++
        }
      }
    }
    lineNumber++
  }
  if (errors)
    results.put((fileName), errors)
}

Generating the results report
The last part of the port was to create an html output report similar to a JUnit report. The Java code could have been better but it was simply a large set of write statements. I decided to use the HTML markup builder to generate the output report.

//*******************************************************
// generate the report
//*******************************************************
def writer = new FileWriter("$opt.r")
def html = new groovy.xml.MarkupBuilder(writer)
def rptDate = new Date()

html.html {
  head{
    title 'Validator Report'
  }
  body {
    h1 "File Validation - as of $rptDate"
    h2 "Validation Descriptors for file extension $opt.e"
    table ('border':0, 'cellpadding':5, 'cellspacing':2, 'width':'80%'){
      tr ('bgcolor':'#a6caf0') {
        th ("Description")
        th ("Validating Regular Expression")
      }
      //iterate thru descriptors - listing the regular expressions used in validation
      descriptorList.each { descriptor ->
        tr ('bgcolor':'#eeeee0') {
          td (descriptor.description)
          td (descriptor.regex)
        }
      }
    }
    // Summary table - just a file & error count
    h2 'Validation Summary'
    table ('border':0, 'cellpadding':5, 'cellspacing':2, 'width':'80%'){
      tr ('bgcolor':'#a6caf0') {
        th ("Files Processed")
        th ("Error Count")
      }
      tr ('bgcolor':'#eeeee0')  {
        td ("$files.size")
        td ("$errCnt")
      }
    }
    if (errCnt)
    {
      // Error details table
      h2 "Validation Details for extension $opt.e "
      table ('border':0, 'cellpadding':5, 'cellspacing':2, 'width':'100%'){
        tr ('bgcolor':'#a6caf0') {
          th ("File")
          th ("Errors")
          th ("Details")
        }
        //iterate thru results map key is filename and value is list of error text)
        results.each { k, v ->
            tr ('bgcolor':'#eeeee0')  {
               td (k)
               td (v.size())
               td {
                 ul {
                   //iterate thru the list of errors
                   v.each {
                     li (it)
                   }
                 }
               }
            }
        }
      }
    }
  }
}
writer.toString()

Porting Results
Initial Java Ant task: 3 classes and 326 lines of code
Groovy port: 1 script of 102 lines of code and 1 class with 9 lines of code. There are about 40 lines of processing code and the rest of the code/lines are from the html markup builder for the report.

Shorter and more concise, you decide!

Groovy script and class source
Here's the full Validator.groovy script

/**
 *  Validator script ported from Java Ant task
 */

//********************************************
// handle command line parms - all required
//********************************************
def cli = new CliBuilder(usage: 'groovy Validator -e extension -p propertyFile -d directory -r reportFile')

cli.h(longOpt: 'help', 'usage information')
cli.e(longOpt: 'extension', args: 1, required: true, 'file extension to be validated')
cli.p(longOpt: 'prop', args: 1, required: true, 'property file containing regular expresssion')
cli.d(longOpt: 'directory', args: 1, required: true, 'base directory to start file search')
cli.r(longOpt: 'reportFile', args: 1, required: true, 'output file for validation report')

def opt = cli.parse(args)
if (!opt) return
if (opt.h) cli.usage()

println "Processing $opt.e files \n\tusing $opt.p \n\tfrom directory $opt.d \n\tand generate report output to $opt.r"
println 'extracting files...'

//*******************************************
// get all the files for provided extension
//*******************************************
def files = new FileNameFinder().getFileNames(opt.d, "**/*.$opt.e")
println "processing $files.size files..."

//**************************************************************
// load properties into list of ValidatorDescriptor objects
//**************************************************************
Properties properties = new Properties();
try {
  properties.load(new FileInputStream(opt.p));
} catch (IOException e) {}

def config = new ConfigSlurper().parse(properties)
def descriptorList = []
config."$opt.e".each {
  descriptorList << new ValidatorDescriptor(it.value)
}

def errCnt = 0
def results = [:]

//**************************************
// process each file, line by line
//**************************************
files.each {fileName ->
  def errors = []
  def lines = new File(fileName).readLines()
  def lineNumber = 1
  lines.each {line ->
    // for each line - iterate thru descriptors looking for matches if file not excluded
    descriptorList.each {
      if (!(fileName =~ /$it.exclude/)) {
        if ((line =~ /$it.regex/).find()) {
          errors.add(it.description + " on line " + lineNumber)
          errCnt++
        }
      }
    }
    lineNumber++
  }
  if (errors)
    results.put((fileName), errors)
}

//*******************************************************
// generate the report
//*******************************************************
def writer = new FileWriter("$opt.r")
def html = new groovy.xml.MarkupBuilder(writer)
def rptDate = new Date()

html.html {
  head{
    title 'Validator Report'
  }
  body {
    h1 "File Validation - as of $rptDate"
    h2 "Validation Descriptors for file extension $opt.e"
    table ('border':0, 'cellpadding':5, 'cellspacing':2, 'width':'80%'){
      tr ('bgcolor':'#a6caf0') {
        th ("Description")
        th ("Validating Regular Expression")
      }
      //iterate thru descriptors - listing the regular expressions used in validation
      descriptorList.each { descriptor ->
        tr ('bgcolor':'#eeeee0') {
          td (descriptor.description)
          td (descriptor.regex)
        }
      }
    }
    // Summary table - just a file & error count
    h2 'Validation Summary'
    table ('border':0, 'cellpadding':5, 'cellspacing':2, 'width':'80%'){
      tr ('bgcolor':'#a6caf0') {
        th ("Files Processed")
        th ("Error Count")
      }
      tr ('bgcolor':'#eeeee0')  {
        td ("$files.size")
        td ("$errCnt")
      }
    }
    if (errCnt)
    {
      // Error details table
      h2 "Validation Details for extension $opt.e "
      table ('border':0, 'cellpadding':5, 'cellspacing':2, 'width':'100%'){
        tr ('bgcolor':'#a6caf0') {
          th ("File")
          th ("Errors")
          th ("Details")
        }
        //iterate thru results map key is filename and value is list of error text)
        results.each { k, v ->
            tr ('bgcolor':'#eeeee0')  {
               td (k)
               td (v.size())
               td {
                 ul {
                   //iterate thru the list of errors
                   v.each {
                     li (it)
                   }
                 }
               }
            }
        }
      }
    }
  }
}
writer.toString()

println "validation complete - possible error count = $errCnt"

//********************************
// for testing launch browser
//********************************
//"c:/program files/Internet Explorer/iexplore.exe $opt.r".execute()

Here's the one short class - ValidatorDescriptor.groovy

class ValidatorDescriptor {
    String description
    String exclude
    String regex

    String toString() {
       "regEx="+regex + " description=" + description + (exclude ? " exclude("+exclude+")" : "")
    }
}

And finally, my set of properties used for validation:

#################################################################
#
# SQL file regular expressions
#
#################################################################

#
# Find instances of trailing spaces after the semicolon.
# This causes MySQL problems.
#
#
sql.1.regex=;\\s+$
sql.1.description=Trailing spaces after semicolon
sql.1.exclude=SQLServer|Oracle

#
# Instances of lines starting with GO
#
sql.2.regex=^GO
sql.2.description=GO statement


#
# Find spaces before ending semicolon (but not DELIMITER ; which is used for MySQL Stored Procedures)
#
sql.4.regex=^(?!DELIMITER)\\s+;$
sql.4.description=Spaces before ending semicolon
sql.4.exclude=SQLServer|Oracle


#
# Oracle scripts using nvarchar
#
#
sql.5.regex=nvarchar
sql.5.description=Oracle scripts using nvarchar
sql.5.exclude=SQLServer|MySQL|Store Update

#
# MySQL scripts using ntext
#
#
sql.6.regex=\\sntext
sql.6.description=MySQL scripts using ntext data type
sql.6.exclude=SQLServer|Oracle

Summary
There you go - another itch successfully scratched! Maybe this task/script will be of help to others. The design was to make this a generic file validation utility and the SQL validation is the only use we have found so far. Enjoy!

Scratching my programming itch

Sunday, January 3, 2010

Ant task ported to Groovy - Part 3