Wednesday, January 28, 2009

Introducing the SINFERS Project

Hi All,
After a nearly three year hiatus and at the bequest of James Owen, I have decided to reactivate my blog. To motivate myself, I'll start a thread about my on-going work with the University of Sydney, Australia. Since May of 2006, I've been working with a super chap by the name of Grant Tranter, a PhD candidate in the Department of Soil Science, Faculty of Agriculture, Food and Natural Resources at UniS. Together, we've been creating an expert system called SINFERS that uses Jess to infer unknown soil properties from a set of known input data.

The underlying theory of SINFERS rests on the concept of a pedotransfer function (PTF) - as predictive functions of certain soil properties from other more available, easily, routinely, or cheaply measured properties [1]. SINFERS ultimately will have 300-500 such PTFs at its disposal for computing soil properties. Listing 1 is a typical example.

34.0023360256908 - 0.171438746911598 * P10_NR_Z - 0.333101650254256 * P10_NR_S

Listing 1. A PTF for computing 15 BAR Moisture g/g by gravimetric pressure plate

The above PTF has to do with soil water content, which can be measured to determined other characteristics under certain conditions (i.e., wilting point and field capacity). Such an examination yields the moisture released, and such tests are conducted under specific suction pressures (15 BAR and 0.3 BAR).

A user inputs a set of soil property data that we, as a convention, refer to as a batch. SINFERS has a knowledgebase that relates the arguments of each serialized PTF in its PTF database to the entered soil properties. If SINFERS recognizes a subset of those soil properties as matching the arguments to a particular PTF, that PTF is de-serialized, instantiated as a Java object, and is made available to SINFERS. The instantiated PTF object has methods that allow it to compute a value for its dependent variable as well as an estimate of the error.

When a PTF calculates a new soil property, that property is placed in working memory along with the initial set. The addition of such new properties generates new subsets that match the argument lists of other PTFs and the process repeats until the knowledgebase is exhausted and all possible soil properties have been inferred from the initial set. A typical PTF propose rule is given in Listing 2.
(defrule propose::ptf_P3A1_1
"Computes the soil bulk density - (g/cm^3)"
(soil-property (batchId ?id)(labCode depth)(value ?depth)(error ?edepth)
(status normal|initial)(ptf ?ptf1))
(soil-property (batchId ?id)(labCode _6A1)(value ?_6A1)(error ?e_6A1)
(status normal|initial)(ptf ?ptf2))
(soil-property (batchId ?id)(labCode P10_NR_S)(value ?P10_NR_S)(error ?eP10_NR_S)
(status normal|initial)(ptf ?ptf3))
; Propose a candidate value and error for this PTF.
(ptf-propose-candidate P3A1_PTF_1 P3A1 "1.36892296397839 - 0.138035067345618 * depth
- 0.321710483557874 * log10(_6A1) + 0.00198950541325674 * P10_NR_S +
( P10_NR_S - 48.8224031007752) * (P10_NR_S - 48.8224031007752) * -0.0000884352051558744"
(list depth _6A1 P10_NR_S)(list ?depth ?_6A1 ?P10_NR_S)
(list ?edepth ?e_6A1 ?eP10_NR_S)))

Listing 2. A PTF rule for computing soil bulk density using the given regression expression.

Lest it appear that the process of selecting PTFs is straight forward, let me point out that not only do the rules have to select which PTFs to apply, but they also have to allow the computed value and uncertainty of any soil property to be updated with a less uncertain one provided that:
  • The update does not cause a circular reference.
  • The update does not overwrite an initial input property.
SINFERS uses a modified propose and revise problem-solving method, implemented by three Jess modules. The propose module contains rules that allow each activated PTF rule to go ahead and compute its value and uncertainty, asserting these as candidates for consideration by the select module. A third module, the revise module handles special cases. For example, if a PTF rule is activated, and the very property it is set to replace was used to compute one of its arguments, a circular reference forms and we abort the computation.

An example of a select rule is given in Listing 3.

(defrule select::choose-best-soil-property-P3A1
"Selects best candidate providing most certain P3A1 and removes next best loser from WM"
(declare (auto-focus TRUE))
?sp1 <-(soil-property (batchId ?id)(id ?id1)(error ?e1)(status candidate)
(labCode P3A1) (value ?v1)(ptf ?ptf1))
?sp2 <-(soil-property (batchId ?id)(id ?id2&~?id1) (error ?e2&:(< ?e2 ?e1))
(status candidate) (labCode P3A1) (value ?v2)(ptf ?ptf2))

; Remove loser
(retract ?sp1)
(printout t "ptf-" ?ptf2 " has replaced ptf-" ?ptf1 " for
providing the most certain P3A1 value = " ?v2 ", error = " ?e2 crlf))
Listing 3. A selection rule for choosing the best bulk density computation.

I will save a detailed explanation of the inferencing logic for a later blog.

Getting Automated
I'll start this thread by talking about how we automated some of the build processes concerning the SINFERS knowledgebase. One of the early decisions was to integrate using Jython as an auxiliary scripting language for SINFERS. Jython provides a handy command line interpreter class that can be plugged into any Java application. With Jython, there is no need to use a tool like ANTLR to write your own scripting language. In fact, we could have used Jess as the internal scripting language for SINFERS (and may still for some parts of it) since Jess (the language) provides many of the same features. The cool thing is that we can drive Jess from Jython as well, so we get the best of both worlds.

When complex rules can be developed by applying a set of parameters to a template, it makes sense to automate the creation of that rule set. Rather than add potentially brittle classes to the SINEFRS API, we decided to write simple Jython scripts to generate our rules.

The Jython function that generates the proposal rules is given in Listing 4 below.

def makeRules(databasePath):
sys = System
rules = StringBuffer()
rules.append('(defmodule propose)\n\n')

xmlFileObj = getFileFromPath(databasePath)
dom = loadXMLDocument(xmlFileObj)
dependentVars = ArrayList()

root = dom.getRootElement();

ptfsCol = root.getChild('ptfs')
ptfs = ptfsCol.getChildren('ptf')

i = 1
for ptf in ptfs:
# Set var count
n = 1

# Extract all the body elements
ptfIdElement = ptf.getChild('id')
ptfLabCodeElement = ptf.getChild('labCode')
ptfNameElement = ptf.getChild('ptfName')
ptfFunctionTypeElement = ptf.getChild('functionType')
ptfLinsCorrelationCoefficientElement = ptf.getChild('linsCorrelationCoefficient')
ptfRmsErrorElement = ptf.getChild('rmsError')
ptfDependentTransformElement = ptf.getChild('dependentTransformation')
ptfCovarianceMatrixElement = ptf.getChild('covarianceMatrix')
ptfClusterCountElement = ptf.getChild('clusterCount')
ptfTrainingSampleCountElement = ptf.getChild('trainingSampleCount')
ptfAlphaElement = ptf.getChild('alpha')
ptfFuzzyExponentElement = ptf.getChild('fuzzyExponent')
ptfExpressionElement = ptf.getChild('expression')
ptfCreationDateElement = ptf.getChild('creationDate')
ptfCountryCodeElement = ptf.getChild('countryCode')
ptfIndependentVariablesCol = ptf.getChild('independentVariables')
ptfIndependentVariables = ptfIndependentVariablesCol
ptfTrainingClustersCol = ptf.getChild('trainingClusters')
ptfTrainingClusters = ptfTrainingClustersCol.getChildren('trainingCluster')

# Extract the field values
ptfId = ptfIdElement.getText()
ptfLabCode = ptfLabCodeElement.getText()
ptfName = ptfNameElement.getText()
ptfFunctionType = ptfFunctionTypeElement.getText()
ptfLinsCorrelationCoefficient = ptfLinsCorrelationCoefficientElement.getText()
ptfRmsError = ptfRmsErrorElement.getText()
ptfDependentTransform = ptfDependentTransformElement.getText()
ptfCovarianceMatrix = ptfCovarianceMatrixElement.getText()
ptfClusterCount = ptfClusterCountElement.getText()
ptfTrainingSampleCount = ptfTrainingSampleCountElement.getText()
ptfAlpha = ptfAlphaElement.getText()
ptfFuzzyExponent = ptfFuzzyExponentElement.getText()
ptfExpression = ptfExpressionElement.getText()
ptfCreationDate = ptfCreationDateElement.getText()
ptfCountryCode = ptfCountryCodeElement.getText()

# Store the dependent variable lab codes for later

rules.append('(defrule propose::ptf_' + ptfLabCode + '_' + ptfId + '\n')
rules.append('\"Insert ptf purpose here\"\n')

# Store the PTF arguments for later
args = ArrayList()
for variable in ptfIndependentVariables:
varLabCodeElement = variable.getChild('labCode')
varLabCode = varLabCodeElement.getText()
# Create a conditional element for each argument in the PTF
for variable in ptfIndependentVariables:
varLabCodeElement = variable.getChild('labCode')
varLabCode = varLabCodeElement.getText()
rules.append('(soil-property (batchId ?id)(labCode ' +
varLabCode + ')(value ?' +
varLabCode + ')(error ?e' +
varLabCode + ') (status normal|initial)(ptf ?ptf' +
str(n) + '))\n')
n = n + 1
rules.append('; ?ptfName ?labCode ?expression ?arg-syms ?arg-vals ?arg-errs\n')
rules.append('(ptf-propose-candidate ' + ptfName + ' ' +
ptfLabCode + ' \"' + ptfExpression + '\" ')
for arg in args:
rules.append(' ' + arg)
rules.append(') ')
for arg in args:
rules.append(' ?' + arg)
rules.append(') ')
for arg in args:
rules.append(' ?e' + arg)

# Build the selection rules

rules.append('(defmodule select)\n\n')

for var in dependentVars:
rules.append('(defrule select::choose-best-soil-property-' + var + '\n')
rules.append('\"Selects the best candidate for soil property ' +
var + ' and removes the next best loser from WM\"\n')
rules.append('(declare (auto-focus TRUE))\n')
rules.append('?sp1 <-(soil-property (batchId ?id)(id ?id1)(error ?e1)' +
'(status candidate) (labCode ' + var + ') (value ?v1) (ptf ?ptf1))\n')
rules.append('?sp2 <-(soil-property (batchId ?id)(id ?id2&~?id1)' +
'(error ?e2&:(< ?e2 ?e1)) (status candidate) (labCode ' + var + ')' +
'(value ?v2) (ptf ?ptf2))\n')
rules.append('; Remove loser\n')
rules.append('(retract ?sp1)\n')
rules.append('(printout t "ptf-" ?ptf2 " has defeated ptf-" ?ptf1 " ' +
' for soil property ' + var + ' value = " ?v2 ", error = " ?e2 crlf))\n\n')

sys.out.println('Rule file written OK. \nOUTPUT: ' + ruleOutputFile)
return rules.toString()

Listing 4. A Jython function for generating SINFERS PTF rules.

In the next installment, I'll talk more about the inferencing considerations and how we arrived at the logic.


[1] See