Saturday 7 July 2007

SBML standard and problems with libsbml

I have spent several days fighting a problem with my Java program which uses libsbml indirectly through the Java Native Interface. Every once in a while the whole java virtual machine would crash while reading an SBML model. So, I performed a detailed investigation of what is wrong about libsbml and the standard itself.

Let's begin with an SBML schema. My program was using SBML Level 2 Version 2 schema, and the most recent one is SBML Level 2 Version 3. I specifically checked if this problem persists, and my comments are still valid. So, the key point. The schema defines a specific XML namespace for SBML, and for SBML Level 2 Version 3 it is:
targetNamespace="http://www.sbml.org/sbml/level2/version3" 
However, libsbml, being a de facto standard library for SBML support crashes badly when processing a model compliant to the standard. I begin my model with the following:
<?xml version="1.0" encoding="UTF-8"?>
<sbml xmlns="http://www.sbml.org/sbml/level2/version3" level="2" version="3">
Which is exactly what the schema prescribes. Next, I am trying to read this model with readSBML example program supplied with libsbml. Guess what happens?
$ ./readSBML model1.xml
Segmentation fault
Too bad, as this means that the problem is not with my code, but with libsbml itself. Running readSBML with valgrind demonstrates obvious memory management errors in libsbml's XML parser. The problem persists with both expat and xerces-c implementations.

But why the majority of software tools work fine with this implementation of XML parser? I went on and took a look on what Copasi produces when exporting a model to SBML. Surprisingly, the model produced with Copasi does not comply to the SBML standard. And the root element of the produced model uses the wrong namespace:
http://www.sbml.org/sbml/level2
I tried to feed that (non-compliant) model to readSBML, and everything went fine. I even tried to valgrind readSBML with the new model and look:
==15583== ERROR SUMMARY: 0 errors from 0 contexts
==15583== All heap blocks were freed -- no leaks are possible.
Unbelievable! But is it a problem with Version 3 (and Version 2) schema support or with the whole SBML standard? Just type that namespace http://www.sbml.org/sbml/level2 in your browser, and it will show you the proper schema. The schema, however, clearly specifies:
targetNamespace="http://www.sbml.org/sbml/level2/version3"
So, the standard itself is inconsistent, and standard support is badly broken.

What can we do at the moment? Well, the only way around is to break the standard and continue using the wrong namespace. Does it require any attention from the SBML consortium? I think so. What can I do to write an SBML compliant software tool? Write my own SBML support library. Will a properly SBML compliant tool read the vast majority of published models? I doubt it.

What have I done to fix my problem? I produced a little routine which converts proper models to the broken form and then reads them with libsbml parser.

No comments: