Transforming DocBook to HTML in .NET
Julián Hidalgo May 17th, 2008
One of the things I’m currently involved in is writing a book about Gallio. We chose DocBook as the format, even though XML sucks, because it’s a mature technology with lots of documentation and strong tool support, and because we need to render the book in multiple output forms (HTML, PDF and so on).
This week I started to research how to convert what we have so far to HTML, so we can upload it to the web site. Since DocBook is XML-based it’s not surprising that the simplest way to do it is by using XSL stylesheets. A free set of them is available at the project’s web site and there is even a full book devoted to them.
To make things as smooth as possible I decided I’d do the processing in .NET, but it was not as easy as I was thought it would be, so I decided to blog about it in case anyone else needs a quick start.
Let me start by showing you the sample book we’ll be transforming:
<?xml version=“1.0“ encoding=“UTF-8“?>
<book version=“5.0“
xmlns=“http://docbook.org/ns/docbook“
xmlns:xi=“http://www.w3.org/2001/XInclude“>
<title>My Book</title>
<xi:include href=“Chapter1.xml“ />
<xi:include href=“Chapter2.xml“ />
</book>
It’s pretty basic, we only have a title and two chapters. Notice that I’m using XInclude instead of entities to include the chapters. In case you are interested, the book “DocBook: The Definitive Guide” includes a list of the tradeoffs involved in choosing either approach. To be honest I don’t find any compelling reason to prefer XInclude (it was Jeff who persuaded me!), but I’ll leave that discussion for another post. Here’s the XML markup for Chapter 1:
<?xml version=“1.0“ encoding=“UTF-8“?>
<chapter version=“5.0“ xmlns=“http://docbook.org/ns/docbook“>
<title>Chapter 1’s Title</title>
<section>
<title>A section</title>
<para>
Lorem ipsum dolor sit amet consectetuer adipiscing.
[More random text]
Suspendisse potenti. Etiam hendrerit cursus eros.
</para>
</section>
</chapter>
Chapter 2 is the same thing. In the real Gallio book we also have things like code listings and images. I still don’t know how to format the code listings (that will be the subject of a future post), but images just work without doing anything special.
Before I show any code you’ll want to download the DocBook XSL stylesheets from the SourceForge project. Make sure you pick the DocBook XSL-NS stylesheets (the docbook-xsl-ns-xxx package) or you’ll get a warning when using them (”WARNING: cannot add @xml:base to node set root element. Relative paths may not work”). They are the recommended DocBook V.50 stylesheets anyway. When you decompress the package you’ll see a bunch of folders and files - don’t panic, we only need to mess with one of them.
Now we are ready to start. We’ll use the XslCompiledTransform class (which is .NET’s XSLT processor) to load and transform our DocBook file. There are two steps involved: loading the stylesheet and applying it to the XML file.
1. Loading the Style Sheet
We want the book in HTML format, so we have to use the ‘docbook.xsl’ stylesheet (located in the html folder), which generates a single HTML output file. There’s also another potentially useful one called ‘chunk.xsl’ that generates multiple linked files, which is best suited for a book, but when I tried it I got an error that seemed to be the stylesheet’s fault (something like “an ‘xsl:apply-imports’ element can only appear within a ‘xsl:template’ element”), so we’ll stick to the single output file for now :)
This is the code to load the stylesheet:
1 private static void LoadXsl(XslCompiledTransform transform)
2 {
3 XmlReaderSettings settings = new XmlReaderSettings();
4 settings.ProhibitDtd = false;
5 using (XmlReader reader = XmlReader.Create(
6 @”docbook-xsl-ns-1.73.2\html\docbook.xsl”,
7 settings))
8 {
9 XsltSettings xsltSettings = new XsltSettings();
10 xsltSettings.EnableDocumentFunction = true;
11 transform.Load(reader, xsltSettings, new XmlUrlResolver());
12 }
13 }
We are basically creating a XmlReader object pointed to the stylesheet we want to use and passing it to the Load method of a XslCompiledTransform instance. This is pretty straightforward, but we need to set a couple of options to make it work:
- By default DTD processing is prohibited, so we’ll get an XmlException when loading the stylesheet. To fix this we set the ProhibitDtd property to false in line 4.
- The document() function is disabled by default. This is used by the stylesheets, so we need to enable it or we’ll get yet another exception. This is done in lines 9-10 where we create a new XsltSettings object and set its EnableDocumentFunction property to true.
2. Applying the Transform
This is the code required to apply the stylesheet:
1 private static void Transform(XslCompiledTransform transform)
2 {
3 XmlWriterSettings writerSettings = new XmlWriterSettings();
4 writerSettings.ConformanceLevel = ConformanceLevel.Auto;
5
6 using (XmlWriter writer = XmlWriter.Create(@”book.html”, writerSettings))
7 {
8 XsltArgumentList arguments = new XsltArgumentList();
9 arguments.AddParam(“html.stylesheet”, String.Empty, “styles.css”);
10 transform.Transform(CreateReader2(), arguments, writer);
11 }
12 }
Again, a couple of points to notice:
- In line 4 we set the conformance level to ConformanceLevel.Auto through a XmlWriterSettings instance. For some reason if you don’t do this an InvalidOperationException is thrown complaining about the validity of the output file, even though the HTML file looks fine.
- We can give parameters to the stylesheet. In this case, in line 9 I’m defining the name of the CSS stylesheet to use (this will generate a link tag in the output file).
In case you are wondering, the CreateReader method looks like this:
private static XmlReader CreateReader()
{
XmlReaderSettings readerSettings = new XmlReaderSettings();
readerSettings.ProhibitDtd = false;
return XmlReader.Create(bookPath, readerSettings);
}
We are just creating a XmlReader pointed to the book file and with the ProhibitDtd set to false as we did for the stylesheet.
We run the code and see the following output (after manually opening the resulting file of course):
As you see the title is OK, but the chapters are not included. The following warning give us a clue:
Element include in namespace ‘http://www.w3.org/2001/XInclude’ encountered in book, but no template matches.
It turns out XslCompiledTransform doesn’t support XInclude. Luckily there’s a project called Mvp.Xml that provides this functionality through a class called XIncludingReader. If you read the documentation in their wiki it looks as it’s only a matter of returning a XIncludingReader instance from the CreateReader method to make everything work, but it’s not. For some reason, a XmlException is thrown complaining about the DTD processing prohibition again (don’t you start to hate that little feature?). The solution is to subclass XIncludingReader and override its Settings property, making sure the returned object has the ProhibitDtd property set to false. None of the constructors in the XIncludingReader class accepts a XmlReaderSettings object, so it’s the only way.
This is the new class:
class XIncludeEnabledReader : XIncludingReader
{
XmlReaderSettings settings = new XmlReaderSettings();
public XIncludeEnabledReader(string path)
: base(path)
{
settings.ProhibitDtd = false;
}
public override XmlReaderSettings Settings
{
get { return settings; }
}
}
With the new code everything works as expected:
It took me a while to figure the subclassing thing out, because internally the XIncludingReader class uses a normal XmlReader where the DTD prohibition is disabled. I don’t know how the XslCompiledTransform class interacts with the reader it receives, but it (or some other class) must be looking at this property at some point because otherwise the exception wouldn’t be thrown. I was about to start debugging into the Framework’s source to see what was going on when I thought of subclassing. The clue was discovering that XIncludingReader.Settings was always null. Seeing now all the effort it took, I wish I’d used entities from the beginning, but anyway.
So, that’s all. I’ve uploaded a zip file with the sample code. Please make sure you fix the path in line 48 of Program.cs before running it.
Hope it helps :)