reStructuredText extensions

I’m writing myself a small website, with basic contact info and perhaps a blog. For the fun of it, I’m using Django (for those not in the know, Django is a Python web application framework).

Edit: My website is up and running although there is no code content yet to show off the stuff discussed below.

Now, I don’t like most WYSIWYG editors at all. I think they only help you remove the semantic structure from your documents and makes you focus on visual layout instead. I find it best to either predesign the visual stuff and then not worry about it - or write my document first and then see what visual layout fits the semantic structure.

For this reason, I like markup languages, such as Markdown which I’m using right now to write this entry. However, for my website/blog I wanted something a little more powerful and extendable. Since I’m a Python programmer by trade, what better language to use than reStructuredText, the language often used for inline documentation of Python code?

I don’t want to waste your time by deitailing my preferences, what I do want to write about is how easy it is to extend reStructuredText with custom code. In my case, I wanted to be able to write inline LaTeX formulas on my site and have them automatically converted to PNGs for display in the browser. There is a WordPress plugin but a custom version seems to in effect on the public Wordpress.com blog host as well. The following is my account of quickly implementing this in Python and plugging it into my website as a rST (reStructuredText) extension.

LaTeX rendering

First step is to throw together a script to render the LaTeX formulas into a PNG. For this I shamelessly stole the idea from the LatexRender, the WordPress plugin mentioned above. First off, we need some imports and a function to wrap a formula in a proper document. The preamble shown here corresponds to the one from LatexRender, but of course you can use your own.

import tempfile
import os
import hashlib
import shutil
 
def wrap_formula(formula, font_size, latex_class):
    return r"""\documentclass[%(font_size)spt]{%(latex_class)s}
               \usepackage[latin1]{inputenc}
               \usepackage{amsmath}
               \usepackage{amsfonts}
               \usepackage{amssymb}
               \pagestyle{empty}
               \newsavebox{\formulabox}
               \newlength{\formulawidth}
               \newlength{\formulaheight}
               \newlength{\formuladepth}
               \setlength{\topskip}{0pt}
               \setlength{\parindent}{0pt}
               \setlength{\abovedisplayskip}{0pt}
               \setlength{\belowdisplayskip}{0pt}
               \begin{lrbox}{\formulabox}
               $%(formula)s$
               \end{lrbox}
               \settowidth {\formulawidth}  {\usebox{\formulabox}}
               \settoheight{\formulaheight} {\usebox{\formulabox}}
               \settodepth {\formuladepth}  {\usebox{\formulabox}}
               \newwrite\foo
               \immediate\openout\foo=\jobname.depth
                   \addtolength{\formuladepth} {1pt}
                   \immediate\write\foo{\the\formuladepth}
               \closeout\foo
               \begin{document}
               \usebox{\formulabox}
               \end{document}""" % locals()

Now we need a function that given a formula and a destination directory, invokes the necessary commands to make a PNG and saves it in that directory. For efficiency, we take the MD5 sum of the formula and use it as a name for the PNG, so we can simply skip it if one exists already (taking our chances on hash collisions).

def render_formula(formula, folder, font_size=11, latex_class='article'):
    hash = hashlib.md5(formula).hexdigest()
    if os.path.exists(os.path.join(folder, hash + ".png")):
        return hash + ".png"
 
    tempdir = tempfile.mkdtemp()
    curpath = os.getcwd()
    os.chdir(tempdir)
 
    f = file('formula.tex', 'w')
    f.write(wrap_formula(formula, font_size, latex_class))
    f.close()
 
    status = os.system("latex --interaction=nonstopmode formula.tex")
    assert 0==status, tempdir
 
    status = os.system("dvips -E formula.dvi -o formula.ps")
    assert 0==status, tempdir
 
    status = os.system("convert -density 120 -trim -transparent \"#FFFFFF\" formula.ps formula.png")
    assert 0==status, tempdir
 
    os.chdir(curpath)
 
    shutil.copyfile(os.path.join(tempdir, "formula.png"), os.path.join(folder, hash + ".png"))
    shutil.rmtree(tempdir)
 
    return hash+".png"

Now, this is all very quick n’dirty - there is no error checking or recovery, as the point is to show of how easy it is to plug this into rST. The code so far I saved in latexrender.py.

Defining rST directives and roles

In Django, I’m using the template_utils application for quick access to some Markup languages and follow the lead of James Bennet from his blog application - i.e. my Django model simply contains a text field that on each save is formatted with a markup processor and the resulting HTML is stored in another field. Quite simply, the model looks like this

from django.db import models
from template_utils.markup import formatter
 
class Page(models.Model):    
    key = models.CharField(max_length=30, primary_key=True)
    body = models.TextField()
    body_html = models.TextField(editable=False, blank=True)
 
    class Admin:
        fields = (
            ('Meta', { 'fields': ('key', 'title', 'page_title', 'page_subtitle' )}),
            ('Body', { 'fields': ('body',), 'classes': 'monospace' }),
        )
 
    def save(self):
        self.body_html = formatter(self.body)  # <-- that's basically all it takes...
        super(Page, self).save()

Now, I may be abusing the Django settings file a little, because from it I import a module rstextensions (the contents of which we’ll see shortly), the relevant part of my settings.py looks like this,

import rstextensions
MARKUP_FILTER = ('restructuredtext', {})
LATEX_IMG_DIRECTORY = '/Users/arnarb/Documents/Projects/django/arnar/static/latex'
LATEX_IMG_URL = '/static/latex/'

And now for the whole point, how to define rST extensions. rST has two main extension mechanisms, directives, for block-level content, and roles for inline content and we’ll use both. All code appearing from now on forms the contents of rstextensions.py. First some imports and a simple function to take a formula string, pass it to latexrender and return the corresponding URL, using the LATEX_IMG_* settings from above.

from docutils import nodes, utils
from docutils.parsers.rst import directives, roles
from django.conf import settings
import latexrender
 
def get_latex_img(formula):
    fname = latexrender.render_formula(formula, settings.LATEX_IMG_DIRECTORY)
    return settings.LATEX_IMG_URL + fname

To define an rST directive, we define a handler function and register it.

def latex_directive(name, arguments, options, content, lineno,
                    content_offset, block_text, state, state_machine):
    url = get_latex_img('\n'.join(content))
    return [nodes.raw('', '<img class="formula" src="%s" />' % url, format='html')]
 
latex_directive.content = 1
directives.register_directive('latex', latex_directive)

What is happening here is we define a function with a standard signature (refer to the relevant documentation) and tell docutils that we want it to be a handler for the latex directive. On seeing such a directive, the rST processor will call our function with the block text in content, which will be a list of the lines in the block. We simply generate the PNG and the corresponding URL and plug it into a <img> tag and return it as a raw document node. (Note: the proper thing here would have been to return a nodes.image object, but it didn’t work right away and this served my purposes well)

Now I can write rST code like this

Check this out:
 
.. latex::
 
   e^{i\pi} - 1 = 0
 
Regular text continues here.

and the rST processor correctly generates the PNG and replaces the block with the appropriate <img> tag. Excellent!

Now, since I also want to use inline math expressions, let’s define a rST role as well. Roles are text strings of the form role:`some content` within a paragraph. Defining one is equally simple to defining a directive, just a different signature and another registering function.

def latex_role(name, rawtext, text, lineno, inliner, options={}, content=[]):
    src = rawtext.split('`')[1].replace('\\\\','\\')  # Restore escaped backslashes
    url = get_latex_img(src)
    return [nodes.raw(rawtext, '<img class="inlineformula" src="%s" />' % url, format='html')], []
 
roles.register_canonical_role('latex', latex_role)

Since the rST parser hands role handlers a interpreted version of the context in the text parameter, we must use the rawtext one instead. That one contains the whole unparsed role text, including the role name and the backticks. We fish out the part between the ticks and since rST has been kind enough to double up all backslashes, we need to reverse that before passing it to LaTeX. This function must now return two lists, one with the nodes to be injected (like the directive one did) and a list of system messages that will be injected right after the block that contains the inline element (see the relevant docs for details). Notice how the <img> tag has a different class name, so we can style inline formulas differently in our CSS.

Now I’m a happy camper as I can write rST code like this

The transition :latex:`\right< X,s \left> \Rightarrow \right< X^\prime, s \left>` is contrived!

and all is well :o)

A handy feature of rST is the default role. The default role is a global role that is applied to all strings enclosed in backticks if there is no preceding role name. So, if I have a text with lots of formulas, I can set the default role with a command like this one

.. default-role:: latex

and from then on, I only have to wrap inline LaTeX expressions in a pair of backticks.

5 Comments

  1. einar:

    Nice article Arnar. I have actually been playing around a bit with some basic Django tutorials over the last couple of days, since I’m both new to the framework as well as the Python language.

    My only comment on your code is in the get_latex_img(formula) function, where the local url variable seems to be redundant (it is never used).

    Do you have any plans on packaging this solution so it can be easily plugged into other Django projects?

  2. Arnar:

    Thanks Einar, you were right, that was completely redundant. Suits me for editing code in a web-form.

    I didn’t plan on packaging this up specificially as it is so simple actually. I’d have to add some sane error handling to.

    As for Django, I recommend the Apress book. It should be noted that I’m no Django expert though :)

  3. einar:

    Yes, I was planning to take a look at that book, or to be more precise the free online version

  4. Using jsMath in MoinMoin with ReStructuredText - Tormod Ravnanger Landet:

    [...] know of docutils directives until today, so working is most definitely a big plus It is based on this, but simplified a whole lot due to using jsMath to render the LaTeX [...]

  5. hkarfrre:

    hkarfrre…

    hkarfrre…

Leave a comment

Sorry about the captcha, we were getting buried in spam. At least this one serves a purpose.