Parsing JSON with Haskell
Note: This post is literate Haskell, you can copy the whole text to a .lhs file and compile it with GHC.
A friend of mine was showing me a very simple and elegant parser for JSON, written in Scala. As an excercise, I decided to write one in Haskell, using the Parsec parser combinator library. Here’s a quick walkthrough.
First we need some imports,
> import qualified Data.Map as Map > import Control.Monad (liftM) > import Text.ParserCombinators.Parsec > import qualified Text.ParserCombinators.Parsec.Token as T > import Text.ParserCombinators.Parsec.Language
Now, to represent a JSON value, we use an algebraic data type. For simplicity, we’ll only worry about integers and not floating point numbers. The definition of the type is simple, to be concise we use single letter constructor names (not a good idea if you want to publish a proper library).
> data JSON = S String > | I Integer > | B Bool > | L [JSON] > | O (Map.Map String JSON) > deriving Show
Now, to make our life somewhat easier, we’ll make use of the facilities in the ParsecToken module. This module
provides us with ways to create functions for generic lexers, which deal correctly with whitespace (and comments)
in a language. The first step is to create a language definition, in our case it only specifies how JSON comments
are delimited. From the definition we create a token parser which we call lexer.
> ldef = emptyDef { commentStart = "/*" > , commentEnd = "*/" > , commentLine = "//" > } > lexer = T.makeTokenParser ldef
We can now pass lexer to the convenience functions of ParsecToken.
As is common with such parsers, we rebind their names in our module to spare us referring to lexer
every time we use them.
> symbol = T.symbol lexer > stringLiteral = T.stringLiteral lexer > integer = T.integer lexer > squares = T.squares lexer > braces = T.braces lexer > commaSep = T.commaSep lexer
And now the parser itself. The basic element is a JSON value. From now on I’ll include the type signatures to make things easier to follow, but they are of course not necessary.
> value :: Parser JSON > value = liftM S stringLiteral > <|> liftM I integer > <|> liftM B boolean > <|> liftM L list > <|> liftM O object
A value is simply a choice between each of the basic types, a string, an integer, a boolean, a list of values or an object (which is simply a mapping from strings to values). We defer each to its own parser, and simply lift the value constructors to the parser monad. We get parsing of strings and integers completely for free from the ParsecToken module, but the other three we need to handle ourselves. The boolean case is simple,
> boolean :: Parser Bool > boolean = do symbol "true" > return True > <|> do symbol "false" > return False
Surprisingly, the list is even simpler, due to the helpers from ParsecToken.
> list :: Parser [JSON] > list = squares $ commaSep value
Parsing objects is also fairly simple, basically just like list except that instead of parsing a
comma seperated list of values, we parse a comma seperated list of key/value pairs. Such a pair is handled
by the nested keyValue parser.
> object :: Parser (Map.Map String JSON) > object = liftM Map.fromList $ braces $ commaSep keyValue > where > keyValue = do key <- stringLiteral > symbol ":" > val <- value > return (key, val)
And that’s it! To test this easily, we finally add a main action that simply parses a JSON value from standard input.
> main = getContents >>= parseTest value
Testing the parser
A simple test, including a comment in the middle:
arnarb-macbook:tmp arnarb$ runhaskell json.hs
[1,"two",false,{"one":1, /* a comment here */ "two":2, "list":[]}]
< here I pressed ctrl-d to insert an EOF, rest is output >
L [I 1,S "two",B False,O (fromList [("list",L []),("one",I 1),("two",I 2)])]
As an more complex test, we run it on the first example from json.org (newlines and indent added to the output for clarity).
arnarb-macbook:tmp arnarb$ runhaskell json.hs
{
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": {
"GlossEntry": {
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": {
"para": "A meta-markup language, used to create markup languages such as DocBook.",
"GlossSeeAlso": ["GML", "XML"]
},
"GlossSee": "markup"
}
}
}
}
}
< here I pressed ctrl-d to insert an EOF, rest is output >
O (fromList [
("glossary",O (fromList [
("GlossDiv",O (fromList [
("GlossList",O (fromList [
("GlossEntry",O (fromList [
("Abbrev",S "ISO 8879:1986"),
("Acronym",S "SGML"),
("GlossDef",O (fromList [
("GlossSeeAlso",L [S "GML",S "XML"]),
("para",S "A meta-markup language, used to create markup languages such as DocBook.")
])),
("GlossSee",S "markup"),
("GlossTerm",S "Standard Generalized Markup Language"),
("ID",S "SGML"),
("SortAs",S "SGML")
]))
])),
("title",S "S")
])),
("title",S "example glossary")
]))
])
Finally, we should note that many people have done this before :)
Edits:
- Added link to Scala version.
- Small revisisons, thanks to
kpreidon#haskellfor pointing outbracesandsquares. - Stop lugging
lexer around, thanks to Paczesiowa on reddit for useful suggestions.
oops sorry I missed the note about skipping floats.
That’s a nice sample parser, Arnar.
But does JSON really support comments? I haven’t found any reference to comments in rfc4627. Also, and this is probably nitpicking, are you sure stringLiteral parses the exact syntax of JSON strings?
Hey Stephan,
The “official” syntax diagrams on json.org don’t show support for comments, although I believe comments are supported by JavaScript’s
eval()- at least they are in Firefox.As for the string parsing, no - I’m not at all sure if it parses them according to the JavaScript/JSON syntax. I know it uses Haskell syntax, one might be a superset of the other and although it is easy to check, I didn’t. I thought about this when writing the blog post but decided to gloss over it.
The reason is that the post’s main purpose is to show an example of how easy it is to construct parsers using Parsec. JSON is a well understood language that serves as a good example, but since the JSON part came second after showing parser usage - I allwed myself to take some liberties with it :)
In any case, if you need a proper Haskell JSON library, take a look at the links in the last paragraph.