Parsing JSON with Scala

In the previous post, Arnar wrote about how to do a JSON parser in Haskell using parsec. He also mentioned that I had shown him a similar parser written in Scala, but I must admit that I stole it from the upcoming book Programming in Scala. The book is very good and I recommend it to anyone that is interested in learning Scala.

Before I get on with the post, I’m gonna spend a few words on explaining Scala. It is a strictly typed language with a very powerful type inference system. Scala runs on the Java Virtual Machine, a port or the .NET CLR is available. It is a mix between functional and object oriented languages and is almost fully interoperable with Java. Scala replaces some of the constructs and mechanisms that you normally have in Java with more general things. Rather than using anonymous classes you have closures, there are no interfaces, but you have traits and so on. Scala also adds implicit conversions, operator overloading, mutable and immutable types and some more goodies.

Scala access to all of the class libraries that come with Java and everything you throw into its classpath. It also comes with its own class library which has collections, tuples, parsers, xml types, etc etc. Relevant for this post, maps are written as Map(key -> value), tuples as (first, second, third) and lists as List(1, 2, 3).

The first thing that is going to look weird is that the type of things appears after the name or definition and there is a keyword in front.

val i = 3
val j: Int = i
 
def hello: String = "Hello world"

The val keyword placed in front of the variables is to tell the compiler that an immutable value is coming up and var is the mutable counterpart. Methods are defined with the def keyword. The first two lines show the type inference system at work, since you have the number 3 it can only be an integer. If there is any doubt about the type during compilation, the compiler will complain and exit. The same goes for the return type of the “hello” method, the type String is specified but can be omitted. Since the method is a oneliner the = "Hello world" is a short hand for { return "Hello world" }.

Back to the JSON parser, the parsing combinator package has to be imported

import scala.util.parsing.combinator._

In Scala _ is a sequential any reference, or like in this case as * in Java imports. The parser itself is a token based parser, and extends from a trait called JavaTokenParsers which includes parsing for identifiers, numbers and strings.

class JSON extends JavaTokenParsers { 
        def obj: Parser[Map[String, Any]] = 
                "{"~> repsep(member, ",") <~"}" ^^ (Map() ++ _) 
 
        def arr: Parser[List[Any]] = 
                "["~> repsep(value, ",") <~"]" 
 
        def member: Parser[(String, Any)] = 
                stringLiteral~":"~value ^^ 
                        { case name~":"~value => (name, value) } 
 
        def value: Parser[Any] = ( 
                obj 
                | arr 
                | stringLiteral 
                | floatingPointNumber ^^ (_.toInt) 
                | "null" ^^ (x => null) 
                | "true" ^^ (x => true) 
                | "false" ^^ (x => false) 
                ) 
}

The root of the parse tree is defined in the method value which returns a Parser[Any]. Since the short hand notation is being used and it expects are Parser[Any], it uses implicit conversions for the Parser class and from there you get the ( ... | ... ) notation which provides you with parser alternatives. The weird ^^ thing is result conversions, so floatingPointNumber ^^ (_.toInt) means, take that floating number and return it as an integer using the wildcard _. repsep is a shorthand for interleaved repitition. Lastly the wavy dudes with ~, <~ and ~> denotes sequential composition and with the less and more than symbols it keeps to the left or right.

To run the parser and do something meaningful with it we read in a file pass it on to the parser and print the output

import java.io.FileReader 
 
object JSONTest extends JSON { 
        def main(args: Array[String]) { 
                val reader = new FileReader(args(0)) 
                println(parseAll(value, reader)) 
        } 
}

Different from Java, Scala does not have static types per se but allows you to do static like things by using a object instead of a class. Also notice how the FileReader is being imported directly from the Java class libraries.

After we’ve compiled the code and run it on the same file JSON file as Arnar provided in his example, we are presented with

[22.2] parsed: 
    Map(”glossary” -> 
        Map(”title” -> “example glossary”, “GlossDiv” -> 
            Map(”title” -> “S”, 
                “GlossList” -> 
                Map(”GlossEntry” -> 
                    Map(”Acronym” -> “SGML”, 
                        “Abbrev” -> “ISO 8879:1986″, 
                        “GlossSee” -> “markup”, 
                        “GlossTerm” -> “Standard Generalized Markup Language”, 
                        “ID” -> “SGML”, 
                        “GlossDef” -> 
                            Map(”para” -> 
                                “A meta-markup language, used to create markup 
                                languages such as DocBook.”, 
                                “GlossSeeAlso” -> List(”GML”, “XML”)), 
                        “SortAs” -> “SGML”)))))

I am to lazy to format this output nicely, but you get the picture. :) I noticed also that this JSON parser does not handle comments, which I leave as an exercise to my good readers.

I have been meaning to write something on Scala for quite some time now and now I was forced to since Arnar mentioned my crazy ranting. In the coming weeks I am hopefully going to show you some more niceness of Scala, such as implicit conversions, XML types, its functional nature, etc.

6 Comments

  1. Name:

    I noticed also that this JSON parser does not handle comments, which I leave as an exercise to my good readers.

    JSON itself doesn’t support comments. If your parser allows /* or //, then it’s not a working JSON parser.

  2. Guðmundur Bjarni:

    Great! The readers just got a lucky break then! :) I just looked at the parser Arnar wrote and saw that he supported comments. He’s a smart guy so I just took his word on it.

    Thanks for pointing out that the Scala parser is smarter than the Haskell parser :)

  3. Arnar:

    Mr. Name,

    If your parser allows /* or //, then it’s not a working JSON parser.

    By the same argument, GCC is not a working C parser then, since the // comment syntax is not supported by the C standard, but GCC handles them anyways. :)

    A “working parser” imo is a parser that correctly recognizes and parses all strings of a language, even it can parse some strings outside of the language as well as long as it makes sense.

    In any case, Javascript’s eval()function, the original motivation for JSON, supports comments.

  4. The Superunknown:

    @Arnar Take a good look at the grammar here(http://www.json.org/); it has no mention of comments whatsoever. Even the official parser by Crockford doesn’t deal with comments unless you are talking about something other than JSON… ;-)

  5. Arnar:

    @The Superunknown: Yes, I fully acknowledge that comments are not part of the JSON grammar. Consider it an extension if you will. In any case, my parser doesn’t parse all JSON strings (in particular ones including floating point numbers) - so I don’t claim it is a full JSON parser anyways.

    We might agree to disagree, but I believe that unless you specifically require it, a parser for language X might very well accept more strings than those of X, hence “language extensions”, which are very common in parser/compilers for languages defined by a standard.

  6. forestman:

    yo…

    usefull…

Leave a comment

Sorry about the captcha, we were getting buried in spam. At least this one serves a purpose.