Parsing JSON with Scala
In the previous post, Arnar wrote about how to do a JSON parser in Haskell using parsec. He also mentioned that I had shown him a similar parser written in Scala, but I must admit that I stole it from the upcoming book Programming in Scala. The book is very good and I recommend it to anyone that is interested in learning Scala.
Before I get on with the post, I’m gonna spend a few words on explaining Scala. It is a strictly typed language with a very powerful type inference system. Scala runs on the Java Virtual Machine, a port or the .NET CLR is available. It is a mix between functional and object oriented languages and is almost fully interoperable with Java. Scala replaces some of the constructs and mechanisms that you normally have in Java with more general things. Rather than using anonymous classes you have closures, there are no interfaces, but you have traits and so on. Scala also adds implicit conversions, operator overloading, mutable and immutable types and some more goodies.
Scala access to all of the class libraries that come with Java and everything you throw into its classpath. It also comes with its own class library which has collections, tuples, parsers, xml types, etc etc. Relevant for this post, maps are written as Map(key -> value), tuples as (first, second, third) and lists as List(1, 2, 3).
The first thing that is going to look weird is that the type of things appears after the name or definition and there is a keyword in front.
val i = 3 val j: Int = i def hello: String = "Hello world"
The val keyword placed in front of the variables is to tell the compiler that an immutable value is coming up and var is the mutable counterpart. Methods are defined with the def keyword. The first two lines show the type inference system at work, since you have the number 3 it can only be an integer. If there is any doubt about the type during compilation, the compiler will complain and exit. The same goes for the return type of the “hello” method, the type String is specified but can be omitted. Since the method is a oneliner the = "Hello world" is a short hand for { return "Hello world" }.
Back to the JSON parser, the parsing combinator package has to be imported
import scala.util.parsing.combinator._
In Scala _ is a sequential any reference, or like in this case as * in Java imports. The parser itself is a token based parser, and extends from a trait called JavaTokenParsers which includes parsing for identifiers, numbers and strings.
class JSON extends JavaTokenParsers { def obj: Parser[Map[String, Any]] = "{"~> repsep(member, ",") <~"}" ^^ (Map() ++ _) def arr: Parser[List[Any]] = "["~> repsep(value, ",") <~"]" def member: Parser[(String, Any)] = stringLiteral~":"~value ^^ { case name~":"~value => (name, value) } def value: Parser[Any] = ( obj | arr | stringLiteral | floatingPointNumber ^^ (_.toInt) | "null" ^^ (x => null) | "true" ^^ (x => true) | "false" ^^ (x => false) ) }
The root of the parse tree is defined in the method value which returns a Parser[Any]. Since the short hand notation is being used and it expects are Parser[Any], it uses implicit conversions for the Parser class and from there you get the ( ... | ... ) notation which provides you with parser alternatives. The weird ^^ thing is result conversions, so floatingPointNumber ^^ (_.toInt) means, take that floating number and return it as an integer using the wildcard _. repsep is a shorthand for interleaved repitition. Lastly the wavy dudes with ~, <~ and ~> denotes sequential composition and with the less and more than symbols it keeps to the left or right.
To run the parser and do something meaningful with it we read in a file pass it on to the parser and print the output
import java.io.FileReader object JSONTest extends JSON { def main(args: Array[String]) { val reader = new FileReader(args(0)) println(parseAll(value, reader)) } }
Different from Java, Scala does not have static types per se but allows you to do static like things by using a object instead of a class. Also notice how the FileReader is being imported directly from the Java class libraries.
After we’ve compiled the code and run it on the same file JSON file as Arnar provided in his example, we are presented with
[22.2] parsed:
Map(”glossary” ->
Map(”title” -> “example glossary”, “GlossDiv” ->
Map(”title” -> “S”,
“GlossList” ->
Map(”GlossEntry” ->
Map(”Acronym” -> “SGML”,
“Abbrev” -> “ISO 8879:1986″,
“GlossSee” -> “markup”,
“GlossTerm” -> “Standard Generalized Markup Language”,
“ID” -> “SGML”,
“GlossDef” ->
Map(”para” ->
“A meta-markup language, used to create markup
languages such as DocBook.”,
“GlossSeeAlso” -> List(”GML”, “XML”)),
“SortAs” -> “SGML”)))))
I am to lazy to format this output nicely, but you get the picture. :) I noticed also that this JSON parser does not handle comments, which I leave as an exercise to my good readers.
I have been meaning to write something on Scala for quite some time now and now I was forced to since Arnar mentioned my crazy ranting. In the coming weeks I am hopefully going to show you some more niceness of Scala, such as implicit conversions, XML types, its functional nature, etc.