Archive

Archive for the ‘Haskell’ Category

The Non-Determinism Monad

February 8th, 2009

Yesterday I saw a beautiful blog entry on Monads by Gregg Reynolds. It explains Moggi’s motivation for using them as an abstraction for computations with certain properties (e.g. side-effects and non-determinism). It beautifully describes the general notion without obstructing the view with concrete examples, as there are plenty of those in other places. If you haven’t already, I suggest you read it (it’s short).

I want to give a tiny example of one important, but often overlooked monad; the non-determinism monad, more commonly known as the list monad. Many explanations of why a list of type [a] is a monad start and end with pointing out that the bind operation >>= is simply equal to concatMap. For some reason, this explanation does not provide me with any insight. This post is meant help people with a similar mindset.

Lists are very useful as, well, lists of elements, but we can also view them in another light. In the lazy setting of Haskell, lists can be viewed as computations rather than data. A term of type [a] is a computation that has potentially multiple outcomes each of type a, i.e. a non-deterministic computation. To understand such computations, let’s look at an example.

State spaces are often non-deterministic, meaning that moving from a given state may put us in any number of states. A simple example of a state space is found in the game of Tic-Tac-Toe. From any given state, the player whose turn it is has (potentially) multiple choices for marking a square.

In Haskell, we can represent the state of a tic-tac-toe game with a board of nine squares, and a turn indicator that says which player should move.

import Data.Maybe
import Data.List
 
data Player = X | O deriving (Show, Eq)
 
data State = St { board :: [Maybe Player]
                , turn :: Player
                } deriving (Show)
 
initSt :: State
initSt = St (replicate 9 Nothing) X
 
-- for convenience
other X = O
other O = X

For the squares we use Nothing to represent an empty square and e.g. Just X to mark a square marked with X. Now, to define the state space we need to define a successor function. This function, given a state, returns all possible states after one move (we’ll ignore the action for now). The move rules of tic-tac-toe are simple, the player whose turn it is can mark any empty square, and the other player will have the next turn.

Without thinking about things, I came up with this

succ :: State -> [State]
succ s = map (\b -> St b (other $ turn s)) $ mark [] [] (board s) (turn s)
    where
      mark acc pre (Nothing:post) p = mark ((pre ++ (Just p):post):acc) (pre ++ [Nothing]) post p
      mark acc pre (x:post)       p = mark acc                          (pre ++ [x])       post p
      mark acc _   []             _ = acc

Ugh, that’s ugly! The mark function keeps a current square, a list of squares preceding it and a list of squares following it. One by one it marks the current square, accumulating new board configurations in acc and moves one square from the post list to the pre list. (Before you make a comment on list comprehensions, read on :)

The problem here is that we’re considering all the successor states, i.e. we’re reading State -> [State] as a function from states to lists of states. If we instead read it as a non-deterministic function from states to states, we can write code that only worries about one successor, using the fact that [State] is a monad.

succ' :: State -> [State]
succ' s = do emptyIdx <- findIndices isNothing (board s)
             let (a,b) = splitAt emptyIdx (board s)
                 player = turn s
             return $ St (a ++ [Just player] ++ tail b) (other player)

What is happening here is that using do notation, we are binding several operations in the non-deterministic monad. The only non-determinism rises from the first one, emptyIdx <- findIndices isNothing (board s). It is important not to think of this line as assigning a list to a variable, but assigning a non-deterministic variable. In do blocks, we use <- to pluck values out of monads, and in the case of the non-determinism monad the line says: let emptyIdx be any of the values returned by the non-deterministic function findIndices.

From that point, we simply use the value emptyIdx as if it referred to one specific index of an empty cell, and construct a state where we mark that particular cell. Although it looks like we’re just constructing one successor state, the result will be non-deterministic, i.e. a list of all possible successor states.

*Main> succ' initSt
[ St {board = [Just X,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing], turn = O}
, St {board = [Nothing,Just X,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing], turn = O}
, St {board = [Nothing,Nothing,Just X,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing], turn = O}
, St {board = [Nothing,Nothing,Nothing,Just X,Nothing,Nothing,Nothing,Nothing,Nothing], turn = O}
, St {board = [Nothing,Nothing,Nothing,Nothing,Just X,Nothing,Nothing,Nothing,Nothing], turn = O}
, St {board = [Nothing,Nothing,Nothing,Nothing,Nothing,Just X,Nothing,Nothing,Nothing], turn = O}
, St {board = [Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Just X,Nothing,Nothing], turn = O}
, St {board = [Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Just X,Nothing], turn = O}
, St {board = [Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Just X], turn = O}
]

Simplifying by extracting generic patterns

Being good Haskell citizens, we realize that we can extract a generic pattern from this, namely replacing a single element of a list by its index.

replace :: Int -> a -> [a] -> [a]
replace idx el xs = before ++ (el : tail after)
    where (before,after) = splitAt idx xs

Note also that in succ we only use the state variable s to access the fields of the ADT. In that case, it is simpler to use pattern matching. This, and our new replace function, yields a much more readable successor function.

succ'' :: State -> [State]
succ'' (St board player) =
    do empty <- findIndices isNothing board
       return $ St (replace empty (Just player) board) (other player)

List comprehensions

If you are alert, you might have noticed that the latest successor function looks very much like a list comprehension. In fact, it is precisely equivalent to this:

succ''' :: State -> [State]
succ''' (St board player) =
    [St (replace empty (Just player) board) (other player)
     | empty <- findIndices isNothing board]

I’ll leave it up to you to decide which is more readable, but it shows a strong semantic link between do-notation over the non-deterministic monad and list comprehensions.

I find the non-deterministic viewpoint a useful one, and sometimes it is more powerful than list comprehensions. For an example, think of one of the most basic non-deterministic processes: throwing a die. Say you have two six-sided dice to throw, and the outcome is the sum. What are the possible outcomes (in other words, what is the non-deterministic outcome)? We can easily answer this with a list comprehension.

Prelude> [d1+d2 | d1 <- [1..6], d2 <- [1..6]]
[2,3,4,5,6,7,3,4,5,6,7,8,4,5,6,7,8,9,5,6,7,8,9,10,6,7,8,9,10,11,7,8,9,10,11,12]

Thinking in terms of the non-determinism monad, it looks similar.

Prelude> do { d1 <- [1..6] ; d2 <- [1..6] ; return (d1 + d2) }
[2,3,4,5,6,7,3,4,5,6,7,8,4,5,6,7,8,9,5,6,7,8,9,10,6,7,8,9,10,11,7,8,9,10,11,12]

Note that the last expression uses return. In the non-determinism monad, it is equivalent to writing [d1 + d2], i.e. the result is also non-deterministic. For example, say that we throw the two dice, but after the throw we have a choice to either use their sum as the outcome, or the product. This adds one level of non-determinism, which is very simple to express.

Prelude> do { d1 <- [1..6] ; d2 <- [1..6] ; [d1 + d2, d1 * d2] }
[2,1,3,2,4,3,5,4,6,5,7,6,3,2,4,4,5,6,6,8,7,10,8,12,4,3,5,6,6,9,7,12,8,15,9,18,5,4,6,8,7,12,8,16,9,20,10,24,6,5,7,10,8,15,9,20,10,25,11,30,7,6,8,12,9,18,10,24,11,30,12,36]

Neat, isn’t it? Non-determinism was one of the properties that Moggi had in mind when choosing his notion of computations, and viewing lists as non-deterministic values can be a valuable tool to understand and write succinct code.

Arnar Haskell , ,

Generating legal subsets of a dependency graph

October 13th, 2008

This post is literate Haskell

Update: Almost immediately after posting this, I started receiving helpful comments that perhaps introducing multithreading is an overkill here. While I do like the notion of asynchronous message passing via MVars, the comments do have a strong point. Instead of changing the post, and thereby doing the commenters a disservice by invalidating their comments, I just put a non-threaded version of this on gist.github. Indeed, it is simpler than what is found on this page and even a little bit faster :)

I lurk on several mailing lists, one of them being the MochiKit list. One thing recently discussed there is a tool to let users of MochiKit create their own customized version of it. This entails a GUI that allows a user to select which module he or she needs, and the webapp takes care of including all the dependencies as well.

This led me to a fun exercise. Say we have a dependency graph, which by nature must by a directed acyclic graph (DAG), otherwise you could have circular dependencies. Now, if you have N modules with no interdependencies, then there are 2^N possible ways of choosing a subset. However, if you restrict yourself only to those subsets that work, i.e. those where all dependencies of included modules are also included, then that puts a limit on this number.

For example, if you have two modules Base and Sub, and the latter depends on the former, then you would not allow the set containing just Sub. In mathematical terms, you only allow closed subsets of modules, where closed means:

A subset S of modules is closed if and only if the following holds:

  • If M is a module in S, and M depends on module P, then P is also in S

MochiKit contains around 15 modules and their dependency graph looks like this:

MochiKit dependencies diagram

I wanted to know if it was feasible to pre-generate all legal combinations of modules, and the main factor of that is exactly how many are there. Clearly there are fewer than the roughly 16.000 which there would be if there were no dependencies. Generating all dependency-closed subsets seemed like a very classical graph theoretical problem so someone must have written an algorithm for it. I asked my friend and professor, Magnús Már, for some terms to put into Google. Instead he gave me a hint on how to solve this.

The hint is this: To every closed subset of modules, there is a distinct anti-chain of modules that represents that set. An anti-chain of modules is a collection of modules such that no two depend on each other, neither directly nor indirectly.

For example, say we want the modules Logging, DOM, Style, Selector and Visual. That means we have to take all the modules in the green area below. The anti-chain for this set of modules are the blue modules.

Antichain visualization

You can see how the set of the blue modules is the smallest set of modules needed to pull in all the modules that we want.

Since there is a one-to-one correspondence between anti-chains and legal subsets, then we need only find the anti-chains and we can generate the subsets.

First some preliminaries,

> module Main where
> 
> import Control.Concurrent (forkIO)
> import Control.Concurrent.MVar
> import Data.Maybe (fromJust)
> import Data.List (nub, intersect)

We then set up our data as a simple list of pairs, where the first element is the module name and the second is a list of its dependencies. This list must be ordered in topological order, meaning that a module may not appear before any of its dependencies. This is easy to do by hand by looking at the above picture.

> dependencies = [
>  ("Base",[]),
>  ("DateTime",["Base"]),
>  ("Format",["Base"]),
>  ("Iter",["Base"]),
>  ("Async",["Base"]),
>  ("DOM",["Base"]),
>  ("Style",["Base"]),
>  ("Color",["DOM", "Style"]),
>  ("Logging",["Base"]),
>  ("LoggingPane",["Logging"]),
>  ("Selector",["DOM", "Style"]),
>  ("Signal",["DOM", "Style"]),
>  ("Visual",["Color"]),
>  ("DragAndDrop",["Iter","Signal","Visual"]),
>  ("Sortable",["DragAndDrop"])
>  ]
> 
> -- For convenience, derive the list of module names 
> modules = map fst dependencies

The first thing we need to do is to generate all anti-chains. We will do this in a dedicated thread and feed the anti-chains as we find them, one by one, on an MVar. Note that the MVar holds a Maybe value, this is so that we can indicate the end by sending Nothing.

The algorithm is simple backtracking through recursion, we start with the empty set which is always an anti-chain. At each step, we do the following.

  1. We check if the candidate element is dependent on any element in the anti-chain already. If it is not, we can add it and recurse.

  2. Since the anti-chain is still valid without the element, we also recurse on the unchanged anti-chain but discard the candidate element.

The recursion bottoms up when there are no more candidate elements to consider. Note that the topological order of candidates is crucial here.

The helper function accepts determines if a candidate is allowed in a chain. This is simply done by closing the candidate (see later) and seeing if there is an overlap with the current anti-chain.

> antichains :: MVar (Maybe [String]) -> IO ()
> antichains channel = antichains' [] modules >> putMVar channel Nothing
>     where
>       antichains' :: [String] -> [String] -> IO ()
>       antichains' ac [] = putMVar channel (Just ac) >> return ()
>       antichains' ac (candidate:rest) =
>           do if ac `accepts` candidate
>                 then antichains' (candidate:ac) rest
>                 else return ()
>              antichains' ac rest
>       chain `accepts` candidate = null $ intersect (closure [candidate]) chain

Above we mentioned closing a module, or more precisely a set of modules. This means to take a set of modules and adding all modules needed to make it closed with respect to dependencies. This is used both in the accepts helper above but also to map the anti-chains to what we ultimately want, the closed sets.

> closure :: [String] -> [String]
> closure ms = 
>     let missing = nub $ filter (not . flip elem ms) 
>                   $ concatMap (fromJust . flip lookup dependencies) ms
>     in
>       if null missing
>          then ms
>          else closure (ms ++ missing)

The anti-chain generator above feeds the anti-chains to an MVar. The following IO action is run in a second thread and it eats off the anti-chains from the MVar, closes them up and prints. We use a second MVar to notify the main thread when we are done.

> printer :: MVar () -> MVar (Maybe [String]) -> IO ()
> printer stop in_ = 
>     do v <- takeMVar in_
>        case v of
>          Just xs -> putStrLn $ show $ closure xs
>          Nothing -> putMVar stop ()
>        printer stop in_

Finally, to tie it all together, the main action creates the MVars for communication and sparks separate threads for the anti-chain generation and for the printing.

> main :: IO ()
> main = do chan <- newEmptyMVar
>           stop <- newEmptyMVar
>           forkIO $ printer stop chan
>           forkIO $ antichains chan
>           takeMVar stop
>           return ()

That’s it! The result: For the dependency graph of MochiKit above, there are only 817 legal combinations of modules. Actually - 816 if you discount the empty one :)

I’ll leave it up to someone else to answer the original question about feasibility of generating and storing that number of files.

Arnar Haskell, Programming , , , ,