Generating legal subsets of a dependency graph

This post is literate Haskell

Update: Almost immediately after posting this, I started receiving helpful comments that perhaps introducing multithreading is an overkill here. While I do like the notion of asynchronous message passing via MVars, the comments do have a strong point. Instead of changing the post, and thereby doing the commenters a disservice by invalidating their comments, I just put a non-threaded version of this on gist.github. Indeed, it is simpler than what is found on this page and even a little bit faster :)

I lurk on several mailing lists, one of them being the MochiKit list. One thing recently discussed there is a tool to let users of MochiKit create their own customized version of it. This entails a GUI that allows a user to select which module he or she needs, and the webapp takes care of including all the dependencies as well.

This led me to a fun exercise. Say we have a dependency graph, which by nature must by a directed acyclic graph (DAG), otherwise you could have circular dependencies. Now, if you have N modules with no interdependencies, then there are 2^N possible ways of choosing a subset. However, if you restrict yourself only to those subsets that work, i.e. those where all dependencies of included modules are also included, then that puts a limit on this number.

For example, if you have two modules Base and Sub, and the latter depends on the former, then you would not allow the set containing just Sub. In mathematical terms, you only allow closed subsets of modules, where closed means:

A subset S of modules is closed if and only if the following holds:

  • If M is a module in S, and M depends on module P, then P is also in S

MochiKit contains around 15 modules and their dependency graph looks like this:

MochiKit dependencies diagram

I wanted to know if it was feasible to pre-generate all legal combinations of modules, and the main factor of that is exactly how many are there. Clearly there are fewer than the roughly 16.000 which there would be if there were no dependencies. Generating all dependency-closed subsets seemed like a very classical graph theoretical problem so someone must have written an algorithm for it. I asked my friend and professor, Magnús Már, for some terms to put into Google. Instead he gave me a hint on how to solve this.

The hint is this: To every closed subset of modules, there is a distinct anti-chain of modules that represents that set. An anti-chain of modules is a collection of modules such that no two depend on each other, neither directly nor indirectly.

For example, say we want the modules Logging, DOM, Style, Selector and Visual. That means we have to take all the modules in the green area below. The anti-chain for this set of modules are the blue modules.

Antichain visualization

You can see how the set of the blue modules is the smallest set of modules needed to pull in all the modules that we want.

Since there is a one-to-one correspondence between anti-chains and legal subsets, then we need only find the anti-chains and we can generate the subsets.

First some preliminaries,

> module Main where
> 
> import Control.Concurrent (forkIO)
> import Control.Concurrent.MVar
> import Data.Maybe (fromJust)
> import Data.List (nub, intersect)

We then set up our data as a simple list of pairs, where the first element is the module name and the second is a list of its dependencies. This list must be ordered in topological order, meaning that a module may not appear before any of its dependencies. This is easy to do by hand by looking at the above picture.

> dependencies = [
>  ("Base",[]),
>  ("DateTime",["Base"]),
>  ("Format",["Base"]),
>  ("Iter",["Base"]),
>  ("Async",["Base"]),
>  ("DOM",["Base"]),
>  ("Style",["Base"]),
>  ("Color",["DOM", "Style"]),
>  ("Logging",["Base"]),
>  ("LoggingPane",["Logging"]),
>  ("Selector",["DOM", "Style"]),
>  ("Signal",["DOM", "Style"]),
>  ("Visual",["Color"]),
>  ("DragAndDrop",["Iter","Signal","Visual"]),
>  ("Sortable",["DragAndDrop"])
>  ]
> 
> -- For convenience, derive the list of module names 
> modules = map fst dependencies

The first thing we need to do is to generate all anti-chains. We will do this in a dedicated thread and feed the anti-chains as we find them, one by one, on an MVar. Note that the MVar holds a Maybe value, this is so that we can indicate the end by sending Nothing.

The algorithm is simple backtracking through recursion, we start with the empty set which is always an anti-chain. At each step, we do the following.

  1. We check if the candidate element is dependent on any element in the anti-chain already. If it is not, we can add it and recurse.

  2. Since the anti-chain is still valid without the element, we also recurse on the unchanged anti-chain but discard the candidate element.

The recursion bottoms up when there are no more candidate elements to consider. Note that the topological order of candidates is crucial here.

The helper function accepts determines if a candidate is allowed in a chain. This is simply done by closing the candidate (see later) and seeing if there is an overlap with the current anti-chain.

> antichains :: MVar (Maybe [String]) -> IO ()
> antichains channel = antichains' [] modules >> putMVar channel Nothing
>     where
>       antichains' :: [String] -> [String] -> IO ()
>       antichains' ac [] = putMVar channel (Just ac) >> return ()
>       antichains' ac (candidate:rest) =
>           do if ac `accepts` candidate
>                 then antichains' (candidate:ac) rest
>                 else return ()
>              antichains' ac rest
>       chain `accepts` candidate = null $ intersect (closure [candidate]) chain

Above we mentioned closing a module, or more precisely a set of modules. This means to take a set of modules and adding all modules needed to make it closed with respect to dependencies. This is used both in the accepts helper above but also to map the anti-chains to what we ultimately want, the closed sets.

> closure :: [String] -> [String]
> closure ms = 
>     let missing = nub $ filter (not . flip elem ms) 
>                   $ concatMap (fromJust . flip lookup dependencies) ms
>     in
>       if null missing
>          then ms
>          else closure (ms ++ missing)

The anti-chain generator above feeds the anti-chains to an MVar. The following IO action is run in a second thread and it eats off the anti-chains from the MVar, closes them up and prints. We use a second MVar to notify the main thread when we are done.

> printer :: MVar () -> MVar (Maybe [String]) -> IO ()
> printer stop in_ = 
>     do v <- takeMVar in_
>        case v of
>          Just xs -> putStrLn $ show $ closure xs
>          Nothing -> putMVar stop ()
>        printer stop in_

Finally, to tie it all together, the main action creates the MVars for communication and sparks separate threads for the anti-chain generation and for the printing.

> main :: IO ()
> main = do chan <- newEmptyMVar
>           stop <- newEmptyMVar
>           forkIO $ printer stop chan
>           forkIO $ antichains chan
>           takeMVar stop
>           return ()

That’s it! The result: For the dependency graph of MochiKit above, there are only 817 legal combinations of modules. Actually - 816 if you discount the empty one :)

I’ll leave it up to someone else to answer the original question about feasibility of generating and storing that number of files.

8 Comments

  1. Thomas hartman:

    Neat article.

    One thing. I just skimmed this, but if you’re only computing the transitive closure couldn’t you use the fgl library function at

    http://www.haskell.org/ghc/docs/latest/html/libraries/fgl/Data-Graph-Inductive-Query-TransClos.html

    and avoid all the IO?

  2. Per Cederberg:

    Actually, the Style module depends on DOM. So the dependency graph is a bit deeper than what is outlined above… And probably, there are fewer valid combinations as well.

  3. Arnar:

    Thomas: Thanks for the link. However, calculating the closure is trivial so I don’t want to introduce an external dependency for that. The real problem here is finding all the anti-chains.

    Per: Ah, right. The magic number is 769 then :)

  4. Josef Svenningsson:

    Why do you use concurrent programming to solve the task? There is nothing concurrent about the problem.

  5. Arnar:

    Josef: I just find the problem more easily expressed in a CSP like manner, generating objects and feeding them on a channel where someone else picks them up - rather than generating a list.

    Concurrency is useful for more things than simply parallelizing computation.

    So, I ask in return: Why not?

  6. Saizan:

    I agree with Josef Svenningsson that there’s no need to introduce forkIO or MVars in this program. Using the Writer monad you can still very easily print the sets of modules as soon as they are produced, but avoiding IO in antichains and the need to synchronize threads.

  7. Arnar:

    Saizan: I can see how the Writer monad lets me easily collect up the anti-chains, but how does it allow one to print them as soon as they are produced, without pulling the antichains function in IO?

    Edit: Ah ok, got it - adding note to the top of the blog

  8. Josef Svenningsson:

    Arnar, thanks for your update. I’ve elaborated a bit about what I mean in a post on reddit.

Leave a comment

Sorry about the captcha, we were getting buried in spam. At least this one serves a purpose.