Posts tagged ‘Computer Science’

Generating legal subsets of a dependency graph

This post is literate Haskell

Update: Almost immediately after posting this, I started receiving helpful comments that perhaps introducing multithreading is an overkill here. While I do like the notion of asynchronous message passing via MVars, the comments do have a strong point. Instead of changing the post, and thereby doing the commenters a disservice by invalidating their comments, I just put a non-threaded version of this on gist.github. Indeed, it is simpler than what is found on this page and even a little bit faster :)

I lurk on several mailing lists, one of them being the MochiKit list. One thing recently discussed there is a tool to let users of MochiKit create their own customized version of it. This entails a GUI that allows a user to select which module he or she needs, and the webapp takes care of including all the dependencies as well.

This led me to a fun exercise. Say we have a dependency graph, which by nature must by a directed acyclic graph (DAG), otherwise you could have circular dependencies. Now, if you have N modules with no interdependencies, then there are 2^N possible ways of choosing a subset. However, if you restrict yourself only to those subsets that work, i.e. those where all dependencies of included modules are also included, then that puts a limit on this number.

For example, if you have two modules Base and Sub, and the latter depends on the former, then you would not allow the set containing just Sub. In mathematical terms, you only allow closed subsets of modules, where closed means:

A subset S of modules is closed if and only if the following holds:

  • If M is a module in S, and M depends on module P, then P is also in S

MochiKit contains around 15 modules and their dependency graph looks like this:

MochiKit dependencies diagram

I wanted to know if it was feasible to pre-generate all legal combinations of modules, and the main factor of that is exactly how many are there. Clearly there are fewer than the roughly 16.000 which there would be if there were no dependencies. Generating all dependency-closed subsets seemed like a very classical graph theoretical problem so someone must have written an algorithm for it. I asked my friend and professor, Magnús Már, for some terms to put into Google. Instead he gave me a hint on how to solve this.

The hint is this: To every closed subset of modules, there is a distinct anti-chain of modules that represents that set. An anti-chain of modules is a collection of modules such that no two depend on each other, neither directly nor indirectly.

For example, say we want the modules Logging, DOM, Style, Selector and Visual. That means we have to take all the modules in the green area below. The anti-chain for this set of modules are the blue modules.

Antichain visualization

You can see how the set of the blue modules is the smallest set of modules needed to pull in all the modules that we want.

Since there is a one-to-one correspondence between anti-chains and legal subsets, then we need only find the anti-chains and we can generate the subsets.

First some preliminaries,

> module Main where
> 
> import Control.Concurrent (forkIO)
> import Control.Concurrent.MVar
> import Data.Maybe (fromJust)
> import Data.List (nub, intersect)

We then set up our data as a simple list of pairs, where the first element is the module name and the second is a list of its dependencies. This list must be ordered in topological order, meaning that a module may not appear before any of its dependencies. This is easy to do by hand by looking at the above picture.

> dependencies = [
>  ("Base",[]),
>  ("DateTime",["Base"]),
>  ("Format",["Base"]),
>  ("Iter",["Base"]),
>  ("Async",["Base"]),
>  ("DOM",["Base"]),
>  ("Style",["Base"]),
>  ("Color",["DOM", "Style"]),
>  ("Logging",["Base"]),
>  ("LoggingPane",["Logging"]),
>  ("Selector",["DOM", "Style"]),
>  ("Signal",["DOM", "Style"]),
>  ("Visual",["Color"]),
>  ("DragAndDrop",["Iter","Signal","Visual"]),
>  ("Sortable",["DragAndDrop"])
>  ]
> 
> -- For convenience, derive the list of module names 
> modules = map fst dependencies

The first thing we need to do is to generate all anti-chains. We will do this in a dedicated thread and feed the anti-chains as we find them, one by one, on an MVar. Note that the MVar holds a Maybe value, this is so that we can indicate the end by sending Nothing.

The algorithm is simple backtracking through recursion, we start with the empty set which is always an anti-chain. At each step, we do the following.

  1. We check if the candidate element is dependent on any element in the anti-chain already. If it is not, we can add it and recurse.

  2. Since the anti-chain is still valid without the element, we also recurse on the unchanged anti-chain but discard the candidate element.

The recursion bottoms up when there are no more candidate elements to consider. Note that the topological order of candidates is crucial here.

The helper function accepts determines if a candidate is allowed in a chain. This is simply done by closing the candidate (see later) and seeing if there is an overlap with the current anti-chain.

> antichains :: MVar (Maybe [String]) -> IO ()
> antichains channel = antichains' [] modules >> putMVar channel Nothing
>     where
>       antichains' :: [String] -> [String] -> IO ()
>       antichains' ac [] = putMVar channel (Just ac) >> return ()
>       antichains' ac (candidate:rest) =
>           do if ac `accepts` candidate
>                 then antichains' (candidate:ac) rest
>                 else return ()
>              antichains' ac rest
>       chain `accepts` candidate = null $ intersect (closure [candidate]) chain

Above we mentioned closing a module, or more precisely a set of modules. This means to take a set of modules and adding all modules needed to make it closed with respect to dependencies. This is used both in the accepts helper above but also to map the anti-chains to what we ultimately want, the closed sets.

> closure :: [String] -> [String]
> closure ms = 
>     let missing = nub $ filter (not . flip elem ms) 
>                   $ concatMap (fromJust . flip lookup dependencies) ms
>     in
>       if null missing
>          then ms
>          else closure (ms ++ missing)

The anti-chain generator above feeds the anti-chains to an MVar. The following IO action is run in a second thread and it eats off the anti-chains from the MVar, closes them up and prints. We use a second MVar to notify the main thread when we are done.

> printer :: MVar () -> MVar (Maybe [String]) -> IO ()
> printer stop in_ = 
>     do v <- takeMVar in_
>        case v of
>          Just xs -> putStrLn $ show $ closure xs
>          Nothing -> putMVar stop ()
>        printer stop in_

Finally, to tie it all together, the main action creates the MVars for communication and sparks separate threads for the anti-chain generation and for the printing.

> main :: IO ()
> main = do chan <- newEmptyMVar
>           stop <- newEmptyMVar
>           forkIO $ printer stop chan
>           forkIO $ antichains chan
>           takeMVar stop
>           return ()

That’s it! The result: For the dependency graph of MochiKit above, there are only 817 legal combinations of modules. Actually - 816 if you discount the empty one :)

I’ll leave it up to someone else to answer the original question about feasibility of generating and storing that number of files.

ICALP’08, pre-conference workshops and first day

So ICALP 2008 is on. I’m fortunate enough that it is held at my school (Reykjavik University) and that I was offered the chance to sit in on the talks in return for helping a little with the organization and administration.

The activities started on Friday with the DYNAMO 2008 PhD school. I didn’t attend many of the lectures, but the ones I did were very interesting. To name one, Elias Koutsoupias gave informative talks on Game Theory and its use for example in online auctions.

Yesterday, Sunday, there were several workshops. I mostly attended SOS’08, a workshop on Structural Operational Semantics. Dale Miller, a hard-core logician, gave an excellent invited talk on formalizing SOS specifications in logic and although I didn’t understand half of it I somehow found it very interesting. His slides are available at his website.

Today was the first day of the conference proper and started off with an invited talk by Google’s S. Muthukrishnan, where he explained how Google runs auctions to decide what ads are displayed along with search results. Attendance to the talk was excellent, and while I had my hands full with other duties, I can safely say it was interesting on many levels.

After lunch I mostly attended track B, on logic, semantics and theory of programming. I’m afraid I can’t say much of merit about the talks as I had to do my best to understand them properly myself. I can say that this first day of ICALP went very smoothly in my opinion. As a bonus, the weather turned out to be very nice so in the afternoon and during the welcome reception, people could stand outside and enjoy the fresh air.

For photos from ICALP, check out our Picasa web album (updated at the end of each day). For more authoritative information on some of the talks, check out MohammadReza’s guest posts on Luca’s blog