Archive

Archive for October, 2008

Generating legal subsets of a dependency graph

October 13th, 2008

This post is literate Haskell

Update: Almost immediately after posting this, I started receiving helpful comments that perhaps introducing multithreading is an overkill here. While I do like the notion of asynchronous message passing via MVars, the comments do have a strong point. Instead of changing the post, and thereby doing the commenters a disservice by invalidating their comments, I just put a non-threaded version of this on gist.github. Indeed, it is simpler than what is found on this page and even a little bit faster :)

I lurk on several mailing lists, one of them being the MochiKit list. One thing recently discussed there is a tool to let users of MochiKit create their own customized version of it. This entails a GUI that allows a user to select which module he or she needs, and the webapp takes care of including all the dependencies as well.

This led me to a fun exercise. Say we have a dependency graph, which by nature must by a directed acyclic graph (DAG), otherwise you could have circular dependencies. Now, if you have N modules with no interdependencies, then there are 2^N possible ways of choosing a subset. However, if you restrict yourself only to those subsets that work, i.e. those where all dependencies of included modules are also included, then that puts a limit on this number.

For example, if you have two modules Base and Sub, and the latter depends on the former, then you would not allow the set containing just Sub. In mathematical terms, you only allow closed subsets of modules, where closed means:

A subset S of modules is closed if and only if the following holds:

  • If M is a module in S, and M depends on module P, then P is also in S

MochiKit contains around 15 modules and their dependency graph looks like this:

MochiKit dependencies diagram

I wanted to know if it was feasible to pre-generate all legal combinations of modules, and the main factor of that is exactly how many are there. Clearly there are fewer than the roughly 16.000 which there would be if there were no dependencies. Generating all dependency-closed subsets seemed like a very classical graph theoretical problem so someone must have written an algorithm for it. I asked my friend and professor, Magnús Már, for some terms to put into Google. Instead he gave me a hint on how to solve this.

The hint is this: To every closed subset of modules, there is a distinct anti-chain of modules that represents that set. An anti-chain of modules is a collection of modules such that no two depend on each other, neither directly nor indirectly.

For example, say we want the modules Logging, DOM, Style, Selector and Visual. That means we have to take all the modules in the green area below. The anti-chain for this set of modules are the blue modules.

Antichain visualization

You can see how the set of the blue modules is the smallest set of modules needed to pull in all the modules that we want.

Since there is a one-to-one correspondence between anti-chains and legal subsets, then we need only find the anti-chains and we can generate the subsets.

First some preliminaries,

> module Main where
> 
> import Control.Concurrent (forkIO)
> import Control.Concurrent.MVar
> import Data.Maybe (fromJust)
> import Data.List (nub, intersect)

We then set up our data as a simple list of pairs, where the first element is the module name and the second is a list of its dependencies. This list must be ordered in topological order, meaning that a module may not appear before any of its dependencies. This is easy to do by hand by looking at the above picture.

> dependencies = [
>  ("Base",[]),
>  ("DateTime",["Base"]),
>  ("Format",["Base"]),
>  ("Iter",["Base"]),
>  ("Async",["Base"]),
>  ("DOM",["Base"]),
>  ("Style",["Base"]),
>  ("Color",["DOM", "Style"]),
>  ("Logging",["Base"]),
>  ("LoggingPane",["Logging"]),
>  ("Selector",["DOM", "Style"]),
>  ("Signal",["DOM", "Style"]),
>  ("Visual",["Color"]),
>  ("DragAndDrop",["Iter","Signal","Visual"]),
>  ("Sortable",["DragAndDrop"])
>  ]
> 
> -- For convenience, derive the list of module names 
> modules = map fst dependencies

The first thing we need to do is to generate all anti-chains. We will do this in a dedicated thread and feed the anti-chains as we find them, one by one, on an MVar. Note that the MVar holds a Maybe value, this is so that we can indicate the end by sending Nothing.

The algorithm is simple backtracking through recursion, we start with the empty set which is always an anti-chain. At each step, we do the following.

  1. We check if the candidate element is dependent on any element in the anti-chain already. If it is not, we can add it and recurse.

  2. Since the anti-chain is still valid without the element, we also recurse on the unchanged anti-chain but discard the candidate element.

The recursion bottoms up when there are no more candidate elements to consider. Note that the topological order of candidates is crucial here.

The helper function accepts determines if a candidate is allowed in a chain. This is simply done by closing the candidate (see later) and seeing if there is an overlap with the current anti-chain.

> antichains :: MVar (Maybe [String]) -> IO ()
> antichains channel = antichains' [] modules >> putMVar channel Nothing
>     where
>       antichains' :: [String] -> [String] -> IO ()
>       antichains' ac [] = putMVar channel (Just ac) >> return ()
>       antichains' ac (candidate:rest) =
>           do if ac `accepts` candidate
>                 then antichains' (candidate:ac) rest
>                 else return ()
>              antichains' ac rest
>       chain `accepts` candidate = null $ intersect (closure [candidate]) chain

Above we mentioned closing a module, or more precisely a set of modules. This means to take a set of modules and adding all modules needed to make it closed with respect to dependencies. This is used both in the accepts helper above but also to map the anti-chains to what we ultimately want, the closed sets.

> closure :: [String] -> [String]
> closure ms = 
>     let missing = nub $ filter (not . flip elem ms) 
>                   $ concatMap (fromJust . flip lookup dependencies) ms
>     in
>       if null missing
>          then ms
>          else closure (ms ++ missing)

The anti-chain generator above feeds the anti-chains to an MVar. The following IO action is run in a second thread and it eats off the anti-chains from the MVar, closes them up and prints. We use a second MVar to notify the main thread when we are done.

> printer :: MVar () -> MVar (Maybe [String]) -> IO ()
> printer stop in_ = 
>     do v <- takeMVar in_
>        case v of
>          Just xs -> putStrLn $ show $ closure xs
>          Nothing -> putMVar stop ()
>        printer stop in_

Finally, to tie it all together, the main action creates the MVars for communication and sparks separate threads for the anti-chain generation and for the printing.

> main :: IO ()
> main = do chan <- newEmptyMVar
>           stop <- newEmptyMVar
>           forkIO $ printer stop chan
>           forkIO $ antichains chan
>           takeMVar stop
>           return ()

That’s it! The result: For the dependency graph of MochiKit above, there are only 817 legal combinations of modules. Actually - 816 if you discount the empty one :)

I’ll leave it up to someone else to answer the original question about feasibility of generating and storing that number of files.

Arnar Haskell, Programming , , , ,

Two handy Ubiquity commands

October 8th, 2008

This blog has been very quiet, I guess our contributors are pretty busy with school work. Or at least they were. :) Since I’m pretty busy with mine, here are two ultra short tips.

Ubiquity is an ultra-handy (and geek friendly) tool for Firefox that allows for text commands. Today I created two Ubiquity commands that I find rather handy - especially for redditors :)

Automatic Markdown links

I write this blog in the Markdown markup format, which is the same format that is used for reddit comments (which I have been known to make occasionally). When writing blog posts, I often find that I mention a word or a phrase that is naturally turned into a link. This means I have to open a new tab, Google the word or phrase - open the relevant page and copy the URL into my post or comment, making sure it is in the correct Markdown format for links. Now, Ubiquity is very apt at solving these problems, so I wrote a command that allows me to do one of two things:

  1. I write a phrase, say “Reykjavik University”. I then select that phrase, invoke ubiquity (in my case it is Option+Space) and type “mdlink” and hit enter. This performs a Google search on the phrase, picks the first result and uses its URL to make the phrase a proper Markdown link, replacing the selected text: [Reykjavik University](http://www.ru.is/)

  2. Instead of typing the phrase and selecting it, I can also just invoke Ubuiquity and type “mdlink Reykjavik University” directly in the prompt - with the same result.

The command can be found and installed from Gist.github. Notice the (very cool) feature of Gist that it recognizes Ubiquity commands and inserts the appropriate markup to make FF offer you to install it.

Follow this link to see and/or install the “Markdown link” command

jQuery injection

When browsing, I often find the need to do it programmatically, executing Javascript in the context of a 3rd party webpage. For this it is very handy to have something like jQuery on hand and thus I’ve used a bookmarklet to inject jQuery into any web-page. That makes it very easy to do fancy stuff (like down-voting every comment by a reddit user) through the Firebug JS console.

I converted the bookmarklet linked above into a simple Ubiquity command, allowing for quicker access to the “Inject jQuery” operation. You can get the bookmarklet on its gist.github entry:

Follow this link to see and/or install the “Inject jQuery” command

Arnar Uncategorized