Inline
On Sun, Apr 8, 2012 at 7:10 AM, Matthew Farrellee<matt@xxxxxxxxxx> wrote:
inline
On 04/07/2012 09:15 PM, Ziliang Guo wrote:
As some of you may be aware of, I'm working on parallelizing classad
matchmaking for my ME964 project. Right now I have an implementation
that takes in a single job classad and multiple machine ads and spins
up multiple threads to search for a match amongst the machine ads.
Another possibility is to have multiple threads each trying to find
matches for a different job, though that one is a bit more complicated
to pull off thanks to us modifying the machine ads during the
matchmaking process (old classad compatibility hacks ftw....). Does
anyone else have any suggestions for other variations that may be
useful or comments on the above two methods (keeping in mind only the
single job vs multiple machine match has been implemented and tested
so far)?
i can think of two high level objectives here: 0) show that the
algorithm can be parallelized; 1) make matchmaking faster. are you
aiming for one of those or something else?
i'd aim for making matchmaking faster.
let me know what your objective is and i'll comment more.
A little bit of both, mostly to make things faster. I haven't dug
deeply enough into the actual evaluation code yet to determine if the
individual evaluations could be done in parallel, but my gut reaction
is unless we can do like a thousand of them at the same time, the
sequential evaluation is so fast it's not worth it.
-
an over-simplification of the algorithm is,
0: for j in jobs (a)
1: for m in machines (b)
2: if j accepts m and m accepts j
3: add m to j's matches
current implementation parallelizes (b)?
parallelization of (a) blocked by mutating classads?
have you considered possibilities if you could swap (0) and (1)?
Yes and it's complicated by the changes we make to the two classads we
try to match. Since we set the, I think it was alternate, member of
each classad to each other, we can't match the same instance of a
classad against multiple instances at the same time. It's why for the
current implementation I have to basically have four copies of the job
classad in order to do four matchmaking operations at the same time.
If we flip 0 and 1, we'd either need to make copies of the machine ad
for every candidate, or stagger the evaluations and force each set of
evaluations to all finish before moving onto the next set. At least
with the current order of operations, I can spin up the copies of the
job ad before I spawn the threads and not do any memory allocations or
clones inside the threads.
-
Long term there is also a question of whether there is any point in
trying to parallelize the parsing of a classad from the raw text into
its expression tree, or whether this is even possible. Really long
i believe we can flyweight (save on memory and re-parsing)
expressions as the result of work tim did on classads. if the
objective is optimization, this isn't going to bear fruit.
term, would it be possible to generate data structures such that we
could attempt a match making operation on a GPU.
that would be interesting.