Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] [Birdbath Related] Strange behaviour - 6.7.17
- Date: Sun, 5 Mar 2006 11:23:45 -0600
- From: Matthew Farrellee <matt@xxxxxxxxxxx>
- Subject: Re: [Condor-users] [Birdbath Related] Strange behaviour - 6.7.17
On Mar 5, 2006, at 10:14 AM, Afrasyab Bashir wrote:
[snip]
I added code to catch exceptions on about every line as per your
instructions. To my utter surprise the code has automatically started
functioning. Transactions are being carried out and files are being
sent
with the same code. On one hand i'm happy but on the other its very
frustrating because I can't understand what caused the problem.
Anyway, I
have some more queries please.
This is troubling. I was suggesting you change how errors are
reported so that you can see why things are actually failing. I never
would expect something to all of a sudden start working...
Strange Behaviour (it might not be strange for others though).
[ I've used the term 'personal computer', in this, for the computer
I'm
using /typing on / interacting on.]
1. When I have only 6.7.17 installed on both the computers that i'm
using
for my condor pool and I start the mini embedded SOAP web server on
the
remote computer then condor_status -l does return only the machine
that is
running central manager / master. It does not matter that on which
computer
I run the condor_status -l query.
condor_status will contact the central manager by default. It is
possible that if your central manager started after your personal
computer you will have to wait 5 minutes to see your personal
computer in the list.
2. When I install 6.6.9 on personal computer and 6.7.17 on remote
computer
and submit job from 6.6.9 computer (using birdbath) on to 6.7.17
computer
then I can not see the job in queue. To check the queue I use
condor_q but
queue is empty with no jobs in, none idle or etc. Therefore, I remain
unaware if my job was submitted or not.
That would be because the jobs are in the 6.7.17 schedd's queue, and
condor_q, by default, looks only at your local schedd.
3. When I install 6.7.17 on both the computers and install SOAP
server on
my personal computer and then submit the job using birdbath then
the job can
be seen idle in the queue and remains as such forever.
You can see them because they are in the queue that condor_q looks at
by default. As for being there forever, you might not have a machine
available that they match.
condor_q -analyze says, " 2 match but reject the job for unknown
reasons"
condor_status -java returns empty string. However path is correct
in both
the machines plus the fact that on command prompt job exits
gracefully after
execution.
You should look at the Requirements attribute of your job and see if
you can figure out why the match might be failing -- maybe both
machines in your pool think they have an active user?
It sounds like a configuration issue here, which is good in the sense
that you have things working with the SOAP API...
4. When I submit the job as a user whose credentials are not stored
on the
computer then I can't remove it from the queue on command line even
when I'm
trying as the administrator.
You should repost this separately as a general issue.
5. The jobs that were submitted, as a user with stored credentials,
can be
marked for removal on command prompt however condor_q keeps
displaying them
as jobs marked for removal and condor_q -analyze says , "Request is
removed"
but does show the id and this remark.
Jobs submitted with the SOAP API have their LeaveJobInQueue attribute
set (or some similar name). While it evaluates to TRUE the job will
sit in the X (removed) state in the queue. You can use condor_q_edit
to change the attribute, or the CloseSpool() SOAP call. Also,
condor_rm -forcex might work too.
Birdbath Specific Queries
6. condor_q shows all the jobs and related details within a job queue.
What's the substitute for it in birdbath? Reason of question is
that it
seems that I have to metion transaction, clusterId, jobId etc to
retrieve
the information. However, there is no such restriction in condor_q.
What if
I want to manage the jobs with birdbath?
In addition to GetJobAd(), there is a GetJobAds() call that takes a
classad expression and returns matching jobs, e.g. the expression
"TRUE" would give you everything. You can pass null (in Java) for the
transaction.
7. Can I mention a requirement as "Machine = "\marie-LAPTOP\"" for
the job
to run on that particular computer?
You sure can, but you might need an "==".
matt