Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] condor_submit never return with condor 7.2.1
- Date: Tue, 31 Mar 2009 12:27:27 -0400
- From: Frédéric Bastien <nouiz@xxxxxxxxx>
- Subject: Re: [Condor-users] condor_submit never return with condor 7.2.1
Hi,
This don't give me more data on the console. The only thing in the console is:
Submitting job(s).
Logging submit event(s)
Where are the debug data supposed to go? In the log?
Here is the list of condor_process
condorr 14978 0.0 0.1 30832 3540 ? Ss 11:58 0:00
/opt/condor/sbin/condor_master
condorr 14979 0.0 0.2 31188 4980 ? Ss 11:58 0:00
condor_schedd -f
condorr 14980 1.1 0.2 30632 4592 ? Ss 11:58 0:04
condor_startd -f
root 14981 0.0 0.1 20040 3276 ? S 11:58 0:00
condor_procd -A
/tmp/condor-lock.atchoum0.780986706101/procd_pipe.SCHEDD -S 60 -C
51860
bastienf 15349 4.0 0.1 29168 3116 pts/6 R+ 12:02 0:05
condor_submit -debug
LOGS.NOBACKUP/echo_1_2009-03-31_12:02:42.874654/submit_file.condor
bastienf 15577 4.1 0.1 30248 4028 ? S 12:03 0:03
condor_shadow -f 1.0 --schedd=<132.204.26.92:9601>
--xfer-queue=limit=upload,download;addr=<132.204.26.92:9666>
<132.204.26.92:9666> -
as you can see, condor_submit is running for more then 5 minutes. The
condor_shadow have been started, but the stdout, stderr and log file
are empty. Can you confirm me that the condor_submit should be
finished before the jobs is matched?
condor_q tell me that the jobs is running event if their is notting on
the compute server.
in the SchedLog on the submit node I have:
3/31 12:03:04 (pid:14979) Sent ad to central manager for
bastienf@xxxxxxxxxxxxxxxx
3/31 12:03:04 (pid:14979) Sent ad to 1 collectors for bastienf@xxxxxxxxxxxxxxxx
3/31 12:03:56 (pid:14979) Negotiating for owner: bastienf@xxxxxxxxxxxxxxxx
3/31 12:03:56 (pid:14979) AutoCluster:config() significant atttributes
changed to OWNER,IOJob,JobUniverse,LastCheckpointPlatform,NumCkpts,slot1_IOJob,slot2_IOJob,slot3_IOJob,slot4
_IOJob,slot5_IOJob,slot6_IOJob,slot7_IOJob,slot8_IOJob
3/31 12:03:56 (pid:14979) Checking consistency running and runnable jobs
3/31 12:03:56 (pid:14979) Tables are consistent
3/31 12:03:56 (pid:14979) Rebuilt prioritized runnable job list in 0.000s.
3/31 12:03:56 (pid:14979) Out of jobs - 1 jobs matched, 0 jobs idle,
flock level = 0
3/31 12:03:56 (pid:14979) Sent ad to central manager for
bastienf@xxxxxxxxxxxxxxxx
3/31 12:03:56 (pid:14979) Sent ad to 1 collectors for bastienf@xxxxxxxxxxxxxxxx
3/31 12:03:56 (pid:14979) Sent REQUEST_CLAIM to startd
slot3@xxxxxxxxxxxxxxxxxxxxxxx <132.204.27.64:47146> for
bastienf@xxxxxxxxxxxxxxxx
3/31 12:03:56 (pid:14979) Starting add_shadow_birthdate(1.0)
3/31 12:03:56 (pid:14979) Started shadow for job 1.0 on
slot3@xxxxxxxxxxxxxxxxxxxxxxx <132.204.27.64:47146> for
bastienf@xxxxxxxxxxxxxxxx, (shadow pid = 15577)
3/31 12:04:56 (pid:14979) Sent ad to central manager for
bastienf@xxxxxxxxxxxxxxxx
3/31 12:04:56 (pid:14979) Sent ad to 1 collectors for bastienf@xxxxxxxxxxxxxxxx
3/31 12:04:58 (pid:14979) Activity on stashed negotiator socket
3/31 12:04:58 (pid:14979) Negotiating for owner: bastienf@xxxxxxxxxxxxxxxx
3/31 12:04:58 (pid:14979) Out of servers - 0 jobs matched, 1 jobs
idle, 0 jobs rejected
3/31 12:05:56 (pid:14979) Sent ad to central manager for
bastienf@xxxxxxxxxxxxxxxx
3/31 12:05:56 (pid:14979) Sent ad to 1 collectors for bastienf@xxxxxxxxxxxxxxxx
3/31 12:06:56 (pid:14979) Sent ad to central manager for
bastienf@xxxxxxxxxxxxxxxx
In the shadow log:
3/31 12:03:56 Initializing a VANILLA shadow for job 1.0
3/31 12:03:56 (1.0) (15577): Request to run on
slot3@xxxxxxxxxxxxxxxxxxxxxxx <132.204.27.64:47146> was ACCEPTED
On the central node in the NegotiatorLog and MatchLog file. It tell me
that the jobs have been matched.
I would really appreciate to solve this issue as it happen more often
then before and I can reproduice it each time on some computer with
some user.
Also, I use FS and FS_REMOTE as authentification methods.
thanks for your time
Frédéric bastien
On Mon, Mar 30, 2009 at 5:13 PM, Ian Chesal <ICHESAL@xxxxxxxxxx> wrote:
>> Do you have any advise where I could look to solve this? What log
>> would help you?
>
> You can try turning on tool debugging:
>
> TOOL_DEBUG = True
>
> in your condor_config file. And then running condor_submit with -debug.
>
> That might help give you a little more information the tool side as to
> what's going on.
>
> - Ian
>
> Confidentiality Notice.
> This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution, or copying of this message, or any attachments, is strictly prohibited. If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments. Thank you.
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>