Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] CheckpointPlatform error
- Date: Mon, 10 May 2010 10:33:57 +0200
- From: antoni artigues <tartigues@xxxxxxx>
- Subject: Re: [Condor-users] CheckpointPlatform error
Hello
Thank you for all the answers.
Ok. I have changed my job to only: echo "hello world". All the files are
world-readable/writable
After submitting the "condor_q -ana -l <Clusterid>" returns:
----------------------------
slot2@xxxxxxxxx Failed offer constraint
---
011.000: Run analysis summary. Of 6 machines,
0 are rejected by your job's requirements
2 reject your job because of their own requirements
0 match but are serving users with a better priority in the pool
4 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
0 match but are currently offline
0 are available to run your job
----------------------------
In the NegotiatorLog I found this error:
05/10 10:27:08 Phase 4.1: Negotiating with schedds ...
05/10 10:27:08 Negotiating with condor@xxxxxxxxxxxxxxxx at
<192.168.1.40:44936>
05/10 10:27:08 0 seconds so far
05/10 10:27:08 condor_read() failed: recv() returned -1, errno = 104
Connection reset by peer, reading 5 bytes from schedd
condor@xxxxxxxxxxxxxxxxx
05/10 10:27:08 IO: Failed to read packet header
05/10 10:27:08 Failed to get reply from schedd
05/10 10:27:08 Error: Ignoring submitter for this cycle
In the SchedLog I found this other error:
05/10 10:27:08 (pid:2505) Sent ad to central manager for
condor@xxxxxxxxxxxxxxxx
05/10 10:27:08 (pid:2505) Sent ad to 1 collectors for
condor@xxxxxxxxxxxxxxxx
05/10 10:27:08 (pid:2505) Can't find address for startd
dalia.intranet.iac3.eu
05/10 10:27:08 (pid:2505) PERMISSION DENIED to unauthenticated user from
host 192.168.1.40 for command 493 (NEGOTIATE_WITH_SIGATTRS), access
level NEGOTIATOR: reason: cached result for NEGOTIATOR; see first case
for the full reason
I think that I could have some error in my condor configuration.
Thanks for any advice
Regards
El vie, 07-05-2010 a las 09:38 -0500, Steven Timm escribió:
> Todd--the message is saying that CheckpointPlatform is not in
> the _job_ classad, this has nothing to do with machine classads.
> I have been seeing this warning message from condor_q -better-analyze
> for a very long time and, like you say, it is harmless.
>
> To explore the "unknown reasons" why a job is rejected by certain machines
> condor_q -ana -l <Clusterid>
> will give you the last reason that a particular
> job was rejected. Group quotas, or sometimes the negotiator
> just hasn't run yet.
>
> Steve Timm
>
>
>
> >> Queue
> >>
> >> But after the job has been submited the condor_q -better-analyze
> >> returns:
> >> -----------------------
> >> 002.000: Run analysis summary. Of 6 machines,
> >> 0 are rejected by your job's requirements
> >> 2 reject your job because of their own requirements
> >> 0 match but are serving users with a better priority in the pool
> >> 4 match but reject the job for unknown reasons
> >> 0 match but will not currently preempt their existing job
> >> 0 match but are currently offline
> >> 0 are available to run your job
> >>
> >> The following attributes are missing from the job ClassAd:
> >>
> >> CheckpointPlatform
> >> ----------------------
> >> Where is the error? What is the CheckpointPlatform?
> >
> > From the Condor Manual -
> >
> >> CheckpointPlatform: A string which opaquely encodes various aspects
> >> about a machine's operating system, hardware, and kernel attributes.
> >> It is used to identify systems where previously taken checkpoints for
> >> the standard universe may resume.
> >
> > But this strange to see better-analyze saying it is missing.
> > CheckpointPlatform should appear by default in all the machine classads, the
> > above message from condor_analyze would imply that some of your machines are
> > not advertising a checkpoint platform. Would be curious to see what this
> > command
> > condor_status -con 'CheckpointPlatform =?= UNDEFINED'
> > returns (it will print out all machines in your pool that do not have
> > CheckpointPlatform defined)
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/