[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] job rejected for unknow reason.....machine Failedrank condition: MY.Rank > MY.CurrentRank



Hi,
I have opened my ShadowLog file on submitter machine.
This is what I have:
10/08 16:29:47 Using config source: /opt/condor/etc/condor_config
10/08 16:29:47 Using local config sources: 
10/08 16:29:47    /opt/condor/local/condor_config.local
10/08 16:29:47 DaemonCore: Command Socket at <xxx.xxx.xxx.xxx:9606>
10/08 16:29:47 Initializing a VANILLA shadow for job 2779.0
10/08 16:29:47 (2780.0) (10934): Request to run on ec2-174-12-15-17.compute-1.amazonaws.com <174.129.158.17:9699> was ACCEPTED
10/08 16:29:47 (2778.0) (10935): Request to run on ec2-174-12-133-94.compute-1.amazonaws.com <174.129.133.94:9607> was ACCEPTED
10/08 16:29:47 (2781.0) (10936): Request to run on ec2-184-73-90-177.compute-1.amazonaws.com <184.73.90.177:9675> was ACCEPTED
10/08 16:29:47 (2779.0) (10937): Request to run on ec2-184-73-97-53.compute-1.amazonaws.com <184.73.97.53:9620> was ACCEPTED
10/08 18:47:54 (2780.0) (10934): condor_read() failed: recv() returned -1, errno = 110 Connection timed out, reading 5 bytes from startd ec2-174-129-158-17.compute-1.amazonaws.com.
10/08 18:47:54 (2780.0) (10934): IO: Failed to read packet header
10/08 18:47:54 (2780.0) (10934): Can no longer talk to condor_starter <174.129.158.17:9699>
10/08 18:47:54 (2780.0) (10934): Trying to reconnect to disconnected job
10/08 18:47:54 (2780.0) (10934): LastJobLeaseRenewal: 1286548599 Fri Oct  8 16:36:39 2010
10/08 18:47:54 (2780.0) (10934): JobLeaseDuration: 1200 seconds
10/08 18:47:54 (2780.0) (10934): JobLeaseDuration remaining: EXPIRED!
10/08 18:47:54 (2780.0) (10934): Reconnect FAILED: Job disconnected too long: JobLeaseDuration (1200 seconds) expired
10/08 18:47:54 (2780.0) (10934): **** condor_shadow (condor_SHADOW) pid 10934 EXITING WITH STATUS 107
10/08 18:48:38 (2779.0) (10937): condor_read() failed: recv() returned -1, errno = 110 Connection timed out, reading 5 bytes from startd ec2-184-73-97-53.compute-1.amazonaws.com.
10/08 18:48:38 (2779.0) (10937): IO: Failed to read packet header
10/08 18:48:38 (2779.0) (10937): Can no longer talk to condor_starter <184.73.97.53:9620>
10/08 18:48:38 (2779.0) (10937): Trying to reconnect to disconnected job
10/08 18:48:38 (2779.0) (10937): LastJobLeaseRenewal: 1286548643 Fri Oct  8 16:37:23 2010
10/08 18:48:38 (2779.0) (10937): JobLeaseDuration: 1200 seconds
10/08 18:48:38 (2779.0) (10937): JobLeaseDuration remaining: EXPIRED!
10/08 18:48:38 (2779.0) (10937): Reconnect FAILED: Job disconnected too long: JobLeaseDuration (1200 seconds) expired
10/08 18:48:38 (2779.0) (10937): **** condor_shadow (condor_SHADOW) pid 10937 EXITING WITH STATUS 107
10/08 18:48:55 (2778.0) (10935): condor_read() failed: recv() returned -1, errno = 110 Connection timed out, reading 5 bytes from startd ec2-174-129-133-94.compute-1.amazonaws.com.
10/08 18:48:55 (2778.0) (10935): IO: Failed to read packet header
10/08 18:48:55 (2778.0) (10935): Can no longer talk to condor_starter <174.129.133.94:9607>
10/08 18:48:55 (2778.0) (10935): Trying to reconnect to disconnected job
10/08 18:48:55 (2778.0) (10935): LastJobLeaseRenewal: 1286548659 Fri Oct  8 16:37:39 2010
10/08 18:48:55 (2778.0) (10935): JobLeaseDuration: 1200 seconds
10/08 18:48:55 (2778.0) (10935): JobLeaseDuration remaining: EXPIRED!
10/08 18:48:55 (2778.0) (10935): Reconnect FAILED: Job disconnected too long: JobLeaseDuration (1200 seconds) expired
10/08 18:48:55 (2778.0) (10935): **** condor_shadow (condor_SHADOW) pid 10935 EXITING WITH STATUS 107
10/08 18:49:04 (2781.0) (10936): condor_read() failed: recv() returned -1, errno = 110 Connection timed out, reading 5 bytes from startd ec2-184-73-90-177.compute-1.amazonaws.com.
10/08 18:49:04 (2781.0) (10936): IO: Failed to read packet header
10/08 18:49:04 (2781.0) (10936): Can no longer talk to condor_starter <184.73.90.177:9675>
10/08 18:49:04 (2781.0) (10936): Trying to reconnect to disconnected job
10/08 18:49:04 (2781.0) (10936): LastJobLeaseRenewal: 1286548668 Fri Oct  8 16:37:48 2010
10/08 18:49:04 (2781.0) (10936): JobLeaseDuration: 1200 seconds
10/08 18:49:04 (2781.0) (10936): JobLeaseDuration remaining: EXPIRED!
10/08 18:49:04 (2781.0) (10936): Reconnect FAILED: Job disconnected too long: JobLeaseDuration (1200 seconds) expired
10/08 18:49:04 (2781.0) (10936): **** condor_shadow (condor_SHADOW) pid 10936 EXITING WITH STATUS 107

Thanks.

--- Lun 11/10/10, michele pierri <pierm4ci@xxxxxxxx> ha scritto:

Da: michele pierri <pierm4ci@xxxxxxxx>
Oggetto: Re: [Condor-users] job rejected for unknow reason.....machine Failedrank condition: MY.Rank > MY.CurrentRank
A: "Condor-Users Mail List" <condor-users@xxxxxxxxxxx>
Data: Lunedì 11 ottobre 2010, 21:30

I am using linux based machine (Ubuntu). 
A few days ago everythin worker fine but now I have this error.
What may be the problem?

Thanks

--- Lun 11/10/10, Alas, Alex [FEDI] <aalas@xxxxxxxxxxxxx> ha scritto:

Da: Alas, Alex [FEDI] <aalas@xxxxxxxxxxxxx>
Oggetto: Re: [Condor-users] job rejected for unknow reason.....machine Failedrank condition: MY.Rank > MY.CurrentRank
A: "Condor-Users Mail List" <condor-users@xxxxxxxxxxx>
Data: Lunedì 11 ottobre 2010, 18:38

Hello Michele,

I usually I see this error message in my windows condor pool when users don’t have their credentials stored on the execute nodes.

I don’t know if you are dealing with a windows or unix based condor pool, since it’s not specified on your e-mail.

Hope this helps,

Alex

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of michele pierri
Sent: Monday, October 11, 2010 10:35 AM
To: condor-users@xxxxxxxxxxx
Subject: [Condor-users] job rejected for unknow reason.....machine Failedrank condition: MY.Rank > MY.CurrentRank

 

Hi,

I have this problem...all the job that I submit are rejected for unknow reasons.

If I type condor_q -ana -l job_id I have returned:

 

-- Submitter: submitter_machine : <xxx.xxx.xxx.xxx:xxxx> : 

machine_name Failed rank condition: MY.Rank > MY.CurrentRank

---

2795.000:  Run analysis summary.  Of 1 machines,

      0 are rejected by your job's requirements

      0 reject your job because of their own requirements

      0 match but are serving users with a better priority in the pool

      1 match but reject the job for unknown reasons

      0 match but will not currently preempt their existing job

      0 match but are currently offline

      0 are available to run your job

 

 

What is the problem? What I have to do?

Thank you so much.

 


-----Segue allegato-----

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


-----Segue allegato-----

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/