Hi, I have opened my ShadowLog file on submitter machine. This is what I have: 10/08 16:29:47 Using config source: /opt/condor/etc/condor_config 10/08 16:29:47 Using local config sources: 10/08 16:29:47 /opt/condor/local/condor_config.local 10/08 16:29:47 DaemonCore: Command Socket at <xxx.xxx.xxx.xxx:9606> 10/08 16:29:47 Initializing a VANILLA shadow for job 2779.0 10/08 16:29:47 (2780.0) (10934): Request to run on ec2-174-12-15-17.compute-1.amazonaws.com <174.129.158.17:9699> was ACCEPTED 10/08 16:29:47 (2778.0) (10935): Request to run on ec2-174-12-133-94.compute-1.amazonaws.com <174.129.133.94:9607> was ACCEPTED 10/08 16:29:47 (2781.0) (10936): Request to run on ec2-184-73-90-177.compute-1.amazonaws.com <184.73.90.177:9675>
was ACCEPTED 10/08 16:29:47 (2779.0) (10937): Request to run on ec2-184-73-97-53.compute-1.amazonaws.com <184.73.97.53:9620> was ACCEPTED 10/08 18:47:54 (2780.0) (10934): condor_read() failed: recv() returned -1, errno = 110 Connection timed out, reading 5 bytes from startd ec2-174-129-158-17.compute-1.amazonaws.com. 10/08 18:47:54 (2780.0) (10934): IO: Failed to read packet header 10/08 18:47:54 (2780.0) (10934): Can no longer talk to condor_starter <174.129.158.17:9699> 10/08 18:47:54 (2780.0) (10934): Trying to reconnect to disconnected job 10/08 18:47:54 (2780.0) (10934): LastJobLeaseRenewal: 1286548599 Fri Oct 8 16:36:39 2010 10/08 18:47:54 (2780.0) (10934): JobLeaseDuration: 1200 seconds 10/08 18:47:54 (2780.0) (10934): JobLeaseDuration remaining: EXPIRED! 10/08 18:47:54 (2780.0) (10934): Reconnect FAILED: Job disconnected too long:
JobLeaseDuration (1200 seconds) expired 10/08 18:47:54 (2780.0) (10934): **** condor_shadow (condor_SHADOW) pid 10934 EXITING WITH STATUS 107 10/08 18:48:38 (2779.0) (10937): condor_read() failed: recv() returned -1, errno = 110 Connection timed out, reading 5 bytes from startd ec2-184-73-97-53.compute-1.amazonaws.com. 10/08 18:48:38 (2779.0) (10937): IO: Failed to read packet header 10/08 18:48:38 (2779.0) (10937): Can no longer talk to condor_starter <184.73.97.53:9620> 10/08 18:48:38 (2779.0) (10937): Trying to reconnect to disconnected job 10/08 18:48:38 (2779.0) (10937): LastJobLeaseRenewal: 1286548643 Fri Oct 8 16:37:23 2010 10/08 18:48:38 (2779.0) (10937): JobLeaseDuration: 1200 seconds 10/08 18:48:38 (2779.0) (10937): JobLeaseDuration remaining: EXPIRED! 10/08 18:48:38 (2779.0) (10937): Reconnect FAILED: Job disconnected too long:
JobLeaseDuration (1200 seconds) expired 10/08 18:48:38 (2779.0) (10937): **** condor_shadow (condor_SHADOW) pid 10937 EXITING WITH STATUS 107 10/08 18:48:55 (2778.0) (10935): condor_read() failed: recv() returned -1, errno = 110 Connection timed out, reading 5 bytes from startd ec2-174-129-133-94.compute-1.amazonaws.com. 10/08 18:48:55 (2778.0) (10935): IO: Failed to read packet header 10/08 18:48:55 (2778.0) (10935): Can no longer talk to condor_starter <174.129.133.94:9607> 10/08 18:48:55 (2778.0) (10935): Trying to reconnect to disconnected job 10/08 18:48:55 (2778.0) (10935): LastJobLeaseRenewal: 1286548659 Fri Oct 8 16:37:39 2010 10/08 18:48:55 (2778.0) (10935): JobLeaseDuration: 1200 seconds 10/08 18:48:55 (2778.0) (10935): JobLeaseDuration remaining: EXPIRED! 10/08 18:48:55 (2778.0) (10935): Reconnect FAILED: Job disconnected too long:
JobLeaseDuration (1200 seconds) expired 10/08 18:48:55 (2778.0) (10935): **** condor_shadow (condor_SHADOW) pid 10935 EXITING WITH STATUS 107 10/08 18:49:04 (2781.0) (10936): condor_read() failed: recv() returned -1, errno = 110 Connection timed out, reading 5 bytes from startd ec2-184-73-90-177.compute-1.amazonaws.com. 10/08 18:49:04 (2781.0) (10936): IO: Failed to read packet header 10/08 18:49:04 (2781.0) (10936): Can no longer talk to condor_starter <184.73.90.177:9675> 10/08 18:49:04 (2781.0) (10936): Trying to reconnect to disconnected job 10/08 18:49:04 (2781.0) (10936): LastJobLeaseRenewal: 1286548668 Fri Oct 8 16:37:48 2010 10/08 18:49:04 (2781.0) (10936): JobLeaseDuration: 1200 seconds 10/08 18:49:04 (2781.0) (10936): JobLeaseDuration remaining: EXPIRED! 10/08 18:49:04 (2781.0) (10936): Reconnect FAILED: Job disconnected too long:
JobLeaseDuration (1200 seconds) expired 10/08 18:49:04 (2781.0) (10936): **** condor_shadow (condor_SHADOW) pid 10936 EXITING WITH STATUS 107 Thanks. --- Lun 11/10/10, michele pierri <pierm4ci@xxxxxxxx> ha scritto:
|