[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Job is going into Hold state automatically



Hi all,

We have a condor pool including 70 windows hosts.
A few days ago, we met strange behaviour of the job state.
The job gone from Idle state into Hold state automatically.
What is happened in this case??

I attached StartLog, StarterLog.vmx and ShadowLog below.
What does "Download acknowledgment missing attribute" mean?
Can you help me find out what is going wrong?

Thanks,
Kohei

-- StartLog on execution host --

11/9 20:45:44 vm2: Remote job ID is 43534.0
11/9 20:45:44 vm2: Got universe "VANILLA" (5) from request classad
11/9 20:45:44 vm2: State change: claim-activation protocol successful
11/9 20:45:44 vm2: Changing activity: Idle -> Busy
11/9 20:45:45 Starter pid 2784 exited with status 4
11/9 20:45:45 vm2: State change: starter exited
11/9 20:45:45 vm2: Changing activity: Busy -> Idle

-- StarterLog.vm2 on execution host --

11/9 20:45:44 ******************************************************
11/9 20:45:44 ** condor_starter (CONDOR_STARTER) STARTING UP
11/9 20:45:44 ** C:\apl\condor\bin\condor_starter.exe
11/9 20:45:44 ** $CondorVersion: 6.8.0 Jul 19 2006 $
11/9 20:45:44 ** $CondorPlatform: INTEL-WINNT50 $
11/9 20:45:44 ** PID = 2784
11/9 20:45:44 ** Log last touched 11/9 20:45:41
11/9 20:45:44 ******************************************************
11/9 20:45:44 Using config source: C:\apl\condor\condor_config
11/9 20:45:44 Using local config sources:
11/9 20:45:44    C:\apl\condor/condor_config.local
11/9 20:45:44 DaemonCore: Command Socket at <133.189.59.157:1468>
11/9 20:45:44 Setting resource limits not implemented!
11/9 20:45:45 Communicating with shadow <133.189.59.145:2234>
11/9 20:45:45 Submitting machine is "xxxx.xxxx.xxxx.co.jp"
11/9 20:45:45 Download acknowledgment missing Result
11/9 20:45:45 DoDownload: STARTER failed to receive file(s) from <133.189.59.157:1066>; Download acknowledgment missing attribute: Result
11/9 20:45:45 File transfer failed (status=0).
11/9 20:45:45 ERROR "Failed to transfer files" at line 1649 in file ..\src\condor_starter.V6.1\jic_shadow.C
11/9 20:45:45 ShutdownFast all jobs.

-- ShadowLog on submition host --

11/9 20:45:43 ******************************************************
11/9 20:45:43 ** condor_shadow (CONDOR_SHADOW) STARTING UP
11/9 20:45:43 ** C:\apl\condor\bin\condor_shadow.exe
11/9 20:45:43 ** $CondorVersion: 6.8.0 Jul 19 2006 $
11/9 20:45:43 ** $CondorPlatform: INTEL-WINNT50 $
11/9 20:45:43 ** PID = 2148
11/9 20:45:43 ** Log last touched 11/9 20:45:40
11/9 20:45:43 ******************************************************
11/9 20:45:43 Using config source: C:\apl\condor\condor_config
11/9 20:45:43 Using local config sources:
11/9 20:45:43    C:\apl\condor/condor_config.local
11/9 20:45:43 DaemonCore: Command Socket at <133.189.59.145:2234>
11/9 20:45:43 Initializing a VANILLA shadow for job 43534.0
11/9 20:45:44 (43534.0) (2148): Request to run on <133.189.59.157:1066> was ACCEPTED
11/9 20:45:45 (43534.0) (2148): DoUpload: (Condor error code 11, subcode 0) SHADOW at 133.189.59.145 failed to send file(s) to <133.189.59.145:2234>; STARTER failed to receive file(s) from <133.189.59.157:1066> 11/9 20:45:45 (43534.0) (2148): Job 43534.0 going into Hold state (code 11,0): Error from starter on vm2@xxxxxxxxxxxxxxx: STARTER failed to receive file(s) from <133.189.59.157:1066>; Download acknowledgment missing attribute: Result
11/9 20:45:45 (43534.0) (2148): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 112