Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Starter exited with status -1073740940

Date: Thu, 10 Jul 2014 17:43:33 -0500
From: "John (TJ) Knoeller" <johnkn@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Starter exited with status -1073740940

That error code is 0xC0000374 which is STATUS_HEAP_CORRUPTION definedin ntstatus.h.

Logging of the priv change only happens after the change was successfulso the crash is

whatever happens next after this line.

c:\condor\execute\dir_18128\userdir\src\condor_starter.v6.1\basestarter.cpp:1789

The next thing to happen is a
  mkdir the working directory
  write machine ad and job ad into the working directory
  set acls are on the working directory

if (ENCRYPT_EXECUTE_DIRECTORY) Load ADVAPI32.dll to getEncryptFile() function and use it to encrypt the working dir

  chdir to working dir

dprintf( D_FULLDEBUG, "Done moving to directory \"%s\"\n",WorkingDir.Value() );

So since you aren't seeing "Done moving to directory...", the problemmust happen as we are setting up the working directory.


Can you tell how far into the process we got?
was the working directory made?
were ads written to it?
do you have encryption enabled?

None of this code ever calls exit, so the exit must be happening downinside some library.


-tj


On 7/10/2014 1:21 PM, Ben Cotton wrote:

I'm running HTCondor 8.2.1 in a small cluster on AWS and I'm having a
hard time getting my Windows jobs to run. The Windows execute node is
Server 2k8 R2 (which HTCondor identifies as Windows 7). The job
matches, appears to start, but then the condor_starter.exe dies. The
StartLog records:

ïïïïïïï07/10/14 17:48:47 condor_read() failed: recv(fd=1012) returned
-1, errno = 10054 , reading 5 bytes from <127.0.0.1:50882>.
07/10/14 17:48:47 IO: Failed to read packet header
07/10/14 17:48:47 Closing job ClassAd update socket from starter.
07/10/14 17:48:47 Starter pid 336 exited with status -1073740940

 From the StarterLog:
07/10/14 17:48:47 (fd:7) (pid:336) (D_HOSTNAME) Daemon client (shadow)
address determined: name: "ip-10-151-7-218.ec2.internal", pool:
"NULL", alias: "NULL", addr: "<10.151.7.218:48140?noUDP>"
07/10/14 17:48:47 (fd:7) (pid:336) (D_ALWAYS) Communicating with
shadow <10.151.7.218:48140?noUDP>
07/10/14 17:48:47 (fd:7) (pid:336) (D_ALWAYS) Submitting machine is
"ip-10-151-7-218.ec2.internal"
07/10/14 17:48:47 (fd:7) (pid:336) (D_SYSCALLS) Doing
CONDOR_register_starter_info
07/10/14 17:48:47 (fd:7) (pid:336) (D_NETWORK) condor_write(fd=604
<10.151.7.218:59144>,,size=515,timeout=300,flags=0,non_blocking=0)
07/10/14 17:48:47 (fd:7) (pid:336) (D_NETWORK) condor_read(fd=604
<10.151.7.218:59144>,,size=5,timeout=300,flags=0,non_blocking=0)
07/10/14 17:48:47 (fd:7) (pid:336) (D_NETWORK) condor_read(fd=604
<10.151.7.218:59144>,,size=8,timeout=300,flags=0,non_blocking=0)
07/10/14 17:48:47 (fd:7) (pid:336) (D_ALWAYS) setting the orig job
name in starter
07/10/14 17:48:47 (fd:7) (pid:336) (D_ALWAYS) setting the orig job iwd
in starter
07/10/14 17:48:47 (fd:7) (pid:336) (D_PRIV) PRIV_CONDOR -->
PRIV_CONDOR at c:\condor\execute\dir_18128\userdir\src\condor_starter.v6.1\basestarter.cpp:1789

And then it goes poof. I see on the MagicNumbers page[1] that negative
statuses might mean "Possibly missing libraries or missing functions
in libraries on Windows. Try running from the command line to see if
you get any errors." I tried running from the command line and got no
output, error or otherwise. The other daemons seem to be fine. Any
ideas what's going on here?

[1] https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=MagicNumbers


Thanks,
BC

Follow-Ups:
- Re: [HTCondor-users] Starter exited with status -1073740940
  - From: Ben Cotton

References:
- [HTCondor-users] Starter exited with status -1073740940
  - From: Ben Cotton

Prev by Date: Re: [HTCondor-users] Condor + Docker?
Next by Date: [HTCondor-users] CFP: 7th IEEE Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS) 2014 @ IEEE/ACM SC14
Previous by thread: [HTCondor-users] Starter exited with status -1073740940
Next by thread: Re: [HTCondor-users] Starter exited with status -1073740940
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] Starter exited with status -1073740940