Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] ACCESS_VIOLATION under Windows
- Date: Wed, 01 Aug 2007 14:12:58 -0500
- From: Ben Burnett <burnett@xxxxxxxxxxx>
- Subject: Re: [Condor-users] ACCESS_VIOLATION under Windows
James:
That's strange; however, you have set the configuration correctly, so it's
nothing you're missing--it sounds as if they haven't been created. Could
you try turning your debugging level up (STARTER_DEBUG = D_ALL), re-run the
job, and repost the resulting logs in full.
-B
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Wojtek Goscinski
Sent: Wednesday, August 01, 2007 12:08 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] ACCESS_VIOLATION under Windows
Hi Ben,
SMTP is currently unavailable from that machine - a firewall issue
which i'm getting fixed.
I set CREATE_CORE_FILES = true - which i assume should give me a core
file in the log directory? However, I do not receive a core file in
either the machines log directory or the directory i submitted the
java job from.
Am i missing something? do i have to set something else for core files
to be dumped to log, or is it possible that a core file is not
created?
Regards,
James
On 7/31/07, Ben Burnett <burnett@xxxxxxxxxxx> wrote:
>
>
>
>
> Hi James:
>
>
>
> I wonder if you could post the core file from the execute node's
starter-it
> should have been emailed to your admin email after the crash.
>
>
>
> -B
>
>
>
>
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of
> Wojtek Goscinski
> Sent: Sunday, July 29, 2007 8:19 PM
> To: condor-users@xxxxxxxxxxx
> Subject: [Condor-users] ACCESS_VIOLATION under Windows
>
>
>
>
> Hi All,
>
> I'm experiencing a problem setting up a windows box as a condor execute
> node - specifically to execute java jobs.
>
> I have a windows box running xp sp2. It is purely set up as an execute
> node. The start deamon picks successfully picks up the job and attempts to
> execute it. It spawns the condor_starter - but the condor_starter seems to
> crash with an exception (an ACCESS_VIOLATION).
>
> As you can see in log below, the starter process seems to try to launch
> java, but this ends in an exception? The starter crashes immediately after
> that last log. I've confirmed that java exists at the location specified
> etc.
>
> I assume this might be some sort of windows security issue, but I'm not
> sure how to debug it. The condor vm user was given rights to execute the
> java directory - though i'm not sure whether this is enough.
>
> Any help or tips for debugging are most welcome.
>
> -james
>
>
> Start Log
> -------------
>
> 7/25 16:04:52 (fd:3) (pid:3636) DC_AUTHENTICATE: setting sock->decode()
> 7/25 16:04:52 (fd:3) (pid:3636) DC_AUTHENTICATE: allowing an empty
message
> for sock.
> 7/25 16:04:52 (fd:3) (pid:3636) DC_AUTHENTICATE: Success.
> 7/25 16:04:52 (fd:3) (pid:3636) DaemonCore: Command received via UDP from
> host < 172.19.189.3:9629>
> 7/25 16:04:52 (fd:3) (pid:3636) DaemonCore: received command 60011
> (DC_NOP), calling handler (handle_nop())
> 7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at
> ..\src\condor_daemon_core.V6\daemon_core.C:2743
> 7/25 16:04:52 (fd:3) (pid:3636) Calling Handler
> <HandleDC_SERVICEWAITPIDS()> for Signal 60009 <DC_SERVICEWAITPIDS>
> 7/25 16:04:52 (fd:3) (pid:3636) KEYCACHEX: removing session
> hp-test-02:3636:1185343491:6 for <172.19.189.3:9618 >
> 7/25 16:04:52 (fd:3) (pid:3636) DaemonCore: pid 3940 exited with status
> -1073741819, invoking reaper 1 <reaper>
> 7/25 16:04:52 (fd:3) (pid:3636) Starter pid 3940 died on signal
-1073741819
> (exception ACCESS_VIOLATION)
> 7/25 16:04:52 (fd:3) (pid:3636) Entering ProcFamily::hardkill
> 7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at
> ..\src\condor_c++_util\killfamily.C:274
> 7/25 16:04:52 (fd:3) (pid:3636) Destroying Daemon object:
> 7/25 16:04:52 (fd:3) (pid:3636) Type: 1 (any), Name: (null), Addr: <
> 172.19.189.3:9611>
> 7/25 16:04:52 (fd:3) (pid:3636) FullHost: (null), Host: (null), Pool:
> (null), Port: -1
> 7/25 16:04:52 (fd:3) (pid:3636) IsLocal: N, IdStr: (null), Error: (null)
> 7/25 16:04:52 (fd:3) (pid:3636) --- End of Daemon object info ---
> 7/25 16:04:52 (fd:3) (pid:3636) ProcAPI: pid # 3940 was not found
> (OpenProcess err=1308)
> 7/25 16:04:52 (fd:3) (pid:3636) ProcAPI: pid # 3940 was not found
> (OpenProcess err=1308)
> 7/25 16:04:52 (fd:3) (pid:3636) ProcFamily: parent: 3940 family:
> 7/25 16:04:52 (fd:3) (pid:3636) ProcFamily: alive_cpu_user = 0,
exited_cpu
> = 0, max_image = 3624k
> 7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at
> ..\src\condor_c++_util\killfamily.C:475
> 7/25 16:04:52 (fd:3) (pid:3636) Attempting to remove
> C:\condor\execute\dir_3940 as SuperUser (system)
> 7/25 16:04:52 (fd:3) (pid:3636) Deleted ProcFamily w/ pid 3940 as parent
> 7/25 16:04:52 (fd:3) (pid:3636) State change: starter exited
> 7/25 16:04:52 (fd:3) (pid:3636) Changing activity: Busy -> Idle
> 7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at
> ..\src\condor_daemon_core.V6\daemon_core.C:2743
> 7/25 16:04:52 (fd:3) (pid:3636) In cancel_timer(), id=66
> 7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at
> ..\src\condor_daemon_core.V6\daemon_core.C:2743
> 7/25 16:04:52 (fd:3) (pid:3636) In DaemonCore Timeout()
> 7/25 16:04:52 (fd:3) (pid:3636)
>
> Starter Log
> ----------------
> 7/25 16:04:51 (fd:8) (pid:3940) DC_AUTHENTICATE: setting sock->decode()
> 7/25 16:04:51 (fd:8) (pid:3940) DC_AUTHENTICATE: allowing an empty
message
> for sock.
> 7/25 16:04:51 (fd:8) (pid:3940) DC_AUTHENTICATE: Success.
> 7/25 16:04:51 (fd:8) (pid:3940) DaemonCore: Command received via UDP from
> host < 172.19.189.3:9614>
> 7/25 16:04:51 (fd:8) (pid:3940) DaemonCore: received command 60011
> (DC_NOP), calling handler (handle_nop())
> 7/25 16:04:51 (fd:8) (pid:3940) PRIV_CONDOR --> PRIV_CONDOR at
> ..\src\condor_daemon_core.V6\daemon_core.C:2743
> 7/25 16:04:51 (fd:8) (pid:3940) Calling Handler
> <HandleDC_SERVICEWAITPIDS()> for Signal 60009 <DC_SERVICEWAITPIDS>
> 7/25 16:04:51 (fd:8) (pid:3940) DaemonCore: tid 3300 exited with status
1,
> invoking reaper 2 <FileTransfer::Reaper()>
> 7/25 16:04:51 (fd:8) (pid:3940) File transfer completed successfully.
> 7/25 16:04:51 (fd:6) (pid:3940) Destroying Daemon object:
> 7/25 16:04:51 (fd:6) (pid:3940) Type: 1 (any), Name: (null), Addr:
> <172.19.189.3:9618>
> 7/25 16:04:51 (fd:6) (pid:3940) FullHost: (null), Host: (null), Pool:
> (null), Port: -1
> 7/25 16:04:51 (fd:6) (pid:3940) IsLocal: N, IdStr: (null), Error: (null)
> 7/25 16:04:51 (fd:6) (pid:3940) --- End of Daemon object info ---
> 7/25 16:04:52 (fd:6) (pid:3940) Calling client FileTransfer handler
> function.
> 7/25 16:04:52 (fd:6) (pid:3940) in DaemonCore NewTimer()
> 7/25 16:04:52 (fd:6) (pid:3940)
> 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> Timers
> 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> ~~~~~~
> 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 7, when = 1185343492,
> period = 0, handler_descrip=<deferred job start>
> 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 6, when = 1185343551,
> period = 0, handler_descrip=<dc_touch_log_file>
> 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 3, when = 1185343731,
> period = 240, handler_descrip=<self_monitor>
> 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 1, when = 1185343791,
> period = 300, handler_descrip=<check_session_cache>
> 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 5, when = 1185344661,
> period = 1170,
> handler_descrip=<DaemonCore::SendAliveToParent>
> 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 2, when = 1185345292,
> period = 1801, handler_descrip=<handle_cookie_refresh>
> 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 4, when = 1185372290,
> period = 0, handler_descrip=<DaemonCore::ReInit()>
> 7/25 16:04:52 (fd:6) (pid:3940)
> 7/25 16:04:52 (fd:6) (pid:3940) leaving DaemonCore NewTimer, id=7
> 7/25 16:04:52 (fd:6) (pid:3940) Job 71.0 set to execute immediately
> 7/25 16:04:52 (fd:6) (pid:3940) PRIV_CONDOR --> PRIV_CONDOR at
> ..\src\condor_daemon_core.V6\daemon_core.C:2743
> 7/25 16:04:52 (fd:6) (pid:3940) PRIV_CONDOR --> PRIV_CONDOR at
> ..\src\condor_daemon_core.V6\daemon_core.C:2743
> 7/25 16:04:52 (fd:6) (pid:3940) In DaemonCore Timeout()
> 7/25 16:04:52 (fd:6) (pid:3940)
> 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> Timers
> 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> ~~~~~~
> 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 7, when = 1185343492,
> period = 0, handler_descrip=<deferred job start>
> 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 6, when = 1185343551,
> period = 0, handler_descrip=<dc_touch_log_file>
> 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 3, when = 1185343731,
> period = 240, handler_descrip=<self_monitor>
> 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 1, when = 1185343791,
> period = 300, handler_descrip=<check_session_cache>
> 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 5, when = 1185344661,
> period = 1170,
> handler_descrip=<DaemonCore::SendAliveToParent>
> 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 2, when = 1185345292,
> period = 1801, handler_descrip=<handle_cookie_refresh>
> 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 4, when = 1185372290,
> period = 0, handler_descrip=<DaemonCore::ReInit()>
> 7/25 16:04:52 (fd:6) (pid:3940)
> 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore: Calling handler for Timer 7
> (deferred job start)
> 7/25 16:04:52 (fd:6) (pid:3940) Starting a JAVA universe job with ID:
71.0
> 7/25 16:04:52 (fd:6) (pid:3940) In OsProc::OsProc()
> 7/25 16:04:52 (fd:6) (pid:3940) Main job KillSignal: 15 (Unknown)
> 7/25 16:04:52 (fd:6) (pid:3940) Main job RmKillSignal: 15 (Unknown)
> 7/25 16:04:52 (fd:6) (pid:3940) Main job HoldKillSignal: 15 (Unknown)
> 7/25 16:04:52 (fd:6) (pid:3940) SYSAPI_GET_LOADAVG is undefined, using
> default value of True
> 7/25 16:04:52 (fd:6) (pid:3940) JavaProc: Cmd="C:\\Program
> Files\\Java\\jre1.5.0_06\\bin\\JAVA.EXE"
> 7/25 16:04:52 (fd:6) (pid:3940) JavaProc: Args=-Xmx247m -classpath
> C:\condor/lib;C:\condor/lib/scimark2lib.jar;.
> -Dchirp.config=C:\condor\execute\dir_3940\chirp.config
> CondorJavaWrapper C:\condor\execute\dir_3940\jvm.start
> C:\condor\execute\dir_3940\jvm.end JavaTest
> 7/25 16:04:52 (fd:6) (pid:3940) in VanillaProc::StartJob()
> 7/25 16:04:52 (fd:6) (pid:3940) in OsProc::StartJob()
> 7/25 16:04:52 (fd:6) (pid:3940) IWD: C:\condor/execute\dir_3940
> 7/25 16:04:52 (fd:6) (pid:3940) get_port_range - (LOWPORT,HIGHPORT) is
> (9600,9700).
> 7/25 16:04:52 (fd:6) (pid:3940) TokenCache contents:
> condor-reuse-vm1@.
> 7/25 16:04:52 (fd:6) (pid:3940) PRIV_CONDOR --> PRIV_USER at
> ..\src\condor_starter.V6.1\os_proc.C:227
> 7/25 16:04:52 (fd:7) (pid:3940) Input file: NUL
> 7/25 16:04:52 (fd:8) (pid:3940) Output file:
> C:\condor/execute\dir_3940\JavaTest.output.0
> 7/25 16:04:52 (fd:9) (pid:3940) Error file:
> C:\condor/execute\dir_3940\JavaTest.error.0
> 7/25 16:04:52 (fd:9) (pid:3940) Doing CONDOR_begin_execution
> 7/25 16:04:52 (fd:9) (pid:3940) condor_read(): nfds=0
> 7/25 16:04:52 (fd:9) (pid:3940) condor_read(): nfound=1
> 7/25 16:04:52 (fd:9) (pid:3940) condor_read(): nfds=0
> 7/25 16:04:52 (fd:9) (pid:3940) condor_read(): nfound=1
> 7/25 16:04:52 (fd:9) (pid:3940) Renice expr "10" evaluated to 10
> 7/25 16:04:52 (fd:9) (pid:3940) About to exec
> C:\condor/execute\dir_3940\"C:\\Program
> Files\\Java\\jre1.5.0_06\\bin\\JAVA.EXE" -Xmx247m
> -classpath C:\condor/lib;C:\condor/lib/scimark2lib.jar;. -
> Dchirp.config=C:\condor\execute\dir_3940\chirp.config
> CondorJavaWrapper C:\condor\execute\dir_3940\jvm.start
> C:\condor\execute\dir_3940\jvm.end JavaTest
> 7/25 16:04:52 (fd:9) (pid:3940) Env =
> _CONDOR_SCRATCH_DIR=C:\condor\execute\dir_3940
> _CONDOR_HIGHPORT=9700 _CONDOR_LOWPORT=9600
> 7/25 16:04:52 (fd:9) (pid:3940)
> JOB_INHERITS_STARTER_ENVIRONMENT is undefined, using
> default value of False
> 7/25 16:04:52 (fd:9) (pid:3940) PRIV_USER --> PRIV_CONDOR at
> ..\src\condor_starter.V6.1\os_proc.C:343
> 7/25 16:04:52 (fd:9) (pid:3940) In
> DaemonCore::Create_Process(C:\condor/execute\dir_3940\"C:\\Program
> Files\\Java\\jre1.5.0_06\\bin\\JAVA.EXE",...)
>
>
>
>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to
> condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/