Ben, Please find attached the logs and configs for the box in question. Regards, james On 8/2/07, Ben Burnett <burnett@xxxxxxxxxxx> wrote: > James: > > That's strange; however, you have set the configuration correctly, so it's > nothing you're missing--it sounds as if they haven't been created. Could > you try turning your debugging level up (STARTER_DEBUG = D_ALL), re-run the > job, and repost the resulting logs in full. > > -B > > -----Original Message----- > From: condor-users-bounces@xxxxxxxxxxx > [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Wojtek Goscinski > Sent: Wednesday, August 01, 2007 12:08 AM > To: Condor-Users Mail List > Subject: Re: [Condor-users] ACCESS_VIOLATION under Windows > > Hi Ben, > > SMTP is currently unavailable from that machine - a firewall issue > which i'm getting fixed. > I set CREATE_CORE_FILES = true - which i assume should give me a core > file in the log directory? However, I do not receive a core file in > either the machines log directory or the directory i submitted the > java job from. > > Am i missing something? do i have to set something else for core files > to be dumped to log, or is it possible that a core file is not > created? > > Regards, > > James > > > On 7/31/07, Ben Burnett <burnett@xxxxxxxxxxx> wrote: > > > > > > > > > > Hi James: > > > > > > > > I wonder if you could post the core file from the execute node's > starter-it > > should have been emailed to your admin email after the crash. > > > > > > > > -B > > > > > > > > > > From: condor-users-bounces@xxxxxxxxxxx > > [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of > > Wojtek Goscinski > > Sent: Sunday, July 29, 2007 8:19 PM > > To: condor-users@xxxxxxxxxxx > > Subject: [Condor-users] ACCESS_VIOLATION under Windows > > > > > > > > > > Hi All, > > > > I'm experiencing a problem setting up a windows box as a condor execute > > node - specifically to execute java jobs. > > > > I have a windows box running xp sp2. It is purely set up as an execute > > node. The start deamon picks successfully picks up the job and attempts to > > execute it. It spawns the condor_starter - but the condor_starter seems to > > crash with an exception (an ACCESS_VIOLATION). > > > > As you can see in log below, the starter process seems to try to launch > > java, but this ends in an exception? The starter crashes immediately after > > that last log. I've confirmed that java exists at the location specified > > etc. > > > > I assume this might be some sort of windows security issue, but I'm not > > sure how to debug it. The condor vm user was given rights to execute the > > java directory - though i'm not sure whether this is enough. > > > > Any help or tips for debugging are most welcome. > > > > -james > > > > > > Start Log > > ------------- > > > > 7/25 16:04:52 (fd:3) (pid:3636) DC_AUTHENTICATE: setting sock->decode() > > 7/25 16:04:52 (fd:3) (pid:3636) DC_AUTHENTICATE: allowing an empty > message > > for sock. > > 7/25 16:04:52 (fd:3) (pid:3636) DC_AUTHENTICATE: Success. > > 7/25 16:04:52 (fd:3) (pid:3636) DaemonCore: Command received via UDP from > > host < 172.19.189.3:9629> > > 7/25 16:04:52 (fd:3) (pid:3636) DaemonCore: received command 60011 > > (DC_NOP), calling handler (handle_nop()) > > 7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at > > ..\src\condor_daemon_core.V6\daemon_core.C:2743 > > 7/25 16:04:52 (fd:3) (pid:3636) Calling Handler > > <HandleDC_SERVICEWAITPIDS()> for Signal 60009 <DC_SERVICEWAITPIDS> > > 7/25 16:04:52 (fd:3) (pid:3636) KEYCACHEX: removing session > > hp-test-02:3636:1185343491:6 for <172.19.189.3:9618 > > > 7/25 16:04:52 (fd:3) (pid:3636) DaemonCore: pid 3940 exited with status > > -1073741819, invoking reaper 1 <reaper> > > 7/25 16:04:52 (fd:3) (pid:3636) Starter pid 3940 died on signal > -1073741819 > > (exception ACCESS_VIOLATION) > > 7/25 16:04:52 (fd:3) (pid:3636) Entering ProcFamily::hardkill > > 7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at > > ..\src\condor_c++_util\killfamily.C:274 > > 7/25 16:04:52 (fd:3) (pid:3636) Destroying Daemon object: > > 7/25 16:04:52 (fd:3) (pid:3636) Type: 1 (any), Name: (null), Addr: < > > 172.19.189.3:9611> > > 7/25 16:04:52 (fd:3) (pid:3636) FullHost: (null), Host: (null), Pool: > > (null), Port: -1 > > 7/25 16:04:52 (fd:3) (pid:3636) IsLocal: N, IdStr: (null), Error: (null) > > 7/25 16:04:52 (fd:3) (pid:3636) --- End of Daemon object info --- > > 7/25 16:04:52 (fd:3) (pid:3636) ProcAPI: pid # 3940 was not found > > (OpenProcess err=1308) > > 7/25 16:04:52 (fd:3) (pid:3636) ProcAPI: pid # 3940 was not found > > (OpenProcess err=1308) > > 7/25 16:04:52 (fd:3) (pid:3636) ProcFamily: parent: 3940 family: > > 7/25 16:04:52 (fd:3) (pid:3636) ProcFamily: alive_cpu_user = 0, > exited_cpu > > = 0, max_image = 3624k > > 7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at > > ..\src\condor_c++_util\killfamily.C:475 > > 7/25 16:04:52 (fd:3) (pid:3636) Attempting to remove > > C:\condor\execute\dir_3940 as SuperUser (system) > > 7/25 16:04:52 (fd:3) (pid:3636) Deleted ProcFamily w/ pid 3940 as parent > > 7/25 16:04:52 (fd:3) (pid:3636) State change: starter exited > > 7/25 16:04:52 (fd:3) (pid:3636) Changing activity: Busy -> Idle > > 7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at > > ..\src\condor_daemon_core.V6\daemon_core.C:2743 > > 7/25 16:04:52 (fd:3) (pid:3636) In cancel_timer(), id=66 > > 7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at > > ..\src\condor_daemon_core.V6\daemon_core.C:2743 > > 7/25 16:04:52 (fd:3) (pid:3636) In DaemonCore Timeout() > > 7/25 16:04:52 (fd:3) (pid:3636) > > > > Starter Log > > ---------------- > > 7/25 16:04:51 (fd:8) (pid:3940) DC_AUTHENTICATE: setting sock->decode() > > 7/25 16:04:51 (fd:8) (pid:3940) DC_AUTHENTICATE: allowing an empty > message > > for sock. > > 7/25 16:04:51 (fd:8) (pid:3940) DC_AUTHENTICATE: Success. > > 7/25 16:04:51 (fd:8) (pid:3940) DaemonCore: Command received via UDP from > > host < 172.19.189.3:9614> > > 7/25 16:04:51 (fd:8) (pid:3940) DaemonCore: received command 60011 > > (DC_NOP), calling handler (handle_nop()) > > 7/25 16:04:51 (fd:8) (pid:3940) PRIV_CONDOR --> PRIV_CONDOR at > > ..\src\condor_daemon_core.V6\daemon_core.C:2743 > > 7/25 16:04:51 (fd:8) (pid:3940) Calling Handler > > <HandleDC_SERVICEWAITPIDS()> for Signal 60009 <DC_SERVICEWAITPIDS> > > 7/25 16:04:51 (fd:8) (pid:3940) DaemonCore: tid 3300 exited with status > 1, > > invoking reaper 2 <FileTransfer::Reaper()> > > 7/25 16:04:51 (fd:8) (pid:3940) File transfer completed successfully. > > 7/25 16:04:51 (fd:6) (pid:3940) Destroying Daemon object: > > 7/25 16:04:51 (fd:6) (pid:3940) Type: 1 (any), Name: (null), Addr: > > <172.19.189.3:9618> > > 7/25 16:04:51 (fd:6) (pid:3940) FullHost: (null), Host: (null), Pool: > > (null), Port: -1 > > 7/25 16:04:51 (fd:6) (pid:3940) IsLocal: N, IdStr: (null), Error: (null) > > 7/25 16:04:51 (fd:6) (pid:3940) --- End of Daemon object info --- > > 7/25 16:04:52 (fd:6) (pid:3940) Calling client FileTransfer handler > > function. > > 7/25 16:04:52 (fd:6) (pid:3940) in DaemonCore NewTimer() > > 7/25 16:04:52 (fd:6) (pid:3940) > > 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> Timers > > 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> ~~~~~~ > > 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 7, when = 1185343492, > > period = 0, handler_descrip=<deferred job start> > > 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 6, when = 1185343551, > > period = 0, handler_descrip=<dc_touch_log_file> > > 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 3, when = 1185343731, > > period = 240, handler_descrip=<self_monitor> > > 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 1, when = 1185343791, > > period = 300, handler_descrip=<check_session_cache> > > 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 5, when = 1185344661, > > period = 1170, > > handler_descrip=<DaemonCore::SendAliveToParent> > > 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 2, when = 1185345292, > > period = 1801, handler_descrip=<handle_cookie_refresh> > > 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 4, when = 1185372290, > > period = 0, handler_descrip=<DaemonCore::ReInit()> > > 7/25 16:04:52 (fd:6) (pid:3940) > > 7/25 16:04:52 (fd:6) (pid:3940) leaving DaemonCore NewTimer, id=7 > > 7/25 16:04:52 (fd:6) (pid:3940) Job 71.0 set to execute immediately > > 7/25 16:04:52 (fd:6) (pid:3940) PRIV_CONDOR --> PRIV_CONDOR at > > ..\src\condor_daemon_core.V6\daemon_core.C:2743 > > 7/25 16:04:52 (fd:6) (pid:3940) PRIV_CONDOR --> PRIV_CONDOR at > > ..\src\condor_daemon_core.V6\daemon_core.C:2743 > > 7/25 16:04:52 (fd:6) (pid:3940) In DaemonCore Timeout() > > 7/25 16:04:52 (fd:6) (pid:3940) > > 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> Timers > > 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> ~~~~~~ > > 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 7, when = 1185343492, > > period = 0, handler_descrip=<deferred job start> > > 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 6, when = 1185343551, > > period = 0, handler_descrip=<dc_touch_log_file> > > 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 3, when = 1185343731, > > period = 240, handler_descrip=<self_monitor> > > 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 1, when = 1185343791, > > period = 300, handler_descrip=<check_session_cache> > > 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 5, when = 1185344661, > > period = 1170, > > handler_descrip=<DaemonCore::SendAliveToParent> > > 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 2, when = 1185345292, > > period = 1801, handler_descrip=<handle_cookie_refresh> > > 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 4, when = 1185372290, > > period = 0, handler_descrip=<DaemonCore::ReInit()> > > 7/25 16:04:52 (fd:6) (pid:3940) > > 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore: Calling handler for Timer 7 > > (deferred job start) > > 7/25 16:04:52 (fd:6) (pid:3940) Starting a JAVA universe job with ID: > 71.0 > > 7/25 16:04:52 (fd:6) (pid:3940) In OsProc::OsProc() > > 7/25 16:04:52 (fd:6) (pid:3940) Main job KillSignal: 15 (Unknown) > > 7/25 16:04:52 (fd:6) (pid:3940) Main job RmKillSignal: 15 (Unknown) > > 7/25 16:04:52 (fd:6) (pid:3940) Main job HoldKillSignal: 15 (Unknown) > > 7/25 16:04:52 (fd:6) (pid:3940) SYSAPI_GET_LOADAVG is undefined, using > > default value of True > > 7/25 16:04:52 (fd:6) (pid:3940) JavaProc: Cmd="C:\\Program > > Files\\Java\\jre1.5.0_06\\bin\\JAVA.EXE" > > 7/25 16:04:52 (fd:6) (pid:3940) JavaProc: Args=-Xmx247m -classpath > > C:\condor/lib;C:\condor/lib/scimark2lib.jar;. > > -Dchirp.config=C:\condor\execute\dir_3940\chirp.config > > CondorJavaWrapper C:\condor\execute\dir_3940\jvm.start > > C:\condor\execute\dir_3940\jvm.end JavaTest > > 7/25 16:04:52 (fd:6) (pid:3940) in VanillaProc::StartJob() > > 7/25 16:04:52 (fd:6) (pid:3940) in OsProc::StartJob() > > 7/25 16:04:52 (fd:6) (pid:3940) IWD: C:\condor/execute\dir_3940 > > 7/25 16:04:52 (fd:6) (pid:3940) get_port_range - (LOWPORT,HIGHPORT) is > > (9600,9700). > > 7/25 16:04:52 (fd:6) (pid:3940) TokenCache contents: > > condor-reuse-vm1@. > > 7/25 16:04:52 (fd:6) (pid:3940) PRIV_CONDOR --> PRIV_USER at > > ..\src\condor_starter.V6.1\os_proc.C:227 > > 7/25 16:04:52 (fd:7) (pid:3940) Input file: NUL > > 7/25 16:04:52 (fd:8) (pid:3940) Output file: > > C:\condor/execute\dir_3940\JavaTest.output.0 > > 7/25 16:04:52 (fd:9) (pid:3940) Error file: > > C:\condor/execute\dir_3940\JavaTest.error.0 > > 7/25 16:04:52 (fd:9) (pid:3940) Doing CONDOR_begin_execution > > 7/25 16:04:52 (fd:9) (pid:3940) condor_read(): nfds=0 > > 7/25 16:04:52 (fd:9) (pid:3940) condor_read(): nfound=1 > > 7/25 16:04:52 (fd:9) (pid:3940) condor_read(): nfds=0 > > 7/25 16:04:52 (fd:9) (pid:3940) condor_read(): nfound=1 > > 7/25 16:04:52 (fd:9) (pid:3940) Renice expr "10" evaluated to 10 > > 7/25 16:04:52 (fd:9) (pid:3940) About to exec > > C:\condor/execute\dir_3940\"C:\\Program > > Files\\Java\\jre1.5.0_06\\bin\\JAVA.EXE" -Xmx247m > > -classpath C:\condor/lib;C:\condor/lib/scimark2lib.jar;. - > > Dchirp.config=C:\condor\execute\dir_3940\chirp.config > > CondorJavaWrapper C:\condor\execute\dir_3940\jvm.start > > C:\condor\execute\dir_3940\jvm.end JavaTest > > 7/25 16:04:52 (fd:9) (pid:3940) Env = > > _CONDOR_SCRATCH_DIR=C:\condor\execute\dir_3940 > > _CONDOR_HIGHPORT=9700 _CONDOR_LOWPORT=9600 > > 7/25 16:04:52 (fd:9) (pid:3940) > > JOB_INHERITS_STARTER_ENVIRONMENT is undefined, using > > default value of False > > 7/25 16:04:52 (fd:9) (pid:3940) PRIV_USER --> PRIV_CONDOR at > > ..\src\condor_starter.V6.1\os_proc.C:343 > > 7/25 16:04:52 (fd:9) (pid:3940) In > > DaemonCore::Create_Process(C:\condor/execute\dir_3940\"C:\\Program > > Files\\Java\\jre1.5.0_06\\bin\\JAVA.EXE",...) > > > > > > > > > > > > _______________________________________________ > > Condor-users mailing list > > To unsubscribe, send a message to > > condor-users-request@xxxxxxxxxxx with a > > subject: Unsubscribe > > You can also unsubscribe by visiting > > https://lists.cs.wisc.edu/mailman/listinfo/condor-users > > > > The archives can be found at: > > https://lists.cs.wisc.edu/archive/condor-users/ > > > > > > _______________________________________________ > Condor-users mailing list > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/condor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/condor-users/ > > _______________________________________________ > Condor-users mailing list > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/condor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/condor-users/ >
Attachment:
Logs.zip
Description: Zip archive