Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] job does not run
- Date: Tue, 18 Jun 2019 07:16:19 +0200
- From: Valerio Bellizzomi <valerio@xxxxxxxxxx>
- Subject: Re: [HTCondor-users] job does not run
Hi,
to try to remove the -1001 error I have setup on the execute node:
SLOT_TYPE_1 = cpus=1, gpus=1
NUM_SLOTS_TYPE_1 = 2
and
SLOT1_USER = root
SLOT2_USER = root
but the error is still present.
On Mon, 2019-06-17 at 20:08 +0200, Valerio Bellizzomi wrote:
> Hi,
> apart the other issues I did a test on the execute node, I think the
> reason for which the job remains idle is due to an error. I have run
> condor_startd by hand on machine compute02 and got an error:
>
> ocl.getPlatformIDs returned error=-1001 and 0 platforms
>
> That means the OpenCL ICD is not found, but this is anomalous as I can
> run the job locally on the execute node, opencl is installed correctly.
> The only reason this can happen is that the process does not have
> privileges to access the opencl platform, but I am running condor_startd
> as root.
>
>
>
> -------- Forwarded Message --------
> From: Valerio Bellizzomi <valerio@xxxxxxxxxx>
> Reply-to: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> To: htcondor-users@xxxxxxxxxxx
> Subject: Re: [HTCondor-users] job does not run
> Date: Mon, 17 Jun 2019 17:31:46 +0200
>
> On Mon, 2019-06-17 at 11:52 +0000, Bockelman, Brian wrote:
> >
> > > On Jun 17, 2019, at 2:28 AM, Steffen Grunewald <steffen.grunewald@xxxxxxxxxx> wrote:
> > >
> > > Hi,
> > >
> > > On Sun, 2019-06-16 at 16:10:00 +0200, Valerio Bellizzomi wrote:
> > >> Greetings,
> > >> after submitting a job, the job is in idle state. Diagnostics with
> > >> condor_q -analyze show "no match found".
> > >>
> > >> In the submit file I have:
> > >>
> > >> RANK = (Machine == "compute02")
> > >
> > > Please verify (using e.g. condor_status -l compute02) that the machine
> > > name is correct (is there no domain part?)
> > >
> > >> 1) is this sufficient to select the target machine ?
> > >
> > > With the correct string, IMHO yes
> >
> > Do note that you used "RANK" and not "REQUIREMENTS" -- the job will show a preference for "compute02" if there are multiple available compute hosts. However, it will still be allowed to run on any host.
> >
> > It might be useful to post the output of "condor_q -better-analyze". Another thing that could be going wrong is that the Machine attribute is using a FQDN ("compute02.example.com") whereas you are only querying the host ("compute02").
>
> Hi,
> I have verified that the compute02 node has a problem, that is ps
> command shows condor_procd running but not condor_startd. Master and
> Startd are listed in the configuration but condor_startd does not start
> at first.
>
> Second problem I found and corrected: Schedd was not running on the
> central manager machine. I was using the DAEMON_LIST generated by the
> condor_configure --type=manager command and schedd was not in the list.
>
>
>
>
>
>
> > Brian
> >
> > >
> > >> 2) where is the htcondor log file for the job ?
> > >
> > > Did you specify a path in your submit file?
> > >
> > > - S
> > > _______________________________________________
> > > HTCondor-users mailing list
> > > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> > > subject: Unsubscribe
> > > You can also unsubscribe by visiting
> > > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> > >
> > > The archives can be found at:
> > > https://lists.cs.wisc.edu/archive/htcondor-users/
> >
> >
> > _______________________________________________
> > HTCondor-users mailing list
> > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/htcondor-users/
>
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/