Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Parallel environment
- Date: Fri, 03 Jun 2016 10:59:42 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Parallel environment
On 6/3/2016 10:30 AM, Francesca Maccarone wrote:
Thanks Michael,
I took your advice, and I added these lines in all files
/etc/condor/config.d/00debconf :
DedicatedScheduler = "DedicatedScheduler@Master"
STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler
The above is not sufficient.... the manual in section 3.12.8 goes on to
say, for instance, your startd config needs to have a RANK expression
preferring your dedicated scheduler. Suggest you read all of section
3.12.8 :).
Also, I think the condor_config.local.dedicated.resource example config
file could be very helpful to you; it is a template of the config knobs
to add to support parallel universe with lots of comments. Probably
easier to follow than the manual. If you installed via the RPM, you
typically find the examples in /usr/share/doc/condor-X.X.X/examples.
For your convenience, here is a URL link to it : https://is.gd/plLPVn
Finally, if you happen to be using partitionable slots on your execute
nodes (if you don't know what this is, you are not using them and can
ignore this), you will need to set ALLOW_PSLOT_PREEMPTION=True.
Hope the above helps
Todd
The problem is when I try to run my jobs, these remain idle. The log
file contains only :
Job submitted from host 192.168.56.101
Only the first job is submitted, while the rest isn't submitted. I don't
understand where is the problem.
Thanks in advance
2016-06-02 20:14 GMT+02:00 Michael V Pelletier
<Michael.V.Pelletier@xxxxxxxxxxxx
<mailto:Michael.V.Pelletier@xxxxxxxxxxxx>>:
From: Francesca Maccarone <dike991@xxxxxxxxx
<mailto:dike991@xxxxxxxxx>>
Date: 06/02/2016 11:07 AM
> The problem is all jobs of the queue remain idle and they are never
> executed. Because I want to run my job in parallel: what changes should I
> make to get the desired behavior ?
Ciao, Francesca,
Take a look at section 3.12.8 of the 8.4.6 manual. In order for a
machine
to match a parallel universe job, it must be advertising the
"DedicatedScheduler" attribute which is set in the configuration and
pushed to the machine ad using the STARTD_ATTRS config.
Once this is set up correctly, you should be good to go. The idea here
is that parallel jobs cannot tolerate having any one of the
parallel processes on any of the machines being terminated
unexpectedly,
so machines set up in this way are presumed to prevent eviction and
thus be safe for parallel universe submissions.
-Michael Pelletier.
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
<mailto:htcondor-users-request@xxxxxxxxxxx> with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing Department of Computer Sciences
HTCondor Technical Lead 1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132 Madison, WI 53706-1685