Jaime,
my submit files is:
Executable = PQL
Universe = vanilla
Output = pql.out
Log = pql.log
Error = pql.err
Arguments = -p params.in -t temps.in
notification = Error
notify_user = codytrey@xxxxxxxx
should_transfer_files = YES
Queue 20
I have it queue 20 jobs to see if it would force jobs to other machines if the submit node had all it's processors in use, but it just ran 4 at a time until it was complete
Same results with:
Executable = test.py
Universe = vanilla
Output = /Volumes/Scratch/test/test.out.$(Process)
Log = /Volumes/Scratch/test/test.log
Error = /Volumes/Scratch/test/test.err
should_transfer_files = ALWAYS
Queue 10
-Cody
On 2013-02-26 10:29, Jaime Frey wrote:
What does your submit file look like?A common problem is that the machines don't have a shared filesystem, and HTCondor's file transfer option isn't being requested in the submit file. In this case, HTCondor will only run the jobs on the submit machine.-- Jaime
On Feb 26, 2013, at 9:09 AM, Cody Belcher <codytrey@xxxxxxxxxxxxxxxx> wrote:
I do see all of the machines in condor-status
"codytrey@metis:~$ condor_config_val DAEMON_LIST
MASTER, SCHEDD, STARTD"
This is the submit machine, it is the same on an execute a just tried.-Cody
On 2013-02-26 08:47, Cotton, Benjamin J wrote:
Cody, The first question is are you sure they're all in the same pool? To check this, do they all show up in the output of condor_status? My suspicion is that your submit/execute machine might be running its own condor_collector and condor_negotiator processes. You can check this with condor_config_val DAEMON_LIST If that's the case, then your execute-only nodes might be as well.
Thanks and regards,Jaime FreyUW-Madison HTCondor Project