Hi All,
I have set up a dedicated HPC cluster running with condor.
I successfully set up the condor view server enabling collector to
keep history data.
It seems to me the parallel jobs are not displayed correctly in the
viewhist files.
The job only appears in the viewhist3.* while in Idle state, as job of
DedicatedScheduler.
The jobs are fully disappearing when start to run.
Here is a part of the viewhist3.0.new file:
1345035660 Total : 0 1
1345035660 DedicatedScheduler@xxxxxxxxxxxxxxxxxxx : 0 1
1345035660 nanotio2@xxxxxxxxxx : 0 0
1345035900 CompMag@xxxxxxxxxx : 0 0
1345035900 Total : 0 0
1345035900 DedicatedScheduler@xxxxxxxxxxxxxxxxxxx : 0 0
1345035900 nanotio2@xxxxxxxxxx : 0 0
....
1345037820 CompMag@xxxxxxxxxx : 0 0
1345037820 Total : 0 0
1345037820 nanotio2@xxxxxxxxxx : 0 0
and the output of the condor_q command:
975.0 CompMag 8/12 20:06 2+19:32:15 R 0 73.2
condor_openmpi.sh
975.1 CompMag 8/12 20:06 0+00:00:00 R 0 0.0
condor_openmpi.sh
977.0 CompMag 8/13 14:56 2+00:32:25 R 0 73.2
condor_openmpi.sh
977.1 CompMag 8/13 14:56 0+00:00:00 R 0 0.0
condor_openmpi.sh
981.0 nanotio2 8/14 13:51 1+01:31:34 R 0 73.2
condor_openmpi-1.4
981.1 nanotio2 8/14 13:51 0+00:00:00 R 0 0.0
condor_openmpi-1.4
982.0 nanotio2 8/15 05:21 0+10:10:05 R 0 73.2
condor_openmpi-1.4
982.1 nanotio2 8/15 05:21 0+00:00:00 R 0 0.0
condor_openmpi-1.4
983.0 nanotio2 8/15 14:45 0+00:39:21 R 0 73.2
condor_openmpi-1.4
983.1 nanotio2 8/15 14:45 0+00:00:00 R 0 0.0
condor_openmpi-1.4
Are any configuration details, what I am missed related to parallel
jobs and viewhist ?
Thank you,
Imre
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/