Re: [HTCondor-users] How to find the name of the condor_collecter and the name of condor

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Dear Todd:

Thank you very much for your reply.

But the problem is still there. Can you help me to resolve it?Thank you very much.

In fact, I have already modified the /etc/condor/condor_config file for pool A (named as 181.nodeljA) ，add the following content into the file:

FLOCK_TO =188.nodeljB

FLOCK_COLLECTOR_HOSTS = $(FLOCK_TO)

FLOCK_NEGOTIATOR_HOSTS = $(FLOCK_TO)

ALLOW_NEGOTIATOR_SCHEDD = $(CONDOR_HOST), $(FLOCK_NEGOTIATOR_HOSTS), $(IP_ADDRESS)

CONDOR_GAHP = $(SBIN)/condor_c-gahp

C_GAHP_LOG = /tmp/CGAHPLog.$(USERNAME)

C_GAHP_WORKER_THREAD_LOG = /tmp/CGAHPWorkerLog.$(USERNAME)

C_GAHP_WORKER_THREAD_LOCK = /tmp/CGAHPWorkerLock.$(USERNAME)

and the same file for pool B (named as 188.nodeljB) is modified:

FLOCK_FROM=181.nodeljA

FLOCK_NEGOTIATOR_HOSTS = $(FLOCK_TO)

FLOCK_COLLECTOR_HOSTS = $(FLOCK_TO)

ALLOW_ADMINISTRATOR = $(CONDOR_HOST), $(IP_ADDRESS)

ALLOW_OWNER = $(FULL_HOSTNAME), $(ALLOW_ADMINISTRATOR)

ALLOW_READ=*.nodeljB

ALLOW_WRITE=*.nodeljB

ALLOW_NEGOTIATOR = xxx@$(CONDOR_HOST), $(IP_ADDRESS)

ALLOW_NEGOTIATOR_SCHEDD = $(CONDOR_HOST), $(FLOCK_NEGOTIATOR_HOSTS), $(IP_ADDRESS)

ALLOW_WRITE_COLLECTOR = $(ALLOW_WRITE), $(FLOCK_FROM)

ALLOW_WRITE_STARTD = $(ALLOW_WRITE), $(FLOCK_FROM)

ALLOW_READ_COLLECTOR = $(ALLOW_READ), $(FLOCK_FROM)

ALLOW_READ_STARTD = $(ALLOW_READ), $(FLOCK_FROM)

LOCK = $(LOCAL_DIR)/lock/condor

SEC_DEFAULT_NEGOTIATION = OPTIONAL

SEC_DEFAULT_AUTHENTICATION_METHODS = CLAIMTOBE

and the condor submit description file(named as sub.txt) looks like as:

executable=/data/condor_test/CondorTest.class

input=/data/condor_test/list.txt

arguments=CondorTest181795_2014-05-14_152801.mp4

log=/data/condor_test/condor.log

error=/data/condor_test/condor.error

grid_resource=condor 188.nodeljB 188.nodeljB

+remote_universe=10

+remote_requirements=True

+remote_ShouldTransferFiles='YES'

when I run the command : condor_submit sub.txt

All the jobs are held.And the log tell me that:

012 (031.239.000) 06/14 08:54:52 Job was held.
GridResource missing pool name
Code 0 Subcode 0

And the content of /var/log/condor/GridmanagerLog.xxx is as follows:

06/14/16 09:01:06 ******************************************************
06/14/16 09:01:06 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
06/14/16 09:01:06 ** /usr/sbin/condor_gridmanager
06/14/16 09:01:06 ** SubsystemInfo: name=GRIDMANAGER type=DAEMON(12) class=DAEMON(1)
06/14/16 09:01:06 ** Configuration: subsystem:GRIDMANAGER local:<NONE> class:DAEMON
06/14/16 09:01:06 ** $CondorVersion: 8.4.7 Jun 03 2016 BuildID: 369249 $
06/14/16 09:01:06 ** $CondorPlatform: x86_64_RedHat6 $
06/14/16 09:01:06 ** PID = 3607
06/14/16 09:01:06 ** Log last touched 6/14 08:54:57
06/14/16 09:01:06 ******************************************************
06/14/16 09:01:06 Using config source: /etc/condor/condor_config
06/14/16 09:01:06 Using local config sources:
06/14/16 09:01:06 /etc/condor/condor_config.local
06/14/16 09:01:06 config Macros = 62, Sorted = 62, StringBytes = 1644, TablesBytes = 2272
06/14/16 09:01:06 CLASSAD_CACHING is ENABLED
06/14/16 09:01:06 Daemon Log is logging: D_ALWAYS D_ERROR
06/14/16 09:01:06 Daemoncore: Listening at <0.0.0.0:55344> on TCP (ReliSock) and UDP (SafeSock).
06/14/16 09:01:06 DaemonCore: command socket at <192.168.1.181:55344?addrs=192.168.1.181-55344>
06/14/16 09:01:06 DaemonCore: private command socket at <192.168.1.181:55344?addrs=192.168.1.181-55344>
06/14/16 09:01:09 [3607] Found job 32.0 --- inserting
06/14/16 09:01:09 [3607] Found job 32.1 --- inserting
06/14/16 09:01:10 [3607] (32.0) doEvaluateState called: gmState GM_HOLD, remoteState -1
06/14/16 09:01:10 [3607] (32.1) doEvaluateState called: gmState GM_HOLD, remoteState -1
06/14/16 09:01:15 [3607] No jobs left, shutting down
06/14/16 09:01:15 [3607] Got SIGTERM. Performing graceful shutdown.
06/14/16 09:01:15 [3607] **** condor_gridmanager (condor_GRIDMANAGER) pid 3607 EXITING WITH STATUS 0

I think that the reason is I have not give the right parament for grid_resource the submit description file.

The result of command "condor_status -schedd" is as follows:

[root@188 ~]# condor_status -schedd

and the result of command "condor_status -collector" is as follows:

[root@188 ~]# condor_status -collector
Name                                             Machine                                          RunningJobs IdleJobs HostsTotal

"Condor Pool of LJ"@151.nodelj               151.nodelj                                                 0        0          0
"CPLJ"@188.nodeljB                           188.nodeljB                                                0        0          4
"Condor Pool of LJ"@188.nodeljB              188.nodeljB                                                0        0          0

Can you help me to find the method to this problem?

Thank you very much.

Date: Mon, 13 Jun 2016 12:46:37 -0500

From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>

To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>

Subject: Re: [HTCondor-users] How to find the name of the

condor_collecter and the name of condor_schedd daemon?Thank you very

much.

Message-ID: <575EF17D.3010607@xxxxxxxxxxx>

Content-Type: text/plain; CHARSET=US-ASCII; format=flowed

On 6/12/2016 3:23 AM, HTCondor wrote:

> Dear all,

> I am configuring the HTCondor Flock. There is an parament ind the

> submit description file for a job, that is grid_resource

> According to the manul of HTCondor, the third field is the name of

> the remote pool's condor_collecter , and the second field is the name

> of the remote condor_schedd daemon.

> Can you tell me how to find the detail value of them ?

> Thank you very much.

> Best regards

> David

Hi David,

Flocking is HTCondor's way of allowing jobs that cannot immediately run

within the pool of machines where the job was submitted to instead run

on a different HTCondor pool. If a machine within HTCondor pool A can

send jobs to be run on HTCondor pool B, then we say that jobs from

machine A flock to pool B.

If Flocking is what you want, you don't need to mess around with grid

universe, grid_resource, or any of that. On the condor_config for

machines in pool A just modify the FLOCK_TO line to include the hostname

of pool B central manager, and on the condor_configs for machines in

pool B just modify the FLOCK_FROM line to include the hostname of pool A

central manager.

Details are in:

http://research.cs.wisc.edu/htcondor/manual/v8.4/5_2Connecting_HTCondor.html

regards,

Todd

Mailing List Archives

Authenticated access

Re: [HTCondor-users] How to find the name of the condor_collecter and the name of condor_schedd daemon?Thank you very much.