Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Condor Quill Problem
- Date: Mon, 10 Jan 2011 09:28:59 -0600 (CST)
- From: Steven Timm <timm@xxxxxxxx>
- Subject: Re: [Condor-users] Condor Quill Problem
With QUILL_USE_SQL_LOG=true
it will try to log information about all daemons including
the schedd so it should be logging something.
in your $LOG directory are there any files of the form *sql.log
What does QuillLog say, if you turn it up to D_FULLDEBUG
that should give you a hint as well.
Also you have to make a .pgpass file in your SPOOL
directory that contains the password of the quillwriter user
and is readable only by the condor user. Did you do that?
Steve Timm
On Mon, 10 Jan 2011, Santanu Das wrote:
Hi Steve,
I was running that on submit node only.
Also,I do have QUILL_USE_SQL_LOG set to true but I don't have
SCHEDD.QUILL_USE_SQL_LOG defined:
[root@serv07 JobManager]# condor_config_val -dump | grep QUILL
DAEMON_LIST = MASTER, SCHEDD, QUILL
QUILL = $(SBIN)/condor_quill
QUILL_ADDRESS_FILE = $(LOG)/.quill_address
QUILL_DB_IP_ADDR = vserv03:5432
QUILL_DB_NAME = quill_vserv03
QUILL_DB_QUERY_PASSWORD = reader
QUILL_DB_TYPE = PGSQL
QUILL_DB_USER = quillwriter
QUILL_DBSIZE_LIMIT = 20
QUILL_ENABLED = TRUE
QUILL_HISTORY_DURATION = 30
QUILL_IS_REMOTELY_QUERYABLE = TRUE
QUILL_JOB_HISTORY_DURATION = 3650
QUILL_LOG = $(LOG)/QuillLog
QUILL_MAINTAIN_DB_CONN = TRUE
QUILL_MANAGE_VACUUM = FALSE
QUILL_NAME = quill@$(FULL_HOSTNAME)
QUILL_POLLING_PERIOD = 10
QUILL_RESOURCE_HISTORY_DURATION = 7
QUILL_RUN_HISTORY_DURATION = 7
QUILL_USE_SQL_LOG = TRUE
Is this the problem?
cheers,
Santanu
On 10/01/11 14:56, Steven Timm wrote:
On Mon, 10 Jan 2011, Santanu Das wrote:
Thanks Steve and all, for the explanation. I don't think I really need to
log the startd info.
At the moment, I'm running QUILL daemon on the Central Manager and the
Submit host, and DBMSD only on the Central Manager - is this correct
(well, in my case) way of running Quill? Even though I can run condor_q, I
don't get any answer back when I run condor_history with
"-completedsince":
You should have the following settings defined then:
QUILL_USE_SQL_LOG = FALSE
SCHEDD.QUILL_USE_SQL_LOG = TRUE
Also note that you would only get an output on the condor_history
if you ran it from the submit node.
Steve Timm
[root@serv07 JobManager]# condor_history -completedsince '01/01/2011
13:00'
-- Quill: quill@xxxxxxxxxxxxxxxxxxxxxxxx :<vserv03:5432> :
quill_vserv03
ID OWNER SUBMITTED RUN_TIME ST COMPLETED CMD
No historical jobs in the database match your query
Does it mean Quill is not logging anything at all?
Cheers,
Santanu
On 08/01/11 01:57, Steven Timm wrote:
On Sat, 8 Jan 2011, Santanu Das wrote:
Thanks Erik and Wancheng, for pointing out the DBMSD bit - it's fixed
now.
Two points:
1. You should only run the DBMSD on one machine. The manual says: "One
machine should run the condor_dbmsd daemon. On this machine, add it to
the DAEMON_LIST configuration variable. All Quill-enabled machines
should also run the condor_quill daemon. The machine running the
condor_dbmsd daemon can also run a condor_quill daemon."
One question: What do you mean "Quill-enabled machines"? Does it mean
all the Execute nodes? If yes, what's the benefit running QUILL on
every single node?
The only reason you would run Quill on every single node is if you
are keeping track of all the startd information in quill. The more
common configuration is only to run on the nodes that have a schedd.
If you are keeping track of all daemon info in quill then you
need a huge database machine, there are some presentations in
condor weeks of 2007 and 2008 that describe just how big.
Steve Timm
Cheers,
Santanu
--
------------------------------------------------------------------
Steven C. Timm, Ph.D (630) 840-8525
timm@xxxxxxxx http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Group Leader.
Lead of FermiCloud project.