[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Hibernate and Cron interference



Hi ToddM,

Even if you want EXPIRE_INVALIDATE_ADS to be true for other reasons,
if you set to false and re-run this experiment, does the offline ad
with the proper HibernationLevel get preserved?  (An equivalent test
should be to set ABSENT_REQUIREMENTS to FALSE.)

I just tried twice, and in the end, the EP does is neither present in Absent and Offline. The CollectorLog looks the same :


21:17:05 In OfflineCollectorPlugin::update ( 77 )
21:17:06 StartdAd : Updating ... "< slot1@xxxxxxxxxxxxxxxxxxxxxx , 172.22.0.145 >"
21:17:06 Want private ads, but no socket given!
21:17:06 In OfflineCollectorPlugin::update ( 60 )
21:17:06 Machine ad lifetime: 604800
21:17:06 Added ad to persistent store key=<slot1@xxxxxxxxxxxxxxxxxxxxxx,172.22.0.145>
21:17:06 Got INVALIDATE_MASTER_ADS
21:17:06 **** Removed(1) ad(s): "< render0412.sta.buf.com >"
21:17:06 (Invalidated 1 ads)
21:17:06 In OfflineCollectorPlugin::update ( 15 )
21:17:12 ScheddAd : Updating ... "< htcondor-slots.sta.buf.com , 172.22.0.3 >"
21:17:12 In OfflineCollectorPlugin::update ( 1 )
21:17:15 Got QUERY_STARTD_PVT_ADS
21:17:15 QueryWorker: forked new high priority worker with id 24050 ( max 16 active 1 pending 0 )
21:17:15 Query after modification: *(true) && (Absent =!= True)*
21:17:15 (Sending 1 ads in response to query)
21:17:15 Query info: matched=1; skipped=0; query_time=0.001341; send_time=0.000500; type=MachinePrivate; requi (true) && (Absent =!= true)}; locate=0; limit=0; from=COLLECTOR; peer=<172.22.0.3:5115>; projection={}; filter
attrs=0
21:17:15 QueryWorker: Child 24050 done
21:17:15 Got QUERY_ANY_ADS
21:17:15 QueryWorker: forked new high priority worker with id 24051 ( max 16 active 1 pending 0 ) 21:17:15 Query after modification: *((((MyType == "Submitter")) || ((MyType == "Machine")))) && (Absent =!= Tr

21:17:15 (Sending 1 ads in response to query)
21:17:15 Query info: matched=1; skipped=5; query_time=0.001116; send_time=0.002251; type=Any; requirements={(( == "Submitter")) || ((MyType == "Machine")))) && (Absent =!= true)}; locate=0; limit=0; from=COLLECTOR; peer=<
3:26855>; projection={}; filter_private_attrs=0
21:17:15 QueryWorker: Child 24051 done
21:17:15 AccountingAd : Updating ... "< <none>htcondor-slots.sta.buf.com >"
21:17:15 In OfflineCollectorPlugin::update ( 77 )

21:17:16 Got INVALIDATE_STARTD_ADS
21:17:16 **** Removed(1) ad(s): "< slot1@xxxxxxxxxxxxxxxxxxxxxx , 172.22.0.145 >"
21:17:16 (Invalidated 1 ads)
21:17:16 **** Removed(1) ad(s): "< slot1@xxxxxxxxxxxxxxxxxxxxxx , 172.22.0.145 >"
21:17:16 (Invalidated 1 ads)
21:17:16 In OfflineCollectorPlugin::update ( 13 )
21:17:16 Removed ad from persistent store key=<slot1@xxxxxxxxxxxxxxxxxxxxxx,172.22.0.145> 21:17:16 condor_read(): Socket closed when trying to read 5 bytes from <172.22.0.145:43215> in non-blocking mo

21:17:16 IO: EOF reading packet header
21:17:16 DaemonCore: Can't receive command request from 172.22.0.145 (perhaps a timeout?) 21:17:16 condor_read(): Socket closed when trying to read 5 bytes from <172.22.0.145:45405> in non-blocking mo

21:17:16 IO: EOF reading packet header
21:17:16 DaemonCore: Can't receive command request from 172.22.0.145 (perhaps a timeout?)


Thought: the "stray" ad you originally noted being sent of UDP is
being sent over TCP and then the startd invalidates its ad when the
machine shutting down sends the SIGTERMs.  At that point, the
HibernationLevel is 0 and the ad becomes absent.

That's my intuition, but I don't understand where this rogue Ad comes from, and why it works for others :).


Thanks !

Charles (going to get some sleep)