[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] job factory universe changed?



Hi TJ,

Using your advice, I was able to change my job transform to make this work. The new transform looks like this (there's probably a more concise way to express the logic, but this worked):

JOB_TRANSFORM_TagJob @=end
[
EVAL_SET_LigoSearchTag = IfThenElse(LigoSearchTag isnt null, LigoSearchTag, AcctGroup); EVAL_SET_LigoSearchUser = IfThenElse(LigoSearchUser isnt null, LigoSearchUser, AcctGroupUser);
]
@end

Thanks for the help,

--Mike

On 6/19/23 10:44, John M Knoeller via HTCondor-users wrote:
There was a change.  A bug fix actually.

Transforms and submit requirements are now applied to both the factory at submit time, and to the jobs as they materialize.  You can see that happening in the log

06/16/23 15:00:56 (pid:5534) job_transforms for 19803671.-1: 2
considered, 2 applied (TagJob,RemoveAcctGroup)
...
06/16/23 15:00:56 (pid:5534) Trying to Materializing new job 19803671.0
step=0 row=0
06/16/23 15:00:56 (pid:5534) Trying to Materializing new job 19803671.1
step=0 row=1
06/16/23 15:00:56 (pid:5534) job_transforms for 19803671.0: 2
considered, 2 applied (TagJob,RemoveAcctGroup)
06/16/23 15:00:56 (pid:5534) CommitTransaction() failed for cluster
19803671 rval=-1 (Invalid value for search tag: None)

The first line is applying the transform to the factory.  When that finishes, the factory has no value for AccountingGroup, AcctGroupUser, and AcctGroup.

So when job 19803671.0 is materialized, it *also* has no value for these attributes, which it inherits from the factory.  So the transform does a COPY on these missing attributes and ends up replacing the LigoSearchTag which this job also inherited with undefined.

Then the submit requirement rejects the job because LogoSearchTag is missing.

What you need to do change the TagJob transform so it does not overwrite a LigoSearchTag value if the job already has one.

-tj

-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Michael Thomas
Sent: Friday, June 16, 2023 3:40 PM
To: condor-users@xxxxxxxxxxx
Subject: [HTCondor-users] job factory universe changed?

I'm trying to submit a set of jobs using the schedd late materialization
job factory in condor 10.0.4.  I note that the same submit file and
schedd configuration worked fine in condor v9, so I'm guessing there was
some behavior change that I overlooked.

My submit file contains an accounting_group, which a job transform turns
into a LigoSearchTag and validates that it has an acceptable value.

To start, here is my submit file:

executable = validate_files.sh
log = /home/michael.thomas/condor/rawtrend/job.log.$(Process)
universe = vanilla
accounting_group=llo.test
request_disk = 2048MB
notification = Always
notify_user = michael.thomas@xxxxxxxx
should_transfer_files = YES
stream_output = True
request_HeavyNetwork = 1
max_materialize = 5
arguments = input/condor_input_$(Process)
error = /home/michael.thomas/condor/rawtrend/validation/job.err.$(Process)
output = /home/michael.thomas/condor/rawtrend/validation/job.out.$(Process)
transfer_input_files = input/condor_input_$(Process),validaterawtrend
transfer_output_files = validation
preserve_relative_paths = True
queue 10

...and here are the job transforms:

JOB_TRANSFORM_NAMES = TagJob,RemoveAcctGroup

JOB_TRANSFORM_TagJob @=end
[
    COPY_AcctGroup = "LigoSearchTag";
    COPY_AcctGroupUser = "LigoSearchUser";
    EVAL_SET_LigoSearchTag = LigoSearchTag ?: "None";
    EVAL_SET_LigoSearchUser = LigoSearchUser ?: Owner;
]
@end

# do not strip accounting classads from scheduler universe
# because their presence is necessary to propagate to child
# jobs and sub-DAGs
JOB_TRANSFORM_RemoveAcctGroup @=end
[
Requirements = JobUniverse != 7;
delete_AccountingGroup = True;
delete_AcctGroup = True;
delete_AcctGroupUser = True;
]
@end

SCHEDD_CLASSAD_USER_MAP_NAMES = $(SCHEDD_CLASSAD_USER_MAP_NAMES)
ValidSearchTags ValidSearchUsers
CLASSAD_USER_MAPFILE_ValidSearchTags = /etc/condor/accounting/valid_tags
CLASSAD_USER_MAPFILE_ValidSearchUsers = /etc/condor/accounting/valid_users

SUBMIT_REQUIREMENT_NAMES = $(SUBMIT_REQUIREMENT_NAMES) ValidateSearchTag
ValidateSearchUser

SUBMIT_REQUIREMENT_ValidateSearchTag = JobUniverse == 7 || \
    userMap("ValidSearchTags",LigoSearchTag) isnt undefined
SUBMIT_REQUIREMENT_ValidateSearchTag_REASON = \
    strcat("Invalid value for search tag: ",LigoSearchTag ?: "<undefined>")

SUBMIT_REQUIREMENT_ValidateSearchUser = \
    JobUniverse == 7 || \
    userMap("ValidSearchUsers",Owner,LigoSearchUser) is LigoSearchUser || \
    userMap("ValidSearchUsers",Owner) is undefined && Owner =?=
LigoSearchUser
SUBMIT_REQUIREMENT_ValidateSearchUser_REASON = \
    strcat("Invalid value for search user: ", LigoSearchUser ?:
"<undefined>", "\n", \
           "       Valid values are: ",userMap("ValidSearchUsers",Owner))


Now when I submit, I'm geting an error that my search tag isn't found:

06/16/23 15:00:56 (pid:5534) Calling HandleReq <handle_q> (0) for
command 1112 (QMGMT_WRITE_CMD) from
michael.thomas@xxxxxxxxxxxxxxxxxxxxxxxx <10.13.5.32:27419>
06/16/23 15:00:56 (pid:5534) job_transforms for 19803671.-1: 2
considered, 2 applied (TagJob,RemoveAcctGroup)
06/16/23 15:00:56 (pid:5534) Return from HandleReq <handle_q> (handler:
0.045252s, sec: 0.002s, payload: 0.001s)
06/16/23 15:00:56 (pid:5534) Return from Handler
<DaemonCore::HandleReqPayloadReady> 0.045702s
06/16/23 15:00:56 (pid:5534) Trying to Materializing new job 19803671.0
step=0 row=0
06/16/23 15:00:56 (pid:5534) Trying to Materializing new job 19803671.1
step=0 row=1
06/16/23 15:00:56 (pid:5534) job_transforms for 19803671.0: 2
considered, 2 applied (TagJob,RemoveAcctGroup)
06/16/23 15:00:56 (pid:5534) CommitTransaction() failed for cluster
19803671 rval=-1 (Invalid value for search tag: None)

Which I presume means that either the transform failed to copy
AccountingGroup to LigoSearchTag, or that it didn't execute in the
scheduler universe and deleted the AccountingGroup tag.  Any tips on how
to debug this or what might have changed between v9 and v10 are appreciated.

--Mike
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/