Hi Jeff,
The use of the MaxIdle functionality is something that we are actively working on at the CHTC along improving how DAGMan manages nodes containing more than a single job. DAGMan's MaxIdle acts very differently than the max_idle command in the Job Description
Language (JDL). As the JDL version creates a late materialization factory in the Schedd while DAGMan's MaxIdle functionality acts as a threshold for placing more jobs to the AP. I say threshold because currently DAGMan can place more jobs past this 'max' limit
of idle jobs in various situations.
The actual issue you are likely experiencing is the fact that DAGMan has two methods of placing jobs to the AP. First is shelling out condor_submit on behalf of the user. Second is materializing jobs itself and directly placing them to the AP. Until recently
(v24.2.1) the latter did not actually respect any late materialization capabilities defined in the job description. If your AP is running a version prior to this version try setting
DAGMAN_USE_DIRECT_SUBMIT = False in the AP configuration.
Cheers,
Cole Bollig
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Jeff Templon <templon@xxxxxxxxx>
Sent: Monday, January 6, 2025 2:07 AM To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx> Subject: [HTCondor-users] Max Idle and DAGS Hi,
We’re trying to get out users to use the late materialisation factory stuff, to help avoid tens of thousands of queued jobs.
It doesn’t seem to work though, with DAGs - even though the docs suggest it should. Quoting one of our users:
|