[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Schedule jobs after each other on same node with shared scratch (GPU preproccessing)

Date: Wed, 18 Mar 2026 12:59:34 +0100
From: Emily Kooistra <a66@xxxxxxxxx>
Subject: [HTCondor-users] Schedule jobs after each other on same node with shared scratch (GPU preproccessing)

Hi All,

At NIKHEF we are seeing more and more GPU usage, and often those jobshave quite a long preprocessing stage to reformat the training inputs orother data files takes quite some time compared to the total runtime ofthe slot.

As a result of this we end up with slots that do request a GPU but don'tactually use it for a big part of the claimed period what results in nonoptimal GPU usage.

Now copying this prepared data back to a network storage, and then copyit back to the scratch disk of the slot with a GPU is a bit waist fullof network bandwidth.

So i was wondering if it would be possible to have in a DAG or someother way, a CPU intensive preproccessing job running on a node with aGPU, and later in the process attaching the GPU to this slot or havinga way to have a internal copy between the two jobs.

Any other suggestions that would work with the current limitations ofcondor are also more then welcome, (by for example having a node localscratch and having some constraints the jobs run after each other, althothen you miss the cleanup that condor does of the scratch)


Emily Kooistra
NIKHEF

Follow-Ups:
- Re: [HTCondor-users] Schedule jobs after each other on same node with shared scratch (GPU preproccessing)
  - From: Beyer, Christoph

Prev by Date: Re: [HTCondor-users] Load Balancer for AP(submitter)
Next by Date: Re: [HTCondor-users] Schedule jobs after each other on same node with shared scratch (GPU preproccessing)
Previous by thread: Re: [HTCondor-users] OSG School 2026: Apply now and learn to harness large-scale computing for research
Next by thread: Re: [HTCondor-users] Schedule jobs after each other on same node with shared scratch (GPU preproccessing)
Index(es):
- Date
- Thread