[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] [Python bindings] Submitted job stuck on hold for Spooling input data files



Hello,

I am using python-htcondor v10.1.0 (even though import htcondor; print(htcondor.__version__) says 0.1.0 - bug?)
to send a job at INFN-T1 at the Italian CNAF (scheduler's name "sn-02.cr.cnaf.infn.itâ).
I first contacted their support, but they answered me that they don't provide support for the HTCondor Python bindings and they suggested me to contact this mailing list.

My jobs are always stuck in HELD status due to the reason "Spooling input data filesâ.

This doesnât happen when sending the same job with the standard command line interface
condor_submit -name sn-02.cr.cnaf.infn.it -spool test_tutorial.sub
I am following the tutorial described here

https://htcondor.readthedocs.io/en/latest/apis/python-bindings/tutorials/Submitting-and-Managing-Jobs.html

I send in attachment the script I am using to submit the job (test_tutorial.py).

From the API,

https://htcondor.readthedocs.io/en/latest/apis/python-bindings/api/htcondor.html#htcondor.Schedd.submit

and a very old (but still open) GitHub issue,

https://github.com/htcondor/htcondor-python-bindings-tutorials/issues/21


I understood that I have to call the spool method of the scheduler object since it seems that at my site I have to spool.

I tried with both the jobs() method of the Submit object

scheduler.spool( [j for j in job.jobs()] )

 and the query() method of the Scheduler object, 

query = scheduler.query(constraint='JobStatus==5 && Owner == "peresano"')
scheduler.spool(query)

but in both cases I get a similar error,

HTCondorIOError: DCSchedd::spoolJobFiles:7002:File transfer failed for target job 4818132.0: Failed to receive GoAhead message from 131.154.192.42.

Can you help me? I'd really like to use the Python bindings to deal with my jobs.

Best regards


from htcondor import Collector, Schedd, DaemonTypes, Submit

job_description = Submit(
    {
        "executable": "/bin/hostname",
        "output": "hostname.out",
        "error": "hostname.err",
        "log": "hostname.log",
    }
)

collector = Collector()
scheduler_ad = collector.locate(
    daemon_type=DaemonTypes.Schedd, name="sn-02.cr.cnaf.infn.it"
)
scheduler = Schedd(scheduler_ad)

job = Submit(**job_description)
submit_result = scheduler.submit(job, spool=True)

print(submit_result)

# I did the following after checking that the job was on hold forever
query = scheduler.query(constraint='JobStatus==5 && Owner == "peresano"')
scheduler.spool(query)
# and also tried this with the same result
job_ads = [j for j in job.jobs()]
scheduler.spool(job_ads)

________________________________

Michele Peresano
Postdoctoral Researcher

Department of Physics
University of Turin and INFN
via Pietro Giuria, 1
10125, Turin, Italy