[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] [Python bindings] Submitted job stuck on hold for Spooling input data files



Dear Stefano,

I donât know what happened in the meanwhile, but the same code now triggers a different, but similar error,

HTCondorIOError: DCSchedd::spoolJobFiles:7002:File transfer failed for target job 4832629.0: TOOL at 131.154.193.196 failed to send file(s) to <131.154.192.42:9618>: |Error: reading from file /var/lib/condor/spool/2629/0/cluster4832629.proc0.subproc0/hostname: (errno 2) No such file or directory; SCHEDD failed to receive file(s) from <131.154.193.196:41975>

Probably it is the same cause, but I have no idea...

Anyway, I can try what you suggest: do you have any quick command or link file I look for it?
I canât really understand from the documentation (of the python bindings)

Thanks in advance,
Michele

On 23 Nov 2022, at 16:29, Stefano Dal Pra <stefano.dalpra@xxxxxxxxxxxx> wrote:

Hello,
i suspect it is a matter of telling python bindings where your user IDTOKEN is.
From your User Interface that should be located as
$HOME/.condor/tokens.d/`id -nu`@`condor_config_val UID_DOMAIN`

Regards
Stefano


On 23/11/22 12:01, Michele Peresano wrote:
Hello,

I am using python-htcondor v10.1.0 (even though import htcondor; print(htcondor.__version__) says 0.1.0 - bug?)
to send a job at INFN-T1 at the Italian CNAF (scheduler's name "sn-02.cr.cnaf.infn.itâ).
I first contacted their support, but they answered me that they don't provide support for the HTCondor Python bindings and they suggested me to contact this mailing list.

My jobs are always stuck in HELD status due to the reason "Spooling input data filesâ.

This doesnât happen when sending the same job with the standard command line interface
condor_submit -name sn-02.cr.cnaf.infn.it -spool test_tutorial.sub
I am following the tutorial described here

https://htcondor.readthedocs.io/en/latest/apis/python-bindings/tutorials/Submitting-and-Managing-Jobs.html

I send in attachment the script I am using to submit the job (test_tutorial.py).

From the API,

https://htcondor.readthedocs.io/en/latest/apis/python-bindings/api/htcondor.html#htcondor.Schedd.submit

and a very old (but still open) GitHub issue,

https://github.com/htcondor/htcondor-python-bindings-tutorials/issues/21


I understood that I have to call the spool method of the scheduler object since it seems that at my site I have to spool.

I tried with both the jobs() method of the Submit object

scheduler.spool( [j for j in job.jobs()] )

 and the query() method of the Scheduler object, 

query = scheduler.query(constraint='JobStatus==5 && Owner == "peresano"')
scheduler.spool(query)

but in both cases I get a similar error,

HTCondorIOError: DCSchedd::spoolJobFiles:7002:File transfer failed for target job 4818132.0: Failed to receive GoAhead message from 131.154.192.42.

Can you help me? I'd really like to use the Python bindings to deal with my jobs.

Best regards




________________________________

Michele Peresano
Postdoctoral Researcher

Department of Physics
University of Turin and INFN
via Pietro Giuria, 1
10125, Turin, Italy


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/