[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] HTCondor Job Failure due to Missing Executables (java.lang.ClassNotFoundException)



Hello to the entire HTCondor community.

I am new to HTCondor and I am encountering a problem that I can't solve. I have HTCondor version 23.0.8 installed on a Red Hat Enterprise Linux 9.3 server as a central manager and submit machine. I have installed HTCondor version 23.0.8 on a Windows 10 machine as an execution node.

My problem is that when I launch a job on my Redhat machine, all resources match well and the job is launched on the Windows execution machine, except that I always receive a java.lang.ClassNotFoundException error. My executables are never found even though in the submission file I specified transfer_executable= true. Initially, I thought that Condor was not transferring the executables, so I modified my submission file and set transfer_executable= false and made sure to place the executables in a directory and provided the absolute path of the executables in the submission file. But I still encounter the same error even though the executables are indeed present in the directory. Below I am attaching the config files of my HTCondor installation, some relevant logs, and my submission file.

ÂHTCondor config files:

condor_config: ######################################################################
##
## Âcondor_config
##
## ÂThis is the global configuration file for condor. This is where
## Âyou define where the local config file is. Any settings
## Âmade here may potentially be overridden in the local configuration
## Âfile. KEEP THAT IN MIND! To double-check that a variable is
## Âgetting set from the configuration file that you expect, use
## Âcondor_config_val -v <variable name>
##
## Âcondor_config.annotated is a more detailed sample config file
##
## ÂUnless otherwise specified, settings that are commented out show
## Âthe defaults that are used if you don't define a value. Settings
## Âthat are defined here MUST BE DEFINED since they have no default
## Âvalue.
##
######################################################################

## ÂWhere have you installed the bin, sbin and lib condor directories?
RELEASE_DIR = /usr

## ÂWhere is the local condor directory for each host? This is where the local config file(s), logs and
## Âspool/execute directories are located. this is the default for Linux and Unix systems.
LOCAL_DIR = /var

## ÂWhere is the machine-specific local config file for each host?
LOCAL_CONFIG_FILE = /etc/condor/condor_config.local
## ÂIf your configuration is on a shared file system, then this might be a better default
#LOCAL_CONFIG_FILE = $(RELEASE_DIR)/etc/$(HOSTNAME).local
## ÂIf the local config file is not present, is it an error? (WARNING: This is a potential security issue.)
REQUIRE_LOCAL_CONFIG_FILE = true

## ÂThe normal way to do configuration with RPM and Debian packaging is to read all of the
## Âfiles in a given directory that don't match a regex as configuration files.
## ÂConfig files are read in lexicographic order.
## ÂMultiple directories may be specified, separated by commas; directories
## Âare read in left-to-right order.
LOCAL_CONFIG_DIR = /usr/share/condor/config.d,/etc/condor/config.d
#LOCAL_CONFIG_DIR_EXCLUDE_REGEXP = ^((\..*)|(.*~)|(#.*)|(.*\.rpmsave)|(.*\.rpmnew))$

##
## Do NOT use host-based security by default.
##
## This was the default for the 8.8 series (and earlier), but it is
## intrinsically insecure. To make the 9.0 series secure by default, we
## commented it out.
##
## You should seriously consider improving your security configuration.
##
## To continue to use your old security configuration, knowing that it is
## insecure, add the line 'use SECURITY:HOST_BASED' to your local
## configuration directory. Don't just uncomment the final line in this
## comment block; changes in this file may be lost during your next upgrade.
## The following shell command will make the change on most Linux systems.
##
## echo 'use SECURITY:HOST_BASED' >> $(condor_config_val LOCAL_CONFIG_DIR)/00-insecure.config
##

## ÂTo expand your condor pool beyond a single host, set ALLOW_WRITE to match all of the hosts
#ALLOW_WRITE = *.cs.wisc.edu
## ÂFLOCK_FROM defines the machines that grant access to your pool via flocking. (i.e. these machines can join your pool).
#FLOCK_FROM =
## ÂFLOCK_TO defines the central managers that your schedd will advertise itself to (i.e. these pools will give matches to your schedd).
#FLOCK_TO = condor.cs.wisc.edu, cm.example.edu

##--------------------------------------------------------------------
## Values set by the rpm patch script:
##--------------------------------------------------------------------

## For Unix machines, the path and file name of the file containing
## the pool password for password authentication.
#SEC_PASSWORD_FILE = $(LOCAL_DIR)/lib/condor/pool_password

## ÂPathnames
RUN Â Â = $(LOCAL_DIR)/run/condor
LOG Â Â = $(LOCAL_DIR)/log/condor
LOCK Â Â= $(LOCAL_DIR)/lock/condor
SPOOL Â = $(LOCAL_DIR)/lib/condor/spool
EXECUTE = $(LOCAL_DIR)/lib/condor/execute
BIN Â Â = $(RELEASE_DIR)/bin
LIB Â Â = $(RELEASE_DIR)/lib64/condor
INCLUDE = $(RELEASE_DIR)/include/condor
SBIN Â Â= $(RELEASE_DIR)/sbin
LIBEXEC = $(RELEASE_DIR)/libexec/condor
SHARE Â = $(RELEASE_DIR)/share/condor

JAVA_CLASSPATH_DEFAULT = $(SHARE) .

## ÂInstall the minicondor package to run HTCondor on a single node

condor_config.local:

CONDOR_HOST=my_host
DEFAULT_DOMAIN_NAME = my_domain
CONDOR_ADMIN=me@xxxxxxxxx
COLLECTOR_NAME=DCN

DAEMON_LIST = $(DAEMON_LIST) CREDD

MAX_JOBS_PER_OWNER Â Â Â= 10000000
MAX_JOBS_RUNNING Â Â Â Â= 500
MAX_JOBS_SUBMITTED Â Â Â= 10000000
MAX_JOBS_PER_SUBMISSION = 10000000

PREEN_ARGS=-r

EVENT_LOG_FORMAT_OPTIONS=ISO_DATE
DEFAULT_USERLOG_FORMAT_OPTIONS=ISO_DATE

WARN_ON_UNUSED_SUBMIT_FILE_MACROS=FALSE

PRIORITY_HALFLIFE = 1.0e100

IsWeekday = (ClockDay > 0 && ClockDay < 6)
IsWeekend = (Clockday == 0 || ClockDay == 6)
IsBusinessHours = (ClockMin >= 420 && ClockMin < 1140) Â
isNightTime = (ClockMin < 420 || ClockMin >= 1140)

00-minicondor

# HTCONDOR CONFIGURATION TO CREATE A POOL WITH ONE MACHINE

#

# This file was created upon initial installation of HTCondor.

# It contains configuration settings to set up a secure HTCondor

# installation consisting of **just one single machine**.

# YOU WILL WANT TO REMOVE THIS FILE IF/WHEN YOU DECIDE TO ADD ADDITIONAL

# MACHINES TO YOUR HTCONDOR INSTALLATION! Most of these settings do

# not make sense if you have a multi-server pool.

#

# See the Quick Start Installation guide at:

#ÂÂÂÂ https://htcondor.org/manual/quickstart.html

#

Â

# ---Â NODE ROLESÂ ---

Â

# Every pool needs one Central Manager, some number of Submit nodes and

# as many Execute nodes as you can find. Consult the manual to learn

# about addtional roles.

Â

use ROLE: CentralManager

use ROLE: Submit

#use ROLE: Execute

Â

# --- NETWORK SETTINGS ---

Â

# Configure HTCondor services to listen to port 9618 on the IPv4

# loopback interface.

#NETWORK_INTERFACE = 127.0.0.1

#BIND_ALL_INTERFACES = False

CONDOR_HOST=My_CentralManager_SubmitMachine

Â

Â

EXECUTE_MACHINES @=BNQ

ÂÂÂÂÂÂÂ ## Groupe: DCN ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂNotes: Poste de Germans

ÂÂÂÂÂÂÂ BNQ135020.bnquebec.ca,

ÂÂÂÂÂÂÂ ## Groupe: DCNÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Notes: Salle Yves-ThÃÆÃÂriault (nouveau poste 2023)

ÂÂÂÂÂÂÂ My_Execute_machine

ÂÂÂÂÂÂÂ #BNQ134179

@BNQ

Â

Â

# --- SECURITY SETTINGS ---

Â

# Verify authenticity of HTCondor services by checking if they are

# running with an effective user id of user "condor".

#SEC_DEFAULT_AUTHENTICATION = REQUIRED

#SEC_DEFAULT_INTEGRITY = REQUIRED

ALLOW_DAEMON = condor@$(UID_DOMAIN) $(CONDOR_HOST) $(EXECUTE_MACHINES)

ALLOW_NEGOTIATOR = condor@$(UID_DOMAIN) $(EXECUTE_MACHINES)

Â

# Configure so only user root or user condor can run condor_on,

# condor_off, condor_restart, and condor_userprio commands to manage

# HTCondor on this machine.

# If you wish any user to do so, comment out the line below.

ALLOW_ADMINISTRATOR = root@$(UID_DOMAIN) condor@$(UID_DOMAIN) $(CONDOR_HOST)

Â

# Allow anyone (on the loopback interface) to submit jobs.

ALLOW_WRITE = *

# Allow anyone (on the loopback interface) to run condor_q or condor_status.

ALLOW_READ = *

Â

# --- PERFORMANCE TUNING SETTINGS ---

Â

# Since there is just one server in this pool, we can tune various

# polling intervals to be much more responsive than the system defaults

# (which are tuned for pools with thousands of servers). This will

# enable jobs to be scheduled faster, and job monitoring to happen more

# frequently.

SCHEDD_INTERVAL = 5

NEGOTIATOR_INTERVAL = 2

NEGOTIATOR_CYCLE_DELAY = 5

STARTER_UPDATE_INTERVAL = 5

SHADOW_QUEUE_UPDATE_INTERVAL = 10

UPDATE_INTERVAL = 5

RUNBENCHMARKS = 0

Â

# --- COMMON CHANGES ---

Â

# Uncomment the lines below and do 'sudo condor_reconfig' if you wish

# condor_q to show jobs from all users with one line per job by default.

#CONDOR_Q_DASH_BATCH_IS_DEFAULT = False

#CONDOR_Q_ONLY_MY_JOBS = False

00-insecure.config

use SECURITY:HOST_BASED

My submit file:

cmd_parametersÂÂÂÂÂÂÂÂÂ = ${INPUT}[0] -rotate 180 ${OUTPUT}

extensionsÂÂÂÂÂÂÂÂÂÂÂÂÂ = *

json_parametersÂÂÂÂÂÂÂÂ =

path_inputÂÂÂÂÂÂÂÂÂÂÂÂÂ = N:\TraitementNumerique\ZONE51\Sady_Doucoure\condor_test\input

path_outputÂÂÂÂÂÂÂÂÂÂÂÂ = N:\TraitementNumerique\ZONE51\Sady_Doucoure\condor_test\output

timestampÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = 1714742017803

labelÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = Test-image

typeÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = CONV.IMAGE

jvm_locator_classÂÂÂÂÂÂ = ca.qc.banq.dnum.JVMLocator

java_execute_jarÂÂÂÂÂÂÂ = transform-core.jar

java_launcherÂÂÂÂÂÂÂÂÂÂ = ca.qc.banq.dnum.Launcher

java_main_classÂÂÂÂÂÂÂÂ = ca.qc.banq.dnum.transform.cmd.ImagickConversion

input_line_countÂÂÂÂÂÂÂ = 3

Â

JobBatchNameÂÂÂÂÂÂÂÂÂÂÂ = $(label) - $(type) - $(timestamp)

accounting_group_userÂÂ = sady.doucoure

executableÂÂÂÂÂÂÂÂÂÂÂÂÂ = C:\condor\apps\jvm-locator.jar

universeÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = java

initialdirÂÂÂÂÂÂÂÂÂÂÂÂÂ = /data01/condor/jobs/1714742017803-CONV.IMAGE-Test-image

Â

jar_filesÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = C:\condor\apps\transform-core.jar, C:\condor\apps\jvm-locator.jar

argumentsÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = "$(jvm_locator_class) $(java_execute_jar) $(java_launcher) $(java_main_class) -pathInput '$(path_input)' -pathOutput '$(path_output)' -cmd '$(cmd_parameters)' -fileInput '$(line)'"

Â

range_startÂÂÂÂÂÂÂÂÂÂÂÂ = $INT(ItemIndex)/1000*1000

range_endÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = ($INT(ItemIndex)/1000+1)*1000-1

Â

logÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = $(ClusterId).log

outputÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = $INT(range_start)-$INT(range_end)/$(ClusterId).$INT(ItemIndex,%09d).stdout

errorÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = $INT(range_start)-$INT(range_end)/$(ClusterId).$INT(ItemIndex,%09d).stderr

listÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = input.txt

Â

transfer_executableÂÂÂÂ = false

should_transfer_filesÂÂ = true

transfer_input_filesÂÂÂ =

when_to_transfer_output = ON_EXIT_OR_EVICT

transfer_output_filesÂÂ = ""

Â

request_memoryÂÂÂÂÂÂÂÂÂ = 500MB

request_diskÂÂÂÂÂÂÂÂÂÂÂ = 250MB

requirementsÂÂÂÂÂÂÂÂÂÂÂ = (Arch == "INTEL" || Arch == "X86_64")

Â

materialize_max_idleÂÂÂ = 20

max_retriesÂÂÂÂÂÂÂÂÂÂÂÂ = 5

Â

on_exit_removeÂÂÂÂÂÂÂÂÂ = (ExitCode == 0 || ExitCode == 0 || ExitCode == 1500 || ExitCode == 1501 || ExitCode == 1502 || ExitCode == 1503 || ExitCode == 1504 || ExitCode == 1505)

Â

on_exit_holdÂÂÂÂÂÂÂÂÂÂÂ = false

on_exit_hold_reasonÂÂÂÂ = "Job exit with error and will run again after a delay"

on_exit_hold_subcodeÂÂÂ = 62

Â

periodic_holdÂÂÂÂÂÂÂÂÂÂ = (JobStatus == 2) && (time() - EnteredCurrentStatus) > (3 * 24 * 60 * 60)

periodic_hold_reasonÂÂÂ = "Job ran for more than 3 days"

periodic_hold_subcodeÂÂ = 42

periodic_releaseÂÂÂÂÂÂÂ = NumJobCompletions <= JobMaxRetries && (time() - EnteredCurrentStatus) > 20 * 60 && (HoldReasonCode != 1 && HoldReasonSubCode != 99)

Â

queue line from $(initialdir)/$(list)

Â

LOGS

Redhat (Central manager, submit machine)

MasterLog

05/04/24 16:01:47 ******************************************************

05/04/24 16:01:47 Using config source: /etc/condor/condor_config

05/04/24 16:01:47 Using local config sources:

05/04/24 16:01:47ÂÂÂ /etc/condor/config.d/00-insecure.config

05/04/24 16:01:47ÂÂÂ /etc/condor/config.d/00-minicondor

05/04/24 16:01:47ÂÂÂ /etc/condor/config.d/10-stash-plugin.conf

05/04/24 16:01:47ÂÂÂ /etc/condor/condor_config.local

05/04/24 16:01:47 config Macros = 87, Sorted = 87, StringBytes = 2679, TablesBytes = 3204

05/04/24 16:01:47 CLASSAD_CACHING is OFF

05/04/24 16:01:47 Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS

05/04/24 16:01:48 SharedPortEndpoint: waiting for connections to named socket master_1104_d4a5

05/04/24 16:01:48 SharedPortEndpoint: failed to open /var/lock/condor/shared_port_ad: No such file or directory

05/04/24 16:01:48 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s.

05/04/24 16:01:48 DaemonCore: private command socket at <x.x.x.x:0?alias=my_CentralManager_SubmitMachine&sock=master_1104_d4a5>

05/04/24 16:01:48 Adding SHARED_PORT to DAEMON_LIST, because USE_SHARED_PORT=true (to disable this, set AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST=False)

05/04/24 16:01:48 SHARED_PORT is in front of a COLLECTOR, so it will use the configured collector port

05/04/24 16:01:48 Master restart (GRACEFUL) is watching /usr/sbin/condor_master (mtime:1712851859)

05/04/24 16:01:48 Collector port not defined, will use default: 9618

05/04/24 16:01:48 Started DaemonCore process "/usr/libexec/condor/condor_shared_port", pid and pgroup = 95395

05/04/24 16:01:48 Waiting for /var/lock/condor/shared_port_ad to appear.

05/04/24 16:01:49 Found /var/lock/condor/shared_port_ad.

05/04/24 16:01:49 Collector port not defined, will use default: 9618

05/04/24 16:01:49 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 95396

05/04/24 16:01:49 Waiting for /var/log/condor/.collector_address to appear.

05/04/24 16:01:49 Found /var/log/condor/.collector_address.

05/04/24 16:01:49 Started DaemonCore process "/usr/sbin/condor_negotiator", pid and pgroup = 95397

05/04/24 16:01:49 Started DaemonCore process "/usr/sbin/condor_schedd", pid and pgroup = 95398

05/04/24 16:01:49 Started DaemonCore process "/usr/sbin/condor_credd", pid and pgroup = 95399

05/04/24 16:01:49 Daemons::StartAllDaemons all daemons were started

05/04/24 16:02:35 Adding SHARED_PORT to DAEMON_LIST, because USE_SHARED_PORT=true (to disable this, set AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST=False)

05/04/24 16:02:35 SHARED_PORT is in front of a COLLECTOR, so it will use the configured collector port

05/04/24 16:02:35 Reconfiguring all managed daemons.

05/04/24 16:02:35 Sent SIGHUP to COLLECTOR (pid 95396)

05/04/24 16:02:35 Sent SIGHUP to CREDD (pid 95399)

05/04/24 16:02:35 Sent SIGHUP to NEGOTIATOR (pid 95397)

05/04/24 16:02:35 Sent SIGHUP to SCHEDD (pid 95398)

05/04/24 16:02:35 Sent SIGHUP to SHARED_PORT (pid 95395)

05/04/24 16:16:04 Adding SHARED_PORT to DAEMON_LIST, because USE_SHARED_PORT=true (to disable this, set AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST=False)

05/04/24 16:16:04 SHARED_PORT is in front of a COLLECTOR, so it will use the configured collector port

05/04/24 16:16:04 Reconfiguring all managed daemons.

05/04/24 16:16:04 Sent SIGHUP to COLLECTOR (pid 95396)

05/04/24 16:16:04 Sent SIGHUP to CREDD (pid 95399)

05/04/24 16:16:04 Sent SIGHUP to NEGOTIATOR (pid 95397)

05/04/24 16:16:04 Sent SIGHUP to SCHEDD (pid 95398)

05/04/24 16:16:04 Sent SIGHUP to SHARED_PORT (pid 95395)

05/04/24 17:01:49 Preen pid is 99905

05/04/24 17:01:49 Preen (pid 99905) exited with status 0

05/05/24 17:01:49 Preen pid is 196541

05/05/24 17:01:49 Preen (pid 196541) exited with status 0

Â

ShadowLog

Â

05/06/24 09:56:41 ******************************************************

05/06/24 09:56:41 ** condor_shadow (CONDOR_SHADOW) STARTING UP

05/06/24 09:56:41 ** /usr/sbin/condor_shadow

05/06/24 09:56:41 ** SubsystemInfo: name=SHADOW type=SHADOW(5) class=DAEMON(1)

05/06/24 09:56:41 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON

05/06/24 09:56:41 ** $CondorVersion: 23.0.8 2024-04-11 BuildID: 726317 PackageID: 23.0.8-1 $

05/06/24 09:56:41 ** $CondorPlatform: x86_64_AlmaLinux9 $

05/06/24 09:56:41 ** PID = 265638

05/06/24 09:56:41 ** Log last touched time unavailable (No such file or directory)

05/06/24 09:56:41 ******************************************************

05/06/24 09:56:41 Using config source: /etc/condor/condor_config

05/06/24 09:56:41 Using local config sources:

05/06/24 09:56:41ÂÂÂ /etc/condor/config.d/00-insecure.config

05/06/24 09:56:41ÂÂÂ /etc/condor/config.d/00-minicondor

05/06/24 09:56:41ÂÂÂ /etc/condor/config.d/10-stash-plugin.conf

05/06/24 09:56:41ÂÂÂ /etc/condor/condor_config.local

05/06/24 09:56:41 config Macros = 90, Sorted = 90, StringBytes = 2824, TablesBytes = 1512

05/06/24 09:56:41 CLASSAD_CACHING is OFF

05/06/24 09:56:41 Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS

05/06/24 09:56:41 SharedPortEndpoint: waiting for connections to named socket shadow_95398_4560_248

05/06/24 09:56:41 ******************************************************

05/06/24 09:56:41 ** condor_shadow (CONDOR_SHADOW) STARTING UP

05/06/24 09:56:41 ** /usr/sbin/condor_shadow

05/06/24 09:56:41 ** SubsystemInfo: name=SHADOW type=SHADOW(5) class=DAEMON(1)

05/06/24 09:56:41 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON

05/06/24 09:56:41 ** $CondorVersion: 23.0.8 2024-04-11 BuildID: 726317 PackageID: 23.0.8-1 $

05/06/24 09:56:41 ** $CondorPlatform: x86_64_AlmaLinux9 $

05/06/24 09:56:41 ** PID = 265639

05/06/24 09:56:41 ** Log last touched 5/6 09:56:41

05/06/24 09:56:41 ******************************************************

05/06/24 09:56:41 Using config source: /etc/condor/condor_config

05/06/24 09:56:41 Using local config sources:

05/06/24 09:56:41ÂÂÂ /etc/condor/config.d/00-insecure.config

05/06/24 09:56:41ÂÂÂ /etc/condor/config.d/00-minicondor

05/06/24 09:56:41ÂÂÂ /etc/condor/config.d/10-stash-plugin.conf

05/06/24 09:56:41ÂÂÂ /etc/condor/condor_config.local

05/06/24 09:56:41 config Macros = 90, Sorted = 90, StringBytes = 2824, TablesBytes = 1512

05/06/24 09:56:41 CLASSAD_CACHING is OFF

05/06/24 09:56:41 Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS

05/06/24 09:56:41 SharedPortEndpoint: waiting for connections to named socket shadow_95398_4560_249

05/06/24 09:56:41 DaemonCore: command socket at <x.x.x.x:9618?addrs=x.x.x.x-9618&alias=my_CentralManager_SubmitMachine&noUDP&sock=shadow_95398_4560_248>

05/06/24 09:56:41 DaemonCore: private command socket at <x.x.x.x:9618?addrs=x.x.x.x-9618&alias=my_CentralManager_SubmitMachine&noUDP&sock=shadow_95398_4560_248>

05/06/24 09:56:41 DaemonCore: command socket at <x.x.x.x:9618?addrs=x.x.x.x-9618&alias=my_CentralManager_SubmitMachine&noUDP&sock=shadow_95398_4560_249>

05/06/24 09:56:41 DaemonCore: private command socket at <x.x.x.x:9618?addrs=x.x.x.x-9618&alias=my_CentralManager_SubmitMachine&noUDP&sock=shadow_95398_4560_249>

05/06/24 09:56:41 Initializing a JAVA shadow for job 67.0

05/06/24 09:56:41 (67.0) (265638): LIMIT_DIRECTORY_ACCESS = <unset>

05/06/24 09:56:41 Initializing a JAVA shadow for job 67.1

05/06/24 09:56:41 (67.1) (265639): LIMIT_DIRECTORY_ACCESS = <unset>

05/06/24 09:56:41 ******************************************************

05/06/24 09:56:41 ** condor_shadow (CONDOR_SHADOW) STARTING UP

05/06/24 09:56:41 ** /usr/sbin/condor_shadow

05/06/24 09:56:41 ** SubsystemInfo: name=SHADOW type=SHADOW(5) class=DAEMON(1)

05/06/24 09:56:41 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON

05/06/24 09:56:41 ** $CondorVersion: 23.0.8 2024-04-11 BuildID: 726317 PackageID: 23.0.8-1 $

05/06/24 09:56:41 ** $CondorPlatform: x86_64_AlmaLinux9 $

05/06/24 09:56:41 ** PID = 265640

05/06/24 09:56:41 ** Log last touched 5/6 09:56:41

05/06/24 09:56:41 ******************************************************

05/06/24 09:56:41 (67.0) (265638): Request to run on slot1_1@My_Execute_machine <y.y.y.y:9618?addrs=y.y.y.y-9618&alias=My_Execute_machine&noUDP&sock=startd_15928_377e> was ACCEPTED

05/06/24 09:56:41 Using config source: /etc/condor/condor_config

05/06/24 09:56:41 Using local config sources:

05/06/24 09:56:41ÂÂÂ /etc/condor/config.d/00-insecure.config

05/06/24 09:56:41ÂÂÂ /etc/condor/config.d/00-minicondor

05/06/24 09:56:41ÂÂÂ /etc/condor/config.d/10-stash-plugin.conf

05/06/24 09:56:41ÂÂÂ /etc/condor/condor_config.local

05/06/24 09:56:41 config Macros = 90, Sorted = 90, StringBytes = 2824, TablesBytes = 1512

05/06/24 09:56:41 CLASSAD_CACHING is OFF

05/06/24 09:56:41 Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS

05/06/24 09:56:41 SharedPortEndpoint: waiting for connections to named socket shadow_95398_4560_250

05/06/24 09:56:41 DaemonCore: command socket at <x.x.x.x:9618?addrs=x.x.x.x-9618&alias=my_CentralManager_SubmitMachine&noUDP&sock=shadow_95398_4560_250>

05/06/24 09:56:41 DaemonCore: private command socket at < x.x.x.x:9618?addrs= x.x.x.x -9618&alias=my_CentralManager_SubmitMachine&noUDP&sock=shadow_95398_4560_250>

05/06/24 09:56:41 Initializing a JAVA shadow for job 67.2

05/06/24 09:56:41 (67.2) (265640): LIMIT_DIRECTORY_ACCESS = <unset>

05/06/24 09:56:41 (67.1) (265639): Request to run on slot1_2@My_Execute_machine <y.y.y.y:9618?addrs=y.y.y.y-9618&alias=My_Execute_machine&noUDP&sock=startd_15928_377e> was ACCEPTED

05/06/24 09:56:41 (67.2) (265640): Request to run on slot1_3@My_Execute_machine <y.y.y.y:9618?addrs=y.y.y.y-9618&alias=My_Execute_machine&noUDP&sock=startd_15928_377e> was ACCEPTED

05/06/24 09:56:41 (67.2) (265640): File transfer completed successfully.

05/06/24 09:56:41 (67.1) (265639): File transfer completed successfully.

05/06/24 09:56:41 (67.0) (265638): File transfer completed successfully.

05/06/24 09:56:41 (67.0) (265638): File transfer completed successfully.

05/06/24 09:56:41 (67.0) (265638): Job 67.0 going into Hold state (code 3,62): Job exit with error and will run again after a delay

05/06/24 09:56:41 (67.0) (265638): **** condor_shadow (condor_SHADOW) pid 265638 EXITING WITH STATUS 112

05/06/24 09:56:41 (67.1) (265639): File transfer completed successfully.

05/06/24 09:56:41 (67.2) (265640): File transfer completed successfully.

05/06/24 09:56:41 (67.1) (265639): Job 67.1 going into Hold state (code 3,62): Job exit with error and will run again after a delay

05/06/24 09:56:41 (67.2) (265640): Job 67.2 going into Hold state (code 3,62): Job exit with error and will run again after a delay

05/06/24 09:56:41 (67.1) (265639): **** condor_shadow (condor_SHADOW) pid 265639 EXITING WITH STATUS 112

05/06/24 09:56:41 (67.2) (265640): **** condor_shadow (condor_SHADOW) pid 265640 EXITING WITH STATUS 112

Â

Â

Windows 10 (Execute machine)

MasterLog

05/06/24 09:48:52 ******************************************************

05/06/24 09:48:52 ** condor (CONDOR_MASTER) STARTING UP

05/06/24 09:48:52 ** C:\condor\condor-8.8.10\bin\condor_master.exe

05/06/24 09:48:52 ** SubsystemInfo: name=MASTER type=MASTER(1) class=DAEMON(1)

05/06/24 09:48:52 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON

05/06/24 09:48:52 ** $CondorVersion: 23.0.8 2024-04-11 BuildID: 726317 $

05/06/24 09:48:52 ** $CondorPlatform: x86_64_Windows10 $

05/06/24 09:48:52 ** PID = 15928

05/06/24 09:48:52 ** Log last touched time unavailable (No such file or directory)

05/06/24 09:48:52 ******************************************************

05/06/24 09:48:52 Using config source: C:\condor\condor-8.8.10\condor_config

05/06/24 09:48:52 Using local config sources:

05/06/24 09:48:52ÂÂÂ condor_urlfetch -MASTER http://MyDomain:8080/data/htcondor/config/condor_config.My_executeMachine C:\condor\condor-8.8.10\condor_config.url_cache |

05/06/24 09:48:52 config Macros = 62, Sorted = 62, StringBytes = 2101, TablesBytes = 2280

05/06/24 09:48:52 CLASSAD_CACHING is OFF

05/06/24 09:48:52 Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS

05/06/24 09:48:52 SharedPortEndpoint: failed to open C:\condor\condor-8.8.10\log/shared_port_ad: No such file or directory

05/06/24 09:48:52 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s.

05/06/24 09:48:52 DaemonCore: private command socket at <y.y.y.y:0?alias=My_executeMachine&sock=master_15928_377e>

05/06/24 09:48:52 Adding SHARED_PORT to DAEMON_LIST, because USE_SHARED_PORT=true (to disable this, set AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST=False)

05/06/24 09:48:52 Master restart (GRACEFUL) is watching C:\condor\condor-8.8.10\bin\condor_master.exe (mtime:1712849588)

05/06/24 09:48:52 Adding/Checking Windows firewall exceptions for all daemons

05/06/24 09:48:52 Starting shared port with port: 9618

05/06/24 09:48:52 Started DaemonCore process "C:\condor\condor-8.8.10\bin\condor_shared_port.exe", pid and pgroup = 16112

05/06/24 09:48:52 Waiting for C:\condor\condor-8.8.10\log/shared_port_ad to appear.

05/06/24 09:48:52 Found C:\condor\condor-8.8.10\log/shared_port_ad.

05/06/24 09:48:52 Started DaemonCore process "C:\condor\condor-8.8.10\bin\condor_startd.exe", pid and pgroup = 16252

05/06/24 09:48:52 Started DaemonCore process "C:\condor\condor-8.8.10\bin\condor_kbdd.exe", pid and pgroup = 16268

05/06/24 09:48:52 Daemons::StartAllDaemons all daemons were started

05/06/24 09:48:53 Setting ready state 'Ready' for STARTD

05/06/24 09:53:52 SharedPortEndpoint: failed to open C:\condor\condor-8.8.10\log/shared_port_ad: No such file or directory

05/06/24 09:53:52 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s.

StarterLog.slot1_1

05/06/24 09:56:41 (pid:3328) ******************************************************

05/06/24 09:56:41 (pid:3328) ** condor_starter (CONDOR_STARTER) STARTING UP

05/06/24 09:56:41 (pid:3328) ** C:\condor\condor-8.8.10\bin\condor_starter.exe

05/06/24 09:56:41 (pid:3328) ** SubsystemInfo: name=STARTER type=STARTER(7) class=DAEMON(1)

05/06/24 09:56:41 (pid:3328) ** Configuration: subsystem:STARTER local:<NONE> class:DAEMON

05/06/24 09:56:41 (pid:3328) ** $CondorVersion: 23.0.8 2024-04-11 BuildID: 726317 $

05/06/24 09:56:41 (pid:3328) ** $CondorPlatform: x86_64_Windows10 $

05/06/24 09:56:41 (pid:3328) ** PID = 3328

05/06/24 09:56:41 (pid:3328) ** Log last touched time unavailable (No such file or directory)

05/06/24 09:56:41 (pid:3328) ******************************************************

05/06/24 09:56:41 (pid:3328) Using config source: C:\condor\condor-8.8.10\condor_config

05/06/24 09:56:41 (pid:3328) Using local config sources:

05/06/24 09:56:41 (pid:3328)ÂÂÂ condor_urlfetch -STARTER http://myDomain:8080/data/htcondor/config/condor_config.my_executeMachine C:\condor\condor-8.8.10\condor_config.url_cache |

05/06/24 09:56:41 (pid:3328) config Macros = 66, Sorted = 65, StringBytes = 2237, TablesBytes = 2424

05/06/24 09:56:41 (pid:3328) CLASSAD_CACHING is OFF

05/06/24 09:56:41 (pid:3328) Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS

05/06/24 09:56:41 (pid:3328) SharedPortEndpoint: listener already created.

05/06/24 09:56:41 (pid:3328) SharedPortEndpoint: failed to open C:\condor\condor-8.8.10\log/shared_port_ad: No such file or directory

05/06/24 09:56:41 (pid:3328) SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s.

05/06/24 09:56:41 (pid:3328) DaemonCore: private command socket at <y.y.y.y:0?alias=My_executeMachine&sock=slot1_1_16252_9eec_5>

05/06/24 09:56:41 (pid:3328) Communicating with shadow <x.x.x.x:9618?addrs=x.x.x.x-9618&alias=my_centralManager_submitMachine&noUDP&sock=shadow_95398_4560_248>

05/06/24 09:56:41 (pid:3328) Submitting machine is "My_centralManager_submitMachine"

05/06/24 09:56:41 (pid:3328) setting the orig job name in starter

05/06/24 09:56:41 (pid:3328) setting the orig job iwd in starter

05/06/24 09:56:41 (pid:3328) Chirp config summary: IO false, Updates false, Delayed updates true.

05/06/24 09:56:41 (pid:3328) Initialized IO Proxy.

05/06/24 09:56:41 (pid:3328) Setting resource limits not implemented!

05/06/24 09:56:41 (pid:3328) Set filetransfer runtime ads to C:\condor\condor-8.8.10\execute\dir_3328\.job.ad and C:\condor\condor-8.8.10\execute\dir_3328\.machine.ad.

05/06/24 09:56:41 (pid:3328) File transfer completed successfully.

05/06/24 09:56:41 (pid:3328) Job 67.0 set to execute immediately

05/06/24 09:56:41 (pid:3328) Starting a JAVA universe job with ID: 67.0

05/06/24 09:56:41 (pid:3328) JavaProc::StartJob could not stat jar file C:\condor\condor-8.8.10\execute\dir_3328\transform-core.jar: errno 2

05/06/24 09:56:41 (pid:3328) JavaProc::StartJob could not stat jar file C:\condor\condor-8.8.10\execute\dir_3328\jvm-locator.jar: errno 2

05/06/24 09:56:41 (pid:3328) JavaProc: Cmd=C:\PROGRA~1\Java\jre1.8.0_101\bin\java.exe

05/06/24 09:56:41 (pid:3328) JavaProc: Args=-classpath C:\condor\condor-8.8.10\bin;.;C:\condor\condor-8.8.10\execute\dir_3328\transform-core.jar;C:\condor\condor-8.8.10\execute\dir_3328\jvm-locator.jar -Dchirp.config=C:\condor\condor-8.8.10\execute\dir_3328\chirp.config CondorJavaWrapper C:\condor\condor-8.8.10\execute\dir_3328\jvm.start C:\condor\condor-8.8.10\execute\dir_3328\jvm.end ca.qc.banq.dnum.JVMLocator transform-core.jar ca.qc.banq.dnum.Launcher ca.qc.banq.dnum.transform.cmd.ImagickConversion -pathInput N:\TraitementNumerique\ZONE51\Sady_Doucoure\condor_test\input -pathOutput N:\TraitementNumerique\ZONE51\Sady_Doucoure\condor_test\output -cmd ${INPUT}[0]' '-rotate' '180' '${OUTPUT} -fileInput 82238_01a_0010.tif

05/06/24 09:56:41 (pid:3328) Tracking process family by login "condor-slot1_1"

05/06/24 09:56:41 (pid:3328) IWD: C:\condor\condor-8.8.10\execute\dir_3328

05/06/24 09:56:41 (pid:3328) Output file: C:\condor\condor-8.8.10\execute\dir_3328\_condor_stdout

05/06/24 09:56:41 (pid:3328) Error file: C:\condor\condor-8.8.10\execute\dir_3328\_condor_stderr

05/06/24 09:56:41 (pid:3328) Renice expr "10" evaluated to 10

05/06/24 09:56:41 (pid:3328) Running job as user condor-slot1_1

05/06/24 09:56:41 (pid:3328) About to exec C:\PROGRA~1\Java\jre1.8.0_101\bin\java.exe -classpath C:\condor\condor-8.8.10\bin;.;C:\condor\condor-8.8.10\execute\dir_3328\transform-core.jar;C:\condor\condor-8.8.10\execute\dir_3328\jvm-locator.jar -Dchirp.config=C:\condor\condor-8.8.10\execute\dir_3328\chirp.config CondorJavaWrapper C:\condor\condor-8.8.10\execute\dir_3328\jvm.start C:\condor\condor-8.8.10\execute\dir_3328\jvm.end ca.qc.banq.dnum.JVMLocator transform-core.jar ca.qc.banq.dnum.Launcher ca.qc.banq.dnum.transform.cmd.ImagickConversion -pathInput N:\TraitementNumerique\ZONE51\Sady_Doucoure\condor_test\input -pathOutput N:\TraitementNumerique\ZONE51\Sady_Doucoure\condor_test\output -cmd ${INPUT}[0]' '-rotate' '180' '${OUTPUT} -fileInput 82238_01a_0010.tif

05/06/24 09:56:41 (pid:3328) Create_Process succeeded, pid=15584

05/06/24 09:56:41 (pid:3328) Process exited, pid=15584, status=0

05/06/24 09:56:41 (pid:3328) JavaProc: JVM pid 15584 has finished

05/06/24 09:56:41 (pid:3328) JavaProc: JVM exited normally with code 0

05/06/24 09:56:41 (pid:3328) JavaProc: Wrapper left start record C:\condor\condor-8.8.10\execute\dir_3328\jvm.start

05/06/24 09:56:41 (pid:3328) JavaProc: Wrapper left end record C:\condor\condor-8.8.10\execute\dir_3328\jvm.end

05/06/24 09:56:41 (pid:3328) JavaProc: Job could not be executed

05/06/24 09:56:41 (pid:3328) JavaProc: unlinking C:\condor\condor-8.8.10\execute\dir_3328\jvm.start and C:\condor\condor-8.8.10\execute\dir_3328\jvm.end

05/06/24 09:56:41 (pid:3328) JavaProc: ExceptionHierarchy "java.lang.Object java.lang.Throwable java.lang.Exception java.lang.ReflectiveOperationException java.lang.ClassNotFoundException "

05/06/24 09:56:41 (pid:3328) JavaProc: ExceptionName "java.lang.ClassNotFoundException"

05/06/24 09:56:41 (pid:3328) JavaProc: ExceptionType "java.lang.Exception"

05/06/24 09:56:41 (pid:3328) JavaProc: ExceptionHierarchy "java.lang.Object java.lang.Throwable java.lang.Exception java.lang.ReflectiveOperationException java.lang.ClassNotFoundException "

05/06/24 09:56:41 (pid:3328) JavaProc: ExceptionName "java.lang.ClassNotFoundException"

05/06/24 09:56:41 (pid:3328) JavaProc: ExceptionType "java.lang.Exception"

05/06/24 09:56:41 (pid:3328) JavaProc: ExceptionHierarchy "java.lang.Object java.lang.Throwable java.lang.Exception java.lang.ReflectiveOperationException java.lang.ClassNotFoundException "

05/06/24 09:56:41 (pid:3328) JavaProc: ExceptionName "java.lang.ClassNotFoundException"

05/06/24 09:56:41 (pid:3328) JavaProc: ExceptionType "java.lang.Exception"

05/06/24 09:56:41 (pid:3328) Failed to open '.update.ad' to read update ad: No such file or directory (2).

05/06/24 09:56:41 (pid:3328) Failed to open '.update.ad' to read update ad: No such file or directory (2).

05/06/24 09:56:41 (pid:3328) All jobs have exited... starter exiting

05/06/24 09:56:41 (pid:3328) **** condor_starter (condor_STARTER) pid 3328 EXITING WITH STATUS 0

Â

**Legends:**

- x.x.x.x is the IP address of the Redhat server which is the central manager and the submit machine.

- y.y.y.y is the IP address of my Windows 10 machine which is the execution machine.

Here's the translation of your closing remarks into English:

---

Any help will be greatly appreciated, and I thank you in advance.

Sincerely,
Sady

Attachment: HTcondor doc.docx
Description: application/vnd.openxmlformats-officedocument.wordprocessingml.document