Hello to the entire HTCondor community.
I am new to HTCondor and I am encountering a problem that I can't solve. I have HTCondor version 23.0.8 installed on a Red Hat Enterprise Linux 9.3 server as a central manager and submit machine. I have installed HTCondor version 23.0.8 on a Windows 10 machine as an execution node.
My problem is that when I launch a job on my Redhat machine, all
resources match well and the job is launched on the Windows execution machine,
except that I always receive a java.lang.ClassNotFoundException error. My
executables are never found even though in the submission file I specified transfer_executable= true
. Initially, I thought that Condor was not transferring the executables,
so I modified my submission file and set transfer_executable= false
and made sure to place the executables in a directory and provided the
absolute path of the executables in the submission file. But I still encounter
the same error even though the executables are indeed present in the directory.
Below I am attaching the config files of my HTCondor installation, some
relevant logs, and my submission file.
ÂHTCondor config files:
condor_config:
######################################################################
##
## Âcondor_config
##
## ÂThis is the global configuration file for condor. This is where
## Âyou define where the local config file is. Any settings
## Âmade here may potentially be overridden in the local configuration
## Âfile. KEEP THAT IN MIND! To double-check that a variable
is
## Âgetting set from the configuration file that you expect, use
## Âcondor_config_val -v <variable name>
##
## Âcondor_config.annotated is a more detailed sample config file
##
## ÂUnless otherwise specified, settings that are commented out show
## Âthe defaults that are used if you don't define a value. Settings
## Âthat are defined here MUST BE DEFINED since they have no default
## Âvalue.
##
######################################################################
## ÂWhere have you installed the bin, sbin and lib condor directories?
RELEASE_DIR = /usr
## ÂWhere is the local condor directory for each host? This is where
the local config file(s), logs and
## Âspool/execute directories are located. this is the default for Linux
and Unix systems.
LOCAL_DIR = /var
## ÂWhere is the machine-specific local config file for each host?
LOCAL_CONFIG_FILE = /etc/condor/condor_config.local
## ÂIf your configuration is on a shared file system, then this might be a
better default
#LOCAL_CONFIG_FILE = $(RELEASE_DIR)/etc/$(HOSTNAME).local
## ÂIf the local config file is not present, is it an error? (WARNING:
This is a potential security issue.)
REQUIRE_LOCAL_CONFIG_FILE = true
## ÂThe normal way to do configuration with RPM and Debian packaging is to
read all of the
## Âfiles in a given directory that don't match a regex as configuration
files.
## ÂConfig files are read in lexicographic order.
## ÂMultiple directories may be specified, separated by commas;
directories
## Âare read in left-to-right order.
LOCAL_CONFIG_DIR = /usr/share/condor/config.d,/etc/condor/config.d
#LOCAL_CONFIG_DIR_EXCLUDE_REGEXP = ^((\..*)|(.*~)|(#.*)|(.*\.rpmsave)|(.*\.rpmnew))$
##
## Do NOT use host-based security by default.
##
## This was the default for the 8.8 series (and earlier), but it is
## intrinsically insecure. To make the 9.0 series secure by default, we
## commented it out.
##
## You should seriously consider improving your security configuration.
##
## To continue to use your old security configuration, knowing that it is
## insecure, add the line 'use SECURITY:HOST_BASED' to your local
## configuration directory. Don't just uncomment the final line in this
## comment block; changes in this file may be lost during your next upgrade.
## The following shell command will make the change on most Linux systems.
##
## echo 'use SECURITY:HOST_BASED' >> $(condor_config_val
LOCAL_CONFIG_DIR)/00-insecure.config
##
## ÂTo expand your condor pool beyond a single host, set ALLOW_WRITE to
match all of the hosts
#ALLOW_WRITE = *.cs.wisc.edu
## ÂFLOCK_FROM defines the machines that grant access to your pool via
flocking. (i.e. these machines can join your pool).
#FLOCK_FROM =
## ÂFLOCK_TO defines the central managers that your schedd will advertise
itself to (i.e. these pools will give matches to your schedd).
#FLOCK_TO = condor.cs.wisc.edu, cm.example.edu
##--------------------------------------------------------------------
## Values set by the rpm patch script:
##--------------------------------------------------------------------
## For Unix machines, the path and file name of the file containing
## the pool password for password authentication.
#SEC_PASSWORD_FILE = $(LOCAL_DIR)/lib/condor/pool_password
## ÂPathnames
RUN Â Â = $(LOCAL_DIR)/run/condor
LOG Â Â = $(LOCAL_DIR)/log/condor
LOCK Â Â= $(LOCAL_DIR)/lock/condor
SPOOL Â = $(LOCAL_DIR)/lib/condor/spool
EXECUTE = $(LOCAL_DIR)/lib/condor/execute
BIN Â Â = $(RELEASE_DIR)/bin
LIB Â Â = $(RELEASE_DIR)/lib64/condor
INCLUDE = $(RELEASE_DIR)/include/condor
SBIN Â Â= $(RELEASE_DIR)/sbin
LIBEXEC = $(RELEASE_DIR)/libexec/condor
SHARE Â = $(RELEASE_DIR)/share/condor
JAVA_CLASSPATH_DEFAULT = $(SHARE) .
## ÂInstall the minicondor package to run HTCondor on a single node
condor_config.local:
CONDOR_HOST=my_host
DEFAULT_DOMAIN_NAME = my_domain
CONDOR_ADMIN=me@xxxxxxxxx
COLLECTOR_NAME=DCN
DAEMON_LIST = $(DAEMON_LIST) CREDD
MAX_JOBS_PER_OWNER Â Â Â= 10000000
MAX_JOBS_RUNNING Â Â Â Â= 500
MAX_JOBS_SUBMITTED Â Â Â= 10000000
MAX_JOBS_PER_SUBMISSION = 10000000
PREEN_ARGS=-r
EVENT_LOG_FORMAT_OPTIONS=ISO_DATE
DEFAULT_USERLOG_FORMAT_OPTIONS=ISO_DATE
WARN_ON_UNUSED_SUBMIT_FILE_MACROS=FALSE
PRIORITY_HALFLIFE = 1.0e100
IsWeekday = (ClockDay > 0 && ClockDay < 6)
IsWeekend = (Clockday == 0 || ClockDay == 6)
IsBusinessHours = (ClockMin >= 420 && ClockMin < 1140) Â
isNightTime = (ClockMin < 420 || ClockMin >= 1140)
00-minicondor
# HTCONDOR CONFIGURATION TO CREATE A POOL WITH ONE MACHINE
#
# This file was created upon initial installation of HTCondor.
# It contains configuration settings to set up a secure HTCondor
# installation consisting of **just one single machine**.
# YOU WILL WANT TO REMOVE THIS FILE IF/WHEN YOU DECIDE TO ADD ADDITIONAL
# MACHINES TO YOUR HTCONDOR INSTALLATION! Most of these settings do
# not make sense if you have a multi-server pool.
#
# See the Quick Start Installation guide at:
#ÂÂÂÂ https://htcondor.org/manual/quickstart.html
#
Â
# ---Â NODE ROLESÂ ---
Â
# Every pool needs one Central Manager, some number of Submit nodes and
# as many Execute nodes as you can find. Consult the manual to learn
# about addtional roles.
Â
use ROLE: CentralManager
use ROLE: Submit
#use ROLE: Execute
Â
# --- NETWORK SETTINGS ---
Â
# Configure HTCondor services to listen to port 9618 on the IPv4
# loopback interface.
#NETWORK_INTERFACE = 127.0.0.1
#BIND_ALL_INTERFACES = False
CONDOR_HOST=My_CentralManager_SubmitMachine
Â
Â
EXECUTE_MACHINES @=BNQ
ÂÂÂÂÂÂÂ ## Groupe: DCN ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂNotes: Poste de Germans
ÂÂÂÂÂÂÂ BNQ135020.bnquebec.ca,
ÂÂÂÂÂÂÂ ## Groupe: DCNÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Notes: Salle Yves-ThÃÆÃÂriault (nouveau poste 2023)
ÂÂÂÂÂÂÂ My_Execute_machine
ÂÂÂÂÂÂÂ #BNQ134179
@BNQ
Â
Â
# --- SECURITY SETTINGS ---
Â
# Verify authenticity of HTCondor services by checking if they are
# running with an effective user id of user "condor".
#SEC_DEFAULT_AUTHENTICATION = REQUIRED
#SEC_DEFAULT_INTEGRITY = REQUIRED
ALLOW_DAEMON = condor@$(UID_DOMAIN) $(CONDOR_HOST) $(EXECUTE_MACHINES)
ALLOW_NEGOTIATOR = condor@$(UID_DOMAIN) $(EXECUTE_MACHINES)
Â
# Configure so only user root or user condor can run condor_on,
# condor_off, condor_restart, and condor_userprio commands to manage
# HTCondor on this machine.
# If you wish any user to do so, comment out the line below.
ALLOW_ADMINISTRATOR = root@$(UID_DOMAIN) condor@$(UID_DOMAIN) $(CONDOR_HOST)
Â
# Allow anyone (on the loopback interface) to submit jobs.
ALLOW_WRITE = *
# Allow anyone (on the loopback interface) to run condor_q or condor_status.
ALLOW_READ = *
Â
# --- PERFORMANCE TUNING SETTINGS ---
Â
# Since there is just one server in this pool, we can tune various
# polling intervals to be much more responsive than the system defaults
# (which are tuned for pools with thousands of servers). This will
# enable jobs to be scheduled faster, and job monitoring to happen more
# frequently.
SCHEDD_INTERVAL = 5
NEGOTIATOR_INTERVAL = 2
NEGOTIATOR_CYCLE_DELAY = 5
STARTER_UPDATE_INTERVAL = 5
SHADOW_QUEUE_UPDATE_INTERVAL = 10
UPDATE_INTERVAL = 5
RUNBENCHMARKS = 0
Â
# --- COMMON CHANGES ---
Â
# Uncomment the lines below and do 'sudo condor_reconfig' if you wish
# condor_q to show jobs from all users with one line per job by default.
#CONDOR_Q_DASH_BATCH_IS_DEFAULT = False
#CONDOR_Q_ONLY_MY_JOBS = False
00-insecure.config
use SECURITY:HOST_BASED
My submit file:
cmd_parametersÂÂÂÂÂÂÂÂÂ = ${INPUT}[0] -rotate 180 ${OUTPUT}
extensionsÂÂÂÂÂÂÂÂÂÂÂÂÂ = *
json_parametersÂÂÂÂÂÂÂÂ =
path_inputÂÂÂÂÂÂÂÂÂÂÂÂÂ = N:\TraitementNumerique\ZONE51\Sady_Doucoure\condor_test\input
path_outputÂÂÂÂÂÂÂÂÂÂÂÂ = N:\TraitementNumerique\ZONE51\Sady_Doucoure\condor_test\output
timestampÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = 1714742017803
labelÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = Test-image
typeÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = CONV.IMAGE
jvm_locator_classÂÂÂÂÂÂ = ca.qc.banq.dnum.JVMLocator
java_execute_jarÂÂÂÂÂÂÂ = transform-core.jar
java_launcherÂÂÂÂÂÂÂÂÂÂ = ca.qc.banq.dnum.Launcher
java_main_classÂÂÂÂÂÂÂÂ = ca.qc.banq.dnum.transform.cmd.ImagickConversion
input_line_countÂÂÂÂÂÂÂ = 3
Â
JobBatchNameÂÂÂÂÂÂÂÂÂÂÂ = $(label) - $(type) - $(timestamp)
accounting_group_userÂÂ = sady.doucoure
executableÂÂÂÂÂÂÂÂÂÂÂÂÂ = C:\condor\apps\jvm-locator.jar
universeÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = java
initialdirÂÂÂÂÂÂÂÂÂÂÂÂÂ = /data01/condor/jobs/1714742017803-CONV.IMAGE-Test-image
Â
jar_filesÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = C:\condor\apps\transform-core.jar, C:\condor\apps\jvm-locator.jar
argumentsÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = "$(jvm_locator_class) $(java_execute_jar) $(java_launcher) $(java_main_class) -pathInput '$(path_input)' -pathOutput '$(path_output)' -cmd '$(cmd_parameters)' -fileInput '$(line)'"
Â
range_startÂÂÂÂÂÂÂÂÂÂÂÂ = $INT(ItemIndex)/1000*1000
range_endÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = ($INT(ItemIndex)/1000+1)*1000-1
Â
logÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = $(ClusterId).log
outputÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = $INT(range_start)-$INT(range_end)/$(ClusterId).$INT(ItemIndex,%09d).stdout
errorÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = $INT(range_start)-$INT(range_end)/$(ClusterId).$INT(ItemIndex,%09d).stderr
listÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = input.txt
Â
transfer_executableÂÂÂÂ = false
should_transfer_filesÂÂ = true
transfer_input_filesÂÂÂ =
when_to_transfer_output = ON_EXIT_OR_EVICT
transfer_output_filesÂÂ = ""
Â
request_memoryÂÂÂÂÂÂÂÂÂ = 500MB
request_diskÂÂÂÂÂÂÂÂÂÂÂ = 250MB
requirementsÂÂÂÂÂÂÂÂÂÂÂ = (Arch == "INTEL" || Arch == "X86_64")
Â
materialize_max_idleÂÂÂ = 20
max_retriesÂÂÂÂÂÂÂÂÂÂÂÂ = 5
Â
on_exit_removeÂÂÂÂÂÂÂÂÂ = (ExitCode == 0 || ExitCode == 0 || ExitCode == 1500 || ExitCode == 1501 || ExitCode == 1502 || ExitCode == 1503 || ExitCode == 1504 || ExitCode == 1505)
Â
on_exit_holdÂÂÂÂÂÂÂÂÂÂÂ = false
on_exit_hold_reasonÂÂÂÂ = "Job exit with error and will run again after a delay"
on_exit_hold_subcodeÂÂÂ = 62
Â
periodic_holdÂÂÂÂÂÂÂÂÂÂ = (JobStatus == 2) && (time() - EnteredCurrentStatus) > (3 * 24 * 60 * 60)
periodic_hold_reasonÂÂÂ = "Job ran for more than 3 days"
periodic_hold_subcodeÂÂ = 42
periodic_releaseÂÂÂÂÂÂÂ = NumJobCompletions <= JobMaxRetries && (time() - EnteredCurrentStatus) > 20 * 60 && (HoldReasonCode != 1 && HoldReasonSubCode != 99)
Â
queue line from $(initialdir)/$(list)
Â
LOGS
Redhat (Central manager, submit machine)
MasterLog
05/04/24 16:01:47 ******************************************************
05/04/24 16:01:47 Using config source: /etc/condor/condor_config
05/04/24 16:01:47 Using local config sources:
05/04/24 16:01:47ÂÂÂ /etc/condor/config.d/00-insecure.config
05/04/24 16:01:47ÂÂÂ /etc/condor/config.d/00-minicondor
05/04/24 16:01:47ÂÂÂ /etc/condor/config.d/10-stash-plugin.conf
05/04/24 16:01:47ÂÂÂ /etc/condor/condor_config.local
05/04/24 16:01:47 config Macros = 87, Sorted = 87, StringBytes = 2679, TablesBytes = 3204
05/04/24 16:01:47 CLASSAD_CACHING is OFF
05/04/24 16:01:47 Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS
05/04/24 16:01:48 SharedPortEndpoint: waiting for connections to named socket master_1104_d4a5
05/04/24 16:01:48 SharedPortEndpoint: failed to open /var/lock/condor/shared_port_ad: No such file or directory
05/04/24 16:01:48 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s.
05/04/24 16:01:48 DaemonCore: private command socket at <x.x.x.x:0?alias=my_CentralManager_SubmitMachine&sock=master_1104_d4a5>
05/04/24 16:01:48 Adding SHARED_PORT to DAEMON_LIST, because USE_SHARED_PORT=true (to disable this, set AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST=False)
05/04/24 16:01:48 SHARED_PORT is in front of a COLLECTOR, so it will use the configured collector port
05/04/24 16:01:48 Master restart (GRACEFUL) is watching /usr/sbin/condor_master (mtime:1712851859)
05/04/24 16:01:48 Collector port not defined, will use default: 9618
05/04/24 16:01:48 Started DaemonCore process "/usr/libexec/condor/condor_shared_port", pid and pgroup = 95395
05/04/24 16:01:48 Waiting for /var/lock/condor/shared_port_ad to appear.
05/04/24 16:01:49 Found /var/lock/condor/shared_port_ad.
05/04/24 16:01:49 Collector port not defined, will use default: 9618
05/04/24 16:01:49 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 95396
05/04/24 16:01:49 Waiting for /var/log/condor/.collector_address to appear.
05/04/24 16:01:49 Found /var/log/condor/.collector_address.
05/04/24 16:01:49 Started DaemonCore process "/usr/sbin/condor_negotiator", pid and pgroup = 95397
05/04/24 16:01:49 Started DaemonCore process "/usr/sbin/condor_schedd", pid and pgroup = 95398
05/04/24 16:01:49 Started DaemonCore process "/usr/sbin/condor_credd", pid and pgroup = 95399
05/04/24 16:01:49 Daemons::StartAllDaemons all daemons were started
05/04/24 16:02:35 Adding SHARED_PORT to DAEMON_LIST, because USE_SHARED_PORT=true (to disable this, set AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST=False)
05/04/24 16:02:35 SHARED_PORT is in front of a COLLECTOR, so it will use the configured collector port
05/04/24 16:02:35 Reconfiguring all managed daemons.
05/04/24 16:02:35 Sent SIGHUP to COLLECTOR (pid 95396)
05/04/24 16:02:35 Sent SIGHUP to CREDD (pid 95399)
05/04/24 16:02:35 Sent SIGHUP to NEGOTIATOR (pid 95397)
05/04/24 16:02:35 Sent SIGHUP to SCHEDD (pid 95398)
05/04/24 16:02:35 Sent SIGHUP to SHARED_PORT (pid 95395)
05/04/24 16:16:04 Adding SHARED_PORT to DAEMON_LIST, because USE_SHARED_PORT=true (to disable this, set AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST=False)
05/04/24 16:16:04 SHARED_PORT is in front of a COLLECTOR, so it will use the configured collector port
05/04/24 16:16:04 Reconfiguring all managed daemons.
05/04/24 16:16:04 Sent SIGHUP to COLLECTOR (pid 95396)
05/04/24 16:16:04 Sent SIGHUP to CREDD (pid 95399)
05/04/24 16:16:04 Sent SIGHUP to NEGOTIATOR (pid 95397)
05/04/24 16:16:04 Sent SIGHUP to SCHEDD (pid 95398)
05/04/24 16:16:04 Sent SIGHUP to SHARED_PORT (pid 95395)
05/04/24 17:01:49 Preen pid is 99905
05/04/24 17:01:49 Preen (pid 99905) exited with status 0
05/05/24 17:01:49 Preen pid is 196541
05/05/24 17:01:49 Preen (pid 196541) exited with status 0
Â
ShadowLog
Â
05/06/24 09:56:41 ******************************************************
05/06/24 09:56:41 ** condor_shadow (CONDOR_SHADOW) STARTING UP
05/06/24 09:56:41 ** /usr/sbin/condor_shadow
05/06/24 09:56:41 ** SubsystemInfo: name=SHADOW type=SHADOW(5) class=DAEMON(1)
05/06/24 09:56:41 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON
05/06/24 09:56:41 ** $CondorVersion: 23.0.8 2024-04-11 BuildID: 726317 PackageID: 23.0.8-1 $
05/06/24 09:56:41 ** $CondorPlatform: x86_64_AlmaLinux9 $
05/06/24 09:56:41 ** PID = 265638
05/06/24 09:56:41 ** Log last touched time unavailable (No such file or directory)
05/06/24 09:56:41 ******************************************************
05/06/24 09:56:41 Using config source: /etc/condor/condor_config
05/06/24 09:56:41 Using local config sources:
05/06/24 09:56:41ÂÂÂ /etc/condor/config.d/00-insecure.config
05/06/24 09:56:41ÂÂÂ /etc/condor/config.d/00-minicondor
05/06/24 09:56:41ÂÂÂ /etc/condor/config.d/10-stash-plugin.conf
05/06/24 09:56:41ÂÂÂ /etc/condor/condor_config.local
05/06/24 09:56:41 config Macros = 90, Sorted = 90, StringBytes = 2824, TablesBytes = 1512
05/06/24 09:56:41 CLASSAD_CACHING is OFF
05/06/24 09:56:41 Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS
05/06/24 09:56:41 SharedPortEndpoint: waiting for connections to named socket shadow_95398_4560_248
05/06/24 09:56:41 ******************************************************
05/06/24 09:56:41 ** condor_shadow (CONDOR_SHADOW) STARTING UP
05/06/24 09:56:41 ** /usr/sbin/condor_shadow
05/06/24 09:56:41 ** SubsystemInfo: name=SHADOW type=SHADOW(5) class=DAEMON(1)
05/06/24 09:56:41 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON
05/06/24 09:56:41 ** $CondorVersion: 23.0.8 2024-04-11 BuildID: 726317 PackageID: 23.0.8-1 $
05/06/24 09:56:41 ** $CondorPlatform: x86_64_AlmaLinux9 $
05/06/24 09:56:41 ** PID = 265639
05/06/24 09:56:41 ** Log last touched 5/6 09:56:41
05/06/24 09:56:41 ******************************************************
05/06/24 09:56:41 Using config source: /etc/condor/condor_config
05/06/24 09:56:41 Using local config sources:
05/06/24 09:56:41ÂÂÂ /etc/condor/config.d/00-insecure.config
05/06/24 09:56:41ÂÂÂ /etc/condor/config.d/00-minicondor
05/06/24 09:56:41ÂÂÂ /etc/condor/config.d/10-stash-plugin.conf
05/06/24 09:56:41ÂÂÂ /etc/condor/condor_config.local
05/06/24 09:56:41 config Macros = 90, Sorted = 90, StringBytes = 2824, TablesBytes = 1512
05/06/24 09:56:41 CLASSAD_CACHING is OFF
05/06/24 09:56:41 Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS
05/06/24 09:56:41 SharedPortEndpoint: waiting for connections to named socket shadow_95398_4560_249
05/06/24 09:56:41 DaemonCore: command socket at <x.x.x.x:9618?addrs=x.x.x.x-9618&alias=my_CentralManager_SubmitMachine&noUDP&sock=shadow_95398_4560_248>
05/06/24 09:56:41 DaemonCore: private command socket at <x.x.x.x:9618?addrs=x.x.x.x-9618&alias=my_CentralManager_SubmitMachine&noUDP&sock=shadow_95398_4560_248>
05/06/24 09:56:41 DaemonCore: command socket at <x.x.x.x:9618?addrs=x.x.x.x-9618&alias=my_CentralManager_SubmitMachine&noUDP&sock=shadow_95398_4560_249>
05/06/24 09:56:41 DaemonCore: private command socket at <x.x.x.x:9618?addrs=x.x.x.x-9618&alias=my_CentralManager_SubmitMachine&noUDP&sock=shadow_95398_4560_249>
05/06/24 09:56:41 Initializing a JAVA shadow for job 67.0
05/06/24 09:56:41 (67.0) (265638): LIMIT_DIRECTORY_ACCESS = <unset>
05/06/24 09:56:41 Initializing a JAVA shadow for job 67.1
05/06/24 09:56:41 (67.1) (265639): LIMIT_DIRECTORY_ACCESS = <unset>
05/06/24 09:56:41 ******************************************************
05/06/24 09:56:41 ** condor_shadow (CONDOR_SHADOW) STARTING UP
05/06/24 09:56:41 ** /usr/sbin/condor_shadow
05/06/24 09:56:41 ** SubsystemInfo: name=SHADOW type=SHADOW(5) class=DAEMON(1)
05/06/24 09:56:41 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON
05/06/24 09:56:41 ** $CondorVersion: 23.0.8 2024-04-11 BuildID: 726317 PackageID: 23.0.8-1 $
05/06/24 09:56:41 ** $CondorPlatform: x86_64_AlmaLinux9 $
05/06/24 09:56:41 ** PID = 265640
05/06/24 09:56:41 ** Log last touched 5/6 09:56:41
05/06/24 09:56:41 ******************************************************
05/06/24 09:56:41 (67.0) (265638): Request to run on slot1_1@My_Execute_machine <y.y.y.y:9618?addrs=y.y.y.y-9618&alias=My_Execute_machine&noUDP&sock=startd_15928_377e> was ACCEPTED
05/06/24 09:56:41 Using config source: /etc/condor/condor_config
05/06/24 09:56:41 Using local config sources:
05/06/24 09:56:41ÂÂÂ /etc/condor/config.d/00-insecure.config
05/06/24 09:56:41ÂÂÂ /etc/condor/config.d/00-minicondor
05/06/24 09:56:41ÂÂÂ /etc/condor/config.d/10-stash-plugin.conf
05/06/24 09:56:41ÂÂÂ /etc/condor/condor_config.local
05/06/24 09:56:41 config Macros = 90, Sorted = 90, StringBytes = 2824, TablesBytes = 1512
05/06/24 09:56:41 CLASSAD_CACHING is OFF
05/06/24 09:56:41 Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS
05/06/24 09:56:41 SharedPortEndpoint: waiting for connections to named socket shadow_95398_4560_250
05/06/24 09:56:41 DaemonCore: command socket at <x.x.x.x:9618?addrs=x.x.x.x-9618&alias=my_CentralManager_SubmitMachine&noUDP&sock=shadow_95398_4560_250>
05/06/24 09:56:41 DaemonCore: private command socket at < x.x.x.x:9618?addrs= x.x.x.x -9618&alias=my_CentralManager_SubmitMachine&noUDP&sock=shadow_95398_4560_250>
05/06/24 09:56:41 Initializing a JAVA shadow for job 67.2
05/06/24 09:56:41 (67.2) (265640): LIMIT_DIRECTORY_ACCESS = <unset>
05/06/24 09:56:41 (67.1) (265639): Request to run on slot1_2@My_Execute_machine <y.y.y.y:9618?addrs=y.y.y.y-9618&alias=My_Execute_machine&noUDP&sock=startd_15928_377e> was ACCEPTED
05/06/24 09:56:41 (67.2) (265640): Request to run on slot1_3@My_Execute_machine <y.y.y.y:9618?addrs=y.y.y.y-9618&alias=My_Execute_machine&noUDP&sock=startd_15928_377e> was ACCEPTED
05/06/24 09:56:41 (67.2) (265640): File transfer completed successfully.
05/06/24 09:56:41 (67.1) (265639): File transfer completed successfully.
05/06/24 09:56:41 (67.0) (265638): File transfer completed successfully.
05/06/24 09:56:41 (67.0) (265638): File transfer completed successfully.
05/06/24 09:56:41 (67.0) (265638): Job 67.0 going into Hold state (code 3,62): Job exit with error and will run again after a delay
05/06/24 09:56:41 (67.0) (265638): **** condor_shadow (condor_SHADOW) pid 265638 EXITING WITH STATUS 112
05/06/24 09:56:41 (67.1) (265639): File transfer completed successfully.
05/06/24 09:56:41 (67.2) (265640): File transfer completed successfully.
05/06/24 09:56:41 (67.1) (265639): Job 67.1 going into Hold state (code 3,62): Job exit with error and will run again after a delay
05/06/24 09:56:41 (67.2) (265640): Job 67.2 going into Hold state (code 3,62): Job exit with error and will run again after a delay
05/06/24 09:56:41 (67.1) (265639): **** condor_shadow (condor_SHADOW) pid 265639 EXITING WITH STATUS 112
05/06/24 09:56:41 (67.2) (265640): **** condor_shadow (condor_SHADOW) pid 265640 EXITING WITH STATUS 112
Â
Â
Windows 10 (Execute machine)
MasterLog
05/06/24 09:48:52 ******************************************************
05/06/24 09:48:52 ** condor (CONDOR_MASTER) STARTING UP
05/06/24 09:48:52 ** C:\condor\condor-8.8.10\bin\condor_master.exe
05/06/24 09:48:52 ** SubsystemInfo: name=MASTER type=MASTER(1) class=DAEMON(1)
05/06/24 09:48:52 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
05/06/24 09:48:52 ** $CondorVersion: 23.0.8 2024-04-11 BuildID: 726317 $
05/06/24 09:48:52 ** $CondorPlatform: x86_64_Windows10 $
05/06/24 09:48:52 ** PID = 15928
05/06/24 09:48:52 ** Log last touched time unavailable (No such file or directory)
05/06/24 09:48:52 ******************************************************
05/06/24 09:48:52 Using config source: C:\condor\condor-8.8.10\condor_config
05/06/24 09:48:52 Using local config sources:
05/06/24 09:48:52ÂÂÂ condor_urlfetch -MASTER http://MyDomain:8080/data/htcondor/config/condor_config.My_executeMachine C:\condor\condor-8.8.10\condor_config.url_cache |
05/06/24 09:48:52 config Macros = 62, Sorted = 62, StringBytes = 2101, TablesBytes = 2280
05/06/24 09:48:52 CLASSAD_CACHING is OFF
05/06/24 09:48:52 Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS
05/06/24 09:48:52 SharedPortEndpoint: failed to open C:\condor\condor-8.8.10\log/shared_port_ad: No such file or directory
05/06/24 09:48:52 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s.
05/06/24 09:48:52 DaemonCore: private command socket at <y.y.y.y:0?alias=My_executeMachine&sock=master_15928_377e>
05/06/24 09:48:52 Adding SHARED_PORT to DAEMON_LIST, because USE_SHARED_PORT=true (to disable this, set AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST=False)
05/06/24 09:48:52 Master restart (GRACEFUL) is watching C:\condor\condor-8.8.10\bin\condor_master.exe (mtime:1712849588)
05/06/24 09:48:52 Adding/Checking Windows firewall exceptions for all daemons
05/06/24 09:48:52 Starting shared port with port: 9618
05/06/24 09:48:52 Started DaemonCore process "C:\condor\condor-8.8.10\bin\condor_shared_port.exe", pid and pgroup = 16112
05/06/24 09:48:52 Waiting for C:\condor\condor-8.8.10\log/shared_port_ad to appear.
05/06/24 09:48:52 Found C:\condor\condor-8.8.10\log/shared_port_ad.
05/06/24 09:48:52 Started DaemonCore process "C:\condor\condor-8.8.10\bin\condor_startd.exe", pid and pgroup = 16252
05/06/24 09:48:52 Started DaemonCore process "C:\condor\condor-8.8.10\bin\condor_kbdd.exe", pid and pgroup = 16268
05/06/24 09:48:52 Daemons::StartAllDaemons all daemons were started
05/06/24 09:48:53 Setting ready state 'Ready' for STARTD
05/06/24 09:53:52 SharedPortEndpoint: failed to open C:\condor\condor-8.8.10\log/shared_port_ad: No such file or directory
05/06/24 09:53:52 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s.
StarterLog.slot1_1
05/06/24 09:56:41 (pid:3328) ******************************************************
05/06/24 09:56:41 (pid:3328) ** condor_starter (CONDOR_STARTER) STARTING UP
05/06/24 09:56:41 (pid:3328) ** C:\condor\condor-8.8.10\bin\condor_starter.exe
05/06/24 09:56:41 (pid:3328) ** SubsystemInfo: name=STARTER type=STARTER(7) class=DAEMON(1)
05/06/24 09:56:41 (pid:3328) ** Configuration: subsystem:STARTER local:<NONE> class:DAEMON
05/06/24 09:56:41 (pid:3328) ** $CondorVersion: 23.0.8 2024-04-11 BuildID: 726317 $
05/06/24 09:56:41 (pid:3328) ** $CondorPlatform: x86_64_Windows10 $
05/06/24 09:56:41 (pid:3328) ** PID = 3328
05/06/24 09:56:41 (pid:3328) ** Log last touched time unavailable (No such file or directory)
05/06/24 09:56:41 (pid:3328) ******************************************************
05/06/24 09:56:41 (pid:3328) Using config source: C:\condor\condor-8.8.10\condor_config
05/06/24 09:56:41 (pid:3328) Using local config sources:
05/06/24 09:56:41 (pid:3328)ÂÂÂ condor_urlfetch -STARTER http://myDomain:8080/data/htcondor/config/condor_config.my_executeMachine C:\condor\condor-8.8.10\condor_config.url_cache |
05/06/24 09:56:41 (pid:3328) config Macros = 66, Sorted = 65, StringBytes = 2237, TablesBytes = 2424
05/06/24 09:56:41 (pid:3328) CLASSAD_CACHING is OFF
05/06/24 09:56:41 (pid:3328) Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS
05/06/24 09:56:41 (pid:3328) SharedPortEndpoint: listener already created.
05/06/24 09:56:41 (pid:3328) SharedPortEndpoint: failed to open C:\condor\condor-8.8.10\log/shared_port_ad: No such file or directory
05/06/24 09:56:41 (pid:3328) SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s.
05/06/24 09:56:41 (pid:3328) DaemonCore: private command socket at <y.y.y.y:0?alias=My_executeMachine&sock=slot1_1_16252_9eec_5>
05/06/24 09:56:41 (pid:3328) Communicating with shadow <x.x.x.x:9618?addrs=x.x.x.x-9618&alias=my_centralManager_submitMachine&noUDP&sock=shadow_95398_4560_248>
05/06/24 09:56:41 (pid:3328) Submitting machine is "My_centralManager_submitMachine"
05/06/24 09:56:41 (pid:3328) setting the orig job name in starter
05/06/24 09:56:41 (pid:3328) setting the orig job iwd in starter
05/06/24 09:56:41 (pid:3328) Chirp config summary: IO false, Updates false, Delayed updates true.
05/06/24 09:56:41 (pid:3328) Initialized IO Proxy.
05/06/24 09:56:41 (pid:3328) Setting resource limits not implemented!
05/06/24 09:56:41 (pid:3328) Set filetransfer runtime ads to C:\condor\condor-8.8.10\execute\dir_3328\.job.ad and C:\condor\condor-8.8.10\execute\dir_3328\.machine.ad.
05/06/24 09:56:41 (pid:3328) File transfer completed successfully.
05/06/24 09:56:41 (pid:3328) Job 67.0 set to execute immediately
05/06/24 09:56:41 (pid:3328) Starting a JAVA universe job with ID: 67.0
05/06/24 09:56:41 (pid:3328) JavaProc::StartJob could not stat jar file C:\condor\condor-8.8.10\execute\dir_3328\transform-core.jar: errno 2
05/06/24 09:56:41 (pid:3328) JavaProc::StartJob could not stat jar file C:\condor\condor-8.8.10\execute\dir_3328\jvm-locator.jar: errno 2
05/06/24 09:56:41 (pid:3328) JavaProc: Cmd=C:\PROGRA~1\Java\jre1.8.0_101\bin\java.exe
05/06/24 09:56:41 (pid:3328) JavaProc: Args=-classpath C:\condor\condor-8.8.10\bin;.;C:\condor\condor-8.8.10\execute\dir_3328\transform-core.jar;C:\condor\condor-8.8.10\execute\dir_3328\jvm-locator.jar -Dchirp.config=C:\condor\condor-8.8.10\execute\dir_3328\chirp.config CondorJavaWrapper C:\condor\condor-8.8.10\execute\dir_3328\jvm.start C:\condor\condor-8.8.10\execute\dir_3328\jvm.end ca.qc.banq.dnum.JVMLocator transform-core.jar ca.qc.banq.dnum.Launcher ca.qc.banq.dnum.transform.cmd.ImagickConversion -pathInput N:\TraitementNumerique\ZONE51\Sady_Doucoure\condor_test\input -pathOutput N:\TraitementNumerique\ZONE51\Sady_Doucoure\condor_test\output -cmd ${INPUT}[0]' '-rotate' '180' '${OUTPUT} -fileInput 82238_01a_0010.tif
05/06/24 09:56:41 (pid:3328) Tracking process family by login "condor-slot1_1"
05/06/24 09:56:41 (pid:3328) IWD: C:\condor\condor-8.8.10\execute\dir_3328
05/06/24 09:56:41 (pid:3328) Output file: C:\condor\condor-8.8.10\execute\dir_3328\_condor_stdout
05/06/24 09:56:41 (pid:3328) Error file: C:\condor\condor-8.8.10\execute\dir_3328\_condor_stderr
05/06/24 09:56:41 (pid:3328) Renice expr "10" evaluated to 10
05/06/24 09:56:41 (pid:3328) Running job as user condor-slot1_1
05/06/24 09:56:41 (pid:3328) About to exec C:\PROGRA~1\Java\jre1.8.0_101\bin\java.exe -classpath C:\condor\condor-8.8.10\bin;.;C:\condor\condor-8.8.10\execute\dir_3328\transform-core.jar;C:\condor\condor-8.8.10\execute\dir_3328\jvm-locator.jar -Dchirp.config=C:\condor\condor-8.8.10\execute\dir_3328\chirp.config CondorJavaWrapper C:\condor\condor-8.8.10\execute\dir_3328\jvm.start C:\condor\condor-8.8.10\execute\dir_3328\jvm.end ca.qc.banq.dnum.JVMLocator transform-core.jar ca.qc.banq.dnum.Launcher ca.qc.banq.dnum.transform.cmd.ImagickConversion -pathInput N:\TraitementNumerique\ZONE51\Sady_Doucoure\condor_test\input -pathOutput N:\TraitementNumerique\ZONE51\Sady_Doucoure\condor_test\output -cmd ${INPUT}[0]' '-rotate' '180' '${OUTPUT} -fileInput 82238_01a_0010.tif
05/06/24 09:56:41 (pid:3328) Create_Process succeeded, pid=15584
05/06/24 09:56:41 (pid:3328) Process exited, pid=15584, status=0
05/06/24 09:56:41 (pid:3328) JavaProc: JVM pid 15584 has finished
05/06/24 09:56:41 (pid:3328) JavaProc: JVM exited normally with code 0
05/06/24 09:56:41 (pid:3328) JavaProc: Wrapper left start record C:\condor\condor-8.8.10\execute\dir_3328\jvm.start
05/06/24 09:56:41 (pid:3328) JavaProc: Wrapper left end record C:\condor\condor-8.8.10\execute\dir_3328\jvm.end
05/06/24 09:56:41 (pid:3328) JavaProc: Job could not be executed
05/06/24 09:56:41 (pid:3328) JavaProc: unlinking C:\condor\condor-8.8.10\execute\dir_3328\jvm.start and C:\condor\condor-8.8.10\execute\dir_3328\jvm.end
05/06/24 09:56:41 (pid:3328) JavaProc: ExceptionHierarchy "java.lang.Object java.lang.Throwable java.lang.Exception java.lang.ReflectiveOperationException java.lang.ClassNotFoundException "
05/06/24 09:56:41 (pid:3328) JavaProc: ExceptionName "java.lang.ClassNotFoundException"
05/06/24 09:56:41 (pid:3328) JavaProc: ExceptionType "java.lang.Exception"
05/06/24 09:56:41 (pid:3328) JavaProc: ExceptionHierarchy "java.lang.Object java.lang.Throwable java.lang.Exception java.lang.ReflectiveOperationException java.lang.ClassNotFoundException "
05/06/24 09:56:41 (pid:3328) JavaProc: ExceptionName "java.lang.ClassNotFoundException"
05/06/24 09:56:41 (pid:3328) JavaProc: ExceptionType "java.lang.Exception"
05/06/24 09:56:41 (pid:3328) JavaProc: ExceptionHierarchy "java.lang.Object java.lang.Throwable java.lang.Exception java.lang.ReflectiveOperationException java.lang.ClassNotFoundException "
05/06/24 09:56:41 (pid:3328) JavaProc: ExceptionName "java.lang.ClassNotFoundException"
05/06/24 09:56:41 (pid:3328) JavaProc: ExceptionType "java.lang.Exception"
05/06/24 09:56:41 (pid:3328) Failed to open '.update.ad' to read update ad: No such file or directory (2).
05/06/24 09:56:41 (pid:3328) Failed to open '.update.ad' to read update ad: No such file or directory (2).
05/06/24 09:56:41 (pid:3328) All jobs have exited... starter exiting
05/06/24 09:56:41 (pid:3328) **** condor_starter (condor_STARTER) pid 3328 EXITING WITH STATUS 0
Â
**Legends:**
- x.x.x.x is the IP address of the Redhat server which is the central manager and the submit machine.
- y.y.y.y is the IP address of my Windows 10 machine which is the execution machine.
Here's the translation of your closing remarks into English:
---
Any help will be greatly appreciated, and I thank you in advance.
Sincerely,
Sady
Attachment:
HTcondor doc.docx
Description: application/vnd.openxmlformats-officedocument.wordprocessingml.document