How about TRUST_UID_DOMAIN = TRUE?
Best,
matt
> <mailto:geniejhang@xxxxxxxxxxx>>
On 01/13/2010 12:49 PM, Genie Jhang wrote:
> Is there anyone to help me.
>
> Our machines still don't work except central machine.
>
> central machine submits jobs to clients, but clients cannot work.
>
> What file do you need to find out what is the problem?
>
> Please help me.
>
> 2010/1/13 Genie Jhang <geniejhang@xxxxxxxxxxx
>> <http://192.168.0.105:33682/>>
> Thanks for your reply, Dan.
>
> As you said, I changed permission of the directory,
> /home/condor/execute, on all machines to 777.
>
> And I don't use NFS.
>
> Now, i'm getting this kind of error.
>
> --------------------------------------------------
>
> 022 (218.000.000) 01/13 18:03:15 Job disconnected, attempting to
> reconnect
> Socket between submit and execute hosts closed unexpectedly
> Trying to reconnect to slot3@pheko05 <192.168.0.105:33682
> ...> 2010/1/13 Dan Bradley <dan@xxxxxxxxxxxx <mailto:dan@xxxxxxxxxxxx>>
> 024 (218.000.000) 01/13 18:03:15 Job reconnection failed
> Job not found at execution machine
> Can not reconnect to slot3@pheko05, rescheduling job
>
> -------------------------------------------------------------
>
> I set
> UID_DOMAIN = 192.168.0.109
> FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)
> USE_NFS = False
> SOFT_UID_DOMAIN = TRUE.
>
>
>
> <192.168.0.105:33714 <http://192.168.0.105:33714/>>
> Genie,
>
> Is your condor execute directory on NFS with root squashing?
> The following line is what makes me guess that it might be:
>
>
> 01/13 06:32:30 get_file(): Failed to open file
> /home/condor/execute/dir_22496/condor_exec.exe, errno = 13:
> Permission denied.
>
>
> If EXECUTE is on a NFS mount with root squashing, then it needs
> to be world-writable.
>
> --Dan
>
>
> Genie Jhang wrote:
>
> Hello, again.
> Thanks to all of you, I succeed to run and to connect all
> the machines our lab have.
> But, when I finally tried to submit jobs to machines, I
> found that all the other machines except central manager
> doesn't work!!
> and I dug the log files.
> Here's the log.
> ----------------------------------------------------------------------------------------------------------------------------------
> 01/13 06:32:30
> ******************************************************
> 01/13 06:32:30 ** condor_starter (CONDOR_STARTER) STARTING UP
> 01/13 06:32:30 ** /condor/sbin/condor_starter
> 01/13 06:32:30 ** SubsystemInfo: name=STARTER
> type=STARTER(8) class=DAEMON(1)
> 01/13 06:32:30 ** Configuration: subsystem:STARTER
> local:<NONE> class:DAEMON
> 01/13 06:32:30 ** $CondorVersion: 7.4.1 Dec 17 2009 BuildID:
> 204351 $
> 01/13 06:32:30 ** $CondorPlatform: I386-LINUX_RHEL3 $
> 01/13 06:32:30 ** PID = 22496
> 01/13 06:32:30 ** Log last touched time unavailable (No such
> file or directory)
> 01/13 06:32:30
> ******************************************************
> 01/13 06:32:30 Using config source: /condor/etc/condor_config
> 01/13 06:32:30 Using local config sources:
> 01/13 06:32:30 /home/condor/condor_config.local
> 01/13 06:32:30 DaemonCore: Command Socket at
> <http://192.168.0.105:33714 <http://192.168.0.105:33714/>>>
>> <192.168.0.109:55237 <http://192.168.0.109:55237/>
> 01/13 06:32:30 Done setting resource limits
> 01/13 06:32:30 Communicating with shadow
> <http://192.168.0.109:55237 <http://192.168.0.109:55237/>>>
> <mailto:condor-users-request@xxxxxxxxxxx> with a>
> 01/13 06:32:30 Submitting machine is "pheko09"
> 01/13 06:32:30 setting the orig job name in starter
> 01/13 06:32:30 setting the orig job iwd in starter
> 01/13 06:32:30 get_file(): Failed to open file
> /home/condor/execute/dir_22496/condor_exec.exe, errno = 13:
> Permission denied.
> 01/13 06:32:30 get_file(): consumed 28023 bytes of file
> transmission
> 01/13 06:32:30 DoDownload: consuming rest of transfer and
> failing after encountering the following error: STARTER at
> 192.168.0.105 failed to write to file
> /home/condor/execute/dir_22496/condor_exec.exe: (errno 13)
> Permission denied
> 01/13 06:32:30 WARNING: File
> /home/condor/execute/dir_22496/condor_exec.exe can not be
> accessed by Quill file transfer tracking.
> 01/13 06:32:30 File transfer failed (status=0).
> 01/13 06:32:30 ERROR "Failed to transfer files" at line 1882
> in file jic_shadow.cpp
> 01/13 06:32:30 ShutdownFast all jobs.
> ------------------------------------------------------------------------------------------------------------------------------------
> What on the earth is the problem?
> I set ALLOW_WRITE = * in condor_config file of all the
> machines.
> ------------------------------------------------------------------------
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to
> condor-users-request@xxxxxxxxxxx
> subject: Unsubscribe> <mailto:condor-users-request@xxxxxxxxxxx> with a
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to
> condor-users-request@xxxxxxxxxxx
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>
>
>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/