[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor and KVM: cannot connect to qemu:///session



Ryan - 

If you could enable D_FULLDEBUG for VM_GAHP and send the log that would
greatly help in diagnosing the possible root cause of the failure. 

Cheers,
Tim

On Wed, 2011-01-12 at 15:35 -0500, Ryan Jansen wrote:
> Hi Tim,
> 
> Sorry for emailing you again, but I am still running into the same
> problem, and I can't get it sorted out. I've played with permissions,
> and Condor is, in fact, being started as root, so I'm not sure where
> to go from there.
> 
> One of the details that I suspect may be causing problems might be
> that our user home directories are all stored in AFS. So, if I am
> running on a machine as root and I su into a user, I cannot access
> their home directory, because I do not have the AFS credentials to do
> so.
> 
> To try and get around this, I gave public permissions to the .libvirt
> folder in my user's home directory, but it still failed. In addition,
> I tried creating a user with a local home directory at one point and
> using that, but it still didn't work.
> 
> Do you know exactly what directories/files Condor has to be able to
> write to in the user's home directory to be able to start up a virtual
> machine? Does Condor define the XML file for the VM in the user's home
> directory? Or does it get defined in /etc/libvirt?
> 
> Seeing as KVM and libvirt appear to be working on their own, I'd have
> to agree with you that it's most likely some sort of permissions
> issue. Any other ideas as to how to go about troubleshooting?
> 
> Thanks,
> Ryan Jansen
> 
> On Wed, Dec 15, 2010 at 9:14 AM, Timothy St. Clair
> <tstclair@xxxxxxxxxx> wrote:
>         
>         
>         On Tue, 2010-12-14 at 15:59 -0500, Ryan Jansen wrote:
>         > Seeing as user rjansen didn't have permission to read from
>         his home
>         > directory in AFS, I changed the permissions to allow access.
>         Now, I
>         > can run virsh and connect to qemu:///session as rjansen
>         without AFS
>         > credentials. I'm able to list his machines.
>         >
>         > Condor, however, gives me the same error. Does a vm universe
>         job user
>         > need anything other than an accessible home directory to
>         start a
>         > virtual machine under Condor? Any ideas as to what the
>         problem is?
>         >
>         > Also, does the condor_vm-gahp run as root
>         
>         
>         It needs(condor) to be started as superuser and will
>         periodically
>         elevate privs on certain function calls.
>         
>         > , but then switch to the job user to create and start up a
>         virtual
>         > machine? Is it possible to make Condor just connect to
>         qemu:///system
>         > as root (or even as the user)?
>         
>         
>         e.g. - my condor_config.local
>         
>         ALWAYS_VM_UNIV_USE_NOBODY = TRUE
>         VM_UNIV_NOBODY_USER = tstclair
>         
>         Then I have a script which starts with elevated privs:
>         
>         sudo env PATH=$PATH CONDOR_CONFIG=$CONDOR_CONFIG condor_master
>         
>         
>         Hope this helps,
>         Tim
>         
>         
>         >
>         > Any insight would be very much appreciated.
>         >
>         > Thanks,
>         > Ryan
>         >
>         > On Mon, Dec 13, 2010 at 3:48 PM, Ryan Jansen
>         <rjansen@xxxxxx> wrote:
>         >         Tim,
>         >
>         >         I currently have two VMs defined on the host
>         machine,
>         >         ryankvm01, which is defined under qemu:///system for
>         user
>         >         rjansen, and ryankvm02, which is defined under
>         qemu:///system
>         >         for root.
>         >
>         >         Running as root, I can connect to both URIs and run
>         VMs under
>         >         each:
>         >
>         >         # virsh -c qemu:///system list --all
>         >          Id Name                 State
>         >         ----------------------------------
>         >           - ryankvm02            shut off
>         >
>         >         virsh -c qemu:///session list --all
>         >          Id Name                 State
>         >         ----------------------------------
>         >           - ryankvm02            shut off
>         >
>         >
>         >         Logging in as rjansen and running virsh, I can do
>         the same
>         >         thing (although it shows ryankvm01 under
>         qemu:///session).
>         >
>         >         However (and I think this may be part of the
>         problem), when I
>         >         just su into user rjansen from root, I don't have
>         the
>         >         appropriate AFS tokens to access rjansen's home. As
>         such,
>         >         libvirt gives me an error:
>         >
>         >         # su rjansen
>         >         # virsh -c qemu:///session list --all
>         >         libvir: Network Config error : Failed to open dir
>         >
>         '/afs/crc.nd.edu/user/r/rjansen/.libvirt/qemu/networks':
>         >         Permission denied
>         >         Failed to open dir
>         >         '/afs/crc.nd.edu/user/r/rjansen/.libvirt/storage':
>         Permission
>         >         deniedlibvir: Domain Config error : Failed to open
>         dir
>         >         '/afs/crc.nd.edu/user/r/rjansen/.libvirt/qemu':
>         Permission
>         >         denied
>         >         libvir: error : could not connect to qemu:///session
>         >         error: could not connect to qemu:///session
>         >         error: failed to connect to the hypervisor
>         >
>         >
>         >         Following up on the error with rjansen, I also tried
>         creating
>         >         a local user with a home directory on the host
>         machine,
>         >         condor_vm. condor_vm can use virsh, but Condor still
>         gives me
>         >         the same error when I try to run the job with
>         condor_vm set as
>         >         the nobody user. condor_vm can connect just fine
>         with a simple
>         >         su.
>         >
>         >         # su condor_vm
>         >         # virsh -c qemu:///session list --all
>         >          Id Name                 State
>         >         ----------------------------------
>         >
>         >         The error with rjansen seems to be on the right
>         track, but I
>         >         don't understand why the condor_vm user didn't work.
>         Any
>         >         ideas?
>         >
>         >         Thanks,
>         >         Ryan
>         >
>         >
>         >
>         >
>         >         On Mon, Dec 13, 2010 at 3:08 PM, Timothy St. Clair
>         >         <tstclair@xxxxxxxxxx> wrote:
>         >                 What happens when you try to open via virsh?
>         >
>         >
>         >                 On Mon, 2010-12-13 at 13:55 -0500, Ryan
>         Jansen wrote:
>         >                 > Tim,
>         >                 >
>         >                 > That's what I suspected at first, but it
>         looks like
>         >                 the vm-gahp is
>         >                 > running as root. Here's the vm-gahp log
>         with
>         >                 D_FULLDEBUG on:
>         >                 >
>         >                 > 12/13 13:41:37 Running as root.  Enabling
>         >                 specialized core dump
>         >                 > routines
>         >                 > 12/13 13:41:37 DaemonCore: Command Socket
>         at
>         >                 <10.32.72.74:9077>
>         >                 > 12/13 13:41:37 Will use UDP to update
>         collector
>         >                 cclweb00.cse.nd.edu
>         >                 > <129.74.152.166:9618>
>         >                 > 12/13 13:41:37 VMGAHP[916]: VM-GAHP
>         initialized with
>         >                 run-mode 3
>         >                 > 12/13 13:41:37 VMGAHP[916]: Initial
>         UID/GUID=0/0,
>         >                 > EUID/EGUID=126019/1313, Condor
>         UID/GID=108172,40
>         >                 > 12/13 13:41:37 VMGAHP[916]: Initialize
>         Uids:
>         >                 caller=root, job
>         >                 > user=rjansen
>         >                 > 12/13 13:41:37 VMGAHP[916]: Constructed
>         VMGahp
>         >                 > 12/13 13:41:37 VMGAHP[916]: Command:
>         COMMANDS
>         >                 > 12/13 13:41:38 VMGAHP[916]: Command:
>         SUPPORT_VMS
>         >                 > 12/13 13:41:38 VMGAHP[916]: Execute
>         commands: S xen
>         >                 kvm vmware
>         >                 > 12/13 13:41:39 VMGAHP[916]: Command:
>         ASYNC_MODE_ON
>         >                 > 12/13 13:41:40 VMGAHP[916]: Command:
>         CLASSAD
>         >                 > 12/13 13:41:43 VMGAHP[916]: Command:
>         CONDOR_VM_START
>         >                 > 12/13 13:41:43 VMGAHP[916]: Constructed
>         VM_Type.
>         >                 > 12/13 13:41:43 ERROR "Failed to create
>         libvirt
>         >                 connection: could not
>         >                 > connect to qemu:///session" at line 989 in
>         file
>         >                 xen_type.cpp
>         >                 >
>         >                 > Based on the log output, It appears to be
>         running as
>         >                 root, and it
>         >                 > knows that the job user is rjansen. Does
>         that look
>         >                 normal, or do you
>         >                 > still think it's most likely a permissions
>         problem?
>         >                 Is there any way
>         >                 > to get some more useful output from
>         libvirt, maybe
>         >                 explaining why it
>         >                 > couldn't connect?
>         >                 >
>         >                 > Thanks,
>         >                 > Ryan
>         >                 >
>         >                 >
>         >                 > On Mon, Dec 13, 2010 at 1:11 PM, Timothy
>         St. Clair
>         >                 > <tstclair@xxxxxxxxxx> wrote:
>         >                 >         If you can verify that your
>         libvirtd is
>         >                 running & qemu+kvm are
>         >                 >         installed
>         >                 >         properly (check via virsh command
>         prompt),
>         >                 then it is likely a
>         >                 >         permissions issue.  Condor's
>         vm-gahp
>         >                 requires it be started
>         >                 >         with
>         >                 >         elevated priv's(~root) in order to
>         >                 communicate with the
>         >                 >         libvirtd.
>         >                 >
>         >                 >         Cheers,
>         >                 >         Tim
>         >                 >
>         >                 >
>         >                 >         On Mon, 2010-12-13 at 12:01 -0500,
>         Ryan
>         >                 Jansen wrote:
>         >                 >         > Hi Tim,
>         >                 >         >
>         >                 >         > Thanks for the email and sorry
>         for taking
>         >                 so long to get
>         >                 >         back to you.
>         >                 >         >
>         >                 >         > I'm using libvirt version 0.6.3.
>         >                 >         >
>         >                 >         > Ryan
>         >                 >         >
>         >                 >         > On Wed, Dec 8, 2010 at 11:13 AM,
>         Timothy
>         >                 St. Clair
>         >                 >         > <tstclair@xxxxxxxxxx> wrote:
>         >                 >         >         what version of libvirt
>         are you
>         >                 using?
>         >                 >         >
>         >                 >         >         Cheers,
>         >                 >         >         Tim
>         >                 >         >
>         >                 >         >
>         >                 >         >         On Tue, 2010-12-07 at
>         16:36 -0500,
>         >                 Ryan Jansen
>         >                 >         wrote:
>         >                 >         >         > Hi everyone,
>         >                 >         >         >
>         >                 >         >         > I'm having a problem
>         getting
>         >                 Condor to start up a
>         >                 >         KVM
>         >                 >         >         virtual machine
>         >                 >         >         > in Condor. I posted an
>         email
>         >                 before, and with
>         >                 >         advice from a
>         >                 >         >         few
>         >                 >         >         > people, I was able to
>         sort out
>         >                 my KVM problems.
>         >                 >         But now,
>         >                 >         >         whenever I
>         >                 >         >         > run a vm universe job,
>         the
>         >                 condor_vm-gahp fails
>         >                 >         with the
>         >                 >         >         following
>         >                 >         >         > error:
>         >                 >         >         >
>         >                 >         >         > 12/07 16:18:12 **
>         condor_vm-gahp
>         >                 (CONDOR_VM_GAHP)
>         >                 >         STARTING
>         >                 >         >         UP
>         >                 >         >         > 12/07 16:18:12
>         >                 >         >         >
>         >                 >         >
>         >                 >
>         >
>         ** /afs/nd.edu/user37/condor/software/versions/amd64-redhat5/condor-7.4.2-dynamic/sbin/condor_vm-gahp
>         >                 >         >         > 12/07 16:18:12 **
>         SubsystemInfo:
>         >                 name=VM_GAHP
>         >                 >         type=GAHP(9)
>         >                 >         >         > class=DAEMON(1)
>         >                 >         >         > 12/07 16:18:12 **
>         Configuration:
>         >                 subsystem:VM_GAHP
>         >                 >         >         local:<NONE>
>         >                 >         >         > class:DAEMON
>         >                 >         >         > 12/07 16:18:12 **
>         >                 $CondorVersion: 7.4.2 Mar 29
>         >                 >         2010 BuildID:
>         >                 >         >         227044 $
>         >                 >         >         > 12/07 16:18:12 **
>         >                 $CondorPlatform:
>         >                 >         X86_64-LINUX_RHEL5 $
>         >                 >         >         > 12/07 16:18:12 ** PID
>         = 13583
>         >                 >         >         > 12/07 16:18:12 ** Log
>         last
>         >                 touched 12/7 16:18:10
>         >                 >         >         > 12/07 16:18:12
>         >                 >         >
>         >                 >
>         >
>         ******************************************************
>         >                 >         >         > 12/07 16:18:12 Using
>         config
>         >                 >         >         >
>         >
>         source: /afs/nd.edu/user37/condor/condor_config
>         >                 >         >         > 12/07 16:18:12 Using
>         local
>         >                 config sources:
>         >                 >         >         > 12/07
>         >                 >         >         > 16:18:12
>         >                 >         >
>         >                 >
>         >
>          /afs/nd.edu/user37/condor/software/config/machines/dqcneh100.local
>         >                 >         >         > 12/07 16:18:12
>         DaemonCore:
>         >                 Command Socket at
>         >                 >         >         <10.32.72.74:9118>
>         >                 >         >         > 12/07 16:18:12
>         VMGAHP[13583]:
>         >                 VM-GAHP initialized
>         >                 >         with
>         >                 >         >         run-mode 3
>         >                 >         >         > 12/07 16:18:12
>         VMGAHP[13583]:
>         >                 Initial
>         >                 >         UID/GUID=0/0,
>         >                 >         >         >
>         EUID/EGUID=126019/1313, Condor
>         >                 UID/GID=108172,40
>         >                 >         >         > 12/07 16:18:12
>         VMGAHP[13583]:
>         >                 Initialize Uids:
>         >                 >         caller=root,
>         >                 >         >         job
>         >                 >         >         > user=rjansen
>         >                 >         >         > 12/07 16:18:18 ERROR
>         "Failed to
>         >                 create libvirt
>         >                 >         connection:
>         >                 >         >         could not
>         >                 >         >         > connect to
>         qemu:///session" at
>         >                 line 989 in file
>         >                 >         xen_type.cpp
>         >                 >         >         >
>         >                 >         >         > Now, I have
>         >                 adjusted /etc/libvirt/libvirt.conf to
>         >                 >         allow the
>         >                 >         >         libvirt
>         >                 >         >         > group to access the
>         libvirt rw
>         >                 socket, and I added
>         >                 >         the users
>         >                 >         >         root,
>         >                 >         >         > rjansen, and condor to
>         that
>         >                 group.
>         >                 >         >         >
>         >                 >         >         > Additionally, I can
>         connect just
>         >                 fine (as root and
>         >                 >         rjansen)
>         >                 >         >         to
>         >                 >         >         > qemu:///session,
>         through virsh,
>         >                 and through the
>         >                 >         libvirt C
>         >                 >         >         library
>         >                 >         >         > using example code
>         from the qemu
>         >                 website. In fact,
>         >                 >         the code
>         >                 >         >         I use to
>         >                 >         >         > connect to the library
>         in the
>         >                 example program is
>         >                 >         essentially
>         >                 >         >         the same
>         >                 >         >         > as the code on line
>         989 in
>         >                 xen_type.cpp, which is
>         >                 >         failing.
>         >                 >         >         >
>         >                 >         >         > I'm not sure if I'm
>         doing
>         >                 something wrong with
>         >                 >         Condor or
>         >                 >         >         something
>         >                 >         >         > wrong with
>         KVM/libvirt, but I'd
>         >                 like to get this
>         >                 >         working.
>         >                 >         >         >
>         >                 >         >         > Does anyone have any
>         ideas on
>         >                 how to fix this
>         >                 >         problem?
>         >                 >         >         >
>         >                 >         >         > Thanks,
>         >                 >         >         > Ryan
>         >                 >         >
>         >                 >         >         >
>         >
>         _______________________________________________
>         >                 >         >         > Condor-users mailing
>         list
>         >                 >         >         > To unsubscribe, send a
>         message
>         >                 to
>         >                 >         >
>         condor-users-request@xxxxxxxxxxx
>         >                 with a
>         >                 >         >         > subject: Unsubscribe
>         >                 >         >         > You can also
>         unsubscribe by
>         >                 visiting
>         >                 >         >         >
>         >                 >
>         >
>         https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>         >                 >         >         >
>         >                 >         >         > The archives can be
>         found at:
>         >                 >         >         >
>         >
>         https://lists.cs.wisc.edu/archive/condor-users/
>         >                 >         >
>         >                 >         >
>         >
>         _______________________________________________
>         >                 >         >         Condor-users mailing
>         list
>         >                 >         >         To unsubscribe, send a
>         message to
>         >                 >         >
>         condor-users-request@xxxxxxxxxxx
>         >                 with a
>         >                 >         >         subject: Unsubscribe
>         >                 >         >         You can also unsubscribe
>         by
>         >                 visiting
>         >                 >         >
>         >                 >
>         >
>         https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>         >                 >         >
>         >                 >         >         The archives can be
>         found at:
>         >                 >         >
>         >
>         https://lists.cs.wisc.edu/archive/condor-users/
>         >                 >         >
>         >                 >
>         >                 >
>         >                 >
>         >
>         >
>         >
>         >
>         >
>         
>         
>