[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor and KVM: cannot connect to qemu:///session



Hi Tim,

Sorry for emailing you again, but I am still running into the same problem, and I can't get it sorted out. I've played with permissions, and Condor is, in fact, being started as root, so I'm not sure where to go from there.

One of the details that I suspect may be causing problems might be that our user home directories are all stored in AFS. So, if I am running on a machine as root and I su into a user, I cannot access their home directory, because I do not have the AFS credentials to do so.

To try and get around this, I gave public permissions to the .libvirt folder in my user's home directory, but it still failed. In addition, I tried creating a user with a local home directory at one point and using that, but it still didn't work.

Do you know exactly what directories/files Condor has to be able to write to in the user's home directory to be able to start up a virtual machine? Does Condor define the XML file for the VM in the user's home directory? Or does it get defined in /etc/libvirt?

Seeing as KVM and libvirt appear to be working on their own, I'd have to agree with you that it's most likely some sort of permissions issue. Any other ideas as to how to go about troubleshooting?

Thanks,
Ryan Jansen

On Wed, Dec 15, 2010 at 9:14 AM, Timothy St. Clair <tstclair@xxxxxxxxxx> wrote:


On Tue, 2010-12-14 at 15:59 -0500, Ryan Jansen wrote:
> Seeing as user rjansen didn't have permission to read from his home
> directory in AFS, I changed the permissions to allow access. Now, I
> can run virsh and connect to qemu:///session as rjansen without AFS
> credentials. I'm able to list his machines.
>
> Condor, however, gives me the same error. Does a vm universe job user
> need anything other than an accessible home directory to start a
> virtual machine under Condor? Any ideas as to what the problem is?
>
> Also, does the condor_vm-gahp run as root

It needs(condor) to be started as superuser and will periodically
elevate privs on certain function calls.

> , but then switch to the job user to create and start up a virtual
> machine? Is it possible to make Condor just connect to qemu:///system
> as root (or even as the user)?

e.g. - my condor_config.local

ALWAYS_VM_UNIV_USE_NOBODY = TRUE
VM_UNIV_NOBODY_USER = tstclair

Then I have a script which starts with elevated privs:

sudo env PATH=$PATH CONDOR_CONFIG=$CONDOR_CONFIG condor_master


Hope this helps,
Tim

>
> Any insight would be very much appreciated.
>
> Thanks,
> Ryan
>
> On Mon, Dec 13, 2010 at 3:48 PM, Ryan Jansen <rjansen@xxxxxx> wrote:
>         Tim,
>
>         I currently have two VMs defined on the host machine,
>         ryankvm01, which is defined under qemu:///system for user
>         rjansen, and ryankvm02, which is defined under qemu:///system
>         for root.
>
>         Running as root, I can connect to both URIs and run VMs under
>         each:
>
>         # virsh -c qemu:///system list --all
>          Id Name                 State
>         ----------------------------------
>           - ryankvm02            shut off
>
>         virsh -c qemu:///session list --all
>          Id Name                 State
>         ----------------------------------
>           - ryankvm02            shut off
>
>
>         Logging in as rjansen and running virsh, I can do the same
>         thing (although it shows ryankvm01 under qemu:///session).
>
>         However (and I think this may be part of the problem), when I
>         just su into user rjansen from root, I don't have the
>         appropriate AFS tokens to access rjansen's home. As such,
>         libvirt gives me an error:
>
>         # su rjansen
>         # virsh -c qemu:///session list --all
>         libvir: Network Config error : Failed to open dir
>         '/afs/crc.nd.edu/user/r/rjansen/.libvirt/qemu/networks':
>         Permission denied
>         Failed to open dir
>         '/afs/crc.nd.edu/user/r/rjansen/.libvirt/storage': Permission
>         deniedlibvir: Domain Config error : Failed to open dir
>         '/afs/crc.nd.edu/user/r/rjansen/.libvirt/qemu': Permission
>         denied
>         libvir: error : could not connect to qemu:///session
>         error: could not connect to qemu:///session
>         error: failed to connect to the hypervisor
>
>
>         Following up on the error with rjansen, I also tried creating
>         a local user with a home directory on the host machine,
>         condor_vm. condor_vm can use virsh, but Condor still gives me
>         the same error when I try to run the job with condor_vm set as
>         the nobody user. condor_vm can connect just fine with a simple
>         su.
>
>         # su condor_vm
>         # virsh -c qemu:///session list --all
>          Id Name                 State
>         ----------------------------------
>
>         The error with rjansen seems to be on the right track, but I
>         don't understand why the condor_vm user didn't work. Any
>         ideas?
>
>         Thanks,
>         Ryan
>
>
>
>
>         On Mon, Dec 13, 2010 at 3:08 PM, Timothy St. Clair
>         <tstclair@xxxxxxxxxx> wrote:
>                 What happens when you try to open via virsh?
>
>
>                 On Mon, 2010-12-13 at 13:55 -0500, Ryan Jansen wrote:
>                 > Tim,
>                 >
>                 > That's what I suspected at first, but it looks like
>                 the vm-gahp is
>                 > running as root. Here's the vm-gahp log with
>                 D_FULLDEBUG on:
>                 >
>                 > 12/13 13:41:37 Running as root.  Enabling
>                 specialized core dump
>                 > routines
>                 > 12/13 13:41:37 DaemonCore: Command Socket at
>                 <10.32.72.74:9077>
>                 > 12/13 13:41:37 Will use UDP to update collector
>                 cclweb00.cse.nd.edu
>                 > <129.74.152.166:9618>
>                 > 12/13 13:41:37 VMGAHP[916]: VM-GAHP initialized with
>                 run-mode 3
>                 > 12/13 13:41:37 VMGAHP[916]: Initial UID/GUID=0/0,
>                 > EUID/EGUID=126019/1313, Condor UID/GID=108172,40
>                 > 12/13 13:41:37 VMGAHP[916]: Initialize Uids:
>                 caller=root, job
>                 > user=rjansen
>                 > 12/13 13:41:37 VMGAHP[916]: Constructed VMGahp
>                 > 12/13 13:41:37 VMGAHP[916]: Command: COMMANDS
>                 > 12/13 13:41:38 VMGAHP[916]: Command: SUPPORT_VMS
>                 > 12/13 13:41:38 VMGAHP[916]: Execute commands: S xen
>                 kvm vmware
>                 > 12/13 13:41:39 VMGAHP[916]: Command: ASYNC_MODE_ON
>                 > 12/13 13:41:40 VMGAHP[916]: Command: CLASSAD
>                 > 12/13 13:41:43 VMGAHP[916]: Command: CONDOR_VM_START
>                 > 12/13 13:41:43 VMGAHP[916]: Constructed VM_Type.
>                 > 12/13 13:41:43 ERROR "Failed to create libvirt
>                 connection: could not
>                 > connect to qemu:///session" at line 989 in file
>                 xen_type.cpp
>                 >
>                 > Based on the log output, It appears to be running as
>                 root, and it
>                 > knows that the job user is rjansen. Does that look
>                 normal, or do you
>                 > still think it's most likely a permissions problem?
>                 Is there any way
>                 > to get some more useful output from libvirt, maybe
>                 explaining why it
>                 > couldn't connect?
>                 >
>                 > Thanks,
>                 > Ryan
>                 >
>                 >
>                 > On Mon, Dec 13, 2010 at 1:11 PM, Timothy St. Clair
>                 > <tstclair@xxxxxxxxxx> wrote:
>                 >         If you can verify that your libvirtd is
>                 running & qemu+kvm are
>                 >         installed
>                 >         properly (check via virsh command prompt),
>                 then it is likely a
>                 >         permissions issue.  Condor's vm-gahp
>                 requires it be started
>                 >         with
>                 >         elevated priv's(~root) in order to
>                 communicate with the
>                 >         libvirtd.
>                 >
>                 >         Cheers,
>                 >         Tim
>                 >
>                 >
>                 >         On Mon, 2010-12-13 at 12:01 -0500, Ryan
>                 Jansen wrote:
>                 >         > Hi Tim,
>                 >         >
>                 >         > Thanks for the email and sorry for taking
>                 so long to get
>                 >         back to you.
>                 >         >
>                 >         > I'm using libvirt version 0.6.3.
>                 >         >
>                 >         > Ryan
>                 >         >
>                 >         > On Wed, Dec 8, 2010 at 11:13 AM, Timothy
>                 St. Clair
>                 >         > <tstclair@xxxxxxxxxx> wrote:
>                 >         >         what version of libvirt are you
>                 using?
>                 >         >
>                 >         >         Cheers,
>                 >         >         Tim
>                 >         >
>                 >         >
>                 >         >         On Tue, 2010-12-07 at 16:36 -0500,
>                 Ryan Jansen
>                 >         wrote:
>                 >         >         > Hi everyone,
>                 >         >         >
>                 >         >         > I'm having a problem getting
>                 Condor to start up a
>                 >         KVM
>                 >         >         virtual machine
>                 >         >         > in Condor. I posted an email
>                 before, and with
>                 >         advice from a
>                 >         >         few
>                 >         >         > people, I was able to sort out
>                 my KVM problems.
>                 >         But now,
>                 >         >         whenever I
>                 >         >         > run a vm universe job, the
>                 condor_vm-gahp fails
>                 >         with the
>                 >         >         following
>                 >         >         > error:
>                 >         >         >
>                 >         >         > 12/07 16:18:12 ** condor_vm-gahp
>                 (CONDOR_VM_GAHP)
>                 >         STARTING
>                 >         >         UP
>                 >         >         > 12/07 16:18:12
>                 >         >         >
>                 >         >
>                 >
>                 ** /afs/nd.edu/user37/condor/software/versions/amd64-redhat5/condor-7.4.2-dynamic/sbin/condor_vm-gahp
>                 >         >         > 12/07 16:18:12 ** SubsystemInfo:
>                 name=VM_GAHP
>                 >         type=GAHP(9)
>                 >         >         > class=DAEMON(1)
>                 >         >         > 12/07 16:18:12 ** Configuration:
>                 subsystem:VM_GAHP
>                 >         >         local:<NONE>
>                 >         >         > class:DAEMON
>                 >         >         > 12/07 16:18:12 **
>                 $CondorVersion: 7.4.2 Mar 29
>                 >         2010 BuildID:
>                 >         >         227044 $
>                 >         >         > 12/07 16:18:12 **
>                 $CondorPlatform:
>                 >         X86_64-LINUX_RHEL5 $
>                 >         >         > 12/07 16:18:12 ** PID = 13583
>                 >         >         > 12/07 16:18:12 ** Log last
>                 touched 12/7 16:18:10
>                 >         >         > 12/07 16:18:12
>                 >         >
>                 >
>                 ******************************************************
>                 >         >         > 12/07 16:18:12 Using config
>                 >         >         >
>                 source: /afs/nd.edu/user37/condor/condor_config
>                 >         >         > 12/07 16:18:12 Using local
>                 config sources:
>                 >         >         > 12/07
>                 >         >         > 16:18:12
>                 >         >
>                 >
>                  /afs/nd.edu/user37/condor/software/config/machines/dqcneh100.local
>                 >         >         > 12/07 16:18:12 DaemonCore:
>                 Command Socket at
>                 >         >         <10.32.72.74:9118>
>                 >         >         > 12/07 16:18:12 VMGAHP[13583]:
>                 VM-GAHP initialized
>                 >         with
>                 >         >         run-mode 3
>                 >         >         > 12/07 16:18:12 VMGAHP[13583]:
>                 Initial
>                 >         UID/GUID=0/0,
>                 >         >         > EUID/EGUID=126019/1313, Condor
>                 UID/GID=108172,40
>                 >         >         > 12/07 16:18:12 VMGAHP[13583]:
>                 Initialize Uids:
>                 >         caller=root,
>                 >         >         job
>                 >         >         > user=rjansen
>                 >         >         > 12/07 16:18:18 ERROR "Failed to
>                 create libvirt
>                 >         connection:
>                 >         >         could not
>                 >         >         > connect to qemu:///session" at
>                 line 989 in file
>                 >         xen_type.cpp
>                 >         >         >
>                 >         >         > Now, I have
>                 adjusted /etc/libvirt/libvirt.conf to
>                 >         allow the
>                 >         >         libvirt
>                 >         >         > group to access the libvirt rw
>                 socket, and I added
>                 >         the users
>                 >         >         root,
>                 >         >         > rjansen, and condor to that
>                 group.
>                 >         >         >
>                 >         >         > Additionally, I can connect just
>                 fine (as root and
>                 >         rjansen)
>                 >         >         to
>                 >         >         > qemu:///session, through virsh,
>                 and through the
>                 >         libvirt C
>                 >         >         library
>                 >         >         > using example code from the qemu
>                 website. In fact,
>                 >         the code
>                 >         >         I use to
>                 >         >         > connect to the library in the
>                 example program is
>                 >         essentially
>                 >         >         the same
>                 >         >         > as the code on line 989 in
>                 xen_type.cpp, which is
>                 >         failing.
>                 >         >         >
>                 >         >         > I'm not sure if I'm doing
>                 something wrong with
>                 >         Condor or
>                 >         >         something
>                 >         >         > wrong with KVM/libvirt, but I'd
>                 like to get this
>                 >         working.
>                 >         >         >
>                 >         >         > Does anyone have any ideas on
>                 how to fix this
>                 >         problem?
>                 >         >         >
>                 >         >         > Thanks,
>                 >         >         > Ryan
>                 >         >
>                 >         >         >
>                 _______________________________________________
>                 >         >         > Condor-users mailing list
>                 >         >         > To unsubscribe, send a message
>                 to
>                 >         >         condor-users-request@xxxxxxxxxxx
>                 with a
>                 >         >         > subject: Unsubscribe
>                 >         >         > You can also unsubscribe by
>                 visiting
>                 >         >         >
>                 >
>                 https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>                 >         >         >
>                 >         >         > The archives can be found at:
>                 >         >         >
>                 https://lists.cs.wisc.edu/archive/condor-users/
>                 >         >
>                 >         >
>                 _______________________________________________
>                 >         >         Condor-users mailing list
>                 >         >         To unsubscribe, send a message to
>                 >         >         condor-users-request@xxxxxxxxxxx
>                 with a
>                 >         >         subject: Unsubscribe
>                 >         >         You can also unsubscribe by
>                 visiting
>                 >         >
>                 >
>                 https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>                 >         >
>                 >         >         The archives can be found at:
>                 >         >
>                 https://lists.cs.wisc.edu/archive/condor-users/
>                 >         >
>                 >
>                 >
>                 >
>
>
>
>
>