Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Condor and KVM: cannot connect to qemu:///session
- Date: Thu, 13 Jan 2011 07:58:11 -0600
- From: "Timothy St. Clair" <tstclair@xxxxxxxxxx>
- Subject: Re: [Condor-users] Condor and KVM: cannot connect to qemu:///session
Ryan -
If you could enable D_FULLDEBUG for VM_GAHP and send the log that would
greatly help in diagnosing the possible root cause of the failure.
Cheers,
Tim
On Wed, 2011-01-12 at 15:35 -0500, Ryan Jansen wrote:
> Hi Tim,
>
> Sorry for emailing you again, but I am still running into the same
> problem, and I can't get it sorted out. I've played with permissions,
> and Condor is, in fact, being started as root, so I'm not sure where
> to go from there.
>
> One of the details that I suspect may be causing problems might be
> that our user home directories are all stored in AFS. So, if I am
> running on a machine as root and I su into a user, I cannot access
> their home directory, because I do not have the AFS credentials to do
> so.
>
> To try and get around this, I gave public permissions to the .libvirt
> folder in my user's home directory, but it still failed. In addition,
> I tried creating a user with a local home directory at one point and
> using that, but it still didn't work.
>
> Do you know exactly what directories/files Condor has to be able to
> write to in the user's home directory to be able to start up a virtual
> machine? Does Condor define the XML file for the VM in the user's home
> directory? Or does it get defined in /etc/libvirt?
>
> Seeing as KVM and libvirt appear to be working on their own, I'd have
> to agree with you that it's most likely some sort of permissions
> issue. Any other ideas as to how to go about troubleshooting?
>
> Thanks,
> Ryan Jansen
>
> On Wed, Dec 15, 2010 at 9:14 AM, Timothy St. Clair
> <tstclair@xxxxxxxxxx> wrote:
>
>
> On Tue, 2010-12-14 at 15:59 -0500, Ryan Jansen wrote:
> > Seeing as user rjansen didn't have permission to read from
> his home
> > directory in AFS, I changed the permissions to allow access.
> Now, I
> > can run virsh and connect to qemu:///session as rjansen
> without AFS
> > credentials. I'm able to list his machines.
> >
> > Condor, however, gives me the same error. Does a vm universe
> job user
> > need anything other than an accessible home directory to
> start a
> > virtual machine under Condor? Any ideas as to what the
> problem is?
> >
> > Also, does the condor_vm-gahp run as root
>
>
> It needs(condor) to be started as superuser and will
> periodically
> elevate privs on certain function calls.
>
> > , but then switch to the job user to create and start up a
> virtual
> > machine? Is it possible to make Condor just connect to
> qemu:///system
> > as root (or even as the user)?
>
>
> e.g. - my condor_config.local
>
> ALWAYS_VM_UNIV_USE_NOBODY = TRUE
> VM_UNIV_NOBODY_USER = tstclair
>
> Then I have a script which starts with elevated privs:
>
> sudo env PATH=$PATH CONDOR_CONFIG=$CONDOR_CONFIG condor_master
>
>
> Hope this helps,
> Tim
>
>
> >
> > Any insight would be very much appreciated.
> >
> > Thanks,
> > Ryan
> >
> > On Mon, Dec 13, 2010 at 3:48 PM, Ryan Jansen
> <rjansen@xxxxxx> wrote:
> > Tim,
> >
> > I currently have two VMs defined on the host
> machine,
> > ryankvm01, which is defined under qemu:///system for
> user
> > rjansen, and ryankvm02, which is defined under
> qemu:///system
> > for root.
> >
> > Running as root, I can connect to both URIs and run
> VMs under
> > each:
> >
> > # virsh -c qemu:///system list --all
> > Id Name State
> > ----------------------------------
> > - ryankvm02 shut off
> >
> > virsh -c qemu:///session list --all
> > Id Name State
> > ----------------------------------
> > - ryankvm02 shut off
> >
> >
> > Logging in as rjansen and running virsh, I can do
> the same
> > thing (although it shows ryankvm01 under
> qemu:///session).
> >
> > However (and I think this may be part of the
> problem), when I
> > just su into user rjansen from root, I don't have
> the
> > appropriate AFS tokens to access rjansen's home. As
> such,
> > libvirt gives me an error:
> >
> > # su rjansen
> > # virsh -c qemu:///session list --all
> > libvir: Network Config error : Failed to open dir
> >
> '/afs/crc.nd.edu/user/r/rjansen/.libvirt/qemu/networks':
> > Permission denied
> > Failed to open dir
> > '/afs/crc.nd.edu/user/r/rjansen/.libvirt/storage':
> Permission
> > deniedlibvir: Domain Config error : Failed to open
> dir
> > '/afs/crc.nd.edu/user/r/rjansen/.libvirt/qemu':
> Permission
> > denied
> > libvir: error : could not connect to qemu:///session
> > error: could not connect to qemu:///session
> > error: failed to connect to the hypervisor
> >
> >
> > Following up on the error with rjansen, I also tried
> creating
> > a local user with a home directory on the host
> machine,
> > condor_vm. condor_vm can use virsh, but Condor still
> gives me
> > the same error when I try to run the job with
> condor_vm set as
> > the nobody user. condor_vm can connect just fine
> with a simple
> > su.
> >
> > # su condor_vm
> > # virsh -c qemu:///session list --all
> > Id Name State
> > ----------------------------------
> >
> > The error with rjansen seems to be on the right
> track, but I
> > don't understand why the condor_vm user didn't work.
> Any
> > ideas?
> >
> > Thanks,
> > Ryan
> >
> >
> >
> >
> > On Mon, Dec 13, 2010 at 3:08 PM, Timothy St. Clair
> > <tstclair@xxxxxxxxxx> wrote:
> > What happens when you try to open via virsh?
> >
> >
> > On Mon, 2010-12-13 at 13:55 -0500, Ryan
> Jansen wrote:
> > > Tim,
> > >
> > > That's what I suspected at first, but it
> looks like
> > the vm-gahp is
> > > running as root. Here's the vm-gahp log
> with
> > D_FULLDEBUG on:
> > >
> > > 12/13 13:41:37 Running as root. Enabling
> > specialized core dump
> > > routines
> > > 12/13 13:41:37 DaemonCore: Command Socket
> at
> > <10.32.72.74:9077>
> > > 12/13 13:41:37 Will use UDP to update
> collector
> > cclweb00.cse.nd.edu
> > > <129.74.152.166:9618>
> > > 12/13 13:41:37 VMGAHP[916]: VM-GAHP
> initialized with
> > run-mode 3
> > > 12/13 13:41:37 VMGAHP[916]: Initial
> UID/GUID=0/0,
> > > EUID/EGUID=126019/1313, Condor
> UID/GID=108172,40
> > > 12/13 13:41:37 VMGAHP[916]: Initialize
> Uids:
> > caller=root, job
> > > user=rjansen
> > > 12/13 13:41:37 VMGAHP[916]: Constructed
> VMGahp
> > > 12/13 13:41:37 VMGAHP[916]: Command:
> COMMANDS
> > > 12/13 13:41:38 VMGAHP[916]: Command:
> SUPPORT_VMS
> > > 12/13 13:41:38 VMGAHP[916]: Execute
> commands: S xen
> > kvm vmware
> > > 12/13 13:41:39 VMGAHP[916]: Command:
> ASYNC_MODE_ON
> > > 12/13 13:41:40 VMGAHP[916]: Command:
> CLASSAD
> > > 12/13 13:41:43 VMGAHP[916]: Command:
> CONDOR_VM_START
> > > 12/13 13:41:43 VMGAHP[916]: Constructed
> VM_Type.
> > > 12/13 13:41:43 ERROR "Failed to create
> libvirt
> > connection: could not
> > > connect to qemu:///session" at line 989 in
> file
> > xen_type.cpp
> > >
> > > Based on the log output, It appears to be
> running as
> > root, and it
> > > knows that the job user is rjansen. Does
> that look
> > normal, or do you
> > > still think it's most likely a permissions
> problem?
> > Is there any way
> > > to get some more useful output from
> libvirt, maybe
> > explaining why it
> > > couldn't connect?
> > >
> > > Thanks,
> > > Ryan
> > >
> > >
> > > On Mon, Dec 13, 2010 at 1:11 PM, Timothy
> St. Clair
> > > <tstclair@xxxxxxxxxx> wrote:
> > > If you can verify that your
> libvirtd is
> > running & qemu+kvm are
> > > installed
> > > properly (check via virsh command
> prompt),
> > then it is likely a
> > > permissions issue. Condor's
> vm-gahp
> > requires it be started
> > > with
> > > elevated priv's(~root) in order to
> > communicate with the
> > > libvirtd.
> > >
> > > Cheers,
> > > Tim
> > >
> > >
> > > On Mon, 2010-12-13 at 12:01 -0500,
> Ryan
> > Jansen wrote:
> > > > Hi Tim,
> > > >
> > > > Thanks for the email and sorry
> for taking
> > so long to get
> > > back to you.
> > > >
> > > > I'm using libvirt version 0.6.3.
> > > >
> > > > Ryan
> > > >
> > > > On Wed, Dec 8, 2010 at 11:13 AM,
> Timothy
> > St. Clair
> > > > <tstclair@xxxxxxxxxx> wrote:
> > > > what version of libvirt
> are you
> > using?
> > > >
> > > > Cheers,
> > > > Tim
> > > >
> > > >
> > > > On Tue, 2010-12-07 at
> 16:36 -0500,
> > Ryan Jansen
> > > wrote:
> > > > > Hi everyone,
> > > > >
> > > > > I'm having a problem
> getting
> > Condor to start up a
> > > KVM
> > > > virtual machine
> > > > > in Condor. I posted an
> email
> > before, and with
> > > advice from a
> > > > few
> > > > > people, I was able to
> sort out
> > my KVM problems.
> > > But now,
> > > > whenever I
> > > > > run a vm universe job,
> the
> > condor_vm-gahp fails
> > > with the
> > > > following
> > > > > error:
> > > > >
> > > > > 12/07 16:18:12 **
> condor_vm-gahp
> > (CONDOR_VM_GAHP)
> > > STARTING
> > > > UP
> > > > > 12/07 16:18:12
> > > > >
> > > >
> > >
> >
> ** /afs/nd.edu/user37/condor/software/versions/amd64-redhat5/condor-7.4.2-dynamic/sbin/condor_vm-gahp
> > > > > 12/07 16:18:12 **
> SubsystemInfo:
> > name=VM_GAHP
> > > type=GAHP(9)
> > > > > class=DAEMON(1)
> > > > > 12/07 16:18:12 **
> Configuration:
> > subsystem:VM_GAHP
> > > > local:<NONE>
> > > > > class:DAEMON
> > > > > 12/07 16:18:12 **
> > $CondorVersion: 7.4.2 Mar 29
> > > 2010 BuildID:
> > > > 227044 $
> > > > > 12/07 16:18:12 **
> > $CondorPlatform:
> > > X86_64-LINUX_RHEL5 $
> > > > > 12/07 16:18:12 ** PID
> = 13583
> > > > > 12/07 16:18:12 ** Log
> last
> > touched 12/7 16:18:10
> > > > > 12/07 16:18:12
> > > >
> > >
> >
> ******************************************************
> > > > > 12/07 16:18:12 Using
> config
> > > > >
> >
> source: /afs/nd.edu/user37/condor/condor_config
> > > > > 12/07 16:18:12 Using
> local
> > config sources:
> > > > > 12/07
> > > > > 16:18:12
> > > >
> > >
> >
> /afs/nd.edu/user37/condor/software/config/machines/dqcneh100.local
> > > > > 12/07 16:18:12
> DaemonCore:
> > Command Socket at
> > > > <10.32.72.74:9118>
> > > > > 12/07 16:18:12
> VMGAHP[13583]:
> > VM-GAHP initialized
> > > with
> > > > run-mode 3
> > > > > 12/07 16:18:12
> VMGAHP[13583]:
> > Initial
> > > UID/GUID=0/0,
> > > > >
> EUID/EGUID=126019/1313, Condor
> > UID/GID=108172,40
> > > > > 12/07 16:18:12
> VMGAHP[13583]:
> > Initialize Uids:
> > > caller=root,
> > > > job
> > > > > user=rjansen
> > > > > 12/07 16:18:18 ERROR
> "Failed to
> > create libvirt
> > > connection:
> > > > could not
> > > > > connect to
> qemu:///session" at
> > line 989 in file
> > > xen_type.cpp
> > > > >
> > > > > Now, I have
> > adjusted /etc/libvirt/libvirt.conf to
> > > allow the
> > > > libvirt
> > > > > group to access the
> libvirt rw
> > socket, and I added
> > > the users
> > > > root,
> > > > > rjansen, and condor to
> that
> > group.
> > > > >
> > > > > Additionally, I can
> connect just
> > fine (as root and
> > > rjansen)
> > > > to
> > > > > qemu:///session,
> through virsh,
> > and through the
> > > libvirt C
> > > > library
> > > > > using example code
> from the qemu
> > website. In fact,
> > > the code
> > > > I use to
> > > > > connect to the library
> in the
> > example program is
> > > essentially
> > > > the same
> > > > > as the code on line
> 989 in
> > xen_type.cpp, which is
> > > failing.
> > > > >
> > > > > I'm not sure if I'm
> doing
> > something wrong with
> > > Condor or
> > > > something
> > > > > wrong with
> KVM/libvirt, but I'd
> > like to get this
> > > working.
> > > > >
> > > > > Does anyone have any
> ideas on
> > how to fix this
> > > problem?
> > > > >
> > > > > Thanks,
> > > > > Ryan
> > > >
> > > > >
> >
> _______________________________________________
> > > > > Condor-users mailing
> list
> > > > > To unsubscribe, send a
> message
> > to
> > > >
> condor-users-request@xxxxxxxxxxx
> > with a
> > > > > subject: Unsubscribe
> > > > > You can also
> unsubscribe by
> > visiting
> > > > >
> > >
> >
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > > > >
> > > > > The archives can be
> found at:
> > > > >
> >
> https://lists.cs.wisc.edu/archive/condor-users/
> > > >
> > > >
> >
> _______________________________________________
> > > > Condor-users mailing
> list
> > > > To unsubscribe, send a
> message to
> > > >
> condor-users-request@xxxxxxxxxxx
> > with a
> > > > subject: Unsubscribe
> > > > You can also unsubscribe
> by
> > visiting
> > > >
> > >
> >
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > > >
> > > > The archives can be
> found at:
> > > >
> >
> https://lists.cs.wisc.edu/archive/condor-users/
> > > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
>
>
>