• MAGE Grid Middleware Components
    • BPEL4Grid Engine
  • Grid Development Tools (GDT)
  • Virtualization Components
    • Xen Grid Engine (XGE)
    • Image Creation Station (ICS)
  • GridVPN

XGE

  • Login
  • About Trac
  • Preferences
  • Home
  • View Tickets
  • Search
  • Changes
  • Roadmap
wiki:FAQ

Context Navigation

  • Start Page
  • Index
  • History

Last modified 2 years ago

FAQ

Table of Contents

  1. The XGE cannot start the VNodesManager
  2. The XGE refuses to start
  3. I cannot start VMs and see a message about the network bridge
  4. The XGE won't receive any jobs from Torque
  5. The XGE aborts with ImportError: No module named libtorrent
  6. All LXGEds are running, but the XGE aborts with Connection refused

The XGE cannot start the VNodesManager

Question: When I launch the XGE, it aborts and cannot start the VNodesManager. I see similar messages to the following on console:

Password: 
libvir: Remote error : cannot recv data: Connection reset by peer
[ERROR] Failed to open connection to the Backend hypervisor
[ERROR] Could not start VNodesManager. Abort.

Answer: Please provide privileged (aka root) passwordless SSH access to:

  • All physical machines
  • All VMs
  • The headnode itself (this means login to 127.0.0.1)

See Configuration chapter for further instructions.

The XGE refuses to start

Question: I have libvirt installed, but the XGE won't start.

Answer: Please make sure that the Xen daemon (aka xend) is running on all machines. libvirt refuses to run if it cannot talks with the backend hypervisor. In all versions after 2010.1 the xged should detect this on the head node and refuses to run.

I cannot start VMs and see a message about the network bridge

Question: When I want to start VMs with the XGE, it aborts with the following message:

libvir: Xen Daemon error : POST operation failed: xend_post: error from xen daemon:
(xend.err 'Device 0 (vif) could not be connected. Could not find bridge device xenbr0')

Answer: Check if you specified the correct bridge name in xge.conf (xenbr0 in this example). Your distributions bridge name might be different from the default.

The XGE won't receive any jobs from Torque

Question: I submit dozens of jobs through Torque, but nothing happens.

Answer: The XGE only recognizes jobs in a specific queue called virtual by default. If you submit jobs to another queue, they won't be recognized by the XGE. Check the log file in /opt/xge/jobs/log for entries similar to the following:

pbs_prolog.sh node02c1 152.int12909 virtual testjob.sh matthias users
pbs_epilog.sh testjob.sh 152.int12909 matthias users cput=00:00:00,mem=0kb,vmem=0kb,walltime=00:00:00 virtual

The XGE aborts with ImportError: No module named libtorrent

Question: The XGE aborts with a message similar to the following:

  File "/opt/xge/modules/server/main.py", line 15, in <module>
    from GridWatchdog import GridWatchdog
  File "/home/matthias/stuff/xge/modules/server/GridWatchdog.py", line 14, in <module>
    import Job, HLTransfer, common.XGESocketServer
  File "/home/matthias/stuff/xge/modules/server/Job.py", line 14, in <module>
    import VMManager, XgeMessage
  File "/home/matthias/stuff/xge/modules/server/VMManager.py", line 21, in <module>
    from VNodesManager import *
  File "/home/matthias/stuff/xge/modules/server/VNodesManager.py", line 23, in <module>
    from ImageManager import ImageManager
  File "/home/matthias/stuff/xge/modules/server/ImageManager.py", line 13, in <module>
    import libtorrent as lt
ImportError: No module named libtorrent

Answer: Install a recent version of libtorrent-rasterbar including (!) python bindings.

All LXGEds are running, but the XGE aborts with Connection refused

Question: The XGE is running and reports that all LXGEds are running and I can see the processes on the local systems. Although, when the XGE tries to deploy a VM disk image, I see a Connection Refused exception.

Answer: Make sure that you do not have an entry, similar to the following, in your /etc/hosts file on all compute des:

hostname 127.0.0.1

All LXGEds try to resolve their own IP address based on the host name. If the host name maps to 127.0.0.1, the LXGe binds to localhost and thus, is not available for the XGE.

Download in other formats:

  • Plain Text

Trac Powered

Powered by Trac 0.12
By Edgewall Software.

Visit the XGE project at
http://mage.uni-marburg.de/trac/xge