VMware vSphere Common Issues Summary (Ⅲ)

Hello everyone, the previous two issues of VMware vSphere common issues summary received great feedback, and today we will continue with part three.

1. Unable to Create a Quiesced Snapshot Using VDR Backup

Problem Description

When using VMware Data Recovery (VDR) for backup, it is unable to create a quiesced snapshot because the operation exceeds the time limit for delaying frozen virtual machine I/O.

Solution

  1. Stop the VMware Tools Services.
  2. Open the VMware Tools installer and choose “Modify”.
  3. Do not install Volume Shadow Copy Services Support.
  4. Restart the virtual machine after completing the installation.

2. How to Upload and Download Files to ESXi Host Locally

Solution

  1. You can use the SCP command in ESXi to upload and download files. Without third-party tools, you need to use another Linux intermediate host to upload and download the required files.
  2. After logging into vCenter, you can see the shared storage space and local hard drives, indicating that there must be a file system within the ESXi host. By analyzing, it was found that the /vmfs/volumes/ directory is the storage and local hard drive storage point. Files in the ESXi host can be uploaded and downloaded through this directory.

3. ESX 4.0 Update 2 Host May Crash After Upgrading vCenter Server to 5.0

Problem Description

After upgrading vCenter Server to version 5.0, an ESX 4.0 Update 2 host may crash, displaying the following message on the purple screen: NOT_IMPLEMENTED bora/vmkernel/filesystems/visorfs/visorfsObj.c:3391.

Solution

Before upgrading to vCenter Server 5.0, upgrade all ESX 4.0 Update 2 hosts managed by vCenter Server to ESX 4.0 Update 3.

4. HA Configuration Fails at 90%, Error: Internal AAM Error-agent Could Not Start

Problem Description

  1. The first host joins the cluster without any issues, but the second host fails at 90% with the error: Internal AAM Errors-agent could not start.
  2. The aam_config_util_addnode.log file contains similar error messages:
01.01/23/10 16:20:49 [myexit] Failure location:
02.01/23/10 16:20:49 [myexit] function main::myexit called from line 2199
03.01/23/10 16:20:49 [myexit] function main::start_agent called from line 1168
04.01/23/10 16:20:49 [myexit] function main::add_aam_node called from line 171
05.01/23/10 16:20:49 [myexit] VMwareresult=failure

Analysis

This issue is usually related to the UDP 8043 port being inaccessible.

Solution

Ensure that the UDP 8043 port is unblocked by executing the following command:

tcpdump -i vswif0 -s 900 -n udp port 8043 -w ${hostname}.pcap

5. Checking and Reinstalling VirtualCenter Server Agents (vpxa) Service

Problem Description

  • VMware High Availability (HA) configuration fails.
  • Reconfiguring VMware HA returns the error: Could not Enable aam firewall ruleset.fault.HostConfigFault.
  • Unable to add ESX to VirtualCenter.
  • Attempting to re-add ESX to VirtualCenter returns the error: unable to access the specified host, either it doesn’t exist, the server software is not responding, or there is a network problem.
  • The hostd.log file contains the following content:
[2010-05-24 10:45:51.463 'Vmomi' 15752112 info] Throw vim.fault.AlreadyExists
[2008-05-26 10:45:51.463 'Vmomi' 15752112 info] Result:
(vim.fault.AlreadyExists) {
  name = "vpxuser"
  msg = ""
}

Solution

If you encounter the following error, reinstall the vpxa on the ESX host:

unable to access the specified host, either it doesn’t exist, the server software is not responding, or there is a network problem.

To check the version of the VirtualCenter agent (vpxa) installed on the ESX/ESXi server, follow these steps:

  1. Determine the version of VirtualCenter: Click the Help button and then click About.
  2. Use the following command to check VMware-vpxa:
rpm -V VMware-vpxa

6. Using IBM Servers May Cause ESXi/ESX 4.1 Server HBA Card and PCI Devices to Stop Responding

Problem Description

When using IBM x3650 M3 or BladeCenter HS22V servers, the following issues may occur on ESXi/ESX 4.1:

  1. HBA card stops responding.
  2. Some PCI devices are unresponsive.
  3. The ALT+F12 screen and logs contain the following information:
vmkernel: 6:01:34:46.970 cpu0:4120)ALERT: APIC: 1823: APICID 0x00000000 - ESR = 0x40

4. HBA card stops responding, such as:

vmkernel: 6:01:42:36.189 cpu15:4274)<6>qla2xxx0000:1a:00.0: qla2x00_abort_isp: **** FAILED ****
vmkernel: 6:01:47:36.383cpu14:4274)<4>qla2xxx 0000:1a:00.0: Failed mailbox send register test

5. HBA card may go offline, such as:

vmkernel: 6:01:47:36.383 cpu14:4274)<4>qla2xxx 0000:1a:00.0: ISP error recovery failed - board disabled

Analysis

The exact problem is unclear, and we hope VMware will provide an explanation and release relevant patches.

Solution

In ESXi/ESX 4.1, interrupt remapping code is enabled by default, which is incompatible with some IBM servers. This issue can be temporarily resolved by disabling this code:

  1. Enter the command line interface and execute the following commands:
# esxcfg-advcfg -k TRUE ioDisableIR
# init 6

2. After rebooting, check if the option is still disabled:

# esxcfg-info -c
iovDisableIR=TRUE

Note: This indicates it is not enabled by default. You can also modify this setting using the vSphere Client GUI.

7. Virtual Machine Cannot Shut Down, Stopping at 95%

Problem Description

During a virtual machine reclamation task, the target virtual machine was running antivirus software with high CPU usage. Remote connection was unresponsive, and operations in the VC console were extremely slow. The virtual machine power was turned off directly, but the task progress bar stopped at 95%.

Solution Approach

This issue involves a “communication” problem, which should be addressed from two aspects:

  1. Whether vCenter successfully transmitted the command to ESX.
  2. Whether ESXi/ESX received and executed the command.

Solution Process

  1. SSH into the ESX host.
  2. Use the following command to determine the state of the virtual machine:
vmware-cmd <path.vmx> getstate

where <path.vmx> is the full path (remember to add escape characters for spaces, etc.); the vmware-cmd -l command can be used to view the virtual machine path.

  1. According to the official documentation, if the state is On, other commands can be executed to stop the virtual machine. If the state is Off, it indicates that ESX has already shut down the virtual machine, and the issue is a communication problem.
  2. Log into VC, attempt to disconnect and reconnect the host of the problematic virtual machine from the VC end. However, if the previous task of shutting down the virtual machine is not completed, subsequent tasks may be queued without response.
  3. If tasks cannot be executed, restart the related VC and ESX services.

For VC: Restart the VC service on the Virtual Center’s Windows machine. For ESX: Restart the following two services via SSH (note: virtual machine services will be briefly interrupted):

service mgmt-vmware restart
service vmware-vpxa restart

6. After restarting all services, execute step 2. If the state is On, it indicates the virtual machine is running. According to the official documentation, subsequent commands can be executed.

7. Use the following command to kill the virtual machine process:

kill -9 <PID>

Use the following command to view the virtual machine’s PID:

ps -auxwww | grep -i <VMNAME>.vmx

Conclusion

This summary addresses common VMware vSphere issues with practical solutions, enhancing system stability and performance, and guiding users through troubleshooting steps for efficient problem resolution.

Leave a Reply

Your email address will not be published. Required fields are marked *