Wednesday, September 22, 2010


    ESX tips
    Module 1: Course Introduction
    Module 2: ESXi Command-Line Troubleshooting Methods
    • Configure ESXi technical support mode and SSH access
    To use Tech Support Mode:
    1. Log in to your ESXi host at the console.
    2. Press Alt+F1 to switch to the console window.
    3. Enter unsupported to start the Tech Support Mode login process. Note that no text will appear on the console window.
    4. Enter the password for the root user. Tech Support Mode is now active.
    5. Complete tasks in Tech Support Mode.
    6. Enter the command clear to clear the screen of any residual data from step 5. This may be required by your local security policies.
    7. Enter the command exit to exit Tech Support Mode.
    8. Press Alt+F2 to return the server to DCUI mode.
    Module 3: ESX, ESXi, and vCenter Server Log Files
    • View ESX, ESXi, and vCenter Server log files
    • Configure a centralized ESX/ESXi log host
    ESX host
    Service console log
    /var/log/messages
    VMkernel messages
      /var/log/vmkernel in the service console
      SCIS Error:
      “Device Status”/”Host Status” “Sense Key” “Additional Sense Code” “Additional Sense Code Qualifier”
      cpu0)SCSI: 8879: vmhba2:1:5:0 status = 24/0 0x0 0x0 0x0
      24/0 0x0 0x0 0x0 (SCSI Reservation Conflict)
    VMkernel warning
    /var/log/vmkwarning
    vmkwarning log is a subset of this one and contains only the warning events
    Process log
    /proc/vmware/log
    All events since last vmkernel load are also in memory in /proc/vmware/log
    Service Console
      This log is the log from the Linux kernel (service console), which is generally only potentially useful in the case of a host hang, crash, authentication issue, or 3rd party app acting up. This log has NOTHING to do with virtual machines. The SERVICE CONSOLE (red hat kernel) has NO awareness of the VMs (worlds) running on the VMKERNEL
      /var/log/messages
    1. Console events
    2. Logon events
    3. iSCSI Authentication events
    Init log
    Located in /var/log/initrdlogs
    Events during initial boot from RAM Disk
    Logs include:
    vmklog.vmk
    messages
    vmklog. (e.g. vmklog.qla2300_7xx)
    Host agent
    1. Located in /var/log/vmware
    2. Sym-linked to the current rotated hostd log file
    3. Contains information on the agent that manages and configures the ESX Server host and its virtual machines (Search the file date/time stamps to find the log file it is currently outputting to).
    4. Hostd events
      • VI Client communications when directly connected to ESX
      • Events done on behalf of
        • VPXA
        • System Services
        • Firewall System
        • HA services
        • VMware Converter
    VirtualCenter agent
    1. Located in /var/log/vmware/vpx/vpxa.log
    2. Sym-linked to the current rotated vpxa.log
    3. Events of intractions with Virtual Center Server
    Firewall
    1. Located in /var/log/vmware/esxcfg-firewall.log
    2. All VMware Firewall rules events
    Update
    1. In /var/log/vmware/esxupdate.log
    2. History of all updates done via esxupdate tool
    3. Date and PID
    4. Packages installed
    5. Results of the installation
    Kernel Version
    1. /var/log directory/vmkernel-version
    2. lists current and all previous kernel build numbers
    Vmkernel Summary
    /var/log/vmksummary – Used to determine uptime and availability statistics for ESX Server; human-readable summary found in /var/log/vmksummary.txt
    Authentication log
    – /var/log/secure – Contains records of connections that require authentication, such as VMware daemons and actions initiated by the xinetd daemon.
    Virtual Machines
    – The same directory as the affected virtual machine’s configuration files; named vmware.log – Contain information when a virtual machine crashes or ends abnormally.
    VC
    Installation
      Install logs are located in the %TEMP%directory of the user that installed software
    1. vmlic.log test results for served license file during install
    2. redist.log MDAC/MCAD QFE rollup install results
    3. vmmsde.log MSDE installation log
    4. vmls.log License server installation log
    5. vmosql.log Creation of database/trans logs for VCDB
    6. vminst.log Log of VC server installation and subtasks
    7. VCDatabaseUpgrade.log Details of upgrading from VC 1.x DB
    8. vmmsi.log VI client installation logvpx’vpxd-0.log small stub from first time starting service
    VC server
    1. Location: %TEMP%\vpx (relative to the user account running vpxd)
    2. Name: vpxd-#.log (# is one digit, 0-9)
    3. vpxd-index contains the # of the currently active log file
    4. Logs rotate each time vpxd is started, and also when it reaches 5 MB in size
    VC Client
    1. Intended for client-specific diagnostics
    2. Location: %TEMP%\vpx (relative to the user running the client)
    3. Name: viclient-#.log (# is one digit, 0-9)
    4. No index file
    5. Logs rotate each time VI Client is started
    Other Logs
    1. Core dump location %USERPROFILE%’Application Data’VMware
    2. License Server debug log %ALLUSERSPROFILE%’Application Data’VMware’VMware License Server’lmgrd.log(reset each time the service starts; no rotation)
    3. %ALLUSERSPROFILE%’Application Data’Macrovision’FLEXlm’
    4. Web Access (Tomcat) LogsC:’Program Files’VMware’VMware VirtualCenter 2.0’tomcat’logs
    File/Folders:
    /etc/modules.conf
    This file contains a list of devices in the system available to the Service Console. Usually the devices allocated solely to VMs, but physically existing on the system are also shown here in the commented-out ("#") lines. This is an important file for root and administrators
    /etc/fstab
    This file defines the local and remote filesystems which are mounted at ESX Server boot
    /etc/rc.d/rc.local
    This file is for server local customisations required at the server bootup. Potential additions to this file are public/shared vmfs mounts.
    /etc/syslog.conf
    This file configures what things are logged and where. Some examples are given below:
    • *.crit /dev/tty12
    This example logs all log items at level "crit" (critical) or higher to the virtual terminal at tty12. You can see this log by pressing [Alt]-[F12] on the console.
    • *.=err /dev/tty11
    This example logs all log items at exactly level "err" (error) to the virtual terminal at tty11. You can see this log by pressing [Alt]-[F11] on the console.
    • *.=warning /dev/tty10
    This example logs all log items at exactly level "warning" to the virtual terminal at tty10. You can see this log by pressing [Alt]-[F10] on the console.
    • *.* 192.168.31.3
    This example forwards everything (all syslog entries) un-encrypted to another (central) syslog server. Pay attention to that server's security.
    /etc/logrotate.conf
    This is the main configuration file for log file rotation program. It defines the defaults for log file rotation, log file compression, and time to keep the old log files. Processing the contents of /etc/logrotate.d/ directory is also defined here
    /etc/logrotate.d/
    This directory contains instructions service by service for log file rotation, log file compression, and time to keep the old log files. For the three vmk* files, raise "250k" to "4096k", and enable compression.
    /etc/inittab
    Here you can change the amount of virtual terminals available on the Service Console. Default is 6, but you can go up to 9. I almost always go :-)
    /etc/bashrc
    The system default $PS1 is defined here. It is a good idea to change "\W" to "\w" here to always see the full path while logged on the Service Console. This is one of my favourites.
    /etc/profile.d/colorls.sh
    Command "ls" is aliased to "ls --colortty" here. Many admins don't like this colouring. You can comment-out ("#") this line. I always do this one, too.
    /etc/init.d/
    This directory contains the actual start-up scripts
    /etc/rc3.d/
    This directory contains the K(ill) and S(tart) scripts for the default runlevel 3. The services starting with "S" are started on this runlevel, and the services Starting with "K" are killed, i.e. not started...
    /var/log/
    This directory contains all the log files. VMware's log files start with letters "vm". The general main log file is "messages".
    /etc/ssh/
    This directory contains all the SSH daemon configuration files, public and public keys. The defaults are both secure and flexible and rarely need any changing. The only exception is a change to /etc/ssh/sshd_config file if you want to restrict logins for root user
    /etc/vmware/
    This directory contains the most important vmkernel configuration files.
    /etc/vmware/vm-list
    A file containing a list of registered VMs on this ESX Server
    /etc/xinetd.conf
    This is the main and defaults setting configuration file for xinet daemon. Processing the contents of /etc/xinetd.d/ directory is also defined here.
    /etc/xinetd.d/
    This directory contains instructions service by service for if and how to start the service. Of the services here, vmware-authd, wu-ftpd, and telnet are most interesting to us. Two of the most interesting parameter lines are "bind =" and "only_from =", which allows limiting service usage
    /etc/ntp.conf
    This file configures the NTP daemon. Usable public NTP servers in Finland are fi.pool.ntp.org, elsewhere in Europe europe.pool.ntp.org. You should always place two to four NTP servers to ntp.conf file. Due to the nature of *.pool.ntp.org, you should just have the same line four times in the configuration file.
    Remember to change the service to autostart at runlevel 3 with command chkconfig --add ntpd.
    Commands
    Grep
    1. Search for the given string in a single file
    2. grep "literal_string" filename
    3. Checking for the given string in multiple files.
    4. grep "this" demo_*
    5. Case insensitive
    6. grep -i "string" FILE
      Grep -iw "string" file: search only word
      Grep -n "go" file: show line number
    7. Color export:
    8. REP_OPTIONS='--color=auto' GREP_COLOR='100;8'
    9. regular expression in files
    10. ? The preceding item is optional and matched at most once.
      * The preceding item will be matched zero or more times.
      + The preceding item will be matched one or more times.
      {n} The preceding item is matched exactly n times.
      {n,} The preceding item is matched n or more times.
      {,m} The preceding item is matched at most m times.
      {n,m} The preceding item is matched at least n times, but not more than m times.
    ps
    -u username: all processes owned by specific user
    -e List information about every process now running.
    -f Generate a full listing.
    -j Print session ID and process group ID.
    -C cmdlist
    kill
    Kill -s signal pid:
    ls
    -d, --directory
    -h, --human-readable :
    -l : use a long listing format
    -R, --recursive: list subdirectories recursively
    Module 4: Network Troubleshooting
    • Identify and configure vNetwork components
    • Configure and use a network traffic sniffer
    Change Host IP information:
    /etc/hosts
    Local DNS/IP lookup
    /etc/sysconfig/network
    1. To change the default gateway address and the hostname, edit the /etc/sysconfig/network file and change the GATEWAY and HOSTNAME parameters to the proper values.
    2. To make the changes take place, reboot the host or restart the network service with the command:
    3. service network restart
    Change host name
    hostname newname
    This change is lost when the system is rebooted.
    DNS
    To change the DNS server settings, update the nameserver IPs and search domain the /etc/resolv.conf file.
    IP address
    /etc/sysconfig/network-scripts/ifcfg-vswif0
    PCI NIC card
    cat /proc/vmware/pci|
    Command:
    Ifconfig
    status of all network interfaces on the system.
    esxcfg-vswif
    view the status of or reconfigure the VMware Service console network interface. That SC network interface is called "vswif" and the first interface is always "vwsif0"
    esxcfg-vswitch
      view the status of or reconfigure the VMware virtual switches (called vswitch). These vswitches are used to connect the physical NIC in the server (called vmnic) to the ESX port groups (such as the "Service Console" and the "VM Network" port groups).
    1. assign a VLAN, use the command:
    2. esxcfg-vswitch -v -p “Service Console” vSwitch0
    3. Up-link vmnic1 to the new virtual switch with the command:
    4. esxcfg-vswitch –L vmnic1 vSwitch1
    esxcfg-nics
    used to view the status of or reconfigure the VMware Physical Network interface cards that are installed in the physical server. These physical NICs are called "vmnic" and they start with "vmnic0". The vmnics are connected to vswitches to connect the physical network to the virtual networks.
    Route
    display the routing table, run the command:
    [root@server root]# route –n
    •esxcfg-linuxnet–setup: creates ifcfg-eth0
    esxcfg-linuxnet–remove: removes ifcfg-eth0
    esxcfg-linuxnet–remove: removes ifcfg-eth0
    Module 5: Management Troubleshooting
    • Troubleshoot vSphere management components
    Restart management agent
    Restart RSX management service:
    service mgmt-vmware restart.
    Restart VC agent service:
    service vmware-vpxa restart.
    Verify the installation of the VirtualCenter agent
    From the command-line, run:
    rpm -V VMware-vpxa
    There is only output from the command if errors are found. For example:
    rpm -V VMware-vpxa
    S.5....T /opt/vmware/vpxa/sbin/vpxa
    Check Log
    Verify that the vpxa process is not exceeding allocated memory.
    Change to the the vpxa log directory, run:
    cd /var/log/vmware/vpxa
    View the vpxa log file, run:
    more vpxa.log
    Examine the log for errors. For example:
    [2007-07-28 17:57:25.416 'Memory checker' 5458864 error] Current value 143700 exceeds hard limit 128000. Shutting down process.
    [2007-07-28 17:57:25.420 'Memory checker' 3076453280 info] Resource checker stopped.
    Module 6: Storage Troubleshooting
    • View, configure, and diagnose storage access problems
    • Configure iSCSI authentication and digests
    Check Kernel Logs
    /var/log/vmkernel
    var/log/messages
    /var/log/vmkwarning
    Command
    • esxcfg-vmhbadevs:
    display the HBA address as it is presented to the ESX Server and the attached drives as the system sees them.
    • esxcfg-mpath
    command to display the multipathing configuration of the ESX Server
    • esxcfg-module –l command to get a list of the loaded kernel modules (drivers) from the hardware in the physical ESX Server
    • esxcfg-rescan: rescan HBA for storage
    • vmkfstools –V:
    Verify that the LUN is detected by the ESX host at boot time:
    cd /vmfs/devices/disks
    # ls vmh*
    view active devices installed or attached to your server
    Dmesg
    vdf
    free space on devices
    Tasks:
    Obtaining LUN pathing information for ESX hosts
    esxcfg-mpath -l and press Enter.
    he following is an analysis of the first LUN:
    • Canonical name

      Disk vmhba2:1:4 /dev/sdh (30720MB) has 2 paths and policy of Most Recently Used
      FC 10:3.0 210000e08b89a99b<->5006016130221fdd vmhba2:1:4 On active preferred
      FC 10:3.0 210000e08b89a99b<->5006016930221fdd vmhba2:3:4 Standby

      This is the canonical device name the ESX host used to refer to the LUN.

      Note : When there are multiple paths to a LUN, the canonical name is the first path that was detected for this LUN.

      vmhba2:1:4 is one of the Host Bus Adapters (HBAs).
      vmhba2:1:4 is the second storage target (numbering starts at 0) that was detected by this HBA.
      vmhba2:1:4 is the number of the LUN on this storage target. For multipathing to work properly, each LUN must present the same LUN number to all ESX hosts.

      Note: If the vmhba number for the HBA is a single digit number, it is a physical adapter. If the address is vmhba40 or vmhba32, it is a software iSCSI device for ESX 3.0 and ESX 3.5 respectively.
    • Linux device name

      Disk vmhba2:1:4 /dev/sdh (30720MB) has 2 paths and policy of Most Recently Used
      FC 10:3.0 210000e08b89a99b<->5006016130221fdd vmhba2:1:4 On active preferred
      FC 10:3.0 210000e08b89a99b<->5006016930221fdd vmhba2:3:4 Standby

      This is the associated Linux device handle for the LUN. You must use this reference when using utilities like fdisk.
    • LUN capacity

      Disk vmhba2:1:4 /dev/sdh (30720MB) has 2 paths and policy of Most Recently Used
      FC 10:3.0 210000e08b89a99b<->5006016130221fdd vmhba2:1:4 On active preferred
      FC 10:3.0 210000e08b89a99b<->5006016930221fdd vmhba2:3:4 Standby

      The disk capacity of the LUN. In the example, the LUN capacity is 30GB.
    • Failover policy

      Disk vmhba2:1:4 /dev/sdh (30720MB) has 2 paths and policy of Most Recently Used
      FC 10:3.0 210000e08b89a99b<->5006016130221fdd vmhba2:1:4 On active preferred
      FC 10:3.0 210000e08b89a99b<->5006016930221fdd vmhba2:3:4 Standby

      This is the policy the ESX host uses when it determines which path to use in the event of a failover.

      The choices are:
      • Most Recently Used: The path used by a LUN is not be altered unless an event (user, ESX host, or array initiated) instructs the path to change. If the path changed because of a service interruption along the original path, the path does not fail-back when service is restored. This policy is used for Active/Passive arrays and many pseudo active/active arrays.
      • Fixed: The path used by a LUN is always the one marked as preferred, unless that path is unavailable. As soon as the path becomes available again, the preferred becomes the active path again. This policy is used for Active/Active arrays. An Active/Passive array should never be set to Fixed unless specifically instructed to do so. This can lead to path thrashing, performance degradations and crashes.
      • Round Robin: This is experimentally supported in ESX 3.x. It is fully supported in ESX 4.x

        Note: See the additional information section for references to the arrays and the policy they are using.
    • LUN disk type

      Disk vmhba2:1:4 /dev/sdh (30720MB) has 2 paths and policy of Most Recently Used
      FC 10:3.0 210000e08b89a99b<->5006016130221fdd vmhba2:1:4 On active preferred
      FC 10:3.0 210000e08b89a99b<->5006016930221fdd vmhba2:3:4 Standby

      There are three possible values for LUN disk type:
      • FC: This LUN is presented through a fibre channel device.
      • iScsi: This LUN is presented through an iSCSI device.
      • Local: This LUN is a local disk.
    • PCI slot identifier

      Disk vmhba2:1:4 /dev/sdh (30720MB) has 2 paths and policy of Most Recently Used
      FC 10:3.0 210000e08b89a99b<->5006016130221fdd vmhba2:1:4 On active preferred
      FC 10:3.0 210000e08b89a99b<->5006016930221fdd vmhba2:3:4 Standby

      PCI slot identifier indicates the physical bus location this HBA is plugged in to.
    • HBA World Wide Port Numbers (WWPN)

      Disk vmhba2:1:4 /dev/sdh (30720MB) has 2 paths and policy of Most Recently Used
      FC 10:3.0 210000e08b89a99b<->5006016130221fdd vmhba2:1:4 On active preferred
      FC 10:3.0 210000e08b89a99b<->5006016930221fdd vmhba2:3:4 Standby

      These numbers are the hardware addresses (much like the MAC address on a network adapter) of the HBAs.
    • Storage processor port World Wide Port Numbers (WWPN)

      Disk vmhba2:1:4 /dev/sdh (30720MB) has 2 paths and policy of Most Recently Used
      FC 10:3.0 210000e08b89a99b<->5006016130221fdd vmhba2:1:4 On active preferred
      FC 10:3.0 210000e08b89a99b<->5006016930221fdd vmhba2:3:4 Standby

      These numbers are the hardware addresses of the ports on the storage processors of the array.
    • True path address

      Disk vmhba2:1:4 /dev/sdh (30720MB) has 2 paths and policy of Most Recently Used
      FC 10:3.0 210000e08b89a99b<->5006016130221fdd vmhba2:1:4 On active preferred
      FC 10:3.0 210000e08b89a99b<->5006016930221fdd vmhba2:3:4 Standby

      This is the true name for this path. In this example, there are two possible paths to the LUN (
      vmhba2:1:4 and vmhba2:3:4 ).

    • Path status

      Disk vmhba2:1:4 /dev/sdh (30720MB) has 2 paths and policy of Most Recently Used
      FC 10:3.0 210000e08b89a99b<->5006016130221fdd vmhba2:1:4 On active preferred
      FC 10:3.0 210000e08b89a99b<->5006016930221fdd vmhba2:3:4 Standby

      Path status contains the status of the path.

      There are six attributes that comprise the status:
      • On: This path is active and able process I/O. When queried, it returns a status of READY.
      • Off: The path has been disabled by the administrator.
      • Dead: This path is no longer available for processing I/O. This can be caused by physical medium error, switch, or array misconfiguration.
      • Standby: This path is inactive and cannot process I/O. When queried, it returns a status of NOT_READY.
      • Active: This path is processing I/O for the ESX Server host.
      • Preferred: This is the path that is preferred to be active. This attribute is ignored when the policy is set to Most Recently Used (mru).
    SCSI reservation
    cd /vmfs/devices/disks
    # ls vmh*
    dmesg Look for lines that may provide some information if the LUN is having an issue:
    # esxcfg-info | egrep -B5 "s Reserved|Pending"
    1. Perform a LUN reset to clear the lock with the command:

      # vmkfstools --lock lunreset /vmfs/devices/disks/vmhba1\:0\:52\:0
    2. Verify that the LUN no longer has any Pending Reserves with the command:

      # tail -1 /proc/vmware/scsi/vmhba1/0\:52
    Module 7: vMotion Troubleshooting
    • Troubleshoot VMotion and Storage VMotion errors
    Module 8: VMware Infrastructure Troubleshooting
    • Troubleshoot DRS Cluster errors with shares, pools, and limits
    • Troubleshoot HA Cluster errors with slot calculations, admission control, and host monitoring
    • Review virtual machine power on requirements
    • Troubleshoot virtual machine power on failures
    Vmware-cmd
    vmware-cmd utility to perform various operations on a virtual machine, including registering a virtual machine (on the local server), getting the power state of a virtual machine, setting configuration variables, and so on.
    Vmware-cmd -l
    Lists the virtual machines on the local server. Unlike the other server operations, this option does not require the -s option.
    vmware-cmd -s register
    Registers a virtual machine specified by on the server.
    vmware-cmd -s unregister
    Unregisters a virtual machine specified by on the server
    vmware-cmd -s getresource
    Note: These methods apply only to ESX Server.
    Gets the value of the ESX Server system resource variable specified by system..
    vmware-cmd -s setresource
    Note: These methods apply only to ESX Server.
    Sets the value of the ESX Server system resource variable specified by system..
    Kill a VM
      Method 1L
    1. vmware-cmd /vmfs/volumes///.vmx soft
    2. vmware-cmd /vmfs/volumes///.vmx stop hard
    3. If this for some reasons doesnt work (like me accidentaly deleting the .vmx file), you can try the
    4. following:
      1. Run the following command (to get a list of running VMs):
      sudo vm-support -x
      The output will look something like this:
      VMware ESX Server Support Script 1.29
      Available worlds to debug:
      vmid=1126 vm-01
      vmid=1151 vm-02
      vmid=1272 vm-03
      vmid=1291 vm-04
      vmid=1150 vm-05
      vmid=1420 vm-06
      vmid=1433 vm-07
      2. Then run the command (remember to replace the number with the “vmid” of your VM):
      less -S /proc/vmware/vm/1433/cpu/status
      Press the right arrow key.
      In the right corner there should be som info about the “group”:
      group
      vm.1432
      3. By running the following command, you can safely kill your VM without risking corrupting it
      (remember to replace the number with your “group number”):
      sudo /usr/lib/vmware/bin/vmkload_app -k 9 1432
      4. If successful, you should see a message like this:
      Warning: Jan 06 06:42:49.717: Sending signal ‘9′ to world 1432.
      Method 2:
      ps auxfww | grep
      kill -9 PID
      Method 3:
    5. vm-support -x to list the running VMs
    6. and their World IDs, then vm-support -X worldid
    7. . This then prompts the user with a couple of questions, then runs a debug stop of the VM, and creates a set of log files as well that you can forward to VMware tech support for them to check if you so desire
    Performance Issue:
    esxtop
    esxtop modes: Interactive view data realtime.
    • Batch - b piped to csv.
    • Replay - R view vm-support log. Only root can run esxtop. csv files from batch mode can also be replayed in perfmon or excel.
    resxtop: can run remotely on vMA or CLI installation. No replay mode with resxtop.
    Interactive screens:
    • c CPU, m memory,
    • d disk adapter,
    • u disk device,
    • v disk vm,
    • n network,
    • I interrupts.
    commands:
    • h help, q quit, f add fields, o order, s set refresh delay (default 5 secs,
    min 2 secs), space refesh now, W save to default, V only VMs
    Display Metric Threshold Explanation (Some metrics need fields added with extra character)
    CPU %RDY 10 X vCPU Too many vCPUs, excessive vSMP or limit set (see %MLMTD)
    CPU %CSTP 100 Excessive vSMP usage. Reduce vCPUs to up scheduling opportunity.
    CPU %MLMTD 0 World is being throttled, maybe limit set on CPU.
    CPU %SWPWT 1 VM waiting on swapped pages from disk, maybe host overcommited or mem limit.
    CPU TIMER/S h 1000 High timer-interrupt rate, reduce in guest if posible. Increases with vCPUs.
    Mem MCTLSZ i 1 > 0, VMs forces to use balloon driver, maybe host overcommited or memory limit.
    Mem SWCUR j 1 > 0, host previously swapped mem page, maybe host overcommited or mem limit.
    Mem SWR/s j 1 > 0, host actively reading swap(vswp), maybe host overcommited or mem limit.
    Mem SWW/s j 1 > 0, host actively writing to swap(vswp), maybe host overcommited or mem limit.
    Mem N%L f <> CPU's local mem. Uses remote mem via interconnect, not NUMA.
    Network %DRPTX 1 Dropped tx packets, HWoverworked, maybe network utilization.
    Network %DRPRX 1 Dropped rx packets, HWoverworked, maybe network utilization.
    Disk GAVG h 25 Look at DAVG and KAVG as the sum of both is GAVG.
    Disk DAVG h 25 Disk latency most likely caused by array.
    Disk KAVG h 5 Disk latency caused by VMkernel, usually means queuing, see QUED
    Disk QUED f 1 Queue maxed out, maybe queue depth too low, see vendor settings.
    Disk ABRTS/s k 1 Storage not responding, maybe failed paths or array not taking IO.
    Disk RESETS/s k 1 Number of commands reset per second.
    Links: http: //communities.vmware.com/docs/DOC-9279 - Interpreting esxtop Statistics
    http: //www.yellow-bricks.com/esxtop - Yellow Brick's esxtop values/thresholds
    http: //communities.vmware.com/docs/DOC-10095 - Using vscsiStats for Storage Performance
    http: //www.vmware.com/pdf/Perf_Best_Practices_vSphere4.0.pdf - Performance best practices
    http: //communities.vmware.com/docs/DOC-10352 - Performance Troubleshooting vSphere/ESX4
    http: //www.vmware.com/pdf/vsphere4/r40_u1/vsp_40_u1_resource_mgmt.pdf - Resource Guide
    vscsiStats:
    vscsiStats: monitors IO of VM's virtual SCSI controllers.
    1) Change to appropriate directory: cd /usr/lib/vmware/bin
    2) Reset the stats: sudo . /vscsiStats - r
    3) List VMs (worldgroup) & disks (handle): sudo . /vscsiStats - l
    4) Start stat collection: sudo . /vscsiStats - s - w
    5) View stats: sudo . /vscsiStats - w worldgroup_id - p all
    6) Stop stat collection: sudo . /vscsiStats - x
    Can specify disk instead of whole VM with - i handle_id after - w option.
    Specifies the stats to use: -p all, ioLength, seekDistance, outstandingIOs, latency, interarrival.
    Can export the stats using command 5) above appended with - c > /tmp/outputfile. Csv
    Module 9: vSphere 4 DRS Cluster Troubleshooting
    • Complete a final multihour, multiproblem troubleshooting exercise

No comments: