Space Reclamation Requests from Guest OS

ESXi supports the unmap commands issued directly from a guest operating system to reclaim storage space. The level of support and requirements depend on the type of datastore where your virtual machine resides.

Inside a virtual machine, storage space is freed when, for example, you delete files on the thin virtual disk. The guest operating system notifies VMFS about freed space by sending the unmap command. The unmap command sent from the guest operating system releases space within the VMFS datastore. The command then proceeds to the array, so that the array can reclaim the freed blocks of space.

Space Reclamation for VMFS6 Virtual Machines

VMFS6 generally supports automatic space reclamation requests that generate from the guest operating systems, and passes these requests to the array. Many guest operating systems can send the unmap command and do not require any additional configuration. The guest operating systems that do not support the automatic unmaps might require user intervention. For information about guest operating systems that support the automatic space reclamation on VMFS6, contact your vendor.

Generally, the guest operating systems send the unmap commands based on the unmap granularity they advertise. For details, see documentation provided with your guest operating system.

The following considerations apply when you use space reclamation with VMFS6:

  • VMFS6 processes the unmap request from the guest OS only when the space to reclaim equals 1 MB or is a multiple of 1 MB. If the space is less than 1 MB or is not aligned to 1 MB, the unmap requests are not processed.
  • For VMs with snapshots in the default SEsparse format, VMFS6 supports the automatic space reclamation only on ESXi hosts version 6.7 or later. If you migrate VMs to ESXi hosts version 6.5 or earlier, the automatic space reclamation stops working for the VMs with snapshots.

    Space reclamation affects only the top snapshot and works when the VM is powered on.

Space Reclamation for VMFS5 Virtual Machines

Typically, the unmap command that generates from the guest operation system on VMFS5 cannot be passed directly to the array. You must run the esxcli storage vmfs unmap command to trigger unmaps for the array.

However, for a limited number of the guest operating systems, VMFS5 supports the automatic space reclamation requests.

To send the unmap requests from the guest operating system to the array, the virtual machine must meet the following prerequisites:

  • The virtual disk must be thin-provisioned.
  • Virtual machine hardware must be of version 11 (ESXi 6.0) or later.
  • The advanced setting EnableBlockDelete must be set to 1.
  • The guest operating system must be able to identify the virtual disk as thin.

Reclaiming Space with SCSI Unmap

vSAN 6.7 Update 1 and later supports SCSI UNMAP commands that enable you to reclaim storage space that is mapped to a deleted vSAN object.

Deleting or removing files frees space within the file system. This free space is mapped to a storage device until the file system releases or unmaps it. vSAN supports reclamation of free space, which is also called the unmap operation. You can free storage space in the vSAN datastore when you delete or migrate a VM, consolidate a snapshot, and so on.

Reclaiming storage space can provide higher host-to-flash I/O throughput and improve flash endurance.

vSAN also supports the SCSI UNMAP commands issued directly from a guest operating system to reclaim storage space. vSAN supports offline unmaps as well as inline unmaps. On Linux OS, offline unmaps are performed with the fstrim(8) command, and inline unmaps are performed when the mount -o discard command is used. On Windows OS, NTFS performs inline unmaps by default.

Unmap capability is disabled by default. To enable unmap on a vSAN cluster, use the following RVC command: vsan.unmap_support –enable

When you enable unmap on a vSAN cluster, you must power off and then power on all VMs. VMs must use virtual hardware version 13 or above to perform unmap operations.

Force Unmount temporary datastore used for vSAN traces from vSAN cluster ESXi hosts

>> Disable vsantraced startup by running this command:

chkconfig vsantraced off

>> Stop the vsantraced service by running this command:

/etc/init.d/vsantraced stop

>> Change the syslog to point to the vSAN datastore .
>> Delete any coredump files that are present after checking that they are not required.
>> Sub Step to direct it to syslog :

If not planned or incorrectly configured, vSAN trace-level messages may be:
Taking up a lot of space on ESXi hosts running from a RAM disk
Written to non persistent storage

By default, vSAN traces are saved to /var/log/vsantraces. Default maximum file size is 180MB with rotation of 8 files.

By default, vSAN urgent traces are redirected through the ESXi syslog system. If an external syslog server is defined, the urgent traces are forwarded to the external collector.

Run this command to determine whether vSAN urgent traces are currently configured to                              redirect through syslog and log rotation settings:
# esxcli vsan trace get
You see output similar to:

VSAN Traces Directory: /vmfs/volumes/568ec568-06d68562-e655-001018ed2950/scratch/vsantraces
Number Of Files To Rotate: 8
Maximum Trace File Size: 180 MB
Log Urgent Traces To Syslog: true

Run this command to send urgent traces through syslog

# esxcli vsan trace set –logtosyslog true

To change the default settings, run with the desired parameter:

# esxcli vsan trace set

-l|–logtosyslog Boolean value to enable or disable logging urgent traces to syslog.
-f|–numfiles=<long> Log file rotation for vSAN trace files.
-p|–path=<str> Path to store vSAN trace files.
-r|–reset When set to true, reset defaults for vSAN trace files.
-s|–size=<long> Maximum size of vSAN trace files in MB.

For example, to reduce the number of files to rotate to 4 and maximum size to which these files can grow 200MB, run this command:

# esxcli vsan trace set -f 4 -s 200

Note: If you reduce the number of files, the older files that are not compliant are removed immediately.

>> Reboot the ESXi host.
>> Unmount the datastore.

Using RAID 5 or RAID 6 Erasure Coding in VSAN

You can use RAID 5 or RAID 6 erasure coding to protect against data loss and increase storage efficiency.
Erasure coding can provide the same level of data protection as mirroring (RAID 1), while using less storage capacity.

RAID 5 or RAID 6 erasure coding enables vSAN to tolerate the failure of up to two capacity devices in the datastore. You can configure RAID 5 on all-flash clusters with four or more fault domains. You can configure RAID 5 or RAID 6 on all-flash clusters with six or more fault domains.

RAID 5 or RAID 6 erasure coding requires less additional capacity to protect your data than RAID 1 mirroring.
For example, a VM protected by a Primary level of failures to tolerate value of 1 with RAID 1 requires twice the virtual disk size, but with RAID 5 it requires 1.33 times the virtual disk size.
The following table shows a general comparison between RAID 1 and RAID 5 or RAID 6.

Capacity Required to Store and Protect Data at Different RAID Levels:

RAID Configuration          Primary level of Failures to Tolerate     Data Size              Capacity Required
RAID 1 (mirroring)                                                  1                                100 GB                    200 GB
RAID 5 or RAID 6 (erasure coding) with four fault domains 1        100 GB                    133 GB
RAID 1 (mirroring)                                                   2                               100 GB                    300 GB
RAID 5 or RAID 6 (erasure coding) with six fault domains 2          100 GB                     150 GB

RAID 5 or RAID 6 erasure coding is a policy attribute that you can apply to virtual machine components. To use RAID 5, set Failure tolerance method to RAID-5/6 (Erasure Coding) – Capacity and Primary level of failures to tolerate to 1. To use RAID 6, set Failure tolerance method to RAID-5/6 (Erasure Coding) – Capacity and Primary level of failures to tolerate to 2. RAID 5 or RAID 6 erasure coding does not support a Primary level of failures to tolerate value of 3. To use RAID 1, set Failure tolerance method to RAID-1 (Mirroring) – Performance. RAID 1 mirroring requires fewer I/O operations to the storage devices, so it can provide better performance. For example, a cluster resynchronization takes less time to complete with RAID 1.

Note :: In a vSAN stretched cluster, the Failure tolerance method of RAID-5/6 (Erasure Coding) Capacity applies only to the Secondary level of failures to tolerate.

RAID 5 or RAID 6 Design Considerations :

Consider these guidelines when you configure RAID 5 or RAID 6 erasure coding in a vSAN cluster.
>> RAID 5 or RAID 6 erasure coding is available only on all-flash disk groups.
>> On-disk format version 3.0 or later is required to support RAID 5 or RAID 6.
>> You must have a valid license to enable RAID 5/6 on a cluster.
>> You can achieve additional space savings by enabling deduplication and compression on the vSAN cluster.

vSAN SDKs

The vSAN Management SDKs bundle language bindings for accessing the vSAN Management API and creating client applications for automating vSAN management tasks.

The vSAN Management API The vSAN Management API is an extension of the vSphere API. Both vCenter Server and ESXi hosts expose the vSAN Management API. You can use the vSAN Management API to implement the client applications that perform the following tasks:

>>Configure a vSAN cluster – Configure all aspects of a vSAN cluster, such as set up VMkernel networking, claim disks, configure fault domains, enable the deduplication and compression of all flash clusters, and assign the vSAN license.

>>Configure a vSAN stretched cluster – Deploy the vSAN Witness Appliance and configure a vSAN stretched cluster.

>>Upgrade the vSAN on-disk format.

>>Track the vSAN performance.

>>Monitor the vSAN health.

The vSAN Management SDKs are separated into five different programming languages, Java, .NET, Python, Perl, and Ruby.

Each of the five vSAN Management SDKs depends on the vSphere SDK with similar functionality delivered for the corresponding programming language.

You can download these vSphere SDKs from https://code.vmware.com/home or from Github.

1:vSAN Management SDK for Java

2:vSAN Management SDK for .NET

3:vSAN Management SDK for Python

4:vSAN Management SDK for Perl

5:vSAN Management SDK for Ruby

1: Running the Sample Applications The vSAN Management SDK for Java includes sample applications, build and run scripts, and dependent libraries. They are located under the samplecode directory in the SDK. You can use the sample code to get vSAN managed objects on vCenter Server or ESXi hosts.

Before running the sample applications, make sure that you have the vSphere Web Services SDK on your development environment, with the following directory structure:

VMware-vSphere-SDK–build

           SDK

                  vsphere-ws

Then copy the vsan-sdk-java directory at the same level as the vsphere-vs directory in the vSphere Web Services SDK:

VMware-vSphere-SDK–build

         SDK

                vsphere-ws

                vsan-sdk-java

Build the sample applications by running the build.py command. Run the sample applications using the run.sh script on Linux, or the run.bat script on Windows:

./run.sh com.vmware.vsan.samples. <Sample_name>

      –url https://vcenter/host_address/sdk

      –username <username>

      –password  <password>

2:vSAN Management SDK for .NET

The vSAN Management SDK for .NET provides libraries, sample code, and API reference for developing custom .NET clients against the vSAN Management API. The vSAN Management SDK for .NET depends on the vSphere Web Services SDK of similar level. You use the vSphere Web Services SDK for logging in to vCenter Server and for retrieving vCenter Server managed objects.

Building the vSAN C# DLL

You must have the following components to build the vSAN C# DLL:

>> csc.exe. A C# compiler

>> sgen.exe. An XML serializer generator tool

>> wsdl.exe. Web Service Description Language 4.0 for Microsoft .NET

>> Microsoft.Web.Services3.dll

>> .NET Framework 4.0 n Python 2.7.6

To build the vSAN C# DLL, run the following command:

$ python builder.py vsan_wsdl vsanservice_wsdl

This command generates the following DLL files:

>> VsanhealthService.dll

>> VsanhealthService.XmlSerializers.dll

Running the Sample Applications To run the sample applications, run the following command:

.\VsanHealth.exe –username  hostname

–url https://vcneter_name/sdk

–hostName cluster name —ignorecert –disablesso

To view information about the parameters, use –help.

For further references please follow : https://code.vmware.com/web/sdk/6.7U1/vsan-python & https://code.vmware.com/apis/444/vsan

Guest and HA Application Monitoring SDK Programming

You can download the Guest SDK for monitoring guest virtual machine statistics, and with facilities for High Availability (HA) Application Monitoring. The SDK version number is 10.2.

HA Application Monitoring. The vSphere High Availability (HA) feature for ESXi hosts in a cluster provides protection for a guest OS and applications running in a virtual machine by restarting the virtual machine if a failure occurs. Using the HA Application Monitoring APIs, developers can write software to monitor guest OS and process heartbeat.

Guest SDK. The vSphere Guest SDK provides read-only APIs for monitoring various virtual machine statistics. Management agents running in the guest OS of a virtual machine can use this data for reacting to changes in the application layer.

Compatibility Notices

HA Application Monitoring applications must be recompiled to work with vSphere 6.0 because of changes to the communication interface (see below).

For vSphere 6.0, HA Application Monitoring communication was revised to use the VMCI (virtul machine communication interface). The VMCI driver is preinstalled in Linux kernel 3.9 and higher, and in earlier kernel versions can be installed with VMware Tools. On Windows, VMware Tools must be installed to obtain the VMCI driver.

This SDK supports C and C++ programming languages. You can support Java with wrapper classes, as in JNI.

Changes and New Features

The checksystem utility to verify proper glib version was added in the vSphere 6.5 release.

Tools for fetching extended guest statistics were added in vSphere 6.0, but not publicly documented until April 2015.

In the vSphere 6.0 release, high availability VM component protection, and FT (fault tolerance) has been extended for symmetric multiprocessing (SMP). Also, the communication interface was changed to use VMCI.

In the vSphere 5.5 release, the HA application monitoring facility was changed to reset the guest virtual machine if the application monitoring program requested a reset. Before HA application monitoring had to determine when the guest stopped sending a heartbeat.

In vSphere 5.1, HA Application Monitoring facilities were merged into the Guest SDK previously available.

Known Issues and Workarounds

Security enforcement for the Guest and HA application monitoring SDK using the secure authentication VMX parameter guest_rpc.rpci.auth.app.APP_MONITOR=TRUE does not work for FT (fault tolerant) VMs. The vSphere platform supports only the non-secure channel for FT virtual machines.

Displaying vSphere Guest Library Statistics :

On a Linux virtual machine hosted by ESXi, go to the include directory and compile the vmGuestLibTest.c program. Run the output program vmguestlibtest. gcc -g -o vmguestlibtest -ldl vmGuestLibTest.c ./vmguestlibtest Guest statistics appear repeatedly until you interrupt the program.

Controlling the Application Monitoring Heartbeat :

To run HA application monitoring programs, the virtual machine must be running on an ESXi host, and application monitoring must have been enabled when configuring HA.

You can enable heartbeats with the compiled vmware-appmonitor program.

Usage is as follows: vmware-appmonitor { enable | disable | markActive | isEnabled | getAppStatus | postAppState }

>>enable – Enable application heartbeat so vSphere HA starts listening and monitoring the heartbeat count from this guest virtual machine. The heartbeats should be sent at least once every 30 seconds.

>>disable – Disable the application heartbeat so vSphere HA stops listening to heartbeats from this guest.

>>markActive – This starts sending the actual heartbeat every 30 seconds or less.

>>isEnabled – Indicates whether the heartbeat monitoring was enabled.

>>getAppStatus – Gets the status of the application, either Green, Red, or Gray.
>>postAppState – Posts the state of the application. Arguments can be:

>>appStateOk – Sends an “Application State is OK” signal to the HA agent running on the host.

>>appStateNeedReset – Sends an “Immediate Reset” signal to the HA agent running on the host.

Compiling the Sample Program on Linux:

You need a C compiler and the make program.

Procedure

1: Go to the docs/VMGuestAppMonitor/samples/C directory.

2: Run the make command. On a 64-bit machine you might want to change lib32 to lib64 in the makefile.

3: Set LD_LIBRARY_PATH as described above.

4: Run the sample program. See below for program usage. ./sample

Compiling Sample Programs on Windows :

You need Visual Studio 2008 or later.

Procedure

1: Go to the docs/VMGuestAppMonitor/samples/visualstudio folder.

2: Open the appmon.vcproj file and build the solution.

3: Click Debug > Start Debugging to run appmon.exe. See below for program usage

Demonstrating the HA Application Monitoring API The sample program enables HA application monitoring and sends a heartbeat every 15 seconds. After the program starts running, typing Ctrl+C displays three choices:

s – stop sending heartbeats and exit the program. The virtual machine will reset.

d – disable application monitoring and exit the program. This does not cause a reset.

c – continue sending heartbeats.

For further references please check : https://code.vmware.com/web/sdk/6.7/vsphere-guest

VDDK 6.7.1

The Virtual Disk Development Kit (VDDK) 6.7.1 is an update to support vSphere 6.7 Update 1 and to resolve issues discovered in previous releases. VDDK 6.7 added support for ESXi 6.7 and vCenter Server 6.7, and was tested for backward compatibility against vSphere 6.0 and 6.5.

VDDK is used with vSphere Storage APIs for Data Protection (VADP) to develop backup and restore software. For general information about this development kit, how to obtain the software, programming details, and redistribution, see the VDDK landing page on VMware {Code}.

The VMware policy concerning backward and forward compatibility is for VDDK to support N-2 and N+1 releases. In other words, VDDK 6.7 and all its update releases support vSphere 6.0, 6.5 (except for new features) and the next major release.

Changes and New Features

The VixMntapi library on Linux systems now supports:

  • Advanced transports modes: HotAdd, SAN, and NBD/NBDSSL.
  • Read-only mounting of VMDK files.
  • Diagnostic logging as set by vixMntapi.cfg.LogLevel in the VDDK configuration file. Levels are the same as for vixDiskLib.transport – Panic, Error, Warning, Audit, Info, Verbose, Trivia. The output file named vixMntapi.log appears in the same directory as other log files. Not available for Windows.

In addition to those previously qualified for use as a backup proxy, the following operating systems were tested with VDDK 6.7.1:

  • Red Hat Enterprise Linux RHEL 6.9
  • CentOS 7.4
  • SUSE Linux Enterprise Server SLES 15
  • Windows Server 2016 version 1803

Compatibility Notices

In earlier releases it was an error to close parentHandle after VixDiskLib_Attach succeeds. The VDDK library now marks parentHandle internally to prevent closure and ensure cleanup. Proper calling sequences are as follows:

  1. First open a disk for attach with this call:
    VixDiskLib_Open(remoteConnection, virtualDiskPath, flags, &parentHandle);
  2. Create a local connection using: VixDiskLib_Connect(NULL, &localConnection);
  3. With the backed-up disk (referred to as parent disk) still open, make this call, creating the child disk with a unique name: VixDiskLib_CreateChild(parentHandle, “C:\tmp.vmdk”, VIXDISKLIB_DISK_MONOLITHIC_SPARSE, NULL, NULL);
  4. Open tmp.vmdk (referred to as the redo log): VixDiskLib_Open(localConnection, “C:\tmp.vmdk”, VIXDISKLIB_FLAG_OPEN_SINGLE_LINK, &redoHandle);
  5. Attach the redo log to its parent disk with: VixDiskLib_Attach(parentHandle, redoHandle);

If VixDiskLib_Attach fails, now the system automatically cleans up the local file handle.

  1. To end, close the redo log. Whether to close the parent disk handle is release dependent:
    VixDiskLib_Close(redoHandle);
    if (VIXDISKLIB_VERSION_MAJOR > 7) {
    VixDiskLib_Close(parentHandle); // to avoid memory leaks
    }
  2. Unlink the redo log from the parent disk.

Recently Resolved Issues

The VDDK 6.7.1 release resolves the following issues.

  • XML library upgraded.We XML library libxml2 was upgraded from version 2.9.6 to version 2.9.8 because of a known security vulnerability.
  • Open SSL library upgraded.The Open SSL library openssl was upgraded from an earlier version to version 1.0.2p because of a known security vulnerability.
  • NBD transport in VDDK 6.7 is slow when running against vSphere 6.5.When data protection software is compiled with VDDK 6.7 libraries, NBD/NBDSSL mode backup and restore is significantly slower than before on vSphere 6.5 or 6.0. This was caused by dropping the OPEN_BUFFEREDflag when it became the default in VDDK 6.7. This backward compatibility issue is fixed in the VDDK 6.7.1 release. When performance is important, VMware recommends use of NBD Asynchronous I/O, calling VixDiskLib_WriteAsync and VixDiskLib_Wait.
  • With HotAdd transport VDDK could hang after many NFC connections.For programs compiled with VDDK 6.7 or 6.7 EP1 libraries, VDDK may eventually hang in VixDiskLib_Open when building server connections to ESXi. After the log entry “NBD_ClientOpen: attempting to create connection” VDDK hangs. The cause is that after HotAdd mode retrieves the disk signature, it fails to close the NFC connection, so many NFC server threads continue running and prevent new NFC connections. This regression is fixed in the VDDK 6.7.1 release.
  • HotAdd backup of VM template crashed if NoNfcSession was enabled.In VDDK 6.5.1 and later when it became available to avoid creating an NFC session for backup in cloud environments where local connections are disallowed, if vixDiskLib.transport.hotadd.NoNfcSession=1was set in the proxy’s VDDK configuration file, HotAdd mode crashed due to null pointer access of an attribute in the VM template object.
  • VixMntapi on Linux did not work with advanced transport mode.VDDK partners use VixDiskLib for block-oriented image backup and restore, while they use VixMntapi for file-oriented backup and restore. The Windows implementation of VixMntapi has supported advanced transports for many releases, but the Linux implementation of VixMntapi did not – it supported only NBD mode. In the VDDK 6.7.1 release, VixMntapi supports HotAdd or SAN transport and NBD/NBDSSL on Linux, so it can be used in VMC environments for file-oriented backup and restore of Linux VMs.
  • VDDK hangs during restore when directly connecting to ESXi hosts.When doing restore with direct ESXi connections, VDDK may hang intermittently. The cause is that NfcServer on ESXi enters the wrong state, waiting for new messages that never arrive. The fix for NfcServer was to avoid waiting when no data remains. To resolve this issue, customers must upgrade ESXi hosts to 6.7 U1 or later.
  • VixMntapi on Linux could not open files as read-only.In previous releases, opening files read-only was not supported by VixMntapi. When read-only mode is requested at open time, the file is opened read/write. In this release, VixMntapi actually opens files as read-only on Linux VMs.
  • HotAdd proxy failed with Windows Server backups.If there was SATA controller in the Windows backup proxy, HotAdd mode did not work. The cause was that VDDK did not rescan SATA controllers after HotAdding, so if there existed multiple SATA controllers or ACHI controllers, VDDK sometimes used the wrong controller ID and could not find the HotAdded disk. Disk open failed, resulting in “HotAdd ManagerLoop caught an exception” and “Error 13 (You do not have access rights to this file)” messages. The workaround was to remove the SATA controller from the Windows backup proxy. The issue is fixed in this release so the workaround is no longer needed (https://kb.vmware.com/s/article/2151091)

For further reference please check : https://code.vmware.com/web/sdk/6.7/vddk

Failed to lock the file

  • Powering on a virtual machine fails.
  • Unable to power on a virtual machine.
  • Adding an existing virtual machine disk (VMDK) to a virtual machine that is already powered on fails.

You see the error:

Cannot open the disk ‘/vmfs/volumes/UUID/VMName/Test-000001.vmdk’ or one of the snapshot disks it depends on. Reason: Failed to lock the file.

Cause:

+++++++

This issue occurs when one of the files required by the virtual machine has been opened by another application.

During a Create or Delete Snapshot operation while a virtual machine is running, all the disk files are momentarily closed and reopened. During this window, the files could be opened by another virtual machine, management process, or third-party utility. If that application creates and maintains a lock on the required disk files, the virtual machine cannot reopen the file and resume running.

Resolution:

+++++++++++

If the file is no longer locked, try to power on the virtual machine again. This should succeed. To determine the cause of the previously locked file, review the VMkernel, hostd, and vpxa log files and attempt to determine:

  • When the hostd and vpxa management agents open VMDK descriptor files, they log messages similar to:info ‘DiskLib’] DISKLIB-VMFS : “/vmfs/volumes/UUID/VMName/Test-000001.vmdk” : open successful (21) size = 32227695616, hd = 0. Type 8
    info ‘DiskLib’] DISKLIB-VMFS : “/vmfs/volumes/UUID/VMName/Test-000001.vmdk” : closed.
  • When the VMkernel attempts to open a locked file, it reports:31:16:46:55.498 cpu7:8715)FS3: 2928: [Requested mode: 2] Lock [type 10c00001 offset 11401216 v 2035, hb offset 3178496
    gen 26643, mode 1, owner 4ca72d14-84dc8dd4-0da3-0017a4770038 mtime 2213195] is not free on volume ‘norr_prod_vmfs_data08’
  • The file may have been locked by third-party software running on an ESXi/ESX host or externally. Review the logs of any third-party software that may have acted on the virtual machine’s VMDK files at the time.

Situation 1:

++++++++

Error : Failed to get exclusive lock on the configuration file, another VM process could be running, using this configuration file

Solution : This issue may occur if there is a lack of disk space on the root drive. The ESX host is unable start a virtual machine because there is insufficient disk space to commit changes.

Situation 2:

++++++++

Error : Failed to lock the file when creating a snapshot

Solution :

To work around this issue in ESX or earlier ESXi releases, Use the vmkfstools -D command to identify the MAC address of the machine locking the file, then reboot or power off the machine that owns that MAC address to release the lock.

Notes: 

  • If the vmkfstools -D test-000001-delta.vmdk command does not return a a valid MAC address in the top field (returns all zeros), review the RO Owner line below it to see which MAC address owns the read only/multi writer lock on the file.
  • In some cases, it may be a Service Console-based lock, an NFS lock, or a lock generated by another system or product that can use or read the VMFS file systems. The file is locked by a VMkernel child or cartel world and the offending host running the process/world must be rebooted to clear it.
  • After you have identified the host or backup tool (machine that owns the MAC) locking the file, power it off or stop the responsible service and then restart the management agents on the host running the virtual machine to release the lock.

Situation 3:

+++++++++

Error : Failed to add disk scsi0:1. Failed to power on scsi0:1

To prevent concurrent changes to critical virtual machine files and file systems, ESXi/ESX hosts establish locks on these files. In certain circumstances, these locks may not be released when the virtual machine is powered off. The files cannot be accessed by the servers while locked, and the virtual machine is unable to power on.

These virtual machine files are locked during runtime:

  • VMNAME.vswp
  • DISKNAME-flat.vmdk
  • DISKNAME-ITERATION-delta.vmdk
  • VMNAME.vmx
  • VMNAME.vmxf
  • vmware.log

>> There is a manual procedure to locate the host and virtual machine holding locks.

To work around this issue, run the vmfsfilelockinfo script from the host experiencing difficulties with one or more locked files:

  1. To find out the IP address of the host holding the lock, run the /bin/vmfsfilelockinfo Python script. The script takes these parameters:
    • File being tested
    • Username and password for accessing VMware vCenter Server (when tracing MAC address to ESX host.)For example:

      Run this command:

      ~ # vmfsfilelockinfo -p /vmfs/volumes/iscsi-lefthand-2/VM1/VM1_1-000001-delta.vmdk -v 192.168.1.10 -uadministrator@vsphere.local

      You see ouput similar to:

      vmfsflelockinfo Version 1.0
      Looking for lock owners on “VM1_1-000001-delta.vmdk”
      “VM1_1-000001-delta.vmdk” is locked in Exclusive mode by host having mac address [‘xx:xx:xx:xx:xx:xx’]
      Trying to make use of Fault Domain Manager
      ———————————————————————-
      Found 0 ESX hosts using Fault Domain Manager.
      ———————————————————————-
      Could not get information from Fault domain manager
      Connecting to 192.168.1.10 with user administrator@vsphere.local
      Password: xXxXxXxXxXx
      ———————————————————————-
      Found 3 ESX hosts from Virtual Center Server.
      ———————————————————————-
      Searching on Host 192.168.1.178
      Searching on Host 192.168.1.179
      Searching on Host 192.168.1.180
      MAC Address : xx:xx:xx:xx:xx:xx

      Host owning the lock on the vmdk is 192.168.1.180, lockMode : Exclusive

      Total time taken : 0.27 seconds.

      Note: During the life-cycle of a powered on virtual machine, several of its files transitions between various legitimate lock states. The lock state mode indicates the type of lock that is on the file. The list of lock modes is:

    • mode 0 = no lock
    • mode 1 = is an exclusive lock (vmx file of a powered on virtual machine, the currently used disk (flat or delta), *vswp, and so on.)
    • mode 2 = is a read-only lock (For example on the ..-flat.vmdk of a running virtual machine with snapshots)
    • mode 3 = is a multi-writer lock (For example used for MSCS clusters disks or FT VMs)
  2. To get the name of the process holding the lock, run the lsof command on the host holding the lock and filter the output for the file name in question:~ # lsof | egrep ‘Cartel|VM1_1-000001-delta.vmdk’

    You see output similar to:

    Cartel | World name | Type | fd | Description
    36202 vmx FILE 80 /vmfs/volumes/556ce175-7f7bed3f-eb72-000c2998c47d/VM1/VM1_1-000001-delta.vmdk

    This shows that the file is locked by a virtual machine having Cartel ID 36202. Now display the list of active Cartel IDs by executing this command:

    ~ # esxcli vm process list

    This displays information for active virtual machines grouped by virtual machine name and having a format similar to:

    Alternate_VM27
    World ID: 36205
    Process ID: 0
    VMX Cartel ID: 36202
    UUID: 56 4d bd a1 1d 10 98 0f-c1 41 85 ea a9 dc 9f bf
    Display Name: Alternate_VM27
    Config File: /vmfs/volumes/556ce175-7f7bed3f-eb72-000c2998c47d/Alternate_VM27/Alternate_VM27.vmx
    Alternate_VM20
    World ID: 36207
    Process ID: 0
    VMX Cartel ID: 36206
    UUID: 56 4d bd a1 1d 10 98 0f-c1 41 85 ea a5 dc 94 5f
    Display Name: Alternate_VM20
    Config File: /vmfs/volumes/556ce175-7f7bed3f-eb72-000c2998c47d/Alternate_VM20/Alternate_VM20.vmx

    The virtual machine entry having VMX Cartel ID 36202 shows the display name of the virtual machine holding the lock on file VM1_1-000001-delta.vmdk, which in this example, is Alternate_VM27.

  3. Shut down the virtual machine holding the lock to release the lock.

Related Information

This script performs these actions in this sequence:

  1. Identifies locked state Exclusive, Read-Only, not locked.
  2. Identifies MAC address of locking host [‘xx:xx:xx:xx:xx:xx’].
  3. Queries the Fault Domain Manager (HA) for information on discovered MAC address.
  4. Queries vCenter Server for information on discovered MAC address.
  5. Outputs final status.
    For example:

Host owning the lock on the vmdk is 192.168.1.180, lockMode : Exclusive.

  • The script outputs total execution time when it terminates.

 

Notes:

  • The script does not attempt to break/remove locks. The script only identifies the potential ESX host which holds the lock.
  • If not run with vCenter Server username and password, it prompts for the same, after querying the Fault Domain Manager.
  • This script works on a single file parameter, without wildcards. If multiple queries are required, you must execute the script repeatedly in a wrapper script.

For further clarifications please follow : https://kb.vmware.com/s/article/10051

APD and PDL

Permanent Device Loss (PDL):

  • A datastore is shown as unavailable in the Storage view
  • A storage adapter indicates the Operational State of the device as Lost Communication

All-Paths-Down (APD):

  • A datastore is shown as unavailable in the Storage view.
  • A storage adapter indicates the Operational State of the device as Dead or Error.

PDL:

In vSphere 4.x, an All-Paths-Down (APD) situation occurs when all paths to a device are down. As there is no indication whether this is a permanent or temporary device loss, the ESXi host keeps reattempting to establish connectivity. APD-style situations commonly occur when the LUN is incorrectly unpresented from the ESXi/ESX host. The ESXi/ESX host, still believing the device is available, retries all SCSI commands indefinitely. This has an impact on the management agents, as their commands are not responded to until the device is again accessible. This causes the ESXi/ESX host to become inaccessible/not-responding in vCenter Server.

In vSphere 5.x/6.x, a clear distinction has been made between a device that is permanently lost (PDL) and a transient issue where all paths are down (APD) for an unknown reason.

For example, in the VMkernel logs, if a SCSI sense code of H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0 or Logical Unit Not Supported is logged by the storage device to the ESXi 5.x/6.x host, this indicates that the device is permanently inaccessible to the ESXi host, or is in a Permanent Device Loss (PDL) state. The ESXi host no longer attempts to re-establish connectivity or issue commands to the device.

Devices that suffer a non-recoverable hardware error are also recognized as being in a Permanent Device Loss (PDL) state.

Note: Some iSCSI arrays map LUN-to-Target as a one-to-one relationship. That is, there is only ever a single LUN per Target. In this case, the iSCSI arrays do not return the appropriate SCSI sense code, so PDL on these arrays types cannot be detected. However, in ESXi 5.1, enhancements have been made and now the iSCSI initiator attempts to re-login to the target after a dropped session. If the device is not accessible, the storage system rejects the host’s effort to access the storage. Depending on the response from the array, the host can now mark the device as PDL.

Vmkernel.log

++++++++++

2018-01-09T12:42:09.365Z cpu0:32888)ScsiDevice: 6878: Device naa.xxxxxxxxxxxxxxxxxxxxx APD Notify PERM LOSS; token num:1

2018-01-09T12:42:09.366Z cpu1:32916)StorageApdHandler: 1066: Freeing APD handle 0x430180b88880 [naa.xxxxxxxxxxxxxxxxxxxxx]

2018-01-09T12:49:01.260Z cpu1:32786)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0xc1) PDL error (0x5/0x25/0x0) – path vmhba33:C0:T3:L0 device naa.xxxxxxxxxxxxxxxxxxxxx – triggering path evaluation

2018-01-09T12:49:01.260Z cpu1:32786)ScsiDeviceIO: 2651: Cmd(0x439d802ec580) 0xfe, CmdSN 0x4b7 from world 32776 to dev “naa.xxxxxxxxxxxxxxxxxxxxx” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.

2018-01-09T12:49:01.300Z cpu0:40210)WARNING: NMP: vmk_NmpSatpIssueTUR:1043: Device naa.xxxxxxxxxxxxxxxxxxxxx path vmhba33:C0:T3:L0 has been unmapped from the array

After some time passes you will see this message:

2018-01-09T13:13:11.942Z cpu0:32872)ScsiDevice: 1718: Permanently inaccessible device :naa.xxxxxxxxxxxxxxxxxxxxx has no more open connections. It is now safe to unmount datastores (if any) and delete the device.

In this case the lun was unmapped from the array for this host and that is not a transient issue. Sens data 0x5 0x25 0x0 corresponds to “LOGICAL UNIT NOT SUPPORTED” which indicates the device is in Permanent Device Loss (PDL) state. Once ESXi knows the device is in PDL state it does not wait for the device to return back.

ESXi only checks ASC/ASCQ and if it happens to be 0x25/0x0 or  0x68/0x0, it marks device as PDL.

 

All-Paths-Down (APD)

If PDL SCSI sense codes are not returned from a device (when unable to contact the storage array, or with a storage array that does not return the supported PDL SCSI codes), then the device is in an All-Paths-Down (APD) state, and the ESXi host continues to send I/O requests until the host receives a response.

As the ESXi host is not able to determine if the device loss is permanent (PDL) or transient (APD), it indefinitely retries SCSI I/O, including:

  • Userworld I/O (hostd management agent)
  • Virtual machine guest I/ONote: If an I/O request is issued from a guest, the operating system should timeout and abort the I/O.

Due to the nature of an APD situation, there is no clean way to recover.

  • The APD situation needs to be resolved at the storage array/fabric layer to restore connectivity to the host.
  • All affected ESXi hosts may require a reboot to remove any residual references to the affected devices that are in an APD state.

Note:

  • Performing a vMotion migration of unaffected virtual machines is not possible, as the management agents may be affected by the APD condition, and the ESXi host may become unmanaged. As a result, a reboot of an affected ESXi host forces an outage to all non-affected virtual machines on that host.
  • vSphere 6.0 and later have a powerful new feature as part of vSphere HA called VM Component Protection (VMCP). VMCP protects virtual machines from storage related events, specifically Permanent Device Loss (PDL) and All Paths Down (APD) incidents.

vmkernel log:

++++++++++

2018-01-10T13:04:26.803Z cpu1:32896)StorageApdHandlerEv: 110: Device or filesystem with identifier [naa.xxxxxxxxxxxxxxxxxxxxx] has entered the All Paths Down state.

2018-01-10T13:04:26.818Z cpu0:32896)StorageApdHandlerEv: 110: Device or filesystem with identifier [naa.xxxxxxxxxxxxxxxxxxxxx] has entered the All Paths Down state.

vobd log:

+++++++

2018-01-10T13:04:26.905Z: [scsiCorrelator] 475204262us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage devicenaa.xxxxxxxxxxxxxxxxxxxxx. Path vmhba33:C0:T1:L0 is down. Affected datastores: “Green”.

2018-01-10T13:04:26.905Z: [scsiCorrelator] 475204695us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage devicenaa.xxxxxxxxxxxxxxxxxxxxx. Path vmhba33:C0:T0:L0 is down. Affected datastores: “Grey”.

To clean up an unplanned PDL:

  1. All running virtual machines from the datastore must be powered off and unregistered from the vCenter Server.
  2. From the vSphere Client, go to the Configuration tab of the ESXi host, and click Storage.
  3. Right-click the datastore being removed, and click Unmount.
    The Confirm Datastore Unmount window displays. When the prerequisite criteria have been passed, the OK button appears.
    If you see this error when unmounting the LUN:Call datastore refresh for object <name_of_LUN> on vCenter server <name_of_vCenter> failed

    You may have a snapshot LUN presented. To resolve this issue, remove that snapshot LUN on the array side.

  4. Perform a rescan on all of the ESXi hosts that had visibility to the LUN.Note: If there are active references to the device or pending I/O, the ESXi host still lists the device after the rescan. Check for virtual machines, templates, ISO images, floppy images, and raw device mappings which may still have an active reference to the device or datastore.
  5. If the LUN is still being used and available again, go to each host, right-click the LUN, and click Mount.Note: One possible cause for an unplanned PDL is that the LUN ran out space causing it to become inaccessible.