Power shell script to create and clone 100 VMs from one storage and move send 10000 Iops on each VM

# Connect to the vCenter Server
Connect-VIServer -Server <vCenterServer> -User <username> -Password <password>

# Specify the source VM and storage
$sourceVM = "SourceVM"
$sourceDatastore = "SourceDatastore"

# Specify the destination datastore
$destinationDatastore = "DestinationDatastore"

# Specify the number of VMs to create and clone
$numberOfVMs = 100

# Specify the number of IOPS to send on each VM
$iops = 10000

# Loop through and create/cloning VMs
for ($i = 1; $i -le $numberOfVMs; $i++) {
    $newVMName = "VM$i"

    # Clone the VM from the source to the destination datastore
    $cloneSpec = New-Object VMware.Vim.VirtualMachineCloneSpec
    $cloneSpec.Location = New-Object VMware.Vim.VirtualMachineRelocateSpec
    $cloneSpec.Location.Datastore = Get-Datastore -Name $destinationDatastore
    $cloneSpec.PowerOn = $false

    $sourceVMObj = Get-VM -Name $sourceVM
    New-VM -Name $newVMName -VM $sourceVMObj -Location (Get-Folder) -Datastore (Get-Datastore -Name $destinationDatastore) -CloneSpec $cloneSpec

    # Power on the newly created VM
    Start-VM -VM $newVMName

    # Send IOPS on the VM
    $vm = Get-VM -Name $newVMName
    $disk = $vm | Get-HardDisk
    $disk | Set-HardDisk -Iops $iops
}

# Disconnect from the vCenter Server
Disconnect-VIServer -Server <vCenterServer> -Confirm:$false

Make sure to replace “, “, “, “, “, “ with your actual vCenter Server details and VM/storage names. Please note that this script assumes you have the VMware PowerCLI module installed and properly configured.

Reclaiming Space with SCSI Unmap

vSAN 6.7 Update 1 and later supports SCSI UNMAP commands that enable you to reclaim storage space that is mapped to a deleted vSAN object.

Deleting or removing files frees space within the file system. This free space is mapped to a storage device until the file system releases or unmaps it. vSAN supports reclamation of free space, which is also called the unmap operation. You can free storage space in the vSAN datastore when you delete or migrate a VM, consolidate a snapshot, and so on.

Reclaiming storage space can provide higher host-to-flash I/O throughput and improve flash endurance.

vSAN also supports the SCSI UNMAP commands issued directly from a guest operating system to reclaim storage space. vSAN supports offline unmaps as well as inline unmaps. On Linux OS, offline unmaps are performed with the fstrim(8) command, and inline unmaps are performed when the mount -o discard command is used. On Windows OS, NTFS performs inline unmaps by default.

Unmap capability is disabled by default. To enable unmap on a vSAN cluster, use the following RVC command: vsan.unmap_support –enable

When you enable unmap on a vSAN cluster, you must power off and then power on all VMs. VMs must use virtual hardware version 13 or above to perform unmap operations.

Force Unmount temporary datastore used for vSAN traces from vSAN cluster ESXi hosts

>> Disable vsantraced startup by running this command:

chkconfig vsantraced off

>> Stop the vsantraced service by running this command:

/etc/init.d/vsantraced stop

>> Change the syslog to point to the vSAN datastore .
>> Delete any coredump files that are present after checking that they are not required.
>> Sub Step to direct it to syslog :

If not planned or incorrectly configured, vSAN trace-level messages may be:
Taking up a lot of space on ESXi hosts running from a RAM disk
Written to non persistent storage

By default, vSAN traces are saved to /var/log/vsantraces. Default maximum file size is 180MB with rotation of 8 files.

By default, vSAN urgent traces are redirected through the ESXi syslog system. If an external syslog server is defined, the urgent traces are forwarded to the external collector.

Run this command to determine whether vSAN urgent traces are currently configured to                              redirect through syslog and log rotation settings:
# esxcli vsan trace get
You see output similar to:

VSAN Traces Directory: /vmfs/volumes/568ec568-06d68562-e655-001018ed2950/scratch/vsantraces
Number Of Files To Rotate: 8
Maximum Trace File Size: 180 MB
Log Urgent Traces To Syslog: true

Run this command to send urgent traces through syslog

# esxcli vsan trace set –logtosyslog true

To change the default settings, run with the desired parameter:

# esxcli vsan trace set

-l|–logtosyslog Boolean value to enable or disable logging urgent traces to syslog.
-f|–numfiles=<long> Log file rotation for vSAN trace files.
-p|–path=<str> Path to store vSAN trace files.
-r|–reset When set to true, reset defaults for vSAN trace files.
-s|–size=<long> Maximum size of vSAN trace files in MB.

For example, to reduce the number of files to rotate to 4 and maximum size to which these files can grow 200MB, run this command:

# esxcli vsan trace set -f 4 -s 200

Note: If you reduce the number of files, the older files that are not compliant are removed immediately.

>> Reboot the ESXi host.
>> Unmount the datastore.

Using RAID 5 or RAID 6 Erasure Coding in VSAN

You can use RAID 5 or RAID 6 erasure coding to protect against data loss and increase storage efficiency.
Erasure coding can provide the same level of data protection as mirroring (RAID 1), while using less storage capacity.

RAID 5 or RAID 6 erasure coding enables vSAN to tolerate the failure of up to two capacity devices in the datastore. You can configure RAID 5 on all-flash clusters with four or more fault domains. You can configure RAID 5 or RAID 6 on all-flash clusters with six or more fault domains.

RAID 5 or RAID 6 erasure coding requires less additional capacity to protect your data than RAID 1 mirroring.
For example, a VM protected by a Primary level of failures to tolerate value of 1 with RAID 1 requires twice the virtual disk size, but with RAID 5 it requires 1.33 times the virtual disk size.
The following table shows a general comparison between RAID 1 and RAID 5 or RAID 6.

Capacity Required to Store and Protect Data at Different RAID Levels:

RAID Configuration          Primary level of Failures to Tolerate     Data Size              Capacity Required
RAID 1 (mirroring)                                                  1                                100 GB                    200 GB
RAID 5 or RAID 6 (erasure coding) with four fault domains 1        100 GB                    133 GB
RAID 1 (mirroring)                                                   2                               100 GB                    300 GB
RAID 5 or RAID 6 (erasure coding) with six fault domains 2          100 GB                     150 GB

RAID 5 or RAID 6 erasure coding is a policy attribute that you can apply to virtual machine components. To use RAID 5, set Failure tolerance method to RAID-5/6 (Erasure Coding) – Capacity and Primary level of failures to tolerate to 1. To use RAID 6, set Failure tolerance method to RAID-5/6 (Erasure Coding) – Capacity and Primary level of failures to tolerate to 2. RAID 5 or RAID 6 erasure coding does not support a Primary level of failures to tolerate value of 3. To use RAID 1, set Failure tolerance method to RAID-1 (Mirroring) – Performance. RAID 1 mirroring requires fewer I/O operations to the storage devices, so it can provide better performance. For example, a cluster resynchronization takes less time to complete with RAID 1.

Note :: In a vSAN stretched cluster, the Failure tolerance method of RAID-5/6 (Erasure Coding) Capacity applies only to the Secondary level of failures to tolerate.

RAID 5 or RAID 6 Design Considerations :

Consider these guidelines when you configure RAID 5 or RAID 6 erasure coding in a vSAN cluster.
>> RAID 5 or RAID 6 erasure coding is available only on all-flash disk groups.
>> On-disk format version 3.0 or later is required to support RAID 5 or RAID 6.
>> You must have a valid license to enable RAID 5/6 on a cluster.
>> You can achieve additional space savings by enabling deduplication and compression on the vSAN cluster.

vSAN SDKs

The vSAN Management SDKs bundle language bindings for accessing the vSAN Management API and creating client applications for automating vSAN management tasks.

The vSAN Management API The vSAN Management API is an extension of the vSphere API. Both vCenter Server and ESXi hosts expose the vSAN Management API. You can use the vSAN Management API to implement the client applications that perform the following tasks:

>>Configure a vSAN cluster – Configure all aspects of a vSAN cluster, such as set up VMkernel networking, claim disks, configure fault domains, enable the deduplication and compression of all flash clusters, and assign the vSAN license.

>>Configure a vSAN stretched cluster – Deploy the vSAN Witness Appliance and configure a vSAN stretched cluster.

>>Upgrade the vSAN on-disk format.

>>Track the vSAN performance.

>>Monitor the vSAN health.

The vSAN Management SDKs are separated into five different programming languages, Java, .NET, Python, Perl, and Ruby.

Each of the five vSAN Management SDKs depends on the vSphere SDK with similar functionality delivered for the corresponding programming language.

You can download these vSphere SDKs from https://code.vmware.com/home or from Github.

1:vSAN Management SDK for Java

2:vSAN Management SDK for .NET

3:vSAN Management SDK for Python

4:vSAN Management SDK for Perl

5:vSAN Management SDK for Ruby

1: Running the Sample Applications The vSAN Management SDK for Java includes sample applications, build and run scripts, and dependent libraries. They are located under the samplecode directory in the SDK. You can use the sample code to get vSAN managed objects on vCenter Server or ESXi hosts.

Before running the sample applications, make sure that you have the vSphere Web Services SDK on your development environment, with the following directory structure:

VMware-vSphere-SDK–build

           SDK

                  vsphere-ws

Then copy the vsan-sdk-java directory at the same level as the vsphere-vs directory in the vSphere Web Services SDK:

VMware-vSphere-SDK–build

         SDK

                vsphere-ws

                vsan-sdk-java

Build the sample applications by running the build.py command. Run the sample applications using the run.sh script on Linux, or the run.bat script on Windows:

./run.sh com.vmware.vsan.samples. <Sample_name>

      –url https://vcenter/host_address/sdk

      –username <username>

      –password  <password>

2:vSAN Management SDK for .NET

The vSAN Management SDK for .NET provides libraries, sample code, and API reference for developing custom .NET clients against the vSAN Management API. The vSAN Management SDK for .NET depends on the vSphere Web Services SDK of similar level. You use the vSphere Web Services SDK for logging in to vCenter Server and for retrieving vCenter Server managed objects.

Building the vSAN C# DLL

You must have the following components to build the vSAN C# DLL:

>> csc.exe. A C# compiler

>> sgen.exe. An XML serializer generator tool

>> wsdl.exe. Web Service Description Language 4.0 for Microsoft .NET

>> Microsoft.Web.Services3.dll

>> .NET Framework 4.0 n Python 2.7.6

To build the vSAN C# DLL, run the following command:

$ python builder.py vsan_wsdl vsanservice_wsdl

This command generates the following DLL files:

>> VsanhealthService.dll

>> VsanhealthService.XmlSerializers.dll

Running the Sample Applications To run the sample applications, run the following command:

.\VsanHealth.exe –username  hostname

–url https://vcneter_name/sdk

–hostName cluster name —ignorecert –disablesso

To view information about the parameters, use –help.

For further references please follow : https://code.vmware.com/web/sdk/6.7U1/vsan-python & https://code.vmware.com/apis/444/vsan

Guest and HA Application Monitoring SDK Programming

You can download the Guest SDK for monitoring guest virtual machine statistics, and with facilities for High Availability (HA) Application Monitoring. The SDK version number is 10.2.

HA Application Monitoring. The vSphere High Availability (HA) feature for ESXi hosts in a cluster provides protection for a guest OS and applications running in a virtual machine by restarting the virtual machine if a failure occurs. Using the HA Application Monitoring APIs, developers can write software to monitor guest OS and process heartbeat.

Guest SDK. The vSphere Guest SDK provides read-only APIs for monitoring various virtual machine statistics. Management agents running in the guest OS of a virtual machine can use this data for reacting to changes in the application layer.

Compatibility Notices

HA Application Monitoring applications must be recompiled to work with vSphere 6.0 because of changes to the communication interface (see below).

For vSphere 6.0, HA Application Monitoring communication was revised to use the VMCI (virtul machine communication interface). The VMCI driver is preinstalled in Linux kernel 3.9 and higher, and in earlier kernel versions can be installed with VMware Tools. On Windows, VMware Tools must be installed to obtain the VMCI driver.

This SDK supports C and C++ programming languages. You can support Java with wrapper classes, as in JNI.

Changes and New Features

The checksystem utility to verify proper glib version was added in the vSphere 6.5 release.

Tools for fetching extended guest statistics were added in vSphere 6.0, but not publicly documented until April 2015.

In the vSphere 6.0 release, high availability VM component protection, and FT (fault tolerance) has been extended for symmetric multiprocessing (SMP). Also, the communication interface was changed to use VMCI.

In the vSphere 5.5 release, the HA application monitoring facility was changed to reset the guest virtual machine if the application monitoring program requested a reset. Before HA application monitoring had to determine when the guest stopped sending a heartbeat.

In vSphere 5.1, HA Application Monitoring facilities were merged into the Guest SDK previously available.

Known Issues and Workarounds

Security enforcement for the Guest and HA application monitoring SDK using the secure authentication VMX parameter guest_rpc.rpci.auth.app.APP_MONITOR=TRUE does not work for FT (fault tolerant) VMs. The vSphere platform supports only the non-secure channel for FT virtual machines.

Displaying vSphere Guest Library Statistics :

On a Linux virtual machine hosted by ESXi, go to the include directory and compile the vmGuestLibTest.c program. Run the output program vmguestlibtest. gcc -g -o vmguestlibtest -ldl vmGuestLibTest.c ./vmguestlibtest Guest statistics appear repeatedly until you interrupt the program.

Controlling the Application Monitoring Heartbeat :

To run HA application monitoring programs, the virtual machine must be running on an ESXi host, and application monitoring must have been enabled when configuring HA.

You can enable heartbeats with the compiled vmware-appmonitor program.

Usage is as follows: vmware-appmonitor { enable | disable | markActive | isEnabled | getAppStatus | postAppState }

>>enable – Enable application heartbeat so vSphere HA starts listening and monitoring the heartbeat count from this guest virtual machine. The heartbeats should be sent at least once every 30 seconds.

>>disable – Disable the application heartbeat so vSphere HA stops listening to heartbeats from this guest.

>>markActive – This starts sending the actual heartbeat every 30 seconds or less.

>>isEnabled – Indicates whether the heartbeat monitoring was enabled.

>>getAppStatus – Gets the status of the application, either Green, Red, or Gray.
>>postAppState – Posts the state of the application. Arguments can be:

>>appStateOk – Sends an “Application State is OK” signal to the HA agent running on the host.

>>appStateNeedReset – Sends an “Immediate Reset” signal to the HA agent running on the host.

Compiling the Sample Program on Linux:

You need a C compiler and the make program.

Procedure

1: Go to the docs/VMGuestAppMonitor/samples/C directory.

2: Run the make command. On a 64-bit machine you might want to change lib32 to lib64 in the makefile.

3: Set LD_LIBRARY_PATH as described above.

4: Run the sample program. See below for program usage. ./sample

Compiling Sample Programs on Windows :

You need Visual Studio 2008 or later.

Procedure

1: Go to the docs/VMGuestAppMonitor/samples/visualstudio folder.

2: Open the appmon.vcproj file and build the solution.

3: Click Debug > Start Debugging to run appmon.exe. See below for program usage

Demonstrating the HA Application Monitoring API The sample program enables HA application monitoring and sends a heartbeat every 15 seconds. After the program starts running, typing Ctrl+C displays three choices:

s – stop sending heartbeats and exit the program. The virtual machine will reset.

d – disable application monitoring and exit the program. This does not cause a reset.

c – continue sending heartbeats.

For further references please check : https://code.vmware.com/web/sdk/6.7/vsphere-guest

VDDK 6.7.1

The Virtual Disk Development Kit (VDDK) 6.7.1 is an update to support vSphere 6.7 Update 1 and to resolve issues discovered in previous releases. VDDK 6.7 added support for ESXi 6.7 and vCenter Server 6.7, and was tested for backward compatibility against vSphere 6.0 and 6.5.

VDDK is used with vSphere Storage APIs for Data Protection (VADP) to develop backup and restore software. For general information about this development kit, how to obtain the software, programming details, and redistribution, see the VDDK landing page on VMware {Code}.

The VMware policy concerning backward and forward compatibility is for VDDK to support N-2 and N+1 releases. In other words, VDDK 6.7 and all its update releases support vSphere 6.0, 6.5 (except for new features) and the next major release.

Changes and New Features

The VixMntapi library on Linux systems now supports:

  • Advanced transports modes: HotAdd, SAN, and NBD/NBDSSL.
  • Read-only mounting of VMDK files.
  • Diagnostic logging as set by vixMntapi.cfg.LogLevel in the VDDK configuration file. Levels are the same as for vixDiskLib.transport – Panic, Error, Warning, Audit, Info, Verbose, Trivia. The output file named vixMntapi.log appears in the same directory as other log files. Not available for Windows.

In addition to those previously qualified for use as a backup proxy, the following operating systems were tested with VDDK 6.7.1:

  • Red Hat Enterprise Linux RHEL 6.9
  • CentOS 7.4
  • SUSE Linux Enterprise Server SLES 15
  • Windows Server 2016 version 1803

Compatibility Notices

In earlier releases it was an error to close parentHandle after VixDiskLib_Attach succeeds. The VDDK library now marks parentHandle internally to prevent closure and ensure cleanup. Proper calling sequences are as follows:

  1. First open a disk for attach with this call:
    VixDiskLib_Open(remoteConnection, virtualDiskPath, flags, &parentHandle);
  2. Create a local connection using: VixDiskLib_Connect(NULL, &localConnection);
  3. With the backed-up disk (referred to as parent disk) still open, make this call, creating the child disk with a unique name: VixDiskLib_CreateChild(parentHandle, “C:\tmp.vmdk”, VIXDISKLIB_DISK_MONOLITHIC_SPARSE, NULL, NULL);
  4. Open tmp.vmdk (referred to as the redo log): VixDiskLib_Open(localConnection, “C:\tmp.vmdk”, VIXDISKLIB_FLAG_OPEN_SINGLE_LINK, &redoHandle);
  5. Attach the redo log to its parent disk with: VixDiskLib_Attach(parentHandle, redoHandle);

If VixDiskLib_Attach fails, now the system automatically cleans up the local file handle.

  1. To end, close the redo log. Whether to close the parent disk handle is release dependent:
    VixDiskLib_Close(redoHandle);
    if (VIXDISKLIB_VERSION_MAJOR > 7) {
    VixDiskLib_Close(parentHandle); // to avoid memory leaks
    }
  2. Unlink the redo log from the parent disk.

Recently Resolved Issues

The VDDK 6.7.1 release resolves the following issues.

  • XML library upgraded.We XML library libxml2 was upgraded from version 2.9.6 to version 2.9.8 because of a known security vulnerability.
  • Open SSL library upgraded.The Open SSL library openssl was upgraded from an earlier version to version 1.0.2p because of a known security vulnerability.
  • NBD transport in VDDK 6.7 is slow when running against vSphere 6.5.When data protection software is compiled with VDDK 6.7 libraries, NBD/NBDSSL mode backup and restore is significantly slower than before on vSphere 6.5 or 6.0. This was caused by dropping the OPEN_BUFFEREDflag when it became the default in VDDK 6.7. This backward compatibility issue is fixed in the VDDK 6.7.1 release. When performance is important, VMware recommends use of NBD Asynchronous I/O, calling VixDiskLib_WriteAsync and VixDiskLib_Wait.
  • With HotAdd transport VDDK could hang after many NFC connections.For programs compiled with VDDK 6.7 or 6.7 EP1 libraries, VDDK may eventually hang in VixDiskLib_Open when building server connections to ESXi. After the log entry “NBD_ClientOpen: attempting to create connection” VDDK hangs. The cause is that after HotAdd mode retrieves the disk signature, it fails to close the NFC connection, so many NFC server threads continue running and prevent new NFC connections. This regression is fixed in the VDDK 6.7.1 release.
  • HotAdd backup of VM template crashed if NoNfcSession was enabled.In VDDK 6.5.1 and later when it became available to avoid creating an NFC session for backup in cloud environments where local connections are disallowed, if vixDiskLib.transport.hotadd.NoNfcSession=1was set in the proxy’s VDDK configuration file, HotAdd mode crashed due to null pointer access of an attribute in the VM template object.
  • VixMntapi on Linux did not work with advanced transport mode.VDDK partners use VixDiskLib for block-oriented image backup and restore, while they use VixMntapi for file-oriented backup and restore. The Windows implementation of VixMntapi has supported advanced transports for many releases, but the Linux implementation of VixMntapi did not – it supported only NBD mode. In the VDDK 6.7.1 release, VixMntapi supports HotAdd or SAN transport and NBD/NBDSSL on Linux, so it can be used in VMC environments for file-oriented backup and restore of Linux VMs.
  • VDDK hangs during restore when directly connecting to ESXi hosts.When doing restore with direct ESXi connections, VDDK may hang intermittently. The cause is that NfcServer on ESXi enters the wrong state, waiting for new messages that never arrive. The fix for NfcServer was to avoid waiting when no data remains. To resolve this issue, customers must upgrade ESXi hosts to 6.7 U1 or later.
  • VixMntapi on Linux could not open files as read-only.In previous releases, opening files read-only was not supported by VixMntapi. When read-only mode is requested at open time, the file is opened read/write. In this release, VixMntapi actually opens files as read-only on Linux VMs.
  • HotAdd proxy failed with Windows Server backups.If there was SATA controller in the Windows backup proxy, HotAdd mode did not work. The cause was that VDDK did not rescan SATA controllers after HotAdding, so if there existed multiple SATA controllers or ACHI controllers, VDDK sometimes used the wrong controller ID and could not find the HotAdded disk. Disk open failed, resulting in “HotAdd ManagerLoop caught an exception” and “Error 13 (You do not have access rights to this file)” messages. The workaround was to remove the SATA controller from the Windows backup proxy. The issue is fixed in this release so the workaround is no longer needed (https://kb.vmware.com/s/article/2151091)

For further reference please check : https://code.vmware.com/web/sdk/6.7/vddk

Failed to lock the file

  • Powering on a virtual machine fails.
  • Unable to power on a virtual machine.
  • Adding an existing virtual machine disk (VMDK) to a virtual machine that is already powered on fails.

You see the error:

Cannot open the disk ‘/vmfs/volumes/UUID/VMName/Test-000001.vmdk’ or one of the snapshot disks it depends on. Reason: Failed to lock the file.

Cause:

+++++++

This issue occurs when one of the files required by the virtual machine has been opened by another application.

During a Create or Delete Snapshot operation while a virtual machine is running, all the disk files are momentarily closed and reopened. During this window, the files could be opened by another virtual machine, management process, or third-party utility. If that application creates and maintains a lock on the required disk files, the virtual machine cannot reopen the file and resume running.

Resolution:

+++++++++++

If the file is no longer locked, try to power on the virtual machine again. This should succeed. To determine the cause of the previously locked file, review the VMkernel, hostd, and vpxa log files and attempt to determine:

  • When the hostd and vpxa management agents open VMDK descriptor files, they log messages similar to:info ‘DiskLib’] DISKLIB-VMFS : “/vmfs/volumes/UUID/VMName/Test-000001.vmdk” : open successful (21) size = 32227695616, hd = 0. Type 8
    info ‘DiskLib’] DISKLIB-VMFS : “/vmfs/volumes/UUID/VMName/Test-000001.vmdk” : closed.
  • When the VMkernel attempts to open a locked file, it reports:31:16:46:55.498 cpu7:8715)FS3: 2928: [Requested mode: 2] Lock [type 10c00001 offset 11401216 v 2035, hb offset 3178496
    gen 26643, mode 1, owner 4ca72d14-84dc8dd4-0da3-0017a4770038 mtime 2213195] is not free on volume ‘norr_prod_vmfs_data08’
  • The file may have been locked by third-party software running on an ESXi/ESX host or externally. Review the logs of any third-party software that may have acted on the virtual machine’s VMDK files at the time.

Situation 1:

++++++++

Error : Failed to get exclusive lock on the configuration file, another VM process could be running, using this configuration file

Solution : This issue may occur if there is a lack of disk space on the root drive. The ESX host is unable start a virtual machine because there is insufficient disk space to commit changes.

Situation 2:

++++++++

Error : Failed to lock the file when creating a snapshot

Solution :

To work around this issue in ESX or earlier ESXi releases, Use the vmkfstools -D command to identify the MAC address of the machine locking the file, then reboot or power off the machine that owns that MAC address to release the lock.

Notes: 

  • If the vmkfstools -D test-000001-delta.vmdk command does not return a a valid MAC address in the top field (returns all zeros), review the RO Owner line below it to see which MAC address owns the read only/multi writer lock on the file.
  • In some cases, it may be a Service Console-based lock, an NFS lock, or a lock generated by another system or product that can use or read the VMFS file systems. The file is locked by a VMkernel child or cartel world and the offending host running the process/world must be rebooted to clear it.
  • After you have identified the host or backup tool (machine that owns the MAC) locking the file, power it off or stop the responsible service and then restart the management agents on the host running the virtual machine to release the lock.

Situation 3:

+++++++++

Error : Failed to add disk scsi0:1. Failed to power on scsi0:1

To prevent concurrent changes to critical virtual machine files and file systems, ESXi/ESX hosts establish locks on these files. In certain circumstances, these locks may not be released when the virtual machine is powered off. The files cannot be accessed by the servers while locked, and the virtual machine is unable to power on.

These virtual machine files are locked during runtime:

  • VMNAME.vswp
  • DISKNAME-flat.vmdk
  • DISKNAME-ITERATION-delta.vmdk
  • VMNAME.vmx
  • VMNAME.vmxf
  • vmware.log

>> There is a manual procedure to locate the host and virtual machine holding locks.

To work around this issue, run the vmfsfilelockinfo script from the host experiencing difficulties with one or more locked files:

  1. To find out the IP address of the host holding the lock, run the /bin/vmfsfilelockinfo Python script. The script takes these parameters:
    • File being tested
    • Username and password for accessing VMware vCenter Server (when tracing MAC address to ESX host.)For example:

      Run this command:

      ~ # vmfsfilelockinfo -p /vmfs/volumes/iscsi-lefthand-2/VM1/VM1_1-000001-delta.vmdk -v 192.168.1.10 -uadministrator@vsphere.local

      You see ouput similar to:

      vmfsflelockinfo Version 1.0
      Looking for lock owners on “VM1_1-000001-delta.vmdk”
      “VM1_1-000001-delta.vmdk” is locked in Exclusive mode by host having mac address [‘xx:xx:xx:xx:xx:xx’]
      Trying to make use of Fault Domain Manager
      ———————————————————————-
      Found 0 ESX hosts using Fault Domain Manager.
      ———————————————————————-
      Could not get information from Fault domain manager
      Connecting to 192.168.1.10 with user administrator@vsphere.local
      Password: xXxXxXxXxXx
      ———————————————————————-
      Found 3 ESX hosts from Virtual Center Server.
      ———————————————————————-
      Searching on Host 192.168.1.178
      Searching on Host 192.168.1.179
      Searching on Host 192.168.1.180
      MAC Address : xx:xx:xx:xx:xx:xx

      Host owning the lock on the vmdk is 192.168.1.180, lockMode : Exclusive

      Total time taken : 0.27 seconds.

      Note: During the life-cycle of a powered on virtual machine, several of its files transitions between various legitimate lock states. The lock state mode indicates the type of lock that is on the file. The list of lock modes is:

    • mode 0 = no lock
    • mode 1 = is an exclusive lock (vmx file of a powered on virtual machine, the currently used disk (flat or delta), *vswp, and so on.)
    • mode 2 = is a read-only lock (For example on the ..-flat.vmdk of a running virtual machine with snapshots)
    • mode 3 = is a multi-writer lock (For example used for MSCS clusters disks or FT VMs)
  2. To get the name of the process holding the lock, run the lsof command on the host holding the lock and filter the output for the file name in question:~ # lsof | egrep ‘Cartel|VM1_1-000001-delta.vmdk’

    You see output similar to:

    Cartel | World name | Type | fd | Description
    36202 vmx FILE 80 /vmfs/volumes/556ce175-7f7bed3f-eb72-000c2998c47d/VM1/VM1_1-000001-delta.vmdk

    This shows that the file is locked by a virtual machine having Cartel ID 36202. Now display the list of active Cartel IDs by executing this command:

    ~ # esxcli vm process list

    This displays information for active virtual machines grouped by virtual machine name and having a format similar to:

    Alternate_VM27
    World ID: 36205
    Process ID: 0
    VMX Cartel ID: 36202
    UUID: 56 4d bd a1 1d 10 98 0f-c1 41 85 ea a9 dc 9f bf
    Display Name: Alternate_VM27
    Config File: /vmfs/volumes/556ce175-7f7bed3f-eb72-000c2998c47d/Alternate_VM27/Alternate_VM27.vmx
    Alternate_VM20
    World ID: 36207
    Process ID: 0
    VMX Cartel ID: 36206
    UUID: 56 4d bd a1 1d 10 98 0f-c1 41 85 ea a5 dc 94 5f
    Display Name: Alternate_VM20
    Config File: /vmfs/volumes/556ce175-7f7bed3f-eb72-000c2998c47d/Alternate_VM20/Alternate_VM20.vmx

    The virtual machine entry having VMX Cartel ID 36202 shows the display name of the virtual machine holding the lock on file VM1_1-000001-delta.vmdk, which in this example, is Alternate_VM27.

  3. Shut down the virtual machine holding the lock to release the lock.

Related Information

This script performs these actions in this sequence:

  1. Identifies locked state Exclusive, Read-Only, not locked.
  2. Identifies MAC address of locking host [‘xx:xx:xx:xx:xx:xx’].
  3. Queries the Fault Domain Manager (HA) for information on discovered MAC address.
  4. Queries vCenter Server for information on discovered MAC address.
  5. Outputs final status.
    For example:

Host owning the lock on the vmdk is 192.168.1.180, lockMode : Exclusive.

  • The script outputs total execution time when it terminates.

 

Notes:

  • The script does not attempt to break/remove locks. The script only identifies the potential ESX host which holds the lock.
  • If not run with vCenter Server username and password, it prompts for the same, after querying the Fault Domain Manager.
  • This script works on a single file parameter, without wildcards. If multiple queries are required, you must execute the script repeatedly in a wrapper script.

For further clarifications please follow : https://kb.vmware.com/s/article/10051

SRM advanced parameters which can be used for troubleshooting

VMware vCenter Site Recovery Manager (SRM) has a default setting of 300 seconds for the elapsed time for SRA commands (such as discoverDevices, discoverArrays). If the requested information is not passed back from the SRA in five minutes, SRM flags a timeout and terminates the command.

Example Error:

+++++++++++

"Timed out (300 seconds) while waiting for SRA to complete '<commandtype>' command"

Resolution :

+++++++++

To resolve this issue, increase the VMware vCenter Site Recovery Manager (SRM) timeout value for SRA commands:

  1. Log in to vSphere Web Client and click Site Recovery Manager plugin.
  2. Click Sites in the left pane.
  3. Click on a Site > go to Advanced Settings > click Storage.
  4. To change SRA update timeout, enter a new value in the storage.commandTimeout field greater than its current value (600 or 900).
  5. Perform test recovery again.

 

ADVANCED SETTING

DEFAULT VALUE

MY VALUE

DESCRIPTION

Recovery.powerOffTimeout

300

600

Change the timeout for guest OS to power off.

Recovery.powerOnTimeout

120

300

Change the timeout to wait for VMware Tools when powering on virtual machines.

StorageProvider.fixRecoveredDatastoreNames

Not checked

Checked

Force removal, upon successful completion of a recovery, of the snap-xx prefix applied to recovered datastore names.

StorageProvider.hostRescanRepeatCount

1

3

Repeat host scans during testing and recovery.

StorageProvider.hostRescanTimeoutSec

300

600

Change the interval that Site Recovery Manager waits for each HBA rescan to complete

Storage.commandTimeout

300

600

Change timeout in seconds for executing an SRA command.