IOPS LIMIT FOR OBJECT

A number of customers have expressed a wish to limit the amount of I/O that a single VM can generate to a VSAN datastore. The main reason for this request is to prevent a high feed VM (or to be more precise, intense IOPS application inside in a VM) from impacting other VMs running on the same datastore. With the introduction of IOPS Limits, implemented via policies, administrators can limit the number of IOPS that a VM can do.

VSAN 6.2 has a new quality of service mechanism which we are referring to as “IOPS limit for object”. Through a policy setting, a customer can set an IOPS limit on a per object basis (typically VMDK) which will guarantee that the object will not be able to exceed this amount of IOPS. This is very useful if you have a virtual machine that might be consuming more than its required share of resources. This policy setting will ensure that there are “guard rails” placed on this virtual machine so it doesn’t impact other VMs, or impact the overall performance of the VSAN datastore.

The screenshot below shows what the new “IOPS limit for object” capability looks like in the VM Storage Policy. Simply select “IOPS limit for object” for your policy, and then select an integer value for the IOPS limit. Any object (VMDK) that has this policy assigned will not be able to generate more than that number of IOPS.

Normalized to 32KB:

++++++++++++++++

The IO size for IOPS Limit is normalized to 32KB. Note that this is a hard limit on the number of IOPS so even if you have enough resources available on the system to do more, this will prevent the VM/VMDK from doing so.

Considerations:

++++++++++++

One thing to consider is that not only is read and write I/O counted towards the limit, but also any snapshot I/O that occurs against the VM/VMDK is added to the IOPS limit.

If the I/O against a particular VM/VMDK rises about the IOPS Limit threshold, i.e. it is set to 10,000 IOPS and we receive the 10,001st I/O, then that I/O is delayed/throttled.

New Security Features in 6.5 Release

With new features such as VM Encryption, Encrypted vMotion, Secure Boot Support for Virtual Machines, and Secure Boot Plus Cryptographic Hypervisor Assurance for ESXi, vSphere 6.5 Security brings together security and operational efficiency that are both universal and scalable. In addition, vSphere 6.5 introduces audit-quality logging of vSphere events via Syslog.

Virtual Machine Encryption :

++++++++++++++++++++

VM Encryption is a VM-agnostic method of encryption for VMs that is scalable, easy to implement, and easy to manage.

There are numerous advantages:

1. Because encryption occurs at the hypervisor level and not in the VM, VM Encryption works with any guest OS and datastore type.

2. Encryption is managed via policy. The policy can be applied to many VMs, regardless of their guest OS. Verifying that the VM is encrypted can be done by confirming that the policy is applied. The policy framework being used leverages vSphere Storage Policy Based Management (SPBM).

3. Encryption is not managed “within” the VM. This is a key differentiator. There are no encryption “special cases” that require in-guest configuration and monitoring. Encryption keys are not contained in the memory of the VM or accessible to the VM in any way.

4. Key Management is based on the industry-standard Key Management Interoperability Protocol (KMIP). We are qualifying against KMIP version 1.1. vCenter Server is considered a KMIP client, and it works with many KMIP 1.1 key managers. This provides customers with choice and flexibility. It also provides a separation of duty between key usage and key management. In a large enterprise, key management would be done by the security team, and key usage would be done by IT, in this example via vCenter Server.

5. VM Encryption leverages the latest CPU hardware advances in AES-NI encryption. Advanced Encryption Standard Instruction Set is an extension to the x86 instruction set and provides accelerated encryption and decryption functions on a per-core basis in the CPU.

Encrypted vMotion:

++++++++++++++

Encrypted vMotion is set on a per-VM basis. It encrypts the data traveling over the network rather than encrypting the network itself. This enables more flexibility and easier implementation. A 256-bit random key and a 64-bit nonce, used only once for this VMware vSphere vMotion® migration, are generated. The nonce is used to generate a unique counter for every packet sent over the network. This prevents replay attacks and enables the encryption of 264 128-bit blocks of data. The key and the nonce are packaged into a vSphere vMotion migration specification. The migration specification is sent to both systems in the cluster via the existing encrypted management connections between the vCenter Server instance and the ESXi hosts. The vSphere vMotion traffic begins with every packet being encrypted with the key and the nonce on host A. Each uniquely encrypted packet is decrypted on the receiving host, host B, completing the vSphere vMotion migration.

Secure Boot Support :

++++++++++++++++

vSphere 6.5 introduces Secure Boot Support for Virtual Machines and for the ESXi hypervisor. UEFI Secure Boot is a mechanism that ensures that only trusted code is loaded by EFI firmware prior to OS handoff. Trust is determined by keys and certificates managed by the firmware. Implementation of this feature for a virtual machine enables secure boot of EFI-aware OSs in a VM.

Virtual Machine Secure Boot : Virtual machines must be booted from the EFI firmware to enable Secure Boot. EFI firmware supports Windows, Linux, and nested ESXi. For Secure Boot to work, the guest OS must also support Secure Boot. Examples include Windows 8 and Windows Server 2012 and newer, VMware Photon™ OS, RHEL/Centos 7.0, Ubuntu 14.04, and ESXi 6.5. It is easy to enable Secure Boot for Virtual Machines by checking the box in the UI.

ESXi Host Secure Boot :

+++++++++++++++++

When Secure Boot is enabled, the UEFI firmware validates the digitally signed kernel of an OS against a digital certificate stored in the UEFI firmware. For ESXi 6.5, this capability is further leveraged by the ESXi kernel, adding cryptographic assurance of ESXi components. ESXi is already composed of digitally signed packages called vSphere installation bundles (VIBs). These packages are never broken open. At boot time, the ESXi file system maps to the content of those packages. By leveraging the same digital certificate in the host UEFI firmware used to validate the signed ESXi kernel, the kernel then validates each VIB using the Secure Boot verifier against the firmware-based certificate, ensuring a cryptographically “clean” boot.

Object Store File System

OSFS (Object Store File System) enables VMFS volumes to be mounted as a single datastore for each host. Data on a VSAN datastore is stored in the form of data containers called objects, which is distributed across the cluster. An object can be a vmdk file, a snapshot, or the VM home folder. A reference object is also created and holds a VMFS volume and stores the virtual machine metadata files.

Logs for OSFS are captured in : /var/log/osfsd.log

How would you make a change in vSAN datastore with osfs-mkdir and osfs-rmdir?

It is not easy to create and remove any directory form VSAN  because the vSAN datastore is object based. In order to do that, special vSAN related commands exist to perform such tasks.

If you try and create a directory on VSAN you might get and error as :

# cd /vmfs/volumes/vsanDatastore

# mkdir TEST

mkdir: can’t create directory ‘TEST’: Function not implemented

How can we create/remove a directory ?

Step 1: Login to Esxi and access the folder …/bin/

# cd /usr/lib/vmware/osfs/bin

Step 2: List the contents 

# ls

objtool     osfs-ls     osfs-mkdir  osfs-rmdir  osfsd

Step 3: Verify that a directory called TEST does not exist

# ls -lh /vmfs/volumes/vsanDatastore/TEST

ls: /vmfs/volumes/vsanDatastore/TEST: No such file or directory

Step 4: Let’s create a Directory with the name TEST using osfs-mkdir

# ./osfs-mkdir /vmfs/volumes/vsanDatastore/TEST

54c0ba65-0c45-xxxx-b1f2-xxxxxxxxxxxx

Step 5: Verify that it exists

# ls -lh /vmfs/volumes/vsanDatastore/TEST

lrwxr-xr-x    1 root     root          12 Jan 09 21:03 /vmfs/volumes/vsanDatastore/TEST ->54c0ba65-0c45-xxxx-b1f2-xxxxxxxxxxxx

Step 6: Let’s try to Delete the directory now using osfs-rmdir

# ./osfs-rmdir /vmfs/volumes/vsanDatastore/TEST

Deleting directory 54c0ba65-0c45-xxxx-b1f2-xxxxxxxxxxxx in container id xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

backed by vsan (force=False)

Step 7: Verify that it has been removed

# ls -lh /vmfs/volumes/vsanDatastore/TEST

ls: /vmfs/volumes/vsanDatastore/TEST: No such file or directory

Storage connectivity to vSphere

The VMware vSphere storage architecture consists of layers of abstraction that hide the differences and manage the complexity among physical storage subsystems.

To the applications and guest operating systems inside each virtual machine, the storage subsystem appears as a virtual SCSI controller connected to one or more virtual SCSI disks. These controllers are the only types of SCSI controllers that a virtual machine can see and access. These controllers include BusLogic Parallel, LSI Logic Parallel, LSI Logic SAS, and VMware Paravirtual.

The virtual SCSI disks are provisioned from datastore elements in the datacenter. A datastore is like a storage appliance that delivers storage space for virtual machines across multiple physical hosts. Multiple datastores can be aggregated into a single logical, load-balanced pool called a datastore cluster.

The datastore abstraction is a model that assigns storage space to virtual machines while insulating the guest from the complexity of the underlying physical storage technology. The guest virtual machine is not exposed to Fibre Channel SAN, iSCSI SAN, direct attached storage, and NAS.

Each datastore is a physical VMFS volume on a storage device. NAS datastores are an NFS volume with VMFS characteristics. Datastores can span multiple physical storage subsystems. A single VMFS volume can contain one or more LUNs from a local SCSI disk array on a physical host, a Fibre Channel SAN disk farm, or iSCSI SAN disk farm. New LUNs added to any of the physical storage subsystems are detected and made available to all existing or new datastores. Storage capacity on a previously created datastore can be extended without powering down physical hosts or storage subsystems. If any of the LUNs within a VMFS volume fails or becomes unavailable, only virtual machines that use that LUN are affected. An exception is the LUN that has the first extent of the spanned volume. All other virtual machines with virtual disks residing in other LUNs continue to function as normal.

Each virtual machine is stored as a set of files in a directory in the datastore. The disk storage associated with each virtual guest is a set of files within the guest’s directory. You can operate on the guest disk storage as an ordinary file. The disk storage can be copied, moved, or backed up. New virtual disks can be added to a virtual machine without powering it down. In that case, a virtual disk file  (.vmdk) is created in VMFS to provide new storage for the added virtual disk or an existing virtual disk file is associated with a virtual machine.

VMFS is a clustered file system that leverages shared storage to allow multiple physical hosts to read and write to the same storage simultaneously. VMFS provides on-disk locking to ensure that the same virtual machine is not powered on by multiple servers at the same time. If a physical host fails, the on-disk lock for each virtual machine is released so that virtual machines can be restarted on other physical hosts.

VMFS also features failure consistency and recovery mechanisms, such as distributed journaling, a failure-consistent virtual machine I/O path, and virtual machine state snapshots. These mechanisms can aid quick identification of the cause and recovery from virtual machine, physical host, and storage subsystem failures.

VMFS also supports raw device mapping (RDM). RDM provides a mechanism for a virtual machine to have direct access to a LUN on the physical storage subsystem (Fibre Channel or iSCSI only). RDM supports two typical types of applications:

SAN snapshot or other layered applications that run in the virtual machines. RDM better enables scalable backup offloading systems using features inherent to the SAN.

Microsoft Clustering Services (MSCS) spanning physical hosts and using virtual-to-virtual clusters as well as physical-to-virtual clusters. Cluster data and quorum disks must be configured as RDMs rather than files on a shared VMFS.

 

Supported Storage Adapters:

+++++++++++++++++++++++++

Storage adapters provide connectivity for your ESXi host to a specific storage unit or network.

ESXi supports different classes of adapters, including SCSI, iSCSI, RAID, Fibre Channel, Fibre Channel over Ethernet (FCoE), and Ethernet. ESXi accesses the adapters directly through device drivers in the VMkernel.

View Storage Adapters Information:

++++++++++++++++++++++++++++++

Use the vSphere Client to display storage adapters that your host uses and to review their information.

Procedure

1: In Inventory, select Hosts and Clusters.

2: Select a host and click the Configuration tab.

3: In Hardware, select Storage Adapters.

4:To view details for a specific adapter, select the adapter from the Storage Adapters list.

5: To list all storage devices the adapter can access, click Devices.

6: To list all paths the adapter uses, click Paths

Types of Physical Storage:

++++++++++++++++++++++

The ESXi storage management process starts with storage space that your storage administrator preallocates on different storage systems.

ESXi supports the following types of storage:

Local Storage : Stores virtual machine files on internal or directly connected external storage disks.

Networked Storage: Stores virtual machine files on external storage disks or arrays attached to your host through a direct connection or through a high-speed network.

Local Storage:

Local storage can be internal hard disks located inside your ESXi host, or it can be external storage systems located outside and connected to the host directly through protocols such as SAS or SATA.

Local storage does not require a storage network to communicate with your host. You need a cable connected to the storage unit and, when required, a compatible HBA in your host.

ESXi supports a variety of internal or external local storage devices, including SCSI, IDE, SATA, USB, and SAS storage systems. Regardless of the type of storage you use, your host hides a physical storage layer from virtual machines.

Networked Storage:

Networked storage consists of external storage systems that your ESXi host uses to store virtual machine files remotely. Typically, the host accesses these systems over a high-speed storage network.

Networked storage devices are shared. Datastores on networked storage devices can be accessed by multiple hosts concurrently. ESXi supports the following networked storage technologies.

Note

Accessing the same storage through different transport protocols, such as iSCSI and Fibre Channel, at the same time is not supported.

Fibre Channel (FC):

Stores virtual machine files remotely on an FC storage area network (SAN). FC SAN is a specialized high-speed network that connects your hosts to high-performance storage devices. The network uses Fibre Channel protocol to transport SCSI traffic from virtual machines to the FC SAN devices.

Fibre Channel Storage

In this configuration, a host connects to a SAN fabric, which consists of Fibre Channel switches and storage arrays, using a Fibre Channel adapter. LUNs from a storage array become available to the host. You can access the LUNs and create datastores for your storage needs. The datastores use the VMFS format.

Internet SCSI (iSCSI):

Stores virtual machine files on remote iSCSI storage devices. iSCSI packages SCSI storage traffic into the TCP/IP protocol so that it can travel through standard TCP/IP networks instead of the specialized FC network. With an iSCSI connection, your host serves as the initiator that communicates with a target, located in remote iSCSI storage systems.

ESXi offers the following types of iSCSI connections:

Hardware iSCSI: Your host connects to storage through a third-party adapter capable of offloading the iSCSI and network processing. Hardware adapters can be dependent and independent.

Software iSCSI :Your host uses a software-based iSCSI initiator in the VMkernel to connect to storage. With this type of iSCSI connection, your host needs only a standard network adapter for network connectivity.

You must configure iSCSI initiators for the host to access and display iSCSI storage devices.

iSCSI Storage depicts different types of iSCSI initiators.

iSCSI Storage

In the left example, the host uses the hardware iSCSI adapter to connect to the iSCSI storage system.

In the right example, the host uses a software iSCSI adapter and an Ethernet NIC to connect to the iSCSI storage.

iSCSI storage devices from the storage system become available to the host. You can access the storage devices and create VMFS datastores for your storage needs.

Network-attached Storage (NAS)

Stores virtual machine files on remote file servers accessed over a standard TCP/IP network. The NFS client built into ESXi uses Network File System (NFS) protocol version 3 to communicate with the NAS/NFS servers. For network connectivity, the host requires a standard network adapter.

NFS Storage

Shared Serial Attached SCSI (SAS)

Stores virtual machines on direct-attached SAS storage systems that offer shared access to multiple hosts. This type of access permits multiple hosts to access the same VMFS datastore on a LUN

What is CLOMD in VSAN??

CLOMD (Cluster Level Object Manager Daemon) plays a key role in the operation of a vSAN cluster. It runs on every ESXi host and is responsible for new object creation, initiating repair of existing objects after failures, all types of data moves and evacuations (For example: Enter Maintenance Mode, Evacuate data on disk removal from vSAN), maintaining balance and thus triggering rebalancing, implementing policy changes, etc. 

It does not actually participate in the data path, but it triggers data path operations and as such is a critical component during a number of management workflows and failure handling scenarios. 

Virtual machine power on, or Storage vMotion to vSAN are two operations where CLOMD is required (and which are not that obvious), as those operations require the creation of a swap object, and object creation requires CLOMD. 

Similarly, starting with vSAN 6.0, memory snapshots are maintained as objects, so taking a snapshot with memory state will also require the CLOMD.

Cluster health – CLOMD liveness check :

This checks if the Cluster Level Object Manager (CLOMD) daemon is alive or not. It does so by first checking that the service is running on all ESXi hosts, and then contacting the service to retrieve run-time statistics to verify that CLOMD can respond to inquiries. 

Note: This does not ensure that all of the functionalities discussed above (For example: Object creation, rebalancing) actually work, but it gives a first level assessment as to the health of CLOMD.

CLOMD ERROR 

If any of the ESXi hosts are disconnected, the CLOMD liveness state of the disconnected host is shown as unknown. If the Health service is not installed on a particular ESXi host, the CLOMD liveness state of all the ESXi hosts is also reported as unknown.

If the CLOMD service is not running on a particular ESXi hosts, the CLOMD liveness state of one host is abnormal.

For this test to succeed, the health service needs to be installed on the ESXi host and the CLOMD service needs to be running. To get the state status of the CLOMD service, on the ESXi host, run this command:

/etc/init.d/clomd status

If the CLOMD health check is still failing after these steps or if the CLOMD health check continues to fail on a regular basis, open a support request with VMware Support.

Examples:

++++++++

In the /var/run/log/clomd.log file, you see logs similar to:

2017-04-19T03:59:32.403Z 120360 (482850097440)(opID:1804289387)CLOMProcessWorkItem: Op REPAIR starts:1804289387 
2017-04-19T03:59:32.403Z 120360 (482850097440)(opID:1804289387)CLOMReconfigure: Reconfiguring aae9cf268-cd5e-abc4-448d-050010d45c96 workItem type REPAIR 
2017-04-19T03:59:32.408Z 120360 (482850097440)(opID:1804289387)CLOMReplacementPreWorkRepair: Repair needed. 1 absent/degraded data components for ae9cf268-cd5e-abc4-448d-050010d45c96 found 

^^^ Here, CLOMD crashed while attempting to repair object with UUID ae9cf268-cd5e-abc4-448d-050010d45c96 . The vSAN health check will report CLOMD liveness issue. A CLOMD restart will fail because each time it is restarted, it will fail again while attempting to repair the 0 sized object. Swap objects can be the only vSAN objects that can be zero sized, so this issue can occus only with swap objects.

Host crash Diagnostic Partitions

A diagnostic partition can be on the local disk where the ESXi software is installed. This is the default configuration for ESXi Installable. You can also use a diagnostic partition on a remote disk shared between multiple hosts. If you want to use a network diagnostic partition, you can install ESXi Dump Collector and configure the networked partition.

The following considerations apply:

>> A diagnostic partition cannot be located on an iSCSI LUN accessed through the software iSCSI or dependent hardware iSCSI adapter. For more information about diagnostic partitions with iSCSI, see General Boot from iSCSI SAN Recommendations in the vSphere Storage documentation.

>> Each host must have a diagnostic partition of 110MB. If multiple hosts share a diagnostic partition on a SAN LUN, the partition should be large enough to accommodate core dumps of all hosts.

>>If a host that uses a shared diagnostic partition fails, reboot the host and extract log files immediately after the failure. Otherwise, the second host that fails before you collect the diagnostic data of the first host might not be able to save the core dump.

Diagnostic Partition Creation:

++++++++++++++++++++++

You can use the vSphere Client to create the diagnostic partition on a local disk or on a private or shared SAN LUN. You cannot use vicfg-dumppart to create the diagnostic partition. The SAN LUN can be set up with FibreChannel or hardware iSCSI. SAN LUNs accessed through a software iSCSI initiator are not supported.

Managing Core Dumps:

+++++++++++++++++++

With esxcli system coredump, you can manage local diagnostic partitions or set up core dump on a remote server in conjunction with ESXi Dump Collector. For information about ESXi Dump Collector, see the vSphere Networking documentation.

Managing Local Core Dumps with ESXCLI:

++++++++++++++++++++++++++++++

The following example scenario changes the local diagnostic partition with ESXCLI. Specify one of the connection options listed in Connection Options in place of <conn_options>.

To manage a local diagnostic partition

1: Show the diagnostic partition the VMkernel uses and display information about all partitions that can be used as diagnostic partitions.

esxcli <conn_options> system coredump partition list

2: Deactivate the current diagnostic partition.

esxcli <conn_options> system coredump partition set –unconfigure

The ESXi system is now without a diagnostic partition, and you must immediately set a new one.

3: Set the active partition to naa.<naa_ID>.

esxcli <conn_options> system coredump partition set –partition=naa.<naa_ID>

4: List partitions again to verify that a diagnostic partition is set.

esxcli <conn_options> system coredump partition list

If a diagnostic partition is set, the command displays information about it. Otherwise, the command shows that no partition is activated and configured.

Managing Core Dumps with ESXi Dump Collector:

++++++++++++++++++++++++++++++++++++

By default, a core dump is saved to the local disk. You can use ESXi Dump Collector to keep core dumps on a network server for use during debugging. ESXi Dump Collector is especially useful for Auto Deploy, but supported for any ESXi 5.0 host. ESXi Dump Collector supports other customization, including sending core dumps to the local disk.

ESXi Dump Collector is included with the vCenter Server autorun.exe application. You can install ESXi Dump Collector on the same system as the vCenter Server service or on a different Windows or Linux machine.

You can configure ESXi Dump Collector by using the vSphere Client or ESXCLI. Specify one of the connection options listed in Connection Options in place of <conn_options>.

To manage core dumps with ESXi Dump Collector:

++++++++++++++++++++++++++++++++++++

1: Set up an ESXi system to use ESXi Dump Collector by running esxcli system coredump.

esxcli <conn_options> system coredump network set –interface-name vmk0 –server-ipv4=1-XX.XXX –port=6500

You must specify a VMkernel port with –interface-name, and the IP address and port of the server to send the core dumps to. If you configure an ESXi system that is running inside a virtual machine, you must choose a VMkernel port that is in promiscuous mode.

2: Enable ESXi Dump Collector.

esxcli <conn_options> system coredump network set –enable=true

3: (Optional) Check that ESXi Dump Collector is configured correctly.

esxcli <conn_options> system coredump network get

The host on which you have set up ESXi Dump Collector sends core dumps to the specified server by using the specified VMkernel NIC and optional port.

Managing Local Core Dumps with ESXCLI:

++++++++++++++++++++++++++++++

The following example scenario changes the local diagnostic partition with ESXCLI. Specify one of the connection options listed in Connection Options in place of <conn_options>.

To manage a local diagnostic partition

1: Show the diagnostic partition the VMkernel uses and display information about all partitions that can be used as diagnostic partitions.

esxcli <conn_options> system coredump partition list

2: Deactivate the current diagnostic partition.

esxcli <conn_options> system coredump partition set –unconfigure

The ESXi system is now without a diagnostic partition, and you must immediately set a new one.

3: Set the active partition to naa.<naa_ID>.

esxcli <conn_options> system coredump partition set –partition=naa.<naa_ID>

4: List partitions again to verify that a diagnostic partition is set.

esxcli <conn_options> system coredump partition list

If a diagnostic partition is set, the command displays information about it. Otherwise, the command shows that no partition is activated and configured.

Additional Information : ESXi Network Dump Collector in VMware vSphere 5.x/6.0 

 

Core Dumps for VSAN

If your vSAN cluster uses encryption, and if an error occurs on the ESXi host, the resulting core dump is encrypted to protect customer data. Core dumps which is present in the vm-support package are also encrypted.

Note:

Core dumps can contain sensitive information. Check with your Data Security Team and Privacy Policy when handling core dumps.

Core Dumps on ESXi Hosts

When an ESXi host crashes, an encrypted core dump is generated and the host reboots. The core dump is encrypted with the host key that is in the ESXi key cache.

  • In most cases, vCenter Server retrieves the key for the host from the KMS and attempts to push the key to the ESXi host after reboot. If the operation is successful, you can generate the vm-support package and you can decrypt or re-encrypt the core dump.

  • If vCenter Server cannot access the ESXi host, you might be able to retrieve the key from the KMS.

  • If the host used a custom key, and that key differs from the key that vCenter Server pushes to the host, you cannot manipulate the core dump. Avoid using custom keys.

Core Dumps and vm-support Packages

When you contact VMware Technical Support because of a serious error, your support representative usually asks you to generate a vm-supportpackage. The package includes log files and other information, including core dumps. If support representatives cannot resolve the issues by looking at log files and other information, you can decrypt the core dumps to make relevant information available. Follow your organization’s security and privacy policy to protect sensitive information, such as host keys.

Core Dumps on vCenter Server Systems

A core dump on a vCenter Server system is not encrypted. vCenter Server already contains potentially sensitive information. At the minimum, ensure that the Windows system on which vCenter Server runs or the vCenter Server Appliance is protected. You also might consider turning off core dumps for the vCenter Server system. Other information in log files can help determine the problem.

vSAN issue fixed in 6.5 release

  • An ESXi host fails with purple diagnostic screen when mounting a vSAN disk group :Due to an internal race condition in vSAN, an ESXi host might fail with a purple diagnostic screen when you attempt to mount a vSAN disk group.This issue is resolved in this release.
  • Using objtool on a vSAN witness host causes an ESXi host to fail with a purple diagnostic screen : If you use objtool on a vSAN witness host, it performs an I/O control (ioctl) call which leads to a NULL pointer in the ESXi host and the host crashes.This issue is resolved in this release.
  • Hosts in a vSAN cluster have high congestion which leads to host disconnects :When vSAN components with invalid metadata are encountered while an ESXi host is booting, a leak of reference counts to SSD blocks can occur. If these components are removed by policy change, disk decommission, or other method, the leaked reference counts cause the next I/O to the SSD block to get stuck. The log files can build up, which causes high congestion and host disconnects.This issue is resolved in this release.
  • Cannot enable vSAN or add an ESXi host into a vSAN cluster due to corrupted disks :When you enable vSAN or add a host to a vSAN cluster, the operation might fail if there are corrupted storage devices on the host. Python zdumps are present on the host after the operation, and the vdq -q command fails with a core dump on the affected host.This issue is resolved in this release.
  • vSAN Configuration Assist issues a physical NIC warning for lack of redundancy when LAG is configured as the active uplink :When the uplink port is a member of a Link Aggregation Group (LAG), the LAG provides redundancy. If the Uplink port number is 1, vSAN Configuration Assist issues a warning that the physical NIC lacks redundancy.This issue is resolved in this release.
  • vSAN cluster becomes partitioned after the member hosts and vCenter Server reboot :If the hosts in a unicast vSAN cluster and the vCenter Server are rebooted at the same time, the cluster might become partitioned. The vCenter Server does not properly handle unstable vpxd property updates during a simultaneous reboot of hosts and vCenter Server.This issue is resolved in this release.
  • An ESXi host fails with a purple diagnostic screen due to incorrect adjustment of read cache quota :The vSAN mechanism to that controls read cache quota might make incorrect adjustments that result in a host failure with purple diagnostic screen.This issue is resolved in this release.
  • Large File System overhead reported by the vSAN capacity monitor :When deduplication and compression are enabled on a vSAN cluster, the Used Capacity Breakdown (Monitor > vSAN > Capacity) incorrectly displays the percentage of storage capacity used for file system overhead. This number does not reflect the actual capacity being used for file system activities. The display needs to correctly reflect the File System overhead for a vSAN cluster with deduplication and compression enabled.This issue is resolved in this release.
  • vSAN health check reports CLOMD liveness issue due to swap objects with size of 0 bytes :If a vSAN cluster has objects with size of 0 bytes, and those objects have any components in need of repair, CLOMD might crash. The CLOMD log in /var/run/log/clomd.log might display logs similar to the following:

2017-04-19T03:59:32.403Z 120360 (482850097440)(opID:1804289387)CLOMProcessWorkItem: Op REPAIR starts:1804289387
2017-04-19T03:59:32.403Z 120360 (482850097440)(opID:1804289387)CLOMReconfigure: Reconfiguring ae9cf658-cd5e-dbd4-668d-020010a45c75 workItem type REPAIR 
2017-04-19T03:59:32.408Z 120360 (482850097440)(opID:1804289387)CLOMReplacementPreWorkRepair: Repair needed. 1 absent/degraded data components for ae9cf658-cd5e-dbd4-668d-020010a45c75 found   

  • The vSAN health check reports a CLOMD liveness issue. Each time CLOMD is restarted it crashes while attempting to repair the affected object. Swap objects are the only vSAN objects that can have size of zero bytes.

This issue is resolved in this release.

  • vSphere API FileManager.DeleteDatastoreFile_Task fails to delete DOM objects in vSAN :If you delete vmdks from the vSAN datastore using FileManager.DeleteDatastoreFile_Task API, through filebrowser or SDK scripts, the underlying DOM objects are not deleted.These objects can build up over time and take up space on the vSAN datastore.This issue is resolved in this release.
  • A host in a vSAN cluster fails with a purple diagnostic screen due to internal race condition :When a host in a vSAN cluster reboots, a race condition might occur between PLOG relog code and vSAN device discovery code. This condition can corrupt memory tables and cause the ESXi host to fail and display a purple diagnostic screen.This issue is resolved in this release.
  • Attempts to install or upgrade an ESXi host with ESXCLI or vSphere PowerCLI commands might fail for esx-base, vsan and vsanhealth VIBsFrom ESXi 6.5 Update 1 and above, there is a dependency between the esx-tboot VIB and the esx-base VIB and you must also include the esx-tboot VIB as part of the vib update command for successful installation or upgrade of ESXi hosts.Workaround: Include also the esx-tboot VIB as part of the vib update command. For example:esxcli software vib update -n esx-base -n vsan -n vsanhealth -n esx-tboot -d /vmfs/volumes/datastore1/update-from-esxi6.5-6.5_update01.zip

Configure vSAN Stretched Cluster

Stretched clusters extend the vSAN cluster from a single data site to two sites for a higher level of availability and intersite load balancing. Stretched clusters are typically deployed in environments where the distance between data centers is limited, such as metropolitan or campus environments.

You can use stretched clusters to manage planned maintenance and avoid disaster scenarios, because maintenance or loss of one site does not affect the overall operation of the cluster. In a stretched cluster configuration, both data sites are active sites. If either site fails, vSAN uses the storage on the other site. vSphere HA restarts any VM that must be restarted on the remaining active site.

Configure a vSAN cluster that stretches across two geographic locations or sites.

Prerequisites

  • Verify that you have a minimum of three hosts: one for the preferred site, one for the secondary site, and one host to act as a witness.

  • Verify that you have configured one host to serve as the witness host for the stretched cluster. Verify that the witness host is not part of the vSAN cluster, and that it has only one VMkernel adapter configured for vSAN data traffic.

  • Verify that the witness host is empty and does not contain any components. To configure an existing vSAN host as a witness host, first evacuate all data from the host and delete the disk group.

Procedure

  1. Navigate to the vSAN cluster in the vSphere Web Client.
  2. Click the Configure tab.
  3. Under vSAN, click Fault Domains and Stretched Cluster.
  4. Click the Stretched Cluster Configure button to open the stretched cluster configuration wizard.
  5. Select the fault domain that you want to assign to the secondary site and click >>.

    The hosts that are listed under the Preferred fault domain are in the preferred site.

  6. Click Next.
  7. Select a witness host that is not a member of the vSAN stretched cluster and click Next.
  8. Claim storage devices on the witness host and click Next.

    Claim storage devices on the witness host. Select one flash device for the cache tier, and one or more devices for the capacity tier.

  9. On the Ready to complete page, review the configuration and click Finish.

 

You can change the witness host for a vSAN stretched cluster.

Change the ESXi host used as a witness host for your vSAN stretched cluster.

Prerequisites

Verify that the witness host is not in use.

Procedure

  1. Navigate to the vSAN cluster in the vSphere Web Client.
  2. Click the Configure tab.
  3. Under vSAN, click Fault Domains and Stretched Cluster.
  4. Click the Change witness host button.
  5. Select a new host to use as a witness host, and click Next.
  6. Claim disks on the new witness host, and click Next.
  7. On the Ready to complete page, review the configuration, and click Finish.

 

You can configure the secondary site as the preferred site. The current preferred site becomes the secondary site.

Procedure

  1. Navigate to the vSAN cluster in the vSphere Web Client.
  2. Click the Configure tab.
  3. Under vSAN, click Fault Domains and Stretched Cluster.
  4. Select the secondary fault domain and click the Mark Fault Domain as preferred for Stretched Cluster icon ().
  5. Click Yes to confirm.

    The selected fault domain is marked as the preferred fault domain.