Boot Device for VSAN

Starting an ESXi installation that is a part of a vSAN cluster from a flash device imposes certain restrictions.

When you boot a vSAN host from a USB/SD device, you must use a high-quality USB or SD flash drive of 4 GB or larger.

When you boot a vSAN host from a SATADOM device, you must use single-level cell (SLC) device. The size of the boot device must be at least 16 GB.

During installation, the ESXi installer creates a coredump partition on the boot device. The default size of the coredump partition satisfies most installation requirements. (You need to configure the core dump partition)

  • If the memory of the ESXi host has 512 GB of memory or less, you can boot the host from a USB, SD, or SATADOM device.

  • If the memory of the ESXi host has more than 512 GB, consider the following guidelines.

    • You can boot the host from a SATADOM or disk device with a size of at least 16 GB. When you use a SATADOM device, use a single-level cell (SLC) device.

    • If you are using vSAN 6.5 or later, you must resize the coredump partition on ESXi hosts to boot from USB/SD devices. For more information, see the VMware knowledge base article at http://kb.vmware.com/kb/2147881.

Hosts that boot from a disk have a local VMFS. If you have a disk with VMFS that runs VMs, you must separate the disk for an ESXi boot that is not for vSAN. In this case you need separate controllers.

Best Practices VSAN Networking

Consider networking best practices for vSAN to improve performance and throughput.

  • For hybrid configurations, dedicate at least 1-GbE physical network adapter. Place vSAN traffic on a dedicated or shared 10-GbE physical adapter for best networking performance.

  • For all-flash configurations, use a dedicated or shared 10-GbE physical network adapter.

  • Provision one additional physical NIC as a failover NIC.

  • If you use a shared 10-GbE network adapter, place the vSAN traffic on a distributed switch and configure Network I/O Control to guarantee bandwidth to vSAN.

Monitor the Resynchronization Tasks in the vSAN Cluster

To evaluate the status of objects that are being resynchronized, you can monitor the resynchronization tasks that are currently in progress.

Prerequisites

Verify that hosts in your vSAN cluster are running ESXi 6.5 or later.

Procedure

  1. Navigate to the vSAN cluster in the vSphere Web Client.
  2. Select the Monitor tab and click vSAN.
  3. Select Resyncing Components to track the progress of resynchronization of virtual machine objects and the number of bytes that are remaining before the resynchronization is complete.

    NOTE: If your cluster has connectivity issues, the data on the Resyncing Components page might not get refreshed as expected and the fields might reflect inaccurate information.

Maintenance Mode on VSAN

Any maintenance activity on ESXi host running VSAN, the first thing you will want to do is to place the host into Maintenance Mode. If you have never performed this operation on a VSAN host before, you should be aware that there is a new option to specify how the data for VSAN will be migrated. Below is a screenshot of three options provided when using the vSphere Web Client.

Procedure:

  1. Right-click the host and select Maintenance Mode > Enter Maintenance Mode.
  2. Select a data evacuation mode and click OK.

Ensure data accessibility from other hosts:

++++++++++++++++++++++++++++++

This is the default option. When you power off or remove the host from the cluster, vSAN ensures that all accessible virtual machines on this host remain accessible. Select this option if you want to take the host out of the cluster temporarily, for example, to install upgrades, and plan to have the host back in the cluster. This option is not appropriate if you want to remove the host from the cluster permanently.

Evacuate all data to other hosts:

+++++++++++++++++++++++++

vSAN evacuates all data to other hosts in the cluster, maintains or fixes availability compliance for the affected components, and protects data when sufficient resources exist in the cluster. Select this option if you plan to migrate the host permanently. When evacuating data from the last host in the cluster, make sure that you migrate the virtual machines to another datastore and then place the host in maintenance mode.

This evacuation mode results in the largest amount of data transfer and consumes the most time and resources. All the components on the local storage of the selected host are migrated elsewhere in the cluster. When the host enters maintenance mode, all virtual machines have access to their storage components and are still compliant with their assigned storage policies.

No data evacuation:

+++++++++++++++++

vSAN does not evacuate any data from this host. If you power off or remove the host from the cluster, some virtual machines might become unaccessible.

How to move vSAN Datastore into a Folder?

vSphere Folders are commonly used by administrators for organizational purposes and/or permission delegation. When the customer tried to move their vSAN datastore into a folder using the vSphere Web Client (applies to HTML5 Web Client as well), what they found was that nothing happens even though the UI indicates the operation should be possible with the (+) symbol.

I decided to perform the operation using the vSphere API instead of the UI. Behind the scenes, the UI simply calls the MoveIntoFolder_Task() vSphere API which allows you to move various vSphere Inventory objects into a vSphere Folder.

For PowerCLI users, we can use Move-Datastore cmdlet which I will be using for this:

In my setup, I have one vSAN Datastores, one from a vSphere 6.0u3 environment . Lets say I want to move the 60u3 datastore to TEST. The following PowerCLI snippet below does exactly that:

Move-Datastore -Datastore (Get-Datastore "vsanDatastore") -Destination (Get-Folder "TEST")

You can see the TEST datastore .vSAN Datastores is now part of a vSphere Folder!

For now, if you need to move vSAN-based datastore into a vSphere Folder, simply use the vSphere API as a workaround.

Storage and Availability Technical Documents from Vmware

This was something I came across accidentally so thought it may be worth a very brief post about as I found some amazing contents there.

VMware Storage and Availability Technical Documents Hub 

This is an online repository of technical documents and “how to” guides including video documents for all storage and availability products within VMware. Namely, it has some very useful contents for 4 VMware product categories (as of now)

  • VSAN
  • SRM
  • Virtual Volumes
  • vSphere Replication

Let us check the VSAN section :

Than I just clicked on USE CASES:

What an amazing View of Use cases, this is what we need in our day in and out job.

Similarly there are some good technical documentation around vVols including overview, how to set up and implement VVols…etc.. However in comparison, the content is a little light for the others compared to VSAN, but I’m sure more content will be added as the portal gets developed further.

All the information are presented in HTML5 interface which is easy to navigate with handy option to print to PDF option on all pages if you wanna download the content as a PDF for offline reading which is great.

One amazing feature is at : vSAN Remote Office Deployment

Also : VMware vSAN Disaster Recovery 

 

vSAN memory or SSD congestion reached threshold limit

This is a common issue that we get in VSAN.
Examples of this alert are:

LSOM Memory Congestion State: Exceeded. Congestion Threshold: 200 Current Congestion: 201.
LSOM SSD Congestion State: Exceeded. Congestion Threshold: 200 Current Congestion: 201.

Congestion in vSAN occurs when the I/O rate of the lower layers of the storage subsystem fails to keep up with the I/O rate of the higher layers.

Local Log Structured Object Management (LSOM) is an internal component of vSAN, that works at the physical disk level (both flash devices and magnetic disks). LSOM also handles the read caching and write buffering for the components.

SSD is a cache device for a vSAN disk group.

The LSOM memory congestion state and LSOM SSD congestion state occur when vSAN artificially introduces latencies in the virtual machines in order to slow down writes to the flash device layer or layers.

Impact:
++++++++++++++
During an observed congestion period, higher virtual machine latencies occur.

Short periods of congestion might occur as vSAN uses a throttling mechanism to ensure that all layers run at the same I/O rate.

Smaller values for congestion are preferable, as higher value signifies latency. However, sustained congestion are not usual and in most cases, congestion should be close to zero.

Possible Solutions:
++++++++++++++++
If virtual machines perform a high number of write operations, write buffers could fill up on flash cache devices. These buffers must be de-staged to magnetic disks in hybrid configurations. De-staging can only be performed at a rate at which the magnetic disks in a hybrid configuration can handle.

Other reasons for congestion could be related to:

>>Faulty hardware

 

>>Corrupted or incorrectly functioning drivers or firmware

 

>>Insufficient I/O controller queue depths

 

>>Under specified  vSAN deployment

Tag Devices as SSD

You can use PSA SATP claim rules to tag SSD devices that are not detected automatically.

About this task :
++++++++++++++++
Only devices that are consumed by the PSA Native Multipathing (NMP) plugin can be tagged.

Procedure:

Identify the device to be tagged and its SATP.

esxcli storage nmp device list

The command results in the following information.

naa.6006016015301d00167ce6e2ddb3de11
Device Display Name: DGC Fibre Channel Disk (naa.6006016015301d00167ce6e2ddb3de11)
Storage Array Type: VMW_SATP_CX
Storage Array Type Device Config: {navireg ipfilter}
Path Selection Policy: VMW_PSP_MRU
Path Selection Policy Device Config: Current Path=vmhba4:C0:T0:L25
Working Paths: vmhba4:C0:T0:L25
Note down the SATP associated with the device.
Add a PSA claim rule to mark the device as SSD.
You can add a claim rule by specifying the device name.

esxcli storage nmp satp rule add -s SATP –device device_name –option=”enable_ssd”

You can add a claim rule by specifying the vendor name and the model name.

esxcli storage nmp satp rule add -s SATP -V vendor_name -M model_name --option="enable_ssd"

You can add a claim rule based on the transport protocol.

esxcli storage nmp satp rule add -s SATP --transport transport_protocol --option="enable_ssd"

You can add a claim rule based on the driver name.

esxcli storage nmp satp rule add -s SATP --driver driver_name --option="enable_ssd"

Reclaim the device.

esxcli storage core claiming reclaim --device device_name

Verify if devices are tagged as SSD.

esxcli storage core device list -d device_name

The command output indicates if a listed device is tagged as SSD.

Is SSD: true

What to do next:

If the SSD device that you want to tag is shared among multiple hosts, make sure that you tag the device from all the hosts that share the device.

What are the common types of congestion that are reported, and how can I address them in VSAN?

The types of congestion and remedies for each type are listed below:
  1. SSD Congestion: SSD congestion is typically raised when the active working set of write IOs for a specific disk group is much larger than the size of the cache tier of the disk group. In both the hybrid and all-flash vSAN cluster, data is first written to the write cache (also known as write buffer). A process known as de-staging moves the data from the write buffer to the capacity disks. The write cache absorbs a high write rate, ensuring that the write performance does not get limited by the capacity disks. However, if a benchmark fills the write cache at a very fast rate, the de-staging process may not be able to keep pace with the arriving IO rate. In such cases, SSD congestion is raised to signal the vSAN DOM client layer to slow down IOs to a rate that the vSAN disk group can handle.

    Remedy: To avoid SSD congestion, tune the size of the VM disks that the benchmark uses. For the best results, we recommend that the size of VM disks (active working set), be no larger than 40% of the cumulative size of the write caches across all disk groups. Please keep in mind that for a hybrid vSAN cluster, the size of the write cache is 30% the size of the cache tier disk. In an all-flash cluster, the size of the write cache is the size of the cache tier disk, but no greater than 600GB.


  2. Log Congestion: Log congestion is typically raised when vSAN LSOM Logs (which store the metadata of IO operations that have not been de-staged) consumes significant space in the write cache.

    Typically, a large volume of small sized writes on a small working set can cause a large number of vSAN LSOM log entries and cause this type of congestion. Additionally, if the benchmark does not issue 4K aligned IOs, then the number of IOs on the vSAN stack get inflated accounting for 4K alignment. The higher number of IOs can lead to log congestion.

    Remedy: Check if your benchmark aligns IO requests on the 4K boundary. If not, then check if your benchmark uses a very small working set (a small working set is when the total size of accessed VM disks is less than 10% of the size of caching tier). Please see above on how to calculate the size of the caching tier). If yes, please increase the working set to 40% of the size of the caching tier. If neither of the above two conditions hold true, you will need to reduce write traffic by either reducing the number of outstanding IOs that your benchmark issues, or decreasing the number of VMs that the benchmark is creating.


  3. Component Congestion (Comp-Congestion): This congestion indicates that there is a large volume of outstanding commit operations for some components resulting from the IO requests to those components getting queued. This can lead to worse latency. Typically, a heavy volume of writes to a few VM disks causes this congestion.

    Remedy: Increase the number of VM disks that your benchmark uses. Make sure that your benchmark does not issue IOs to a few VM disks.

  4. Memory and Slab Congestion: Memory and slab congestion usually means that the vSAN LSOM layer is running out of heap memory space or slab space to maintain its internal data structures. vSAN provisions a certain amount of system memory for its internal operations. However, if a benchmark aggressively issues IOs without any throttling, it can lead to vSAN using up all of its allocated memory space.

    Remedy: Reduce the working set of your benchmark. Alternatively, increase the following settings while experimenting with benchmarks to increase the amount of memory reserved for the vSAN LSOM layer. Please note that these settings are per disk group. Also, we do not recommend using these settings on a production cluster. These settings can be changed via esxcli (see KB 1038578)  as follows:

    /LSOM/blPLOGCacheLines, default=128K, increase to 512K
    /LSOM/blPLOGLsnCacheLines, default=4K, tuned=32K
    /LSOM/blLLOGCacheLines, default=128, increase to 32K