Best Practices VSAN Networking

Consider networking best practices for vSAN to improve performance and throughput.

  • For hybrid configurations, dedicate at least 1-GbE physical network adapter. Place vSAN traffic on a dedicated or shared 10-GbE physical adapter for best networking performance.

  • For all-flash configurations, use a dedicated or shared 10-GbE physical network adapter.

  • Provision one additional physical NIC as a failover NIC.

  • If you use a shared 10-GbE network adapter, place the vSAN traffic on a distributed switch and configure Network I/O Control to guarantee bandwidth to vSAN.

Monitor the Resynchronization Tasks in the vSAN Cluster

To evaluate the status of objects that are being resynchronized, you can monitor the resynchronization tasks that are currently in progress.

Prerequisites

Verify that hosts in your vSAN cluster are running ESXi 6.5 or later.

Procedure

  1. Navigate to the vSAN cluster in the vSphere Web Client.
  2. Select the Monitor tab and click vSAN.
  3. Select Resyncing Components to track the progress of resynchronization of virtual machine objects and the number of bytes that are remaining before the resynchronization is complete.

    NOTE: If your cluster has connectivity issues, the data on the Resyncing Components page might not get refreshed as expected and the fields might reflect inaccurate information.

Maintenance Mode on VSAN

Any maintenance activity on ESXi host running VSAN, the first thing you will want to do is to place the host into Maintenance Mode. If you have never performed this operation on a VSAN host before, you should be aware that there is a new option to specify how the data for VSAN will be migrated. Below is a screenshot of three options provided when using the vSphere Web Client.

Procedure:

  1. Right-click the host and select Maintenance Mode > Enter Maintenance Mode.
  2. Select a data evacuation mode and click OK.

Ensure data accessibility from other hosts:

++++++++++++++++++++++++++++++

This is the default option. When you power off or remove the host from the cluster, vSAN ensures that all accessible virtual machines on this host remain accessible. Select this option if you want to take the host out of the cluster temporarily, for example, to install upgrades, and plan to have the host back in the cluster. This option is not appropriate if you want to remove the host from the cluster permanently.

Evacuate all data to other hosts:

+++++++++++++++++++++++++

vSAN evacuates all data to other hosts in the cluster, maintains or fixes availability compliance for the affected components, and protects data when sufficient resources exist in the cluster. Select this option if you plan to migrate the host permanently. When evacuating data from the last host in the cluster, make sure that you migrate the virtual machines to another datastore and then place the host in maintenance mode.

This evacuation mode results in the largest amount of data transfer and consumes the most time and resources. All the components on the local storage of the selected host are migrated elsewhere in the cluster. When the host enters maintenance mode, all virtual machines have access to their storage components and are still compliant with their assigned storage policies.

No data evacuation:

+++++++++++++++++

vSAN does not evacuate any data from this host. If you power off or remove the host from the cluster, some virtual machines might become unaccessible.

How to move vSAN Datastore into a Folder?

vSphere Folders are commonly used by administrators for organizational purposes and/or permission delegation. When the customer tried to move their vSAN datastore into a folder using the vSphere Web Client (applies to HTML5 Web Client as well), what they found was that nothing happens even though the UI indicates the operation should be possible with the (+) symbol.

I decided to perform the operation using the vSphere API instead of the UI. Behind the scenes, the UI simply calls the MoveIntoFolder_Task() vSphere API which allows you to move various vSphere Inventory objects into a vSphere Folder.

For PowerCLI users, we can use Move-Datastore cmdlet which I will be using for this:

In my setup, I have one vSAN Datastores, one from a vSphere 6.0u3 environment . Lets say I want to move the 60u3 datastore to TEST. The following PowerCLI snippet below does exactly that:

Move-Datastore -Datastore (Get-Datastore "vsanDatastore") -Destination (Get-Folder "TEST")

You can see the TEST datastore .vSAN Datastores is now part of a vSphere Folder!

For now, if you need to move vSAN-based datastore into a vSphere Folder, simply use the vSphere API as a workaround.

Storage and Availability Technical Documents from Vmware

This was something I came across accidentally so thought it may be worth a very brief post about as I found some amazing contents there.

VMware Storage and Availability Technical Documents Hub 

This is an online repository of technical documents and “how to” guides including video documents for all storage and availability products within VMware. Namely, it has some very useful contents for 4 VMware product categories (as of now)

  • VSAN
  • SRM
  • Virtual Volumes
  • vSphere Replication

Let us check the VSAN section :

Than I just clicked on USE CASES:

What an amazing View of Use cases, this is what we need in our day in and out job.

Similarly there are some good technical documentation around vVols including overview, how to set up and implement VVols…etc.. However in comparison, the content is a little light for the others compared to VSAN, but I’m sure more content will be added as the portal gets developed further.

All the information are presented in HTML5 interface which is easy to navigate with handy option to print to PDF option on all pages if you wanna download the content as a PDF for offline reading which is great.

One amazing feature is at : vSAN Remote Office Deployment

Also : VMware vSAN Disaster Recovery 

 

FlexPod Datacenter with Cisco UCS Unified Software Release and VMware vSphere 6.0 U1

FlexPod has been gaining lots of market traction as the preferred converged solution platform of choice for many customers of over the last 4 years. This has been due to the amazing hardware technologies that underpins the solution offering (Cisco UCS compute + Cisco Nexus unified networking + NetApp FAS range of Clustered ONTAP SAN). Often, customers deploy FlexPod solutions together with VMware vSphere or MS Hyper-V on top (other hypervisors are also supported) which together, provide a complete, ready to go live, private and hybrid cloud platform that has been pre-validated to run most if not all typical enterprise data center workloads.

FlexPod Datacenter with Cisco UCS Unified Software release and VMware vSphere 6.0 U1 is designed to be fully redundant in the compute, network, and storage layers. There is no single point of failure from a device or traffic path perspective. Figure 2 illustrates a FlexPod topology using the Cisco UCS 6300 Fabric Interconnect top-of-rack model while Figure 3 shows the same network and storage elements paired with the Cisco UCS 6200 series Fabric Interconnects.

The Cisco UCS 6300 Fabric Interconnect FlexPod Datacenter model enables a high-performance, low latency and lossless fabric supporting application with these elevated requirements.  The 40GbE compute and network fabric with optional4/8/16G FC support increase the overall capacity of the system while maintaining the uniform and resilient design of the FlexPod solution.  The remainder of this section describes the network, compute and storage connections and enabled features.

Network: Link aggregation technologies play an important role in this FlexPod design, providing improved aggregate bandwidth and link resiliency across the solution stack. The NetApp storage controllers, Cisco Unified Computing System, and Cisco Nexus 9000 platforms support active port channeling using 802.3ad standard Link Aggregation Control Protocol (LACP). Port channeling is a link aggregation technique offering link fault tolerance and traffic distribution (load balancing) for improved aggregate bandwidth across member ports. In addition, the Cisco Nexus 9000 series features virtual Port Channel (vPC) capabilities. vPC allows links that are physically connected to two different Cisco Nexus 9000 Series devices to appear as a single “logical” port channel to a third device, essentially offering device fault tolerance. The Cisco UCS Fabric Interconnects and NetApp FAS storage controllers benefit from the Cisco Nexus vPC abstraction, gaining link and device resiliency as well as full utilization of a non-blocking Ethernet fabric.

Compute: Each Cisco UCS Fabric Interconnect (FI) is connected to the Cisco Nexus 9000. Figure 2 illustrates the use of vPC enabled 40GbE uplinks between the Cisco Nexus 9000 switches and Cisco UCS 6300 Fabric Interconnects. Figure 3 shows vPCs configured with 10GbE uplinks to a pair of Cisco Nexus 9000 switches from a Cisco UCS 6200 FI.  Note that additional ports can be easily added to the design for increased bandwidth, redundancy and workload distribution. The Cisco UCS unified software release 3.1 provides a common policy feature set that can be readily applied to the appropriate Fabric Interconnect platform based on the organizations workload requirements.

Note:  For SAN environments, NetApp clustered Data ONTAP allows up to 4 HA pairs or 8 nodes. For NAS environments, it allows 12 HA pairs or 24 nodes to form a logical entity.

The HA interconnect allows each node in an HA pair to assume control of its partner’s storage (disks and shelves) directly. The local physical HA storage failover capability does not extend beyond the HA pair. Furthermore, a cluster of nodes does not have to include similar hardware. Rather, individual nodes in an HA pair are configured alike, allowing customers to scale as needed, as they bring additional HA pairs into the larger cluster.

For more reference please follow :

FlexPod Datacenter with Cisco UCS 6300 Fabric Interconnect and VMware vSphere 6.0 U1 Design Guide

Confirming connectivity to a TCP port with telnet

While the ping command confirms connectivity, it does not necessarily mean that all TCP ports on the remote host can be reached. It is possible for a network firewall to allow or block access to certain ports on a host.

 

To check if specific TCP ports are running on the remote host, you can use the telnet command to confirm if a port is online.

# telnet destination-ip destination-port

When trying to establish a telnet connection to TCP port 80, you see an output similar to:

 

# telnet 192.168.1.11 80Trying 192.168.1.11…

Connected to 192.168.1.11.

Escape character is ‘^]’.

In this sample output, you can see that you are connected to port 80 (http) on the server with IP address 192.168.1.11.

 

If you choose a port number for a service that is not running on the host, you see an output similar to:

 

# telnet 192.168.48.1.11 

Trying 192.168.1.11…

telnet: Unable to connect to remote host: Connection timed out

In this case, you can see that there is no response when you attempt to connect to port 81 on the server 192.168.48.133.

Note: Telnet is an application that operates using the TCP protocol. UDP connectivity can not be tested using Telnet.

 

Viewing active TCP/UDP connections with netstat and esxcli network

When troubleshooting network connectivity issues, it may be helpful to see all the active incoming and outgoing TCP/UDP connections on an ESX/ESXi host. ESX hosts can use the netstat command and ESXi 4.1 and later hosts can use esxcli network to show the list of TCP/UDP connections. The commands are: 

ESX 3.5/4.x – # netstat -tnp

ESXi 4.1 – # esxcli network connection list

ESXi 5.0 – # esxcli network ip connection list

 

ESXi 5.1 – # esxcli network ip connection list 

ESXi 5.5 – # esxcli network ip connection list

ESXi 6.0 – # esxcli network ip connection list

ESXi 6.5 – # esxcli network ip connection list

Sample output from an ESXi 4.1 host:

 

# esxcli network connection list 

Proto  Recv-Q  Send-Q  Local Address       Foreign Address     State        World ID

tcp    0       52      192.168.1.11:22   192.168.25.1:55169  ESTABLISHED  0

tcp    0       0       127.0.0.1:62024     127.0.0.1:5988      TIME_WAIT    0

tcp    0       0       127.0.0.1:57867     127.0.0.1:5988      TIME_WAIT    0

tcp    0       0       127.0.0.1:62196     127.0.0.1:5988      TIME_WAIT    0

tcp    0       0       127.0.0.1:8307      127.0.0.1:52943     ESTABLISHED  5790

tcp    0       0       127.0.0.1:52943     127.0.0.1:8307      ESTABLISHED  5790

tcp    0       0       127.0.0.1:80        127.0.0.1:55629     ESTABLISHED  5785

tcp    0       0       127.0.0.1:55629     127.0.0.1:80        ESTABLISHED  6613

tcp    0       0       127.0.0.1:8307      127.0.0.1:56319     ESTABLISHED  5785

tcp    0       0       127.0.0.1:56319     127.0.0.1:8307      ESTABLISHED  5785

tcp    0       0       127.0.0.1:80        127.0.0.1:62782     ESTABLISHED  5166

tcp    0       0       127.0.0.1:62782     127.0.0.1:80        ESTABLISHED  6613

tcp    0       0       127.0.0.1:5988      127.0.0.1:53808     FIN_WAIT_2   0

tcp    0       0       127.0.0.1:53808     127.0.0.1:5988      CLOSE_WAIT   5166

tcp    0       0       127.0.0.1:8307      127.0.0.1:56963     CLOSE_WAIT   5788

tcp    0       0       127.0.0.1:56963     127.0.0.1:8307      FIN_WAIT_2   5785

tcp    0       0       127.0.0.1:8307      0.0.0.0:0           LISTEN       5031

tcp    0       0       127.0.0.1:8309      0.0.0.0:0           LISTEN       5031

tcp    0       0       127.0.0.1:5988      0.0.0.0:0           LISTEN       0

tcp    0       0       0.0.0.0:5989        0.0.0.0:0           LISTEN       0

tcp    0       0       0.0.0.0:80          0.0.0.0:0           LISTEN       5031

tcp    0       0       0.0.0.0:443         0.0.0.0:0           LISTEN       5031

tcp    0       0       127.0.0.1:12001     0.0.0.0:0           LISTEN       5031

tcp    0       0       127.0.0.1:8889      0.0.0.0:0           LISTEN       5331

tcp    0       0       192.168.1.11:427  0.0.0.0:0           LISTEN       0

tcp    0       0       127.0.0.1:427       0.0.0.0:0           LISTEN       0

tcp    0       0       0.0.0.0:22          0.0.0.0:0           LISTEN       0

tcp    0       0       0.0.0.0:902         0.0.0.0:0           LISTEN       0

tcp    0       0       0.0.0.0:8000        0.0.0.0:0           LISTEN       4801

tcp    0       0       0.0.0.0:8100        0.0.0.0:0           LISTEN       4795

udp    0       0       192.168.1.11:427  0.0.0.0:0                        0

udp    0       0       0.0.0.0:427         0.0.0.0:0                        0

udp    0       0       192.168.1.11:68   0.0.0.0:0                        4693

udp    0       0       0.0.0.0:8200        0.0.0.0:0                        4795

udp    0       0       0.0.0.0:8301        0.0.0.0:0                        4686

udp    0       0       0.0.0.0:8302        0.0.0.0:0                        4686

To retrieve errors and statistics for a network adapter, run this command:

# esxcli network nic stats get -n <vmnicX>

Where <vmnicX> is the name of a NIC in your ESXi host.

How Does vMotion Works?

If you need to take a host offline for maintenance, you can move the virtual machine to another host. Migration with vMotion™ allows virtual machine processes to continue working throughout a migration.

With vMotion, you can change the host on which a virtual machine is running, or you can change both the host and the datastore of the virtual machine.

When you migrate virtual machines with vMotion and choose to change only the host, the entire state of the virtual machine is moved to the new host. The associated virtual disk remains in the same location on storage that is shared between the two hosts.

When you choose to change both the host and the datastore, the virtual machine state is moved to a new host and the virtual disk is moved to another datastore. vMotion migration to another host and datastore is possible in vSphere environments without shared storage.

After the virtual machine state is migrated to the alternate host, the virtual machine runs on the new host. Migrations with vMotion are completely transparent to the running virtual machine.

The state information includes the current memory content and all the information that defines and identifies the virtual machine. The memory content includes transaction data and the bits of the operating system and applications that are in the memory. The defining and identification information stored in the state includes all the data that maps to the virtual machine hardware elements, such as BIOS, devices, CPU, MAC addresses for the Ethernet cards, chip set states, registers, and so forth.

When you migrate a virtual machine with vMotion, the new host for the virtual machine must meet compatibility requirements so that the migration can proceed.

Below are the steps :

  1. The first step is to ensure that the source VM can be operated on the chosen destination server (same CPU architecture and make sure you configure a vMotion Network else it operates on Management network of a host).
  2. Then a second VM process is started on the target host and the resources are reserved.
  3. Next, a system memory checkpoint is created. This means all changes to the source VM are written to an extra memory area.
  4. The contents of the system memory recorded at the checkpoint are transferred to the target VM(on the destination host).
  5. The checkpoint/checkpoint-restore process is repeated until only the smallest change sets remain in the target VM’s memory.
  6. The CPU of the source VM is stopped.
  7. The last modifications to the main memory are transferred to the target VM in milliseconds.
  8. The vMotion process is ended and a reverse ARP packet is sent to the physical switch (important: Notify Switches must be activated in the properties of the virtual switch). Hard disk access is taken over by the target ESX.
  9. The source VM is shut down. This means the VM process on the source ESXi is deleted.

A very good article on troubleshooting VMotion issue is Understanding and troubleshooting vMotion (1003734)

VADP overview

VMware vSphere Storage APIs – Data Protection is the next generation of VMware’s data protection framework originally introduced in vSphere 4.0 that enables backup products to do centralized, efficient, off-host LAN free backup of vSphere virtual machines.
A backup product using VMware vSphere Storage APIs – Data Protection can backup vSphere virtual machines from a central backup server or virtual machine without requiring backup agents or requiring backup processing to be done inside each guest virtual machine on the ESX host. This offloads backup processing from ESX hosts and reduces costs by allowing each ESX host to run more virtual machines.
VMware vSphere Storage APIs – Data Protection leverages the snapshot capabilities of VMware vSphere to enable backup across SAN without requiring downtime for virtual machines. As a result, backups can be performed non-disruptively at any time of the day without requiring extended backup windows and the downtime to applications and users associated with backup windows.

Back up Softwares have the ability to take the backup of VMs in the ESXi host level as a VMDK file.
This feature will enable to the Full VM recovery in case of system recovery. This ability helps to reduce the recovery time and gives the better recovery time objective (RTO).
Taking the backup in VM host level can be integrated with the VSphere to have the centralized management of backups and also it can be integrated with the individual ESXi hosts.

Using the Backup Technologies VMware backup Method Full VM can be recovered and also individual directories
and files also can be recovered.

>>Backups deals with the ESXi hosts so that it will not have any performance impact and resource utilization on VM guest level.
>>Full VM recover is possible in to the same location and also in Different locations like data store / ESXi/resource pool /VCenter is possible.
>> Capable of taking backups even the VM guest is powered off.
>>  It do use Snapshot technology to minimize the backup access to the Original VM and reduce the performance impact.

Port requirement:-

++++++++++++++++++
Ports 443 and 902 TCP ports are required to have communication. These 2 ports have specific purpose in the VADP setup.
Port 443:- is used to communicate with the VCenter for the VM Discovery and the backup and restore operations like snapshot creations and snapshot deletions.
Port 902:- is to have the communication with ESXi host in case of using the transport method NBD or NBDSSL transport.

If SAN transport is being used for Backup and restores activities then having the communication with the port 902 is not required.

Transport methods:-

++++++++++++++
Transport defines the path of the data travel from the source to the backup host. Backup provides  multiple transport methods to send the data to backup host.
To take the backups 3 different transport methods are available in Netbackup.
1) SAN Transport
2) NBD Transport
3) NBDSSL Transport

SAN transport and NBD transport are widely used methods.

1) SAN Transport:
Backup data traffic moves over the SAN transport from storage to directly to the Backup host. SAN transport method requires mapping the storage LUNs that are begin used by the datastores to the backup host also. It will enable the SAN transport to the backup shots and sends data directly from storage to the Backup servers LAN communication is only to require having the VM Discovery and the backup and restore operationslike snapshot creations and snapshot deletions over the TCP port 443.

2) NBD (LAN) Transport:
Backup data will travel form over the network from the ESXi hosts or VCenter (in case of using the VCenter) to the backup host using the port number 902. This transport depends on the network and increases the process load on the ESXi servers.

VCenter for SAN backups:
VCenter needs to use for the environments those are configured to use the VCenter for managing the ESXi servers, this scenario is fits very well for the large environments where multiple ESXi hosts are managed by the VCenter.

>> VCenter credentials needs to provide in the Backup Software as a VMware Virtual Center.
>> These credentials used by the backup host at the time of backup request to discover the VMs in VCenter and also to initiate the snapshot request for backups.
>> All the storage LUNs that are being assigned to the ESXi hosts for data stores also need to present to the backup host to enable to SAN transport.
>> The Data store LUNs that are presented to the backup host should not get initialize in backup host.
>> Port 443 requires to be opened to communicate with the VCenter.
>> Configure the policy in Netbackup using the Policy type as VMware and select the SAN transport
method in VMware tab of the policy to make use of SAN transport for backups.

VCenter for backups using NBD:-
NDB transport enables the backups over the LAN by using the Network Block Device (NBD) driver Protocol. In this method VCenter receives the backup request form the Netbackup backup host.
>> VCenter creates the snapshot for the VMs.
>> VCenter performs the snapshot activity and ESXi sends the Data to Backup host
>> Communication required from backup host to TCP port 443 on VCenter & 902 port on ESXi
hosts.Backup over the VCenter would be slower than the SAN transport backups
>> ESXi kernel port communicates with the backup host and sends the backup data and it is directly
impacted with the backup traffic and may encounter with performance issues.
>> Does not require any LUN masking to the backup hosts.