Cisco UCS Manager and Cisco UCS C-Series

Most x86-architecture servers today include a management function commonly known as a baseboard management controller (BMC). The BMC is usually embedded on the motherboard or main circuit board of the server and includes a specialized service processor and firmware to monitor and manage the physical state of the server hardware. BMC functions and standards are defined in the IPMI specifications, originally developed jointly by Intel, Hewlett-Packard Enterprise, Dell, and NEC. The specification is maintained and published at Intel’s corporate website, helping ensure that BMC functions are consistently implemented on all x86 managed server platforms.

Intel includes BMCs on its customer reference board (CRB) designs, which are given to original equipment manufacturers (OEMs) and original design manufacturers (ODMs) to accelerate time to market and help ensure compliance with industry standards such as IPMI. Cisco has added value to the basic BMC functions by reengineering the BMC to make it an important part of the Cisco UCS architecture. This integration helps enable powerful, industry-leading unified computing features and the use of service profiles for server provisioning and change management.

Cisco UCS Manager runs in the Cisco UCS 6100, 6200, and 6300 Series Fabric Interconnects. It provides a wide range of powerful features for the integrated and unified computing, networking, and storage environment of Cisco UCS. The features include the rapid provisioning of infrastructure from shared pools of computing, networking, and storage resources and the rapid scaling and provisioning of IT infrastructure through the model-based management approach of Cisco service profiles.

Service profiles are used to provision and manage Cisco UCS C-Series Rack Servers and their I/O properties within a single management domain. They are created by server, network, and storage administrators and are stored in the fabric interconnects. Infrastructure policies needed to deploy applications are encapsulated in the service profile. The policies coordinate and automate element management at every layer of the hardware stack, including RAID levels, BIOS settings, firmware revisions and settings, server identities, adapter settings, VLAN and VSAN network settings, network quality of service (QoS), and data center connectivity.

Service profile templates are used to simplify the creation of new service profiles, helping ensure consistent policies within the system for a given service or application. Whereas a service profile is a description of a logical server and there is a one-to-one relationship between the profile and the physical server, a service profile template can be used to define multiple servers. The template approach makes it just as easy to configure one server as it is to configure hundreds of servers with perhaps thousands of virtual machines. This automation reduces the number of manual steps needed, helping reduce the opportunities for human error, improving consistency, and further reducing server and network deployment times.

Cisco IMC communicates vital information about each individual server to Cisco UCS Manager. Cisco IMC provides many diagnostic and health monitoring services that contribute to the holistic management environment enabled by Cisco UCS.

Diagnostic and health monitoring features provided with Cisco IMC include:

●   SNMP

●   XML API event subscription and configurable alerts

●   System event log

●   Audit log

●   Monitoring of field-replaceable units (FRUs), HDD faults, dual inline memory module (DIMM) faults, network interface card (NIC) MAC addresses, CPU, and thermal faults

●   Configurable alerts and thresholds

●   Watchdog timer

●   RAID configuration and monitoring

●   Predictive failure analysis of HDD and DIMM

●   Converged network adapters (CNAs)

●   Reliability, availability, and serviceability (RAS)

●   Network Time Protocol (NTP)

●   Graphical and command-line client

Cisco IMC in Standalone Mode on Cisco UCS C-Series Servers

Many customers deploy Cisco UCS C-Series servers in a standalone environment as x86 servers (see Figure above). In such a deployment, the servers are not integrated with other Cisco UCS components, such as the fabric interconnects, fabric extenders, or Cisco UCS Manager. When Cisco UCS C-Series servers are operating in standalone mode, administrators can use Cisco IMC as an industry-standard BMC through a web-based GUI, a secure shell (SSH)–based command-line interface (CLI), or the native API to configure, administer, and monitor the server. IMC provides users full control of the server, allowing complete configuration and management. It can be configured to operate in several different network modes, taking advantage of the dedicated management port or sharing the same physical interface as the host in Shared LOM or Cisco Card mode. With Cisco IMC, administrators can perform the following server management tasks:

●   Power on, power off, power cycle, reset, and shut down the server

●   Toggle the locator LED

●   Configure the server boot order

●   View server properties and sensors

●   Complete out-of-band storage configuration

●   Manage remote presence

●   Firmware management

●   Create and manage local user accounts and enable authentication through Active Directory and LDAP

●   Configure network-related settings, including NIC properties, IPv4, IPv6, VLANs, and network security

●   Configure communication services, including HTTP, SSH, and IPMI over LAN

●   Manage certificates

●   Configure platform event filters

●   Monitor faults, alarms, and server status

Cisco IMC is included with each Cisco UCS C-Series server at no additional cost to customers.

The latest release of IMC, version 3.0(1), introduces a number of new features to better align with the needs of our customers. Most of these new capabilities are the HTML5 web UI and KVM, Redfish support, and XML API transactional support. HTML5 provides customers with a simplified user interface and, along with a reliable KVM, eliminates the need for Java to use IMC.

The IPMI interface is not able to address the scale-out and cloud-based requirements for simplicity and security available in modern programming interfaces, so Intel and other server vendors have developed the new Redfish standard. Redfish is an open industry standard specification and schema that specifies a RESTful interface and uses JSON and OData to help customers integrate solutions within their existing tool chains. It establishes a new management for system control that is scalable, easy to use, and secure. Redfish is sponsored by the Distributed Management Task Force (DMTF), a peer-review standards body recognized throughout the industry. Cisco UCS has adopted support for the Redfish standard on IMC version 3.0.

Redfish introduces a RESTful API to the IMC and is a simple, secure replacement for IPMI. Finally, XML API transactional support is catered toward users who utilize the programmability aspects of the IMC. Users can now configure multiple managed objects in a single transaction, allowing for quicker, simpler deployments.

Along with the many new software capabilities, Cisco has enhanced several of the utilities that rely on the IMC:

●   Cisco IMC Emulator

●   Non-Interactive Server Configuration Utility (NI-SCU)

●   Separation of SCU and diagnostics

●   Driver Update (Linux)

 

For More Information:

●   Cisco UCS Services: Accelerate Your Transition to a Unified Computing Architecture: http://www.cisco.com/en/US/services/ps2961/ps10312/Unified_Computing_Services_Overview.pdf

●   Cisco UCS C-Series Servers Integrated Management Controller CLI Configuration Guide, Release 3.0: http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/c/sw/cli/config/guide/3_0/b_Cisco_UCS_C-Series_CLI_Configuration_Guide_301.html

●   Setup for Cisco IMC on Cisco UCS C-Series Servers: http://www.cisco.com/en/US/partner/products/ps10493/products_configuration_example09186a0080b10d66.shtml

●   Cisco UCS C-Series Rack-Mount Servers: http://www.cisco.com/en/US/partner/products/ps10493/tsd_products_support_series_home.html

●   Unified computing: http://www.cisco.com/en/US/partner/netsol/ns944/index.html

●   Intelligent Platform Management Interface (IPMI) Specifications: http://www.intel.com/design/servers/ipmi/spec.htm

vSAN issue fixed in 6.5 release

  • An ESXi host fails with purple diagnostic screen when mounting a vSAN disk group :Due to an internal race condition in vSAN, an ESXi host might fail with a purple diagnostic screen when you attempt to mount a vSAN disk group.This issue is resolved in this release.
  • Using objtool on a vSAN witness host causes an ESXi host to fail with a purple diagnostic screen : If you use objtool on a vSAN witness host, it performs an I/O control (ioctl) call which leads to a NULL pointer in the ESXi host and the host crashes.This issue is resolved in this release.
  • Hosts in a vSAN cluster have high congestion which leads to host disconnects :When vSAN components with invalid metadata are encountered while an ESXi host is booting, a leak of reference counts to SSD blocks can occur. If these components are removed by policy change, disk decommission, or other method, the leaked reference counts cause the next I/O to the SSD block to get stuck. The log files can build up, which causes high congestion and host disconnects.This issue is resolved in this release.
  • Cannot enable vSAN or add an ESXi host into a vSAN cluster due to corrupted disks :When you enable vSAN or add a host to a vSAN cluster, the operation might fail if there are corrupted storage devices on the host. Python zdumps are present on the host after the operation, and the vdq -q command fails with a core dump on the affected host.This issue is resolved in this release.
  • vSAN Configuration Assist issues a physical NIC warning for lack of redundancy when LAG is configured as the active uplink :When the uplink port is a member of a Link Aggregation Group (LAG), the LAG provides redundancy. If the Uplink port number is 1, vSAN Configuration Assist issues a warning that the physical NIC lacks redundancy.This issue is resolved in this release.
  • vSAN cluster becomes partitioned after the member hosts and vCenter Server reboot :If the hosts in a unicast vSAN cluster and the vCenter Server are rebooted at the same time, the cluster might become partitioned. The vCenter Server does not properly handle unstable vpxd property updates during a simultaneous reboot of hosts and vCenter Server.This issue is resolved in this release.
  • An ESXi host fails with a purple diagnostic screen due to incorrect adjustment of read cache quota :The vSAN mechanism to that controls read cache quota might make incorrect adjustments that result in a host failure with purple diagnostic screen.This issue is resolved in this release.
  • Large File System overhead reported by the vSAN capacity monitor :When deduplication and compression are enabled on a vSAN cluster, the Used Capacity Breakdown (Monitor > vSAN > Capacity) incorrectly displays the percentage of storage capacity used for file system overhead. This number does not reflect the actual capacity being used for file system activities. The display needs to correctly reflect the File System overhead for a vSAN cluster with deduplication and compression enabled.This issue is resolved in this release.
  • vSAN health check reports CLOMD liveness issue due to swap objects with size of 0 bytes :If a vSAN cluster has objects with size of 0 bytes, and those objects have any components in need of repair, CLOMD might crash. The CLOMD log in /var/run/log/clomd.log might display logs similar to the following:

2017-04-19T03:59:32.403Z 120360 (482850097440)(opID:1804289387)CLOMProcessWorkItem: Op REPAIR starts:1804289387
2017-04-19T03:59:32.403Z 120360 (482850097440)(opID:1804289387)CLOMReconfigure: Reconfiguring ae9cf658-cd5e-dbd4-668d-020010a45c75 workItem type REPAIR 
2017-04-19T03:59:32.408Z 120360 (482850097440)(opID:1804289387)CLOMReplacementPreWorkRepair: Repair needed. 1 absent/degraded data components for ae9cf658-cd5e-dbd4-668d-020010a45c75 found   

  • The vSAN health check reports a CLOMD liveness issue. Each time CLOMD is restarted it crashes while attempting to repair the affected object. Swap objects are the only vSAN objects that can have size of zero bytes.

This issue is resolved in this release.

  • vSphere API FileManager.DeleteDatastoreFile_Task fails to delete DOM objects in vSAN :If you delete vmdks from the vSAN datastore using FileManager.DeleteDatastoreFile_Task API, through filebrowser or SDK scripts, the underlying DOM objects are not deleted.These objects can build up over time and take up space on the vSAN datastore.This issue is resolved in this release.
  • A host in a vSAN cluster fails with a purple diagnostic screen due to internal race condition :When a host in a vSAN cluster reboots, a race condition might occur between PLOG relog code and vSAN device discovery code. This condition can corrupt memory tables and cause the ESXi host to fail and display a purple diagnostic screen.This issue is resolved in this release.
  • Attempts to install or upgrade an ESXi host with ESXCLI or vSphere PowerCLI commands might fail for esx-base, vsan and vsanhealth VIBsFrom ESXi 6.5 Update 1 and above, there is a dependency between the esx-tboot VIB and the esx-base VIB and you must also include the esx-tboot VIB as part of the vib update command for successful installation or upgrade of ESXi hosts.Workaround: Include also the esx-tboot VIB as part of the vib update command. For example:esxcli software vib update -n esx-base -n vsan -n vsanhealth -n esx-tboot -d /vmfs/volumes/datastore1/update-from-esxi6.5-6.5_update01.zip

Configure vSAN Stretched Cluster

Stretched clusters extend the vSAN cluster from a single data site to two sites for a higher level of availability and intersite load balancing. Stretched clusters are typically deployed in environments where the distance between data centers is limited, such as metropolitan or campus environments.

You can use stretched clusters to manage planned maintenance and avoid disaster scenarios, because maintenance or loss of one site does not affect the overall operation of the cluster. In a stretched cluster configuration, both data sites are active sites. If either site fails, vSAN uses the storage on the other site. vSphere HA restarts any VM that must be restarted on the remaining active site.

Configure a vSAN cluster that stretches across two geographic locations or sites.

Prerequisites

  • Verify that you have a minimum of three hosts: one for the preferred site, one for the secondary site, and one host to act as a witness.

  • Verify that you have configured one host to serve as the witness host for the stretched cluster. Verify that the witness host is not part of the vSAN cluster, and that it has only one VMkernel adapter configured for vSAN data traffic.

  • Verify that the witness host is empty and does not contain any components. To configure an existing vSAN host as a witness host, first evacuate all data from the host and delete the disk group.

Procedure

  1. Navigate to the vSAN cluster in the vSphere Web Client.
  2. Click the Configure tab.
  3. Under vSAN, click Fault Domains and Stretched Cluster.
  4. Click the Stretched Cluster Configure button to open the stretched cluster configuration wizard.
  5. Select the fault domain that you want to assign to the secondary site and click >>.

    The hosts that are listed under the Preferred fault domain are in the preferred site.

  6. Click Next.
  7. Select a witness host that is not a member of the vSAN stretched cluster and click Next.
  8. Claim storage devices on the witness host and click Next.

    Claim storage devices on the witness host. Select one flash device for the cache tier, and one or more devices for the capacity tier.

  9. On the Ready to complete page, review the configuration and click Finish.

 

You can change the witness host for a vSAN stretched cluster.

Change the ESXi host used as a witness host for your vSAN stretched cluster.

Prerequisites

Verify that the witness host is not in use.

Procedure

  1. Navigate to the vSAN cluster in the vSphere Web Client.
  2. Click the Configure tab.
  3. Under vSAN, click Fault Domains and Stretched Cluster.
  4. Click the Change witness host button.
  5. Select a new host to use as a witness host, and click Next.
  6. Claim disks on the new witness host, and click Next.
  7. On the Ready to complete page, review the configuration, and click Finish.

 

You can configure the secondary site as the preferred site. The current preferred site becomes the secondary site.

Procedure

  1. Navigate to the vSAN cluster in the vSphere Web Client.
  2. Click the Configure tab.
  3. Under vSAN, click Fault Domains and Stretched Cluster.
  4. Select the secondary fault domain and click the Mark Fault Domain as preferred for Stretched Cluster icon ().
  5. Click Yes to confirm.

    The selected fault domain is marked as the preferred fault domain.

Networking Issues fixed in the latest release 6.5 U1d

  • Host profile might fail to extract after import of a VMware vSphere Distributed Switch from a backup fileIf you import a vSphere Distributed Switch (VDS) by previous backup file with Preserve original distributed switch and port group identifiers option selected, in case vCenter Server restarts, it does not load the port group from the database correctly, and the operation to extract the host profile might fail. You might see an error similar to A specified parameter was not correct: dvportgroup-XXXXX Error: specified value.

    This issue is resolved in this release.

    NOTE: If you have already encountered this issue, after you upgrade to vCenter Server 6.5 Update 1d, you must disconnect and reconnect the respective host.

  • vNICs of virtual machines might lose connection after a refresh of VMware Horizon view poolThe virtual network interface cards (vNIC) of virtual machines might lose connection after a refresh of a VMware Horizon view pool, because while VMs shut down and power on, the vCenter Server and the ESXi host might be out of sync, which might break the existing connection between vNICs and ephemeral port groups.

    This issue is resolved in this release.

  • vMotion of virtual machines with imported ephemeral port groups might fail as ports are not created after migrationYou might see vMotion of virtual machines with imported ephemeral port groups, or a VMware vSphere Distributed Switch (VDS) with such port groups, fail with an error similar to A general system error occurred: vim.fault.NotFound, because the ports might not be created after the migration.

    This issue is resolved in this release.

    NOTE: If you already can not migrate virtual machines due to this issue, after the upgrade to vCenter Server Update 1d you must delete the imported ephemeral port groups or DVS with such port groups and reimport them.

  • vCenter Server does not accept SNMP users with non-alphanumeric charactersWith this fix you can create users on vCenter Server SNMP configuration with non-alphanumeric characters. Previously, only SNMP configurations for ESXi allowed users with non-alphanumeric characters.

    This issue is resolved in this release.

  • vSphere Web Client might show error in vSphere Distributed Switch topology after a restart of vpxd due to outdated Link Aggregation Groups datavSphere Web Client might show an error in the vSphere Distributed Switch topology after a restart of vpxd due to outdated Link Aggregation Groups (LAG) data, as the vCenter Server database might not be updated with data for physical adapters added to the LAG before the restart, and the daemon might not be able to find it.

    This issue is resolved in this release.

    NOTE: If you already face the issue, you must reassign the physical adapters to the LAG.

Please find the link to download 6.5 U1d :https://my.vmware.com/web/vmware/details?downloadGroup=VC65U1D&productId=614&rPId=20189&src=so_5a314d05e49f5&cid=70134000001SkJn

Storage and VSAN related Issues fixed in 6.5 U1d

Storage Issue:

++++++++++

  • vSphere Web Client might not properly display storage devices after EMC Powerpath 6.0.1. installationvSphere Web Client might not properly display storage devices after EMC Powerpath 6.0.1 installation if the ESXi host has only LUNs and the host is set up to boot from a SAN.This issue is resolved in this release.
  • Datastores might falsely appear as incompatible for a storage policy that includes I/O filter rulesWhen you create or edit an existing storage policy, containing I/O filter common rules, and while checking for storage compatibility, you might observe messages like:Datastore does not match current VM policy or Datastore does not satisfy compatibility since it does not support one or more required properties

    Some of the datastores that you expect to be compatible, might appear in the incompatible list when you are checking for storage compatibility. This might also happen when you provision a virtual machine with a storage policy, containing I/O filter rules.This issue is resolved in this release.

  • Map Disk Region task might fill up the vCenter Server database task tableThe Map Disk Region task might alone fill up the vpx_task table in the vCenter Server database, which might cause vCenter Server to become unresponsive or other performance issues.

vSAN Issues:

+++++++++++

  • vSAN iSCSI messages might flood the Tasks view of vSphere Web ClientIf you upgrade a vSAN cluster from a vSphere 6.0 environment to a vSphere 6.5 environment, a message like Edit vSAN iSCSI target service in cluster might flood the Tasks view in the vSphere Web Client, even if the vSAN iSCSI feature is not enabled in this cluster.This issue is resolved in this release.
  • New icon for vSAN witness appliancesThis release introduces a new icon for vSAN witness appliances. When you deploy a vSAN witness appliance, you will see a nested host with a new icon in the UI, which indicates that this host can only be used as a witness in a stretched cluster.This issue is resolved in this release.
  • vSAN Health Service logs might eat up all log storage space in vCenter Server on WindowsvSAN Health Service logs might eat up all log storage space in vCenter Server on Windows if log rotation coincides with a running task for collecting support bundle or phone home data for Customer Experience Improvement Plan (CEIP).This issue is resolved in this release.
  • vSphere Update Manager baselines might change from patch to upgrade when only one node in a cluster is remediatedvSAN build recommendation for system baselines and baseline groups for use with vSphere Update Manager might not work if there is a vCenter Server system-wide http proxy. With this fix, vSAN build recommendations in a system-wide http proxy set up are as available and consistent as the vSAN build recommendations created with direct Internet connection.

Download the release from : https://my.vmware.com/web/vmware/details?downloadGroup=VC65U1D&productId=614&rPId=20189&src=so_5a314d05e49f5&cid=70134000001SkJn

Troubleshooting a Performance issue

This is pain point for most of us on how to isolate and understand who is the culprit on performance issue. I will try to add all the points together and summarize .

Symptoms:
++++++++++
>>Services running in guest virtual machines respond slowly.
>>Applications running in the guest virtual machines respond intermittently.
>>The guest virtual machine may seem slow or unresponsive.

There are 4 major things which deal with the performance:

1: CPU
2: Memory
3: Storage
4: Networking

1:CPU:

>>Use the esxtop command to determine if the ESXi/ESX server is being overloaded.
>>Examine the load average on the first line of the command output.
A load average of 1.00 means that the ESXi/ESX Server machine’s physical CPUs are fully utilized, and a load average of 0.5 means that they are half utilized. A load average of 2.00 means that the system as a whole is overloaded.
>>Examine the %READY field for the percentage of time that the virtual machine was ready but could not be scheduled to run on a physical CPU.

For more info please follow : https://kb.vmware.com/s/article/1033115?r=2&KM_Utility.getArticleLanguage=1&KM_Utility.getArticle=1&KM_Utility.getGUser=1&KM_Utility.getArticleData=1&Quarterback.validateRoute=1

If the load average is too high, and the ready time is not caused by CPU limiting, adjust the CPU load on the host. To adjust the CPU load on the host, either:

Increase the number of physical CPUs on the host
OR
Decrease the number of virtual CPUs allocated to the host. To decrease the number of virtual CPUs allocated to the host, either:

Reduce the total number of CPUs allocated to all of the virtual machines running on the ESX host. For more information, see Determining if multiple virtual CPUs are causing performance issues (1005362).
OR
Reduce the number of virtual machines running on the host.

2:Memory:

>>Use the esxtop command to determine whether the ESXi/ESX server’s memory is overcommitted.
>>Examine the MEM overcommit avg on the first line of the command output. This value reflects the ratio of the requested memory to the available memory, minus 1.

Examples:

If the virtual machines require 4 GB of RAM, and the host has 4 GB of RAM, then there is a 1:1 ratio. After subtracting 1 (from 1/1), the MEM overcommit avg field reads 0. There is no overcommitment and no extra RAM is required.
If the virtual machines require 6 GB of RAM, and the host has 4 GB of RAM, then there is a 1.5:1 ratio. After subtracting 1 (from 1.5/1), the MEM overcommit avg field reads 0.5. The RAM is overcommited by 50%, meaning that 50% more than the available RAM is required.

>>Determine whether the virtual machines are ballooning and/or swapping.

To detect any ballooning or swapping:

++Run esxtop.
++Type m for memory
++Type f for fields
++Select the letter J for Memory Ballooning Statistics (MCTL)
++Look at the MCTLSZ value.

MCTLSZ (MB) displays the amount of guest physical memory reclaimed by the balloon driver.

++Type f for Field
++Select the letter for Memory Swap Statistics (SWAP STATS).
++Look at the SWCUR value.

SWCUR (MB) displays the current Swap Usage.

3:Storage:

To determine whether the poor performance is due to storage latency:

>>Determine whether the problem is with the local storage. Migrate the virtual machines to a different storage location.
>>Reduce the number of Virtual Machines per LUN.
>>Look for log entries in the Windows guests that look like this:

The device, \Device\ScsiPort0, did not respond within the timeout period.
>>Using esxtop, look for a high DAVG latency time
>>Determine the maximum I/O throughput you can get with the iometer command.
>>Compare the iometer results for a VM to the results for a physical machine attached to the same storage.
>>Check for SCSI reservation conflicts.
>>If you are using iSCSI storage and jumbo frames, ensure that everything is properly configured.
>>If you are using iSCSI storage and multipathing with the iSCSI software initiator, ensure that everything is properly configured.

4:Network:

Network performance can be highly affected by CPU performance. Rule out a CPU performance issue before investigating network latency.

To determine whether the poor performance is due to network latency:

>>Test the maximum bandwidth from the virtual machine with the Iperf tool. This tool is available from https://github.com/esnet/iperf

Note: VMware does not endorse or recommend any particular third-party utility.

While using Iperf, change the TCP windows size to 64 K. Performance also depends also on this value. To change the TCP windows size:

On the server side, enter this command:

#iperf -s

On the client side, enter this command:

#iperf.exe -c sqlsed -P 1 -i 1 -p 5001 -w 64K -f m -t 10 900M

 

Some additional Information on ESXTOP:
++++++++++++++++++++++++++++++++++++++

Configuring monitoring using esxtop:
====================================
To monitor storage performance per HBA:

>>Start esxtop by typing esxtop at the command line.
>>Press d to switch to disk view (HBA mode).
>>To view the entire Device name, press SHIFT + L and enter 36 in Change the name field size.
>>Press f to modify the fields that are displayed.
>>Press b, c, d, e, h, and j to toggle the fields and press Enter.
>>Press s and then 2 to alter the update time to every 2 seconds and press Enter.

To monitor storage performance on a per-LUN basis:

>>Start esxtop by typing esxtop from the command line.
>>Press u to switch to disk view (LUN mode).
>>Press f to modify the fields that are displayed.
>>Press b, c, f, and h to toggle the fields and press Enter.
>>Press s and then 2 to alter the update time to every 2 seconds and press Enter.

 

For more information please follow the VMware KB : https://kb.vmware.com/s/article/1008205

 

VixDiskLib API

On ESXi hosts, virtual machine disk (VMDK) files are usually located under one of the /vmfs/volumes, perhaps on shared storage. Storage volumes are visible from the vSphere Client, in the inventory of hosts and clusters. Typical names are datastore1 and datastore2. To see a VMDK file, click Summary > Resources > Datastore, right-click Browse Datastore, and select a virtual machine.
On Workstation, VMDK files are stored in the same directory with virtual machine configuration (VMX) files, for example, /path/to/disk on Linux or C:\My Documents\My Virtual Machines on Windows.
VMDK files store data representing a virtual machine’s hard disk drive. Almost the entire portion of a VMDK file is the virtual machine’s data, with a small portion allotted to overhead.

Initialize the Library:
+++++++++++++++++

VixDiskLib_Init() initializes the old virtual disk library. The arguments majorVersion and minorVersion represent the VDDK library’s release number and dot-release number. The optional third, fourth, and fifth arguments specify log, warning, and panic handlers. DLLs and shared objects may be located in libDir.
VixError vixError = VixDiskLib_Init(majorVer, minorVer, &logFunc, &warnFunc, &panicFunc, libDir);
You should call VixDiskLib_Init() only once per process because of internationalization restrictions, at the beginning of your program. You should call VixDiskLib_Exit() at the end of your program for cleanup. For multithreaded programs you should write your own logFunc because the default function is not thread safe.
In most cases you should replace VixDiskLib_Init() with VixDiskLib_InitEx(), which allows you to specify a configuration file.

Virtual Disk Types:
++++++++++++++++

The following disk types are defined in the virtual disk library:
>>>
VIXDISKLIB_DISK_MONOLITHIC_SPARSE – Growable virtual disk contained in a single virtual disk file. This is the default type for hosted disk, and the only setting in the Virtual Disk API .
>>>
VIXDISKLIB_DISK_MONOLITHIC_FLAT – Preallocated virtual disk contained in a single virtual disk file. This takes time to create and occupies a lot of space, but might perform better than sparse.
>>>
VIXDISKLIB_DISK_SPLIT_SPARSE – Growable virtual disk split into 2GB extents (s sequence). These files can to 2GB, then continue growing in a new extent. This type works on older file systems.
>>>
VIXDISKLIB_DISK_SPLIT_FLAT – Preallocated virtual disk split into 2GB extents (f sequence). These files start at 2GB, so they take a while to create, but available space can grow in 2GB increments.
>>>
VIXDISKLIB_DISK_VMFS_FLAT – Preallocated virtual disk compatible with ESX 3 and later. Also known as thick disk. This managed disk type is discussed in Managed Disk and Hosted Disk.
>>>
VIXDISKLIB_DISK_VMFS_SPARSE – Employs a copy-on-write (COW) mechanism to save storage space.
>>>
VIXDISKLIB_DISK_VMFS_THIN – A Growable virtual disk that consumes only as much space as needed, compatible with ESX 3 or later, supported by VDDK 1.1 or later, and highly recommended.
>>>
VIXDISKLIB_DISK_STREAM_OPTIMIZED – Monolithic sparse format compressed for streaming. Stream optimized format does not support random reads or writes.

Check the sample programs on :

http://pubs.vmware.com/vsphere-65/index.jsp#com.vmware.vddk.pg.doc/vddkSample.7.2.html#995259

 

vSAN Prerequisites and Requirements for Deployment

Before delving into the installation and configuration of vSAN, it’s necessary to discuss the requirements and the prerequisites. VMware vSphere is the foundation of every vSAN based virtual infrastructure.

VMware vSphere:
+++++++++++++++

vSAN was first released with VMware vSphere 5.5 U1. Additional versions of vSAN were released with VMware vSphere 6.0 (vSAN 6.0), VMware vSphere 6.0 U1 (vSAN 6.1), and VMware vSphere 6.0 U2 (vSAN 6.2). Each of these releases included additional vSAN features.

VMware vSphere consists of two major components: the vCenter Server management tool and the ESXi hypervisor. To install and configure vSAN, both vCenter Server and ESXi are required.
VMware vCenter Server provides a centralized management platform for VMware vSphere environments. It is the solution used to provision new virtual machines (VMs), configure hosts, and perform many other operational tasks associated with managing a virtualized infrastructure.
To run a fully supported vSAN environment, the vCenter server 5.5 U1 platform is the minimum requirement, although VMware strongly recommends using the latest version of vSphere where possible. vSAN can be managed by both the Windows version of vCenter server and the vCenter Server appliance (VCSA). vSAN is configured and monitored via the vSphere web client, and this also needs a minimum version of 5.5 U1 for support. vSAN can also be fully configured and managed through the command-line interface (CLI) and the vSphere application programming interface (API) for those wanting to automate some (or all) of the aspects of vSAN configuration, monitoring, or management. Although a single cluster can contain only one vSAN datastore, a vCenter server can manage multiple vSAN and compute clusters.

ESXi:
+++++

VMware ESXi is an enterprise-grade virtualization product that allows you to run multiple instances of an operating system in a fully isolated fashion on a single server. It is a baremetal solution, meaning that it does not require a guest-OS and has an extremely thin footprint. ESXi is the foundation for the large majority of virtualized environments worldwide. For standard datacenter deployments, vSAN requires a minimum of three ESXi hosts (where
each host has local storage and is contributing this storage to the vSAN datastore) to form a supported vSAN cluster. This is to allow the cluster to meet the minimum availability requirements of tolerating at least one host failure.

With vSAN 6.1 (released with vSphere 6.0 U1), VMware introduced the concept of a 2-node vSAN cluster primarily for remote office/branch office deployments. There are some additional considerations around the use of a 2-node vSAN cluster, including the concept of a witness host. As of vSAN 6.0 a maximum of 64 ESXi hosts in a cluster is supported, a significant increase from the 32 hosts that were supported in the initial vSAN release that was part of vSphere
5.5, from here on referred to as vSAN 5.5. The ESXi hosts must be running version 6.0 at a minimum to support 64 hosts however. At a minimum, it is recommended that a host have at least 6 GB of memory. If you configure a host to contain the maximum number of disk groups, we recommend that the host be configured with a minimum of 32 GB of memory. vSAN does not consume all of this memory, but it is required for the maximum configuration. The vSAN host memory requirement is directly related to the number of physical disks in the host and the number of disk groups configured on the host.In all cases we recommend to go with more than 32 GB per host to ensure that your workloads, vSAN and the hypervisor have sufficient resources to ensure an optimal user experience. Below is the Diagram for Minimum host contributing storage :

Cache and Capacity Devices:
+++++++++++++++++++++++++++
With the release of vSAN 6.0, VMware introduced the new all-flash version of vSAN. vSAN was only available as a hybrid configuration with version 5.5. A hybrid configuration is where the cache tier is made up of flash-based devices and the capacity tier is made up of magnetic disks. In the all-flash version, both the cache tier and capacity tier are made up of flash devices. The flash devices of the cache and capacity tier are typically a different grade of flash device in terms of performance and endurance. This allows you, under certain circumstances, to create all-flash configurations at the cost of SAS-based magnetic disk configurations.

 

vSAN Requirements:
++++++++++++++++++
Before enabling vSAN, it is highly recommended that the vSphere administrator validate that the environment meets all the prerequisites and requirements. To enhance resilience, this list also includes recommendations from an infrastructure perspective:
>>Minimum of three ESXi hosts for standard datacenter deployments. Minimum of two ESXi hosts and a witness host for the smallest deployment, for example, remote office/branch office.
>>Minimum of 6 GB memory per host to install ESXi.
>>VMware vCenter Server.
>>At least one device for the capacity tier. One hard disk for hosts contributing storage to vSAN datastore in a hybrid configuration; one flash device for hosts contributing storage to vSAN datastore in an all-flash configuration.
>>At least one flash device for the cache tier for hosts contributing storage to vSAN datastore, whether hybrid or all-flash.
>>One boot device to install ESXi.
>>At least one disk controller. Pass-through/JBOD mode capable disk controller preferred.
>>Dedicated network port for vSAN–VMkernel interface. 10 GbE preferred, but 1 GbE supported for smaller hybrid configurations. With 10 GbE, the adapter does not need to be dedicated to vSAN traffic, but can be shared with other traffic types, such as management traffic, vMotion traffic, etc.
>>L3 multicast is required on the vSAN network.

vSAN Ready Nodes:
++++++++++++++++++
vSAN ready nodes are a great alternative to manually selecting components. Ready nodes would also be the preferred way of building a vSAN configuration. Various vendors have gone through the exercise for you and created configurations that are called vSAN ready nodes. These nodes consist of tested and certified hardware only and, in our opinion,provide an additional guarantee.

For more information please follow : https://www.vsan-essentials.com/

vSAN Performance Capabilities

It is difficult to predict what your performance will be because every workload and every combination of hardware will provide different results. After the initial vSAN launch, VMware announced the results of multiple performance tests
(http://blogs.vmware.com/vsphere/2014/03/supercharge-virtual-san-cluster-2-millioniops.html).

The results Vmwarere impressive, to say the least, but Vmwarere only the beginning. With the 6.1 release, performance of hybrid had doubled and so had the scale, allowing for 8 million IOPS per cluster. The introduction of all-flash hoVmwarever completely changed the game. This alloVmwared vSAN to reach 45k IOPS per diskgroup, and remember you can have 5 per disk group, but it also introduced sub millisecond latency. (Just for completeness sake, theoretically it would be possible to design a vSAN cluster that could deliver over 16 million IOPS with sub millisecond latency using an all-flash configuration.)

Do note that these performance numbers should not be used as a guarantee for what you can achieve in your environment. These are theoretical tests that are not necessarily (and most likely not) representative of the I/O patterns you will see in your own environment (and so results will vary). Nevertheless, it does prove that vSAN is capable of delivering a high performance environment. At the time of writing the latest performance document available is for vSAN 6.0, which can be found here:
http://www.vmware.com/files/pdf/products/vsan/VMware-Virtual-San6-ScalabilityPerformance-Paper.pdf.

Vmware highly recommend hoVmwarever to search for the latest version as Vmware are certain that there will be an updated version with the 6.2 release of vSAN. One thing that stands out though when reading these types of papers is that all performance tests and reference architectures by VMware that are publicly available have been done with 10 GbE networking configurations. For our design scenarios, Vmware will use 10 GbE as the golden standard because it is heavily recommended by VMware and increases throughput and loVmwarers latency. The only configuration where this does not apply is ROBO (remote office/branch office). This 2-node vSAN configuration is typically deployed using 1 GbE since the number of VMs running is typically relatively low (up to 20 in total). Different configuration options for networking, including the use of Network I/O Control.