VMware Site Recovery Manager 6.5

VMware Site Recovery Manager 6.5 is an exciting release for VMware. There are few cool features that make it easier to use and monitor SRM,also few integrations to VMware products. All of these improvements enhance what is already the premier virtualization BC/DR solution and provide additional value, time savings, security and risk reduction to customers.

vSphere 6.5 Compatibility:

+++++++++++++++++++
First off, SRM 6.5 is compatible with vSphere 6.5 including:

  • Full integration with the new vCenter HA feature. SRM will continue working normally if vCenter HA fails over
  • Full support for Windows vCenter to VCSA migration. If a customer uses the migration tool to upgrade and migrate their environment from vCenter 6.0 on Windows to vCenter 6.5 on the VCSA, from an SRM standpoint this is just seen as an upgrade and is fully compatible with a standard SRM upgrade
  • SRM supports protecting VMs that are using VM encryption when using Storage Policy-Based Protection Groups (SPPGs)
  • There is now support for Two-factor authentication such as RSA SecurID with SRM 6.5
    Integration with the new vSphere Guest Operations API. This means that changes to VM IP addresses and scripts running on VMs will now be even more secure

vSAN 6.5 Compatibility:

+++++++++++++++++

SRM 6.5 is fully supported and compatible with vSAN 6.5, as well as all previous vSAN versions, using vSphere Replication.

SRM and VVOLs interoperability:

++++++++++++++++++++++++

In addition to all the new cool features that are part of Virtual Volumes (VVOLs) 2.0, SRM 6.5 now supports protection of VMs located on VVOLs using vSphere Replication. If you are using storage other than vSAN you owe it to yourself to take a look at VVOLs and see what you can get out of using them.

API and vRO plug-in enhancements – Scripted/Unattended install and upgrade:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++

There have been a number of enhancements to the programmatic interaction with SRM 6.5. These take a few forms, primarily: exposure of a number of new options in the SRM API and significant enhancements to the vRealize Orchestrator (vRO) plug-in for SRM to take advantage of these. The new functions available through the API and vRO plug-in are listed below. Follow these links for details on the existing API and vRO plug-in.

  • Add a Test Network Mapping
  • Get Test Network Mappings
  • Remove a Test Network Mapping
  • Remove Folder Mapping
  • Remove Network Mapping
  • Remove Resource Mapping
  • Remove Protection Group
  • Remove Replicated VM From VR Group
  • Unprotect Virtual Machines
  • Add Test Network Mapping to Recovery Plan
  • Create Recovery Plan
  • Delete Recovery Plan
  • Initiate Planned Migration Recovery Plan
  • Remove Protection Group From Recovery Plan
  • Remove Test Network Mapping From Recovery Plan
  • Discover Replicated Devices

These functions along with the functions previously exposed provide programmatic access to almost the full range of SRM functionality. This makes it that much easier to manage and maintain SRM programmatically, saving time and improving accuracy and efficiency. Additionally, SRM 6.5 now fully supports unattended/silent installation and upgrades. This makes the deployment of SRM much faster and easier, saving you time and money.

vSphere Replication RPO:

+++++++++++++++++++

Another exciting new enhancement for SRM is actually part of a related solution, vSphere Replication. vSphere Replication now supports VM RPOs of as low as a 5 min on most VMware compatible storage. This takes the RPO available with vSphere Replication to the point where it covers most use cases. And remember that vSphere Replication is included in vSphere Essentials Plus licensing and above.

vROps SRM Management Pack:

+++++++++++++++++++++++

Last but definitely not least, SRM 6.5 marks the first time that vROps has a management pack for monitoring SRM. This will allow for monitoring of the SRM server, Protection Group and Recovery Plan status from within vROps. This makes it easier to monitor, manage and troubleshoot SRM.

IOPS LIMIT FOR OBJECT

A number of customers have expressed a wish to limit the amount of I/O that a single VM can generate to a VSAN datastore. The main reason for this request is to prevent a high feed VM (or to be more precise, intense IOPS application inside in a VM) from impacting other VMs running on the same datastore. With the introduction of IOPS Limits, implemented via policies, administrators can limit the number of IOPS that a VM can do.

VSAN 6.2 has a new quality of service mechanism which we are referring to as “IOPS limit for object”. Through a policy setting, a customer can set an IOPS limit on a per object basis (typically VMDK) which will guarantee that the object will not be able to exceed this amount of IOPS. This is very useful if you have a virtual machine that might be consuming more than its required share of resources. This policy setting will ensure that there are “guard rails” placed on this virtual machine so it doesn’t impact other VMs, or impact the overall performance of the VSAN datastore.

The screenshot below shows what the new “IOPS limit for object” capability looks like in the VM Storage Policy. Simply select “IOPS limit for object” for your policy, and then select an integer value for the IOPS limit. Any object (VMDK) that has this policy assigned will not be able to generate more than that number of IOPS.

Normalized to 32KB:

++++++++++++++++

The IO size for IOPS Limit is normalized to 32KB. Note that this is a hard limit on the number of IOPS so even if you have enough resources available on the system to do more, this will prevent the VM/VMDK from doing so.

Considerations:

++++++++++++

One thing to consider is that not only is read and write I/O counted towards the limit, but also any snapshot I/O that occurs against the VM/VMDK is added to the IOPS limit.

If the I/O against a particular VM/VMDK rises about the IOPS Limit threshold, i.e. it is set to 10,000 IOPS and we receive the 10,001st I/O, then that I/O is delayed/throttled.

New Security Features in 6.5 Release

With new features such as VM Encryption, Encrypted vMotion, Secure Boot Support for Virtual Machines, and Secure Boot Plus Cryptographic Hypervisor Assurance for ESXi, vSphere 6.5 Security brings together security and operational efficiency that are both universal and scalable. In addition, vSphere 6.5 introduces audit-quality logging of vSphere events via Syslog.

Virtual Machine Encryption :

++++++++++++++++++++

VM Encryption is a VM-agnostic method of encryption for VMs that is scalable, easy to implement, and easy to manage.

There are numerous advantages:

1. Because encryption occurs at the hypervisor level and not in the VM, VM Encryption works with any guest OS and datastore type.

2. Encryption is managed via policy. The policy can be applied to many VMs, regardless of their guest OS. Verifying that the VM is encrypted can be done by confirming that the policy is applied. The policy framework being used leverages vSphere Storage Policy Based Management (SPBM).

3. Encryption is not managed “within” the VM. This is a key differentiator. There are no encryption “special cases” that require in-guest configuration and monitoring. Encryption keys are not contained in the memory of the VM or accessible to the VM in any way.

4. Key Management is based on the industry-standard Key Management Interoperability Protocol (KMIP). We are qualifying against KMIP version 1.1. vCenter Server is considered a KMIP client, and it works with many KMIP 1.1 key managers. This provides customers with choice and flexibility. It also provides a separation of duty between key usage and key management. In a large enterprise, key management would be done by the security team, and key usage would be done by IT, in this example via vCenter Server.

5. VM Encryption leverages the latest CPU hardware advances in AES-NI encryption. Advanced Encryption Standard Instruction Set is an extension to the x86 instruction set and provides accelerated encryption and decryption functions on a per-core basis in the CPU.

Encrypted vMotion:

++++++++++++++

Encrypted vMotion is set on a per-VM basis. It encrypts the data traveling over the network rather than encrypting the network itself. This enables more flexibility and easier implementation. A 256-bit random key and a 64-bit nonce, used only once for this VMware vSphere vMotion® migration, are generated. The nonce is used to generate a unique counter for every packet sent over the network. This prevents replay attacks and enables the encryption of 264 128-bit blocks of data. The key and the nonce are packaged into a vSphere vMotion migration specification. The migration specification is sent to both systems in the cluster via the existing encrypted management connections between the vCenter Server instance and the ESXi hosts. The vSphere vMotion traffic begins with every packet being encrypted with the key and the nonce on host A. Each uniquely encrypted packet is decrypted on the receiving host, host B, completing the vSphere vMotion migration.

Secure Boot Support :

++++++++++++++++

vSphere 6.5 introduces Secure Boot Support for Virtual Machines and for the ESXi hypervisor. UEFI Secure Boot is a mechanism that ensures that only trusted code is loaded by EFI firmware prior to OS handoff. Trust is determined by keys and certificates managed by the firmware. Implementation of this feature for a virtual machine enables secure boot of EFI-aware OSs in a VM.

Virtual Machine Secure Boot : Virtual machines must be booted from the EFI firmware to enable Secure Boot. EFI firmware supports Windows, Linux, and nested ESXi. For Secure Boot to work, the guest OS must also support Secure Boot. Examples include Windows 8 and Windows Server 2012 and newer, VMware Photon™ OS, RHEL/Centos 7.0, Ubuntu 14.04, and ESXi 6.5. It is easy to enable Secure Boot for Virtual Machines by checking the box in the UI.

ESXi Host Secure Boot :

+++++++++++++++++

When Secure Boot is enabled, the UEFI firmware validates the digitally signed kernel of an OS against a digital certificate stored in the UEFI firmware. For ESXi 6.5, this capability is further leveraged by the ESXi kernel, adding cryptographic assurance of ESXi components. ESXi is already composed of digitally signed packages called vSphere installation bundles (VIBs). These packages are never broken open. At boot time, the ESXi file system maps to the content of those packages. By leveraging the same digital certificate in the host UEFI firmware used to validate the signed ESXi kernel, the kernel then validates each VIB using the Secure Boot verifier against the firmware-based certificate, ensuring a cryptographically “clean” boot.

Object Store File System

OSFS (Object Store File System) enables VMFS volumes to be mounted as a single datastore for each host. Data on a VSAN datastore is stored in the form of data containers called objects, which is distributed across the cluster. An object can be a vmdk file, a snapshot, or the VM home folder. A reference object is also created and holds a VMFS volume and stores the virtual machine metadata files.

Logs for OSFS are captured in : /var/log/osfsd.log

How would you make a change in vSAN datastore with osfs-mkdir and osfs-rmdir?

It is not easy to create and remove any directory form VSAN  because the vSAN datastore is object based. In order to do that, special vSAN related commands exist to perform such tasks.

If you try and create a directory on VSAN you might get and error as :

# cd /vmfs/volumes/vsanDatastore

# mkdir TEST

mkdir: can’t create directory ‘TEST’: Function not implemented

How can we create/remove a directory ?

Step 1: Login to Esxi and access the folder …/bin/

# cd /usr/lib/vmware/osfs/bin

Step 2: List the contents 

# ls

objtool     osfs-ls     osfs-mkdir  osfs-rmdir  osfsd

Step 3: Verify that a directory called TEST does not exist

# ls -lh /vmfs/volumes/vsanDatastore/TEST

ls: /vmfs/volumes/vsanDatastore/TEST: No such file or directory

Step 4: Let’s create a Directory with the name TEST using osfs-mkdir

# ./osfs-mkdir /vmfs/volumes/vsanDatastore/TEST

54c0ba65-0c45-xxxx-b1f2-xxxxxxxxxxxx

Step 5: Verify that it exists

# ls -lh /vmfs/volumes/vsanDatastore/TEST

lrwxr-xr-x    1 root     root          12 Jan 09 21:03 /vmfs/volumes/vsanDatastore/TEST ->54c0ba65-0c45-xxxx-b1f2-xxxxxxxxxxxx

Step 6: Let’s try to Delete the directory now using osfs-rmdir

# ./osfs-rmdir /vmfs/volumes/vsanDatastore/TEST

Deleting directory 54c0ba65-0c45-xxxx-b1f2-xxxxxxxxxxxx in container id xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

backed by vsan (force=False)

Step 7: Verify that it has been removed

# ls -lh /vmfs/volumes/vsanDatastore/TEST

ls: /vmfs/volumes/vsanDatastore/TEST: No such file or directory

Storage connectivity to vSphere

The VMware vSphere storage architecture consists of layers of abstraction that hide the differences and manage the complexity among physical storage subsystems.

To the applications and guest operating systems inside each virtual machine, the storage subsystem appears as a virtual SCSI controller connected to one or more virtual SCSI disks. These controllers are the only types of SCSI controllers that a virtual machine can see and access. These controllers include BusLogic Parallel, LSI Logic Parallel, LSI Logic SAS, and VMware Paravirtual.

The virtual SCSI disks are provisioned from datastore elements in the datacenter. A datastore is like a storage appliance that delivers storage space for virtual machines across multiple physical hosts. Multiple datastores can be aggregated into a single logical, load-balanced pool called a datastore cluster.

The datastore abstraction is a model that assigns storage space to virtual machines while insulating the guest from the complexity of the underlying physical storage technology. The guest virtual machine is not exposed to Fibre Channel SAN, iSCSI SAN, direct attached storage, and NAS.

Each datastore is a physical VMFS volume on a storage device. NAS datastores are an NFS volume with VMFS characteristics. Datastores can span multiple physical storage subsystems. A single VMFS volume can contain one or more LUNs from a local SCSI disk array on a physical host, a Fibre Channel SAN disk farm, or iSCSI SAN disk farm. New LUNs added to any of the physical storage subsystems are detected and made available to all existing or new datastores. Storage capacity on a previously created datastore can be extended without powering down physical hosts or storage subsystems. If any of the LUNs within a VMFS volume fails or becomes unavailable, only virtual machines that use that LUN are affected. An exception is the LUN that has the first extent of the spanned volume. All other virtual machines with virtual disks residing in other LUNs continue to function as normal.

Each virtual machine is stored as a set of files in a directory in the datastore. The disk storage associated with each virtual guest is a set of files within the guest’s directory. You can operate on the guest disk storage as an ordinary file. The disk storage can be copied, moved, or backed up. New virtual disks can be added to a virtual machine without powering it down. In that case, a virtual disk file  (.vmdk) is created in VMFS to provide new storage for the added virtual disk or an existing virtual disk file is associated with a virtual machine.

VMFS is a clustered file system that leverages shared storage to allow multiple physical hosts to read and write to the same storage simultaneously. VMFS provides on-disk locking to ensure that the same virtual machine is not powered on by multiple servers at the same time. If a physical host fails, the on-disk lock for each virtual machine is released so that virtual machines can be restarted on other physical hosts.

VMFS also features failure consistency and recovery mechanisms, such as distributed journaling, a failure-consistent virtual machine I/O path, and virtual machine state snapshots. These mechanisms can aid quick identification of the cause and recovery from virtual machine, physical host, and storage subsystem failures.

VMFS also supports raw device mapping (RDM). RDM provides a mechanism for a virtual machine to have direct access to a LUN on the physical storage subsystem (Fibre Channel or iSCSI only). RDM supports two typical types of applications:

SAN snapshot or other layered applications that run in the virtual machines. RDM better enables scalable backup offloading systems using features inherent to the SAN.

Microsoft Clustering Services (MSCS) spanning physical hosts and using virtual-to-virtual clusters as well as physical-to-virtual clusters. Cluster data and quorum disks must be configured as RDMs rather than files on a shared VMFS.

 

Supported Storage Adapters:

+++++++++++++++++++++++++

Storage adapters provide connectivity for your ESXi host to a specific storage unit or network.

ESXi supports different classes of adapters, including SCSI, iSCSI, RAID, Fibre Channel, Fibre Channel over Ethernet (FCoE), and Ethernet. ESXi accesses the adapters directly through device drivers in the VMkernel.

View Storage Adapters Information:

++++++++++++++++++++++++++++++

Use the vSphere Client to display storage adapters that your host uses and to review their information.

Procedure

1: In Inventory, select Hosts and Clusters.

2: Select a host and click the Configuration tab.

3: In Hardware, select Storage Adapters.

4:To view details for a specific adapter, select the adapter from the Storage Adapters list.

5: To list all storage devices the adapter can access, click Devices.

6: To list all paths the adapter uses, click Paths

Types of Physical Storage:

++++++++++++++++++++++

The ESXi storage management process starts with storage space that your storage administrator preallocates on different storage systems.

ESXi supports the following types of storage:

Local Storage : Stores virtual machine files on internal or directly connected external storage disks.

Networked Storage: Stores virtual machine files on external storage disks or arrays attached to your host through a direct connection or through a high-speed network.

Local Storage:

Local storage can be internal hard disks located inside your ESXi host, or it can be external storage systems located outside and connected to the host directly through protocols such as SAS or SATA.

Local storage does not require a storage network to communicate with your host. You need a cable connected to the storage unit and, when required, a compatible HBA in your host.

ESXi supports a variety of internal or external local storage devices, including SCSI, IDE, SATA, USB, and SAS storage systems. Regardless of the type of storage you use, your host hides a physical storage layer from virtual machines.

Networked Storage:

Networked storage consists of external storage systems that your ESXi host uses to store virtual machine files remotely. Typically, the host accesses these systems over a high-speed storage network.

Networked storage devices are shared. Datastores on networked storage devices can be accessed by multiple hosts concurrently. ESXi supports the following networked storage technologies.

Note

Accessing the same storage through different transport protocols, such as iSCSI and Fibre Channel, at the same time is not supported.

Fibre Channel (FC):

Stores virtual machine files remotely on an FC storage area network (SAN). FC SAN is a specialized high-speed network that connects your hosts to high-performance storage devices. The network uses Fibre Channel protocol to transport SCSI traffic from virtual machines to the FC SAN devices.

Fibre Channel Storage

In this configuration, a host connects to a SAN fabric, which consists of Fibre Channel switches and storage arrays, using a Fibre Channel adapter. LUNs from a storage array become available to the host. You can access the LUNs and create datastores for your storage needs. The datastores use the VMFS format.

Internet SCSI (iSCSI):

Stores virtual machine files on remote iSCSI storage devices. iSCSI packages SCSI storage traffic into the TCP/IP protocol so that it can travel through standard TCP/IP networks instead of the specialized FC network. With an iSCSI connection, your host serves as the initiator that communicates with a target, located in remote iSCSI storage systems.

ESXi offers the following types of iSCSI connections:

Hardware iSCSI: Your host connects to storage through a third-party adapter capable of offloading the iSCSI and network processing. Hardware adapters can be dependent and independent.

Software iSCSI :Your host uses a software-based iSCSI initiator in the VMkernel to connect to storage. With this type of iSCSI connection, your host needs only a standard network adapter for network connectivity.

You must configure iSCSI initiators for the host to access and display iSCSI storage devices.

iSCSI Storage depicts different types of iSCSI initiators.

iSCSI Storage

In the left example, the host uses the hardware iSCSI adapter to connect to the iSCSI storage system.

In the right example, the host uses a software iSCSI adapter and an Ethernet NIC to connect to the iSCSI storage.

iSCSI storage devices from the storage system become available to the host. You can access the storage devices and create VMFS datastores for your storage needs.

Network-attached Storage (NAS)

Stores virtual machine files on remote file servers accessed over a standard TCP/IP network. The NFS client built into ESXi uses Network File System (NFS) protocol version 3 to communicate with the NAS/NFS servers. For network connectivity, the host requires a standard network adapter.

NFS Storage

Shared Serial Attached SCSI (SAS)

Stores virtual machines on direct-attached SAS storage systems that offer shared access to multiple hosts. This type of access permits multiple hosts to access the same VMFS datastore on a LUN

What is CLOMD in VSAN??

CLOMD (Cluster Level Object Manager Daemon) plays a key role in the operation of a vSAN cluster. It runs on every ESXi host and is responsible for new object creation, initiating repair of existing objects after failures, all types of data moves and evacuations (For example: Enter Maintenance Mode, Evacuate data on disk removal from vSAN), maintaining balance and thus triggering rebalancing, implementing policy changes, etc. 

It does not actually participate in the data path, but it triggers data path operations and as such is a critical component during a number of management workflows and failure handling scenarios. 

Virtual machine power on, or Storage vMotion to vSAN are two operations where CLOMD is required (and which are not that obvious), as those operations require the creation of a swap object, and object creation requires CLOMD. 

Similarly, starting with vSAN 6.0, memory snapshots are maintained as objects, so taking a snapshot with memory state will also require the CLOMD.

Cluster health – CLOMD liveness check :

This checks if the Cluster Level Object Manager (CLOMD) daemon is alive or not. It does so by first checking that the service is running on all ESXi hosts, and then contacting the service to retrieve run-time statistics to verify that CLOMD can respond to inquiries. 

Note: This does not ensure that all of the functionalities discussed above (For example: Object creation, rebalancing) actually work, but it gives a first level assessment as to the health of CLOMD.

CLOMD ERROR 

If any of the ESXi hosts are disconnected, the CLOMD liveness state of the disconnected host is shown as unknown. If the Health service is not installed on a particular ESXi host, the CLOMD liveness state of all the ESXi hosts is also reported as unknown.

If the CLOMD service is not running on a particular ESXi hosts, the CLOMD liveness state of one host is abnormal.

For this test to succeed, the health service needs to be installed on the ESXi host and the CLOMD service needs to be running. To get the state status of the CLOMD service, on the ESXi host, run this command:

/etc/init.d/clomd status

If the CLOMD health check is still failing after these steps or if the CLOMD health check continues to fail on a regular basis, open a support request with VMware Support.

Examples:

++++++++

In the /var/run/log/clomd.log file, you see logs similar to:

2017-04-19T03:59:32.403Z 120360 (482850097440)(opID:1804289387)CLOMProcessWorkItem: Op REPAIR starts:1804289387 
2017-04-19T03:59:32.403Z 120360 (482850097440)(opID:1804289387)CLOMReconfigure: Reconfiguring aae9cf268-cd5e-abc4-448d-050010d45c96 workItem type REPAIR 
2017-04-19T03:59:32.408Z 120360 (482850097440)(opID:1804289387)CLOMReplacementPreWorkRepair: Repair needed. 1 absent/degraded data components for ae9cf268-cd5e-abc4-448d-050010d45c96 found 

^^^ Here, CLOMD crashed while attempting to repair object with UUID ae9cf268-cd5e-abc4-448d-050010d45c96 . The vSAN health check will report CLOMD liveness issue. A CLOMD restart will fail because each time it is restarted, it will fail again while attempting to repair the 0 sized object. Swap objects can be the only vSAN objects that can be zero sized, so this issue can occus only with swap objects.

Host crash Diagnostic Partitions

A diagnostic partition can be on the local disk where the ESXi software is installed. This is the default configuration for ESXi Installable. You can also use a diagnostic partition on a remote disk shared between multiple hosts. If you want to use a network diagnostic partition, you can install ESXi Dump Collector and configure the networked partition.

The following considerations apply:

>> A diagnostic partition cannot be located on an iSCSI LUN accessed through the software iSCSI or dependent hardware iSCSI adapter. For more information about diagnostic partitions with iSCSI, see General Boot from iSCSI SAN Recommendations in the vSphere Storage documentation.

>> Each host must have a diagnostic partition of 110MB. If multiple hosts share a diagnostic partition on a SAN LUN, the partition should be large enough to accommodate core dumps of all hosts.

>>If a host that uses a shared diagnostic partition fails, reboot the host and extract log files immediately after the failure. Otherwise, the second host that fails before you collect the diagnostic data of the first host might not be able to save the core dump.

Diagnostic Partition Creation:

++++++++++++++++++++++

You can use the vSphere Client to create the diagnostic partition on a local disk or on a private or shared SAN LUN. You cannot use vicfg-dumppart to create the diagnostic partition. The SAN LUN can be set up with FibreChannel or hardware iSCSI. SAN LUNs accessed through a software iSCSI initiator are not supported.

Managing Core Dumps:

+++++++++++++++++++

With esxcli system coredump, you can manage local diagnostic partitions or set up core dump on a remote server in conjunction with ESXi Dump Collector. For information about ESXi Dump Collector, see the vSphere Networking documentation.

Managing Local Core Dumps with ESXCLI:

++++++++++++++++++++++++++++++

The following example scenario changes the local diagnostic partition with ESXCLI. Specify one of the connection options listed in Connection Options in place of <conn_options>.

To manage a local diagnostic partition

1: Show the diagnostic partition the VMkernel uses and display information about all partitions that can be used as diagnostic partitions.

esxcli <conn_options> system coredump partition list

2: Deactivate the current diagnostic partition.

esxcli <conn_options> system coredump partition set –unconfigure

The ESXi system is now without a diagnostic partition, and you must immediately set a new one.

3: Set the active partition to naa.<naa_ID>.

esxcli <conn_options> system coredump partition set –partition=naa.<naa_ID>

4: List partitions again to verify that a diagnostic partition is set.

esxcli <conn_options> system coredump partition list

If a diagnostic partition is set, the command displays information about it. Otherwise, the command shows that no partition is activated and configured.

Managing Core Dumps with ESXi Dump Collector:

++++++++++++++++++++++++++++++++++++

By default, a core dump is saved to the local disk. You can use ESXi Dump Collector to keep core dumps on a network server for use during debugging. ESXi Dump Collector is especially useful for Auto Deploy, but supported for any ESXi 5.0 host. ESXi Dump Collector supports other customization, including sending core dumps to the local disk.

ESXi Dump Collector is included with the vCenter Server autorun.exe application. You can install ESXi Dump Collector on the same system as the vCenter Server service or on a different Windows or Linux machine.

You can configure ESXi Dump Collector by using the vSphere Client or ESXCLI. Specify one of the connection options listed in Connection Options in place of <conn_options>.

To manage core dumps with ESXi Dump Collector:

++++++++++++++++++++++++++++++++++++

1: Set up an ESXi system to use ESXi Dump Collector by running esxcli system coredump.

esxcli <conn_options> system coredump network set –interface-name vmk0 –server-ipv4=1-XX.XXX –port=6500

You must specify a VMkernel port with –interface-name, and the IP address and port of the server to send the core dumps to. If you configure an ESXi system that is running inside a virtual machine, you must choose a VMkernel port that is in promiscuous mode.

2: Enable ESXi Dump Collector.

esxcli <conn_options> system coredump network set –enable=true

3: (Optional) Check that ESXi Dump Collector is configured correctly.

esxcli <conn_options> system coredump network get

The host on which you have set up ESXi Dump Collector sends core dumps to the specified server by using the specified VMkernel NIC and optional port.

Managing Local Core Dumps with ESXCLI:

++++++++++++++++++++++++++++++

The following example scenario changes the local diagnostic partition with ESXCLI. Specify one of the connection options listed in Connection Options in place of <conn_options>.

To manage a local diagnostic partition

1: Show the diagnostic partition the VMkernel uses and display information about all partitions that can be used as diagnostic partitions.

esxcli <conn_options> system coredump partition list

2: Deactivate the current diagnostic partition.

esxcli <conn_options> system coredump partition set –unconfigure

The ESXi system is now without a diagnostic partition, and you must immediately set a new one.

3: Set the active partition to naa.<naa_ID>.

esxcli <conn_options> system coredump partition set –partition=naa.<naa_ID>

4: List partitions again to verify that a diagnostic partition is set.

esxcli <conn_options> system coredump partition list

If a diagnostic partition is set, the command displays information about it. Otherwise, the command shows that no partition is activated and configured.

Additional Information : ESXi Network Dump Collector in VMware vSphere 5.x/6.0 

 

Core Dumps for VSAN

If your vSAN cluster uses encryption, and if an error occurs on the ESXi host, the resulting core dump is encrypted to protect customer data. Core dumps which is present in the vm-support package are also encrypted.

Note:

Core dumps can contain sensitive information. Check with your Data Security Team and Privacy Policy when handling core dumps.

Core Dumps on ESXi Hosts

When an ESXi host crashes, an encrypted core dump is generated and the host reboots. The core dump is encrypted with the host key that is in the ESXi key cache.

  • In most cases, vCenter Server retrieves the key for the host from the KMS and attempts to push the key to the ESXi host after reboot. If the operation is successful, you can generate the vm-support package and you can decrypt or re-encrypt the core dump.

  • If vCenter Server cannot access the ESXi host, you might be able to retrieve the key from the KMS.

  • If the host used a custom key, and that key differs from the key that vCenter Server pushes to the host, you cannot manipulate the core dump. Avoid using custom keys.

Core Dumps and vm-support Packages

When you contact VMware Technical Support because of a serious error, your support representative usually asks you to generate a vm-supportpackage. The package includes log files and other information, including core dumps. If support representatives cannot resolve the issues by looking at log files and other information, you can decrypt the core dumps to make relevant information available. Follow your organization’s security and privacy policy to protect sensitive information, such as host keys.

Core Dumps on vCenter Server Systems

A core dump on a vCenter Server system is not encrypted. vCenter Server already contains potentially sensitive information. At the minimum, ensure that the Windows system on which vCenter Server runs or the vCenter Server Appliance is protected. You also might consider turning off core dumps for the vCenter Server system. Other information in log files can help determine the problem.

Troubleshooting Memory Errors in UCS

Memory errors are encountered when an attempt is made to read a memory location. The value read from the memory does not match the value that is supposed to be there. Classification of Memory Errors Detected Versus Undetected Errors A system without error-correcting code (ECC) memory will not detect hardware errors. Hence, memory errors will silently lead to data corruption, incorrect processing of the operating system or application, and eventually system failures. Cisco Unified Computing System™ (Cisco UCS® ) servers use ECC memory. Therefore, powerful error correcting codes such as those provided by the Intel® Xeon® processors in Cisco UCS servers can detect memory errors so that silent data corruption does not occur.

Hard Versus Soft Errors:

Errors that are caused by a persistent physical defect are traditionally referred to as “hard” errors. A hard error may be caused by an assembly defect such as a solder bridge or cracked solder joint, or may be the result of a defect in the memory chip itself. Rewriting the memory location and retrying the read access will not eliminate a hard error. This error will continue to repeat. Errors caused by a brief electrical disturbance, either inside the DRAM chip or on an external interface, are referred to as “soft” errors. Soft errors are transient and do not continue to repeat. If the soft error was the result of a disturbance during the read operation, then simply retrying the read may yield correct data. If the soft error was caused by a disturbance that upset the contents of the memory array, then rewriting the memory location will correct the error. Hard errors are typically detected by memory tests run by the Cisco UCS BIOS at boot time, and any modules containing hard errors are mapped out so that they cannot cause errors during runtime. Cisco UCS servers employ memory patrol scrubbing to automatically detect and correct soft errors during runtime.

Hard errors are typically detected by memory tests run by the Cisco UCS BIOS at boot time, and any modules containing hard errors are mapped out so that they cannot cause errors during runtime. Cisco UCS servers employ memory patrol scrubbing to automatically detect and correct soft errors during runtime.

Correctable Versus Uncorrectable Errors:

Whether a particular error is correctable or uncorrectable depends on the strength of the ECC code employed in the memory system. Dedicated hardware is able to fix correctable errors when they occur with no impact on program processing. Uncorrectable errors generally cannot be fixed and may make it impossible for the application or operating system to continue processing.

Cisco UCS B-Series and C-Series Operating in UCSM 2.2 and 3.1 :

To reset memory-error counters on a Cisco UCS B-Series or C-Series server in UCSM 2.2 and 3.1, run the following script on the CLI:

ca-1-A# scope server 1/8

ca-1-A /chassis/server # reset-all-memory-errors

ca-1-A /chassis/server* # commit

Cisco UCS B-Series and C-Series Operating in UCSM 2.1 :

To reset memory-error counters on a Cisco UCS B-Series or C-Series server in UCSM 2.1, run the following script on the CLI:

Switch-A # scope server 1/1

Switch-A /chassis/server # scope memory-array 1

Switch-A /chassis/server/memory-array # scope dimm 2

Switch-A /chassis/server/memory-array/dimm # reset-errors

Cisco UCS C-Series Rack Servers Operating in Standalone Mode

To reset memory-error counters on a Cisco UCS C-Series Rack Server operating in standalone mode, run the following script on the CLI:

C240-FCH092779J# scope reset-ecc

C240-FCH092779J /reset-ecc # set enabled yes

C240-FCH092779J /reset-ecc *# commit

 

For additional information about memory, please refer to these resources: DIMMs: Reasons to Use Only Cisco Qualified Memory on Cisco UCS Servers