Clone Operation (New-VM) and Storage vMotion(Move-VM)

In VMware PowerCLI, New-VM and Move-VM are two distinct cmdlets used for different purposes related to virtual machines. VAAI (vStorage APIs for Array Integration) is a VMware feature that offloads certain storage operations to the storage array to improve performance and efficiency. While VAAI is not directly related to the New-VM and Move-VM cmdlets, I will provide examples of how these cmdlets are used, and then explain the relationship between VAAI and storage operations.

  1. New-VM: The New-VM cmdlet is used to create a new virtual machine (VM) within a specified host or cluster. It allows you to define various configuration settings for the new VM, such as the VM name, guest operating system, CPU, memory, disk, network settings, and more.

Example of New-VM:

# Create a new virtual machine
New-VM -Name "NewVM" -VMHost "ESXiHost" -Datastore "Datastore1" -MemoryGB 4 -NumCPU 2 -NetworkName "VM Network" -DiskGB 50

In this example, the New-VM cmdlet is used to create a new VM named “NewVM” on the host “ESXiHost,” with 4GB of memory, 2 CPUs, connected to the “VM Network” for networking, and a 50GB virtual disk on “Datastore1.”

  1. Move-VM: The Move-VM cmdlet is used to migrate a VM from one host or datastore to another. It allows you to perform live migrations (vMotion) or cold migrations (Storage vMotion) of VMs across hosts or datastores in a vSphere environment.

Example of Move-VM:

# Migrate a virtual machine to a different datastore
Move-VM -VM "MyVM" -Datastore "NewDatastore"

In this example, the Move-VM cmdlet is used to migrate the VM named “MyVM” to a different datastore named “NewDatastore.”

Now, let’s briefly discuss VAAI:

VAAI (vStorage APIs for Array Integration): VAAI is a set of APIs provided by VMware that allows vSphere to offload certain storage operations from the ESXi hosts to the storage array. This offloading improves performance and efficiency by leveraging the capabilities of the underlying storage hardware.

Examples of storage operations offloaded to VAAI-enabled storage arrays include:

  • Hardware Accelerated Copy (HWCOPY): Improves VM cloning and snapshot operations by using the storage array to perform data copies.
  • Zero Block Detection (Zero): Allows the storage array to automatically handle zeroed-out blocks, reducing the burden on the ESXi host and improving storage efficiency.
  • Full Copy (HWFCOPY): Facilitates storage vMotion by performing fast and efficient data movement between datastores using the storage array’s capabilities.

In VMware PowerCLI, the Move-VM cmdlet automatically leverages VAAI (vStorage APIs for Array Integration) if the underlying storage array supports VAAI and the necessary VAAI primitives are enabled. VAAI allows the storage array to offload certain storage operations, making VM migrations faster and more efficient. Let’s take a look at an example of using Move-VM with VAAI:

# Connect to vCenter Server
Connect-VIServer -Server "vcenter.example.com" -User "username" -Password "password"

# Define the source VM and its current datastore
$sourceVM = Get-VM -Name "MyVM"
$sourceDatastore = Get-Datastore -VM $sourceVM

# Define the destination datastore
$destinationDatastore = Get-Datastore -Name "NewDatastore"

# Perform the VM migration using VAAI (Storage vMotion)
Move-VM -VM $sourceVM -Datastore $destinationDatastore -Force -DiskStorageFormat Thin

# Disconnect from vCenter Server
Disconnect-VIServer -Server "vcenter.example.com" -Confirm:$false

In this example, we perform a Storage vMotion using the Move-VM cmdlet with VAAI. Here’s what each step does:

  1. We start by connecting to the vCenter Server using Connect-VIServer.
  2. We define the source VM ($sourceVM) for the migration and get the current datastore ($sourceDatastore) where the VM is located.
  3. Next, we define the destination datastore ($destinationDatastore) where we want to move the VM.
  4. Finally, we use the Move-VM cmdlet to perform the VM migration. The -DiskStorageFormat Thin parameter specifies that the virtual disks should be moved with thin provisioning. The -Force parameter is used to suppress any confirmation prompts during the migration.

The Move-VM cmdlet will automatically utilize VAAI primitives if they are supported and enabled on the underlying storage array. VAAI accelerates the data movement between datastores, resulting in faster and more efficient VM migrations.

NAS Troubleshooting

Troubleshooting network-attached storage (NAS) issues is essential for maintaining optimal performance and data availability. NAS serves as a central repository for data, and any problems can impact multiple users and applications. In this comprehensive guide, we’ll explore common NAS troubleshooting scenarios, along with examples and best practices for resolving issues.

Table of Contents:

  1. Introduction to NAS Troubleshooting
  2. Network Connectivity Issues
    • Example 1: NAS Unreachable on the Network
    • Example 2: Slow Data Transfer Speeds
    • Example 3: Intermittent Connection Drops
  3. NAS Configuration and Permissions Issues
    • Example 4: Incorrect NFS Share Permissions
    • Example 5: Incorrect SMB Share Configuration
    • Example 6: Invalid iSCSI Initiator Settings
  4. Storage and Disk-Related Problems
    • Example 7: Disk Failure or Degraded RAID Array
    • Example 8: Low Disk Space on NAS
    • Example 9: Disk S.M.A.R.T. Errors
  5. Performance Bottlenecks and Load Balancing
    • Example 10: Network Bottleneck
    • Example 11: CPU or Memory Overload
    • Example 12: Overloaded Disk I/O
  6. Firmware and Software Updates
    • Example 13: Outdated NAS Firmware
    • Example 14: Compatibility Issues with OS Updates
  7. Backup and Disaster Recovery Concerns
    • Example 15: Backup Job Failures
    • Example 16: Data Corruption in Backups
  8. Security and Access Control
    • Example 17: Unauthorized Access Attempts
    • Example 18: Ransomware Attack on NAS
  9. NAS Logs and Monitoring
    • Example 19: Analyzing NAS Logs
    • Example 20: Proactive Monitoring and Alerts
  10. Best Practices for NAS Troubleshooting

1. Introduction to NAS Troubleshooting:

Troubleshooting NAS issues requires a systematic approach and an understanding of the NAS architecture, networking, storage, and access protocols (NFS, SMB/CIFS, iSCSI). It is crucial to gather relevant information, perform tests, and use appropriate tools for diagnostics. In this guide, we’ll cover various scenarios and provide step-by-step solutions for each.

2. Network Connectivity Issues:

Network connectivity problems can cause NAS access failures or slow performance.

Example 1: NAS Unreachable on the Network

Symptoms: The NAS is not accessible from client machines, and it does not respond to ping requests.

Possible Causes:

  • Network misconfiguration (IP address, subnet mask, gateway)
  • Network switch or cable failure
  • Firewall or security rules blocking NAS traffic

Solution Steps:

  1. Check network configurations on the NAS and clients to ensure correct IP settings and subnet masks.
  2. Test network connectivity using the ping command to verify if the NAS is reachable from clients.
  3. Check for physical network issues such as faulty cables or switch ports.
  4. Review firewall and security settings to ensure that NAS traffic is allowed.

Example 2: Slow Data Transfer Speeds

Symptoms: Data transfers to/from the NAS are unusually slow, affecting file access and application performance.

Possible Causes:

  • Network congestion or bandwidth limitations
  • NAS hardware limitations (e.g., slow CPU, insufficient memory)
  • Disk performance issues (slow HDDs or degraded RAID arrays)

Solution Steps:

  1. Use network monitoring tools to identify any bottlenecks or network congestion.
  2. Check NAS hardware specifications to ensure it meets the workload requirements.
  3. Review disk health and RAID status for any disk failures or degraded arrays.
  4. Optimize network settings, such as jumbo frames and link aggregation (if supported).

Example 3: Intermittent Connection Drops

Symptoms: NAS connections drop intermittently, causing data access disruptions.

Possible Causes:

  • Network instability or intermittent outages
  • NAS firmware or driver issues
  • Overloaded NAS or network components

Solution Steps:

  1. Monitor the network for intermittent failures and investigate the root cause.
  2. Check for firmware updates for the NAS and network components to address known issues.
  3. Review NAS resource utilization (CPU, memory, and storage) during connection drops.
  4. Investigate any client-side issues that may be causing disconnects.

3. NAS Configuration and Permissions Issues:

Incorrect NAS configurations or permission settings can lead to access problems for users and applications.

Example 4: Incorrect NFS Share Permissions

Symptoms: Clients are unable to access NFS shares or face “permission denied” errors.

Possible Causes:

  • Incorrect NFS export configurations on the NAS
  • Mismatched UID/GID on the client and server
  • Firewall or SELinux blocking NFS traffic

Solution Steps:

  1. Verify NFS export configurations on the NAS, including allowed clients and permissions.
  2. Check UID/GID mappings between the client and server to ensure consistency.
  3. Disable firewall or SELinux temporarily to rule out any blocking issues.

Example 5: Incorrect SMB Share Configuration

Symptoms: Windows clients cannot access SMB/CIFS shares on the NAS.

Possible Causes:

  • SMB version compatibility issues between clients and NAS
  • Domain or workgroup mismatch
  • Incorrect SMB share permissions

Solution Steps:

  1. Ensure the NAS supports the required SMB versions compatible with the client OS.
  2. Check the domain or workgroup settings on both the NAS and client systems.
  3. Verify SMB share permissions on the NAS to grant appropriate access.

Example 6: Invalid iSCSI Initiator Settings

Symptoms: iSCSI initiators fail to connect or experience slow performance.

Possible Causes:

  • Incorrect iSCSI target settings on the NAS
  • Network misconfiguration between initiator and target
  • Initiator authentication issues

Solution Steps:

  1. Verify iSCSI target configurations on the NAS, including allowed initiators.
  2. Check network settings (IP addresses, subnet masks, and gateways) between initiator and target.
  3. Review authentication settings for the iSCSI target to ensure proper access.

4. Storage and Disk-Related Problems:

Storage-related issues can impact NAS performance and data availability.

Example 7: Disk Failure or Degraded RAID Array

Symptoms: Disk errors reported by the NAS, or degraded RAID status.

Possible Causes:

  • Disk failure due to hardware issues
  • RAID array degradation from multiple disk failures
  • Unrecognized disks or disk format issues

Solution Steps:

  1. Identify the failed disks and replace them following RAID rebuild procedures.
  2. Monitor RAID rebuild status to ensure data redundancy is restored.
  3. Check for unrecognized disks or disks with incompatible formats.

Example 8: Low Disk Space on NAS

Symptoms: The NAS is running low on storage space, leading to performance degradation and potential data loss.

Possible Causes:

  • Insufficient capacity planning for data growth
  • Uncontrolled data retention or lack of data archiving

Solution Steps:

  1. Monitor NAS storage capacity regularly and plan for adequate storage expansion.
  2. Implement data retention policies and archive infrequently accessed data.

Example 9: Disk S.M.A.R.T. Errors

Symptoms: Disk S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) alerts indicating potential disk failures.

Possible Causes:

  • Disk age and wear leading to potential failures
  • Disk temperature or environmental issues affecting disk health

Solution Steps:

  1. Review S.M.A.R.T. data and take appropriate action based on predictive failure alerts.
  2. Ensure proper cooling and environmental conditions to preserve disk health.

5. Performance Bottlenecks and Load Balancing:

Performance bottlenecks can hamper NAS responsiveness and affect data access.

Example 10: Network Bottleneck

Symptoms: The network becomes a performance bottleneck due to high data transfer demands.

Possible Causes:

  • Insufficient network bandwidth for concurrent data access
  • Suboptimal network configuration for NAS traffic

Solution Steps:

  1. Monitor network utilization and identify potential bottlenecks.
  2. Upgrade network infrastructure to higher bandwidth if necessary.
  3. Optimize network settings, such as link aggregation, for NAS traffic.

Example 11: CPU or Memory Overload

Symptoms: NAS performance suffers due to high CPU or memory utilization.

Possible Causes:

  • Heavy concurrent workload on the NAS
  • Insufficient NAS hardware resources for the workload

Solution Steps:

  1. Monitor NAS resource utilization (CPU, memory) during peak usage times.
  2. Optimize NAS settings or upgrade hardware to handle the workload.

Example 12: Overloaded Disk I/O

Symptoms: Disk I/O becomes a performance bottleneck, leading to slow data access.

Possible Causes:

  • Excessive I/O from multiple clients or applications
  • Disk caching and read/write operations impacting performance

Solution Steps:

  1. Monitor disk I/O usage and identify any spikes or patterns of high usage.
  2. Consider adding more disks to the NAS to distribute I/O loads.

6. Firmware and Software Updates:

Keeping NAS firmware and software up-to-date is essential for stability and performance.

Example 13: Outdated NAS Firmware

Symptoms: NAS stability or performance issues caused by outdated firmware.

Possible Causes:

  • Known bugs or performance improvements in newer firmware versions
  • Incompatibility issues with client devices or applications

Solution Steps:

  1. Check the manufacturer’s website for the latest NAS firmware updates.
  2. Plan a scheduled maintenance window to apply firmware updates after thorough testing.

Example 14: Compatibility Issues with OS Updates

Symptoms: Issues accessing the NAS after OS updates on client machines.

Possible Causes:

  • Changes in SMB/NFS/iSCSI protocols affecting compatibility
  • Firewall or security settings blocking access after OS updates

Solution Steps:

  1. Verify NAS compatibility with the updated OS versions on client devices.
  2. Review firewall or security settings on the NAS and clients for any blocking issues.

7. Backup and Disaster Recovery Concerns:

Ensuring robust backup and disaster recovery processes is vital for data protection.

Example 15: Backup Job Failures

Symptoms: Scheduled backup jobs on the NAS fail to complete successfully.

Possible Causes:

  • Insufficient storage space for backups
  • Backup software configuration issues

Solution Steps:

  1. Check backup logs to identify the cause of failure, such as disk space issues or network errors.
  2. Verify backup software settings and reconfigure if necessary.

Example 16: Data Corruption in Backups

Symptoms: Backup data integrity issues, indicating potential data corruption.

Possible Causes:

  • Unreliable storage media for backups
  • Software or hardware issues during the backup process

Solution Steps:

  1. Perform data integrity checks on backup files regularly.
  2. Consider using redundant storage media for backups, such as tape or cloud storage.

8. Security and Access Control:

Ensuring secure access to the NAS is essential to protect data from unauthorized access and attacks.

Example 17: Unauthorized Access Attempts

Symptoms: Unusual login attempts or security events on the NAS.

Possible Causes:

  • Unauthorized users attempting to access the NAS
  • Brute force attacks or compromised credentials

Solution Steps:

  1. Review NAS logs for any suspicious login attempts and security events.
  2. Strengthen NAS security measures, such as using strong passwords and enabling two-factor authentication.

Example 18: Ransomware Attack on NAS

Symptoms: Data on the NAS becomes inaccessible, and files are encrypted with ransomware.

Possible Causes:

  • NAS access exposed to the internet without proper security measures
  • Weak access controls and lack of data protection mechanisms

Solution Steps:

  1. Isolate the NAS from the network to prevent further damage.
  2. Restore data from backups and verify data integrity.
  3. Review NAS security measures to prevent future ransomware attacks.

9. NAS Logs and Monitoring:

NAS logs and proactive monitoring help identify potential issues and allow for quick resolution.

Example 19: Analyzing NAS Logs

Symptoms: NAS performance issues or access problems with no apparent cause.

Possible Causes:

  • Undetected errors or issues recorded in NAS logs
  • Resource exhaustion or system errors leading to performance degradation

Solution Steps:

  1. Regularly review NAS logs for any unusual events or error messages.
  2. Use log analysis tools to identify patterns and potential issues.

Example 20: Proactive Monitoring and Alerts

Symptoms: NAS problems go unnoticed until they impact users or applications.

Possible Causes:

  • Lack of proactive monitoring and alerting for NAS health and performance
  • Inadequate or misconfigured monitoring tools

Solution Steps:

  1. Implement proactive monitoring for NAS health, resource utilization, and performance.
  2. Set up alerts for critical events to enable timely response to potential issues.

10. Best Practices for NAS Troubleshooting:

To ensure effective NAS troubleshooting, follow these best practices:

  1. Documentation: Maintain comprehensive documentation of NAS configurations, network topology, and access permissions.
  2. Backup and Restore: Regularly back up critical NAS configurations and data to facilitate recovery in case of issues.
  3. Testing and Staging: Test firmware updates and configuration changes in a staging environment before applying them to production NAS.
  4. Network Segmentation: Segment the NAS network from the general network to enhance security and prevent unauthorized access.
  5. Regular Maintenance: Schedule regular maintenance windows to perform firmware updates, disk checks, and system health evaluations.
  6. Monitoring and Alerting: Implement proactive monitoring and set up alerts to detect issues and respond quickly.
  7. Security Hardening: Apply security best practices to the NAS, including secure access controls, strong passwords, and two-factor authentication.
  8. Collaboration: Foster collaboration between IT teams, including networking, storage, and server administrators, to address complex issues.

Conclusion:

Troubleshooting NAS issues involves a methodical approach, understanding of NAS architecture, and use of appropriate tools. By addressing common scenarios such as network connectivity problems, configuration issues, storage-related problems, performance bottlenecks, and security concerns, administrators can maintain the availability, performance, and data integrity of their NAS infrastructure. Implementing best practices and proactive monitoring ensures that NAS environments remain robust and reliable, meeting the demands of modern data-driven enterprises.

Troubleshooting Virtual Machines with vmkfstools

A Comprehensive Guide Introduction: Vmware provides administrators with a powerful command-line tool called vmkfstools, which is designed to troubleshoot and manage virtual machine (VM) disk files. With vmkfstools, administrators can perform various tasks such as checking disk consistency, resizing disks, repairing corrupted files, and migrating virtual disks between datastores. In this comprehensive guide, we will explore the features and capabilities of vmkfstools, along with practical examples and best practices for troubleshooting virtual machines using this powerful tool.

1. Understanding vmkfstools: Vmkfstools is a command-line utility that comes bundled with VMware ESXi. It provides a set of commands for managing and troubleshooting VM disk files. With vmkfstools, administrators can perform tasks such as creating, cloning, resizing, and repairing virtual disks. Additionally, it offers various options for disk format conversions, disk integrity checks, and disk defragmentation.

2. Checking Disk Consistency: One of the primary use cases for vmkfstools is to check the consistency of VM disk files. This is particularly useful in scenarios where a VM is experiencing disk-related issues or encountering errors. The following vmkfstools command can be used to check the consistency of a virtual disk:

vmkfstools -t0 <path_to_vmdk_file>

This command performs a disk-level consistency check and verifies the integrity of the virtual disk file. It checks for any inconsistencies, errors, or corruption within the disk file. If any issues are found, vmkfstools provides error messages that can help diagnose and troubleshoot the problem.

3. Repairing Corrupted VM Disk Files: In cases where vmkfstools detects corruption or inconsistencies in a VM disk file, it is possible to attempt a repair using the following command:

vmkfstools -x <repair_option> <path_to_vmdk_file>

The “ can be one of the following: – `-x c`: This option attempts to repair the VM disk file by fixing corrupted or inconsistent data structures. It is recommended to take a backup of the disk file before attempting this repair option. – `-x r`: This option performs a recovery scan on the disk file and attempts to recover any readable data. It is useful in scenarios where the disk file has become partially or completely unreadable.

4. Resizing VM Disks: Vmkfstools also allows administrators to resize virtual disks, either increasing or decreasing their capacity. The following command can be used to resize a virtual disk:

vmkfstools -X <new_size> <path_to_vmdk_file>

The “ parameter specifies the desired new size of the virtual disk. This command can be used to increase or decrease the disk size, depending on the requirements. However, it is important to note that decreasing the size of a virtual disk may result in data loss if the existing data exceeds the new disk size.

5. Converting Disk Formats: Vmkfstools provides the ability to convert virtual disk formats, which can be useful when migrating VMs between different storage platforms or when upgrading to a newer version of VMware. The following command can be used to convert the disk format:

vmkfstools -i <source_vmdk_file> -d <destination_disk_format> <path_to_destination_vmdk_file>

The “ parameter specifies the path to the source virtual disk file, while the “ parameter specifies the desired format for the destination disk. Common disk formats include VMDK (default), VHD, and RAW. This command allows for seamless conversion between different disk formats.

6. Migrating VM Disks: Vmkfstools enables administrators to migrate virtual disks between datastores, which can be useful for load balancing, storage consolidation, or moving VMs to faster storage. The following command can be used to migrate a virtual disk:

vmkfstools -i <source_vmdk_file> -d <disk_format> -m <migration_option> <path_to_destination_vmdk_file>

The “ parameter specifies the migration option, which can be one of the following: – `p`: This option performs a “full copy” migration, where the entire virtual disk is copied to the destination datastore. This option is suitable for small-sized disks or when a complete copy is required. – `s`: This option performs a “sparse copy” migration, where only the used blocks of the virtual disk are copied to the destination datastore. This option is suitable for large-sized disks to save time and storage space.

7. Disk Defragmentation: Vmkfstools provides the ability to defragment virtual disks, which can help improve disk performance and optimize storage utilization. The following command can be used to defragment a virtual disk:

vmkfstools -K <path_to_vmdk_file>

This command initiates a defragmentation process on the specified virtual disk.

VMware High Availability (HA) Block Calculation

VMware High Availability (HA) is a critical feature in VMware vSphere that ensures the availability of virtual machines (VMs) in the event of host failures. HA uses a cluster of ESXi hosts to provide automatic failover and restart of VMs on surviving hosts. To achieve this, HA relies on a block calculation mechanism that determines the number of host failures a cluster can tolerate. In this deep dive, we will explore the HA block calculation process in VMware, including the underlying concepts, factors affecting the calculation, and best practices for optimizing HA in your vSphere environment.

1. Understanding VMware High Availability (HA): VMware HA is a feature that provides automated recovery of VMs in the event of host failures. It monitors the health of ESXi hosts and VMs and ensures that VMs are restarted on surviving hosts to minimize downtime.

2. HA Block Calculation – An Overview: The HA block calculation is a crucial step in determining the number of host failures a cluster can tolerate without impacting VM availability. It considers various factors such as host resources, VM reservation, and the cluster’s admission control policy.

3. Factors Affecting HA Block Calculation: Several factors influence the HA block calculation process. Understanding these factors is essential for accurately determining the number of host failures a cluster can tolerate:

a. Host Resources: – CPU and Memory: The total CPU and memory resources available across the cluster impact the block calculation. Each host’s CPU and memory capacity contribute to the overall cluster resources.

b. VM Reservation: – VM Reservation: VMs can have reserved resources, such as CPU and memory, which are guaranteed resources that cannot be used by other VMs or processes. These reservations impact the available resources for calculating the HA block.

c. Admission Control Policy: – Slot Size: The slot size is a key component of the admission control policy. It represents the resource requirements (CPU and memory) of a single VM in the cluster. The slot size is used to calculate the number of slots available in the cluster.

4. HA Block Calculation Process: The HA block calculation process involves the following steps: a. Determining the Host Failover Capacity:

– Calculate the total CPU and memory resources available in the cluster by summing up the resources across all hosts.

– Subtract the reserved resources (if any) from the total cluster resources. – Divide the remaining resources by the slot size to determine the number of host failover capacity.

b. Determining the Number of Host Failures:

– Divide the host failover capacity by the number of slots per host to calculate the number of host failures the cluster can tolerate.

5. Best Practices for Optimizing HA Block Calculation: To optimize the HA block calculation and ensure efficient VM failover in your vSphere environment, consider the following best practices:

a. Right-Sizing VMs:

– Avoid over-provisioning VMs with excessive CPU and memory reservations. Right-size the VMs to ensure efficient resource utilization.

b. Proper Slot Size Configuration:

– Configure the slot size appropriately based on the resource requirements of your VMs. An accurate slot size ensures optimal calculation of host failover capacity.

c. Monitoring and Capacity Planning:

– Regularly monitor the resource utilization across the cluster to identify potential bottlenecks or capacity constraints. Use capacity planning tools to forecast future resource requirements.

d. Network and Storage Considerations: – Ensure that the network and storage infrastructure can handle the increased load during VM failover events. Proper network and storage design can significantly impact HA performance.

6. Advanced HA Configurations: VMware offers advanced HA configurations that can enhance the availability and resilience of your vSphere environment. These configurations include:

a. HA Admission Control Policies: – Explore different admission control policies such as Host Failures Cluster Tolerates (default), Percentage of Cluster Resources Reserved, and Specify Failover Hosts to align with your specific requirements.

b. Proactive HA: – Implement Proactive HA to detect and respond to potential host failures before they happen. Proactive HA integrates with hardware vendors’ management tools to monitor hardware health and trigger VM migrations.

c. VM-Host Affinity Rules: – Use VM-Host Affinity Rules to enforce VM placement rules, ensuring that specific VMs are always placed on certain hosts. This can help maintain application dependencies or licensing requirements during failover events.

7. Troubleshooting HA Block Calculation Issues: If you encounter issues with HA block calculation or VM failover, consider the following troubleshooting steps:

a. Validate Network and Storage Connectivity:

– Ensure that the network and storage connectivity between hosts is functioning correctly. Verify that VMkernel ports and storage paths are properly configured.

b. Review VM Reservations and Resource Usage:

– Check the reservations and resource usage of individual VMs. Ensure that VMs are not overcommitted or have excessive reservations that impact the block calculation.

c. Verify HA Configuration:

– Review the HA configuration settings, including admission control policies and slot size configurations. Ensure they align with your desired HA behavior and resource requirements.

d. Check Host and Cluster Health:

– Monitor the health status of hosts and clusters using vSphere Health Check and vRealize Operations Manager. Identify and resolve any underlying issues that may impact HA block calculation.

Conclusion: Understanding the HA block calculation process in VMware High Availability is crucial for ensuring the availability and resilience of your virtual infrastructure. By considering factors such as host resources, VM reservations, and admission control policies, you can accurately determine the number of host failures a cluster can tolerate. Implementing best practices, optimizing VM sizing, and considering advanced HA configurations can further enhance the effectiveness of HA in your vSphere environment. By following these guidelines, you will be better equipped to manage and troubleshoot HA block calculation issues, ensuring high availability for your critical VM workloads.

Guest and HA Application Monitoring SDK Programming

You can download the Guest SDK for monitoring guest virtual machine statistics, and with facilities for High Availability (HA) Application Monitoring. The SDK version number is 10.2.

HA Application Monitoring. The vSphere High Availability (HA) feature for ESXi hosts in a cluster provides protection for a guest OS and applications running in a virtual machine by restarting the virtual machine if a failure occurs. Using the HA Application Monitoring APIs, developers can write software to monitor guest OS and process heartbeat.

Guest SDK. The vSphere Guest SDK provides read-only APIs for monitoring various virtual machine statistics. Management agents running in the guest OS of a virtual machine can use this data for reacting to changes in the application layer.

Compatibility Notices

HA Application Monitoring applications must be recompiled to work with vSphere 6.0 because of changes to the communication interface (see below).

For vSphere 6.0, HA Application Monitoring communication was revised to use the VMCI (virtul machine communication interface). The VMCI driver is preinstalled in Linux kernel 3.9 and higher, and in earlier kernel versions can be installed with VMware Tools. On Windows, VMware Tools must be installed to obtain the VMCI driver.

This SDK supports C and C++ programming languages. You can support Java with wrapper classes, as in JNI.

Changes and New Features

The checksystem utility to verify proper glib version was added in the vSphere 6.5 release.

Tools for fetching extended guest statistics were added in vSphere 6.0, but not publicly documented until April 2015.

In the vSphere 6.0 release, high availability VM component protection, and FT (fault tolerance) has been extended for symmetric multiprocessing (SMP). Also, the communication interface was changed to use VMCI.

In the vSphere 5.5 release, the HA application monitoring facility was changed to reset the guest virtual machine if the application monitoring program requested a reset. Before HA application monitoring had to determine when the guest stopped sending a heartbeat.

In vSphere 5.1, HA Application Monitoring facilities were merged into the Guest SDK previously available.

Known Issues and Workarounds

Security enforcement for the Guest and HA application monitoring SDK using the secure authentication VMX parameter guest_rpc.rpci.auth.app.APP_MONITOR=TRUE does not work for FT (fault tolerant) VMs. The vSphere platform supports only the non-secure channel for FT virtual machines.

Displaying vSphere Guest Library Statistics :

On a Linux virtual machine hosted by ESXi, go to the include directory and compile the vmGuestLibTest.c program. Run the output program vmguestlibtest. gcc -g -o vmguestlibtest -ldl vmGuestLibTest.c ./vmguestlibtest Guest statistics appear repeatedly until you interrupt the program.

Controlling the Application Monitoring Heartbeat :

To run HA application monitoring programs, the virtual machine must be running on an ESXi host, and application monitoring must have been enabled when configuring HA.

You can enable heartbeats with the compiled vmware-appmonitor program.

Usage is as follows: vmware-appmonitor { enable | disable | markActive | isEnabled | getAppStatus | postAppState }

>>enable – Enable application heartbeat so vSphere HA starts listening and monitoring the heartbeat count from this guest virtual machine. The heartbeats should be sent at least once every 30 seconds.

>>disable – Disable the application heartbeat so vSphere HA stops listening to heartbeats from this guest.

>>markActive – This starts sending the actual heartbeat every 30 seconds or less.

>>isEnabled – Indicates whether the heartbeat monitoring was enabled.

>>getAppStatus – Gets the status of the application, either Green, Red, or Gray.
>>postAppState – Posts the state of the application. Arguments can be:

>>appStateOk – Sends an “Application State is OK” signal to the HA agent running on the host.

>>appStateNeedReset – Sends an “Immediate Reset” signal to the HA agent running on the host.

Compiling the Sample Program on Linux:

You need a C compiler and the make program.

Procedure

1: Go to the docs/VMGuestAppMonitor/samples/C directory.

2: Run the make command. On a 64-bit machine you might want to change lib32 to lib64 in the makefile.

3: Set LD_LIBRARY_PATH as described above.

4: Run the sample program. See below for program usage. ./sample

Compiling Sample Programs on Windows :

You need Visual Studio 2008 or later.

Procedure

1: Go to the docs/VMGuestAppMonitor/samples/visualstudio folder.

2: Open the appmon.vcproj file and build the solution.

3: Click Debug > Start Debugging to run appmon.exe. See below for program usage

Demonstrating the HA Application Monitoring API The sample program enables HA application monitoring and sends a heartbeat every 15 seconds. After the program starts running, typing Ctrl+C displays three choices:

s – stop sending heartbeats and exit the program. The virtual machine will reset.

d – disable application monitoring and exit the program. This does not cause a reset.

c – continue sending heartbeats.

For further references please check : https://code.vmware.com/web/sdk/6.7/vsphere-guest

VDDK 6.7.1

The Virtual Disk Development Kit (VDDK) 6.7.1 is an update to support vSphere 6.7 Update 1 and to resolve issues discovered in previous releases. VDDK 6.7 added support for ESXi 6.7 and vCenter Server 6.7, and was tested for backward compatibility against vSphere 6.0 and 6.5.

VDDK is used with vSphere Storage APIs for Data Protection (VADP) to develop backup and restore software. For general information about this development kit, how to obtain the software, programming details, and redistribution, see the VDDK landing page on VMware {Code}.

The VMware policy concerning backward and forward compatibility is for VDDK to support N-2 and N+1 releases. In other words, VDDK 6.7 and all its update releases support vSphere 6.0, 6.5 (except for new features) and the next major release.

Changes and New Features

The VixMntapi library on Linux systems now supports:

  • Advanced transports modes: HotAdd, SAN, and NBD/NBDSSL.
  • Read-only mounting of VMDK files.
  • Diagnostic logging as set by vixMntapi.cfg.LogLevel in the VDDK configuration file. Levels are the same as for vixDiskLib.transport – Panic, Error, Warning, Audit, Info, Verbose, Trivia. The output file named vixMntapi.log appears in the same directory as other log files. Not available for Windows.

In addition to those previously qualified for use as a backup proxy, the following operating systems were tested with VDDK 6.7.1:

  • Red Hat Enterprise Linux RHEL 6.9
  • CentOS 7.4
  • SUSE Linux Enterprise Server SLES 15
  • Windows Server 2016 version 1803

Compatibility Notices

In earlier releases it was an error to close parentHandle after VixDiskLib_Attach succeeds. The VDDK library now marks parentHandle internally to prevent closure and ensure cleanup. Proper calling sequences are as follows:

  1. First open a disk for attach with this call:
    VixDiskLib_Open(remoteConnection, virtualDiskPath, flags, &parentHandle);
  2. Create a local connection using: VixDiskLib_Connect(NULL, &localConnection);
  3. With the backed-up disk (referred to as parent disk) still open, make this call, creating the child disk with a unique name: VixDiskLib_CreateChild(parentHandle, “C:\tmp.vmdk”, VIXDISKLIB_DISK_MONOLITHIC_SPARSE, NULL, NULL);
  4. Open tmp.vmdk (referred to as the redo log): VixDiskLib_Open(localConnection, “C:\tmp.vmdk”, VIXDISKLIB_FLAG_OPEN_SINGLE_LINK, &redoHandle);
  5. Attach the redo log to its parent disk with: VixDiskLib_Attach(parentHandle, redoHandle);

If VixDiskLib_Attach fails, now the system automatically cleans up the local file handle.

  1. To end, close the redo log. Whether to close the parent disk handle is release dependent:
    VixDiskLib_Close(redoHandle);
    if (VIXDISKLIB_VERSION_MAJOR > 7) {
    VixDiskLib_Close(parentHandle); // to avoid memory leaks
    }
  2. Unlink the redo log from the parent disk.

Recently Resolved Issues

The VDDK 6.7.1 release resolves the following issues.

  • XML library upgraded.We XML library libxml2 was upgraded from version 2.9.6 to version 2.9.8 because of a known security vulnerability.
  • Open SSL library upgraded.The Open SSL library openssl was upgraded from an earlier version to version 1.0.2p because of a known security vulnerability.
  • NBD transport in VDDK 6.7 is slow when running against vSphere 6.5.When data protection software is compiled with VDDK 6.7 libraries, NBD/NBDSSL mode backup and restore is significantly slower than before on vSphere 6.5 or 6.0. This was caused by dropping the OPEN_BUFFEREDflag when it became the default in VDDK 6.7. This backward compatibility issue is fixed in the VDDK 6.7.1 release. When performance is important, VMware recommends use of NBD Asynchronous I/O, calling VixDiskLib_WriteAsync and VixDiskLib_Wait.
  • With HotAdd transport VDDK could hang after many NFC connections.For programs compiled with VDDK 6.7 or 6.7 EP1 libraries, VDDK may eventually hang in VixDiskLib_Open when building server connections to ESXi. After the log entry “NBD_ClientOpen: attempting to create connection” VDDK hangs. The cause is that after HotAdd mode retrieves the disk signature, it fails to close the NFC connection, so many NFC server threads continue running and prevent new NFC connections. This regression is fixed in the VDDK 6.7.1 release.
  • HotAdd backup of VM template crashed if NoNfcSession was enabled.In VDDK 6.5.1 and later when it became available to avoid creating an NFC session for backup in cloud environments where local connections are disallowed, if vixDiskLib.transport.hotadd.NoNfcSession=1was set in the proxy’s VDDK configuration file, HotAdd mode crashed due to null pointer access of an attribute in the VM template object.
  • VixMntapi on Linux did not work with advanced transport mode.VDDK partners use VixDiskLib for block-oriented image backup and restore, while they use VixMntapi for file-oriented backup and restore. The Windows implementation of VixMntapi has supported advanced transports for many releases, but the Linux implementation of VixMntapi did not – it supported only NBD mode. In the VDDK 6.7.1 release, VixMntapi supports HotAdd or SAN transport and NBD/NBDSSL on Linux, so it can be used in VMC environments for file-oriented backup and restore of Linux VMs.
  • VDDK hangs during restore when directly connecting to ESXi hosts.When doing restore with direct ESXi connections, VDDK may hang intermittently. The cause is that NfcServer on ESXi enters the wrong state, waiting for new messages that never arrive. The fix for NfcServer was to avoid waiting when no data remains. To resolve this issue, customers must upgrade ESXi hosts to 6.7 U1 or later.
  • VixMntapi on Linux could not open files as read-only.In previous releases, opening files read-only was not supported by VixMntapi. When read-only mode is requested at open time, the file is opened read/write. In this release, VixMntapi actually opens files as read-only on Linux VMs.
  • HotAdd proxy failed with Windows Server backups.If there was SATA controller in the Windows backup proxy, HotAdd mode did not work. The cause was that VDDK did not rescan SATA controllers after HotAdding, so if there existed multiple SATA controllers or ACHI controllers, VDDK sometimes used the wrong controller ID and could not find the HotAdded disk. Disk open failed, resulting in “HotAdd ManagerLoop caught an exception” and “Error 13 (You do not have access rights to this file)” messages. The workaround was to remove the SATA controller from the Windows backup proxy. The issue is fixed in this release so the workaround is no longer needed (https://kb.vmware.com/s/article/2151091)

For further reference please check : https://code.vmware.com/web/sdk/6.7/vddk

Failed to lock the file

  • Powering on a virtual machine fails.
  • Unable to power on a virtual machine.
  • Adding an existing virtual machine disk (VMDK) to a virtual machine that is already powered on fails.

You see the error:

Cannot open the disk ‘/vmfs/volumes/UUID/VMName/Test-000001.vmdk’ or one of the snapshot disks it depends on. Reason: Failed to lock the file.

Cause:

+++++++

This issue occurs when one of the files required by the virtual machine has been opened by another application.

During a Create or Delete Snapshot operation while a virtual machine is running, all the disk files are momentarily closed and reopened. During this window, the files could be opened by another virtual machine, management process, or third-party utility. If that application creates and maintains a lock on the required disk files, the virtual machine cannot reopen the file and resume running.

Resolution:

+++++++++++

If the file is no longer locked, try to power on the virtual machine again. This should succeed. To determine the cause of the previously locked file, review the VMkernel, hostd, and vpxa log files and attempt to determine:

  • When the hostd and vpxa management agents open VMDK descriptor files, they log messages similar to:info ‘DiskLib’] DISKLIB-VMFS : “/vmfs/volumes/UUID/VMName/Test-000001.vmdk” : open successful (21) size = 32227695616, hd = 0. Type 8
    info ‘DiskLib’] DISKLIB-VMFS : “/vmfs/volumes/UUID/VMName/Test-000001.vmdk” : closed.
  • When the VMkernel attempts to open a locked file, it reports:31:16:46:55.498 cpu7:8715)FS3: 2928: [Requested mode: 2] Lock [type 10c00001 offset 11401216 v 2035, hb offset 3178496
    gen 26643, mode 1, owner 4ca72d14-84dc8dd4-0da3-0017a4770038 mtime 2213195] is not free on volume ‘norr_prod_vmfs_data08’
  • The file may have been locked by third-party software running on an ESXi/ESX host or externally. Review the logs of any third-party software that may have acted on the virtual machine’s VMDK files at the time.

Situation 1:

++++++++

Error : Failed to get exclusive lock on the configuration file, another VM process could be running, using this configuration file

Solution : This issue may occur if there is a lack of disk space on the root drive. The ESX host is unable start a virtual machine because there is insufficient disk space to commit changes.

Situation 2:

++++++++

Error : Failed to lock the file when creating a snapshot

Solution :

To work around this issue in ESX or earlier ESXi releases, Use the vmkfstools -D command to identify the MAC address of the machine locking the file, then reboot or power off the machine that owns that MAC address to release the lock.

Notes: 

  • If the vmkfstools -D test-000001-delta.vmdk command does not return a a valid MAC address in the top field (returns all zeros), review the RO Owner line below it to see which MAC address owns the read only/multi writer lock on the file.
  • In some cases, it may be a Service Console-based lock, an NFS lock, or a lock generated by another system or product that can use or read the VMFS file systems. The file is locked by a VMkernel child or cartel world and the offending host running the process/world must be rebooted to clear it.
  • After you have identified the host or backup tool (machine that owns the MAC) locking the file, power it off or stop the responsible service and then restart the management agents on the host running the virtual machine to release the lock.

Situation 3:

+++++++++

Error : Failed to add disk scsi0:1. Failed to power on scsi0:1

To prevent concurrent changes to critical virtual machine files and file systems, ESXi/ESX hosts establish locks on these files. In certain circumstances, these locks may not be released when the virtual machine is powered off. The files cannot be accessed by the servers while locked, and the virtual machine is unable to power on.

These virtual machine files are locked during runtime:

  • VMNAME.vswp
  • DISKNAME-flat.vmdk
  • DISKNAME-ITERATION-delta.vmdk
  • VMNAME.vmx
  • VMNAME.vmxf
  • vmware.log

>> There is a manual procedure to locate the host and virtual machine holding locks.

To work around this issue, run the vmfsfilelockinfo script from the host experiencing difficulties with one or more locked files:

  1. To find out the IP address of the host holding the lock, run the /bin/vmfsfilelockinfo Python script. The script takes these parameters:
    • File being tested
    • Username and password for accessing VMware vCenter Server (when tracing MAC address to ESX host.)For example:

      Run this command:

      ~ # vmfsfilelockinfo -p /vmfs/volumes/iscsi-lefthand-2/VM1/VM1_1-000001-delta.vmdk -v 192.168.1.10 -uadministrator@vsphere.local

      You see ouput similar to:

      vmfsflelockinfo Version 1.0
      Looking for lock owners on “VM1_1-000001-delta.vmdk”
      “VM1_1-000001-delta.vmdk” is locked in Exclusive mode by host having mac address [‘xx:xx:xx:xx:xx:xx’]
      Trying to make use of Fault Domain Manager
      ———————————————————————-
      Found 0 ESX hosts using Fault Domain Manager.
      ———————————————————————-
      Could not get information from Fault domain manager
      Connecting to 192.168.1.10 with user administrator@vsphere.local
      Password: xXxXxXxXxXx
      ———————————————————————-
      Found 3 ESX hosts from Virtual Center Server.
      ———————————————————————-
      Searching on Host 192.168.1.178
      Searching on Host 192.168.1.179
      Searching on Host 192.168.1.180
      MAC Address : xx:xx:xx:xx:xx:xx

      Host owning the lock on the vmdk is 192.168.1.180, lockMode : Exclusive

      Total time taken : 0.27 seconds.

      Note: During the life-cycle of a powered on virtual machine, several of its files transitions between various legitimate lock states. The lock state mode indicates the type of lock that is on the file. The list of lock modes is:

    • mode 0 = no lock
    • mode 1 = is an exclusive lock (vmx file of a powered on virtual machine, the currently used disk (flat or delta), *vswp, and so on.)
    • mode 2 = is a read-only lock (For example on the ..-flat.vmdk of a running virtual machine with snapshots)
    • mode 3 = is a multi-writer lock (For example used for MSCS clusters disks or FT VMs)
  2. To get the name of the process holding the lock, run the lsof command on the host holding the lock and filter the output for the file name in question:~ # lsof | egrep ‘Cartel|VM1_1-000001-delta.vmdk’

    You see output similar to:

    Cartel | World name | Type | fd | Description
    36202 vmx FILE 80 /vmfs/volumes/556ce175-7f7bed3f-eb72-000c2998c47d/VM1/VM1_1-000001-delta.vmdk

    This shows that the file is locked by a virtual machine having Cartel ID 36202. Now display the list of active Cartel IDs by executing this command:

    ~ # esxcli vm process list

    This displays information for active virtual machines grouped by virtual machine name and having a format similar to:

    Alternate_VM27
    World ID: 36205
    Process ID: 0
    VMX Cartel ID: 36202
    UUID: 56 4d bd a1 1d 10 98 0f-c1 41 85 ea a9 dc 9f bf
    Display Name: Alternate_VM27
    Config File: /vmfs/volumes/556ce175-7f7bed3f-eb72-000c2998c47d/Alternate_VM27/Alternate_VM27.vmx
    Alternate_VM20
    World ID: 36207
    Process ID: 0
    VMX Cartel ID: 36206
    UUID: 56 4d bd a1 1d 10 98 0f-c1 41 85 ea a5 dc 94 5f
    Display Name: Alternate_VM20
    Config File: /vmfs/volumes/556ce175-7f7bed3f-eb72-000c2998c47d/Alternate_VM20/Alternate_VM20.vmx

    The virtual machine entry having VMX Cartel ID 36202 shows the display name of the virtual machine holding the lock on file VM1_1-000001-delta.vmdk, which in this example, is Alternate_VM27.

  3. Shut down the virtual machine holding the lock to release the lock.

Related Information

This script performs these actions in this sequence:

  1. Identifies locked state Exclusive, Read-Only, not locked.
  2. Identifies MAC address of locking host [‘xx:xx:xx:xx:xx:xx’].
  3. Queries the Fault Domain Manager (HA) for information on discovered MAC address.
  4. Queries vCenter Server for information on discovered MAC address.
  5. Outputs final status.
    For example:

Host owning the lock on the vmdk is 192.168.1.180, lockMode : Exclusive.

  • The script outputs total execution time when it terminates.

 

Notes:

  • The script does not attempt to break/remove locks. The script only identifies the potential ESX host which holds the lock.
  • If not run with vCenter Server username and password, it prompts for the same, after querying the Fault Domain Manager.
  • This script works on a single file parameter, without wildcards. If multiple queries are required, you must execute the script repeatedly in a wrapper script.

For further clarifications please follow : https://kb.vmware.com/s/article/10051

Storage connectivity to vSphere

The VMware vSphere storage architecture consists of layers of abstraction that hide the differences and manage the complexity among physical storage subsystems.

To the applications and guest operating systems inside each virtual machine, the storage subsystem appears as a virtual SCSI controller connected to one or more virtual SCSI disks. These controllers are the only types of SCSI controllers that a virtual machine can see and access. These controllers include BusLogic Parallel, LSI Logic Parallel, LSI Logic SAS, and VMware Paravirtual.

The virtual SCSI disks are provisioned from datastore elements in the datacenter. A datastore is like a storage appliance that delivers storage space for virtual machines across multiple physical hosts. Multiple datastores can be aggregated into a single logical, load-balanced pool called a datastore cluster.

The datastore abstraction is a model that assigns storage space to virtual machines while insulating the guest from the complexity of the underlying physical storage technology. The guest virtual machine is not exposed to Fibre Channel SAN, iSCSI SAN, direct attached storage, and NAS.

Each datastore is a physical VMFS volume on a storage device. NAS datastores are an NFS volume with VMFS characteristics. Datastores can span multiple physical storage subsystems. A single VMFS volume can contain one or more LUNs from a local SCSI disk array on a physical host, a Fibre Channel SAN disk farm, or iSCSI SAN disk farm. New LUNs added to any of the physical storage subsystems are detected and made available to all existing or new datastores. Storage capacity on a previously created datastore can be extended without powering down physical hosts or storage subsystems. If any of the LUNs within a VMFS volume fails or becomes unavailable, only virtual machines that use that LUN are affected. An exception is the LUN that has the first extent of the spanned volume. All other virtual machines with virtual disks residing in other LUNs continue to function as normal.

Each virtual machine is stored as a set of files in a directory in the datastore. The disk storage associated with each virtual guest is a set of files within the guest’s directory. You can operate on the guest disk storage as an ordinary file. The disk storage can be copied, moved, or backed up. New virtual disks can be added to a virtual machine without powering it down. In that case, a virtual disk file  (.vmdk) is created in VMFS to provide new storage for the added virtual disk or an existing virtual disk file is associated with a virtual machine.

VMFS is a clustered file system that leverages shared storage to allow multiple physical hosts to read and write to the same storage simultaneously. VMFS provides on-disk locking to ensure that the same virtual machine is not powered on by multiple servers at the same time. If a physical host fails, the on-disk lock for each virtual machine is released so that virtual machines can be restarted on other physical hosts.

VMFS also features failure consistency and recovery mechanisms, such as distributed journaling, a failure-consistent virtual machine I/O path, and virtual machine state snapshots. These mechanisms can aid quick identification of the cause and recovery from virtual machine, physical host, and storage subsystem failures.

VMFS also supports raw device mapping (RDM). RDM provides a mechanism for a virtual machine to have direct access to a LUN on the physical storage subsystem (Fibre Channel or iSCSI only). RDM supports two typical types of applications:

SAN snapshot or other layered applications that run in the virtual machines. RDM better enables scalable backup offloading systems using features inherent to the SAN.

Microsoft Clustering Services (MSCS) spanning physical hosts and using virtual-to-virtual clusters as well as physical-to-virtual clusters. Cluster data and quorum disks must be configured as RDMs rather than files on a shared VMFS.

 

Supported Storage Adapters:

+++++++++++++++++++++++++

Storage adapters provide connectivity for your ESXi host to a specific storage unit or network.

ESXi supports different classes of adapters, including SCSI, iSCSI, RAID, Fibre Channel, Fibre Channel over Ethernet (FCoE), and Ethernet. ESXi accesses the adapters directly through device drivers in the VMkernel.

View Storage Adapters Information:

++++++++++++++++++++++++++++++

Use the vSphere Client to display storage adapters that your host uses and to review their information.

Procedure

1: In Inventory, select Hosts and Clusters.

2: Select a host and click the Configuration tab.

3: In Hardware, select Storage Adapters.

4:To view details for a specific adapter, select the adapter from the Storage Adapters list.

5: To list all storage devices the adapter can access, click Devices.

6: To list all paths the adapter uses, click Paths

Types of Physical Storage:

++++++++++++++++++++++

The ESXi storage management process starts with storage space that your storage administrator preallocates on different storage systems.

ESXi supports the following types of storage:

Local Storage : Stores virtual machine files on internal or directly connected external storage disks.

Networked Storage: Stores virtual machine files on external storage disks or arrays attached to your host through a direct connection or through a high-speed network.

Local Storage:

Local storage can be internal hard disks located inside your ESXi host, or it can be external storage systems located outside and connected to the host directly through protocols such as SAS or SATA.

Local storage does not require a storage network to communicate with your host. You need a cable connected to the storage unit and, when required, a compatible HBA in your host.

ESXi supports a variety of internal or external local storage devices, including SCSI, IDE, SATA, USB, and SAS storage systems. Regardless of the type of storage you use, your host hides a physical storage layer from virtual machines.

Networked Storage:

Networked storage consists of external storage systems that your ESXi host uses to store virtual machine files remotely. Typically, the host accesses these systems over a high-speed storage network.

Networked storage devices are shared. Datastores on networked storage devices can be accessed by multiple hosts concurrently. ESXi supports the following networked storage technologies.

Note

Accessing the same storage through different transport protocols, such as iSCSI and Fibre Channel, at the same time is not supported.

Fibre Channel (FC):

Stores virtual machine files remotely on an FC storage area network (SAN). FC SAN is a specialized high-speed network that connects your hosts to high-performance storage devices. The network uses Fibre Channel protocol to transport SCSI traffic from virtual machines to the FC SAN devices.

Fibre Channel Storage

In this configuration, a host connects to a SAN fabric, which consists of Fibre Channel switches and storage arrays, using a Fibre Channel adapter. LUNs from a storage array become available to the host. You can access the LUNs and create datastores for your storage needs. The datastores use the VMFS format.

Internet SCSI (iSCSI):

Stores virtual machine files on remote iSCSI storage devices. iSCSI packages SCSI storage traffic into the TCP/IP protocol so that it can travel through standard TCP/IP networks instead of the specialized FC network. With an iSCSI connection, your host serves as the initiator that communicates with a target, located in remote iSCSI storage systems.

ESXi offers the following types of iSCSI connections:

Hardware iSCSI: Your host connects to storage through a third-party adapter capable of offloading the iSCSI and network processing. Hardware adapters can be dependent and independent.

Software iSCSI :Your host uses a software-based iSCSI initiator in the VMkernel to connect to storage. With this type of iSCSI connection, your host needs only a standard network adapter for network connectivity.

You must configure iSCSI initiators for the host to access and display iSCSI storage devices.

iSCSI Storage depicts different types of iSCSI initiators.

iSCSI Storage

In the left example, the host uses the hardware iSCSI adapter to connect to the iSCSI storage system.

In the right example, the host uses a software iSCSI adapter and an Ethernet NIC to connect to the iSCSI storage.

iSCSI storage devices from the storage system become available to the host. You can access the storage devices and create VMFS datastores for your storage needs.

Network-attached Storage (NAS)

Stores virtual machine files on remote file servers accessed over a standard TCP/IP network. The NFS client built into ESXi uses Network File System (NFS) protocol version 3 to communicate with the NAS/NFS servers. For network connectivity, the host requires a standard network adapter.

NFS Storage

Shared Serial Attached SCSI (SAS)

Stores virtual machines on direct-attached SAS storage systems that offer shared access to multiple hosts. This type of access permits multiple hosts to access the same VMFS datastore on a LUN

Host crash Diagnostic Partitions

A diagnostic partition can be on the local disk where the ESXi software is installed. This is the default configuration for ESXi Installable. You can also use a diagnostic partition on a remote disk shared between multiple hosts. If you want to use a network diagnostic partition, you can install ESXi Dump Collector and configure the networked partition.

The following considerations apply:

>> A diagnostic partition cannot be located on an iSCSI LUN accessed through the software iSCSI or dependent hardware iSCSI adapter. For more information about diagnostic partitions with iSCSI, see General Boot from iSCSI SAN Recommendations in the vSphere Storage documentation.

>> Each host must have a diagnostic partition of 110MB. If multiple hosts share a diagnostic partition on a SAN LUN, the partition should be large enough to accommodate core dumps of all hosts.

>>If a host that uses a shared diagnostic partition fails, reboot the host and extract log files immediately after the failure. Otherwise, the second host that fails before you collect the diagnostic data of the first host might not be able to save the core dump.

Diagnostic Partition Creation:

++++++++++++++++++++++

You can use the vSphere Client to create the diagnostic partition on a local disk or on a private or shared SAN LUN. You cannot use vicfg-dumppart to create the diagnostic partition. The SAN LUN can be set up with FibreChannel or hardware iSCSI. SAN LUNs accessed through a software iSCSI initiator are not supported.

Managing Core Dumps:

+++++++++++++++++++

With esxcli system coredump, you can manage local diagnostic partitions or set up core dump on a remote server in conjunction with ESXi Dump Collector. For information about ESXi Dump Collector, see the vSphere Networking documentation.

Managing Local Core Dumps with ESXCLI:

++++++++++++++++++++++++++++++

The following example scenario changes the local diagnostic partition with ESXCLI. Specify one of the connection options listed in Connection Options in place of <conn_options>.

To manage a local diagnostic partition

1: Show the diagnostic partition the VMkernel uses and display information about all partitions that can be used as diagnostic partitions.

esxcli <conn_options> system coredump partition list

2: Deactivate the current diagnostic partition.

esxcli <conn_options> system coredump partition set –unconfigure

The ESXi system is now without a diagnostic partition, and you must immediately set a new one.

3: Set the active partition to naa.<naa_ID>.

esxcli <conn_options> system coredump partition set –partition=naa.<naa_ID>

4: List partitions again to verify that a diagnostic partition is set.

esxcli <conn_options> system coredump partition list

If a diagnostic partition is set, the command displays information about it. Otherwise, the command shows that no partition is activated and configured.

Managing Core Dumps with ESXi Dump Collector:

++++++++++++++++++++++++++++++++++++

By default, a core dump is saved to the local disk. You can use ESXi Dump Collector to keep core dumps on a network server for use during debugging. ESXi Dump Collector is especially useful for Auto Deploy, but supported for any ESXi 5.0 host. ESXi Dump Collector supports other customization, including sending core dumps to the local disk.

ESXi Dump Collector is included with the vCenter Server autorun.exe application. You can install ESXi Dump Collector on the same system as the vCenter Server service or on a different Windows or Linux machine.

You can configure ESXi Dump Collector by using the vSphere Client or ESXCLI. Specify one of the connection options listed in Connection Options in place of <conn_options>.

To manage core dumps with ESXi Dump Collector:

++++++++++++++++++++++++++++++++++++

1: Set up an ESXi system to use ESXi Dump Collector by running esxcli system coredump.

esxcli <conn_options> system coredump network set –interface-name vmk0 –server-ipv4=1-XX.XXX –port=6500

You must specify a VMkernel port with –interface-name, and the IP address and port of the server to send the core dumps to. If you configure an ESXi system that is running inside a virtual machine, you must choose a VMkernel port that is in promiscuous mode.

2: Enable ESXi Dump Collector.

esxcli <conn_options> system coredump network set –enable=true

3: (Optional) Check that ESXi Dump Collector is configured correctly.

esxcli <conn_options> system coredump network get

The host on which you have set up ESXi Dump Collector sends core dumps to the specified server by using the specified VMkernel NIC and optional port.

Managing Local Core Dumps with ESXCLI:

++++++++++++++++++++++++++++++

The following example scenario changes the local diagnostic partition with ESXCLI. Specify one of the connection options listed in Connection Options in place of <conn_options>.

To manage a local diagnostic partition

1: Show the diagnostic partition the VMkernel uses and display information about all partitions that can be used as diagnostic partitions.

esxcli <conn_options> system coredump partition list

2: Deactivate the current diagnostic partition.

esxcli <conn_options> system coredump partition set –unconfigure

The ESXi system is now without a diagnostic partition, and you must immediately set a new one.

3: Set the active partition to naa.<naa_ID>.

esxcli <conn_options> system coredump partition set –partition=naa.<naa_ID>

4: List partitions again to verify that a diagnostic partition is set.

esxcli <conn_options> system coredump partition list

If a diagnostic partition is set, the command displays information about it. Otherwise, the command shows that no partition is activated and configured.

Additional Information : ESXi Network Dump Collector in VMware vSphere 5.x/6.0