LUN (Logical Unit Number) is disconnected from ESXi host what do we check ?

When a LUN (Logical Unit Number) is disconnected from an ESXi host, it can result in data access issues and VM disruptions. To troubleshoot and resolve the LUN disconnection, you need to check both the ESXi host and the storage side to identify the cause of the disconnection. Below are the steps to check from both ESXi and storage perspectives:

Checking from ESXi Host:

1:Review Storage Adapters: Check if the storage adapter(s) on the ESXi host are detecting the LUN properly. Use the following command to list the storage adapters:

esxcli storage core adapter list

Verify that the adapter that connects to the storage where the LUN is located is active and working without any errors.

2:Check Storage Devices: Ensure that the storage devices are visible and accessible. Use the following command to list the storage devices:

esxcli storage core device list

Verify that the device corresponding to the LUN is present and not showing any errors.

3:Check LUN Configuration: Verify the LUN configuration on the ESXi host. Use the following command to list the mounted VMFS datastores:

esxcli storage filesystem list

Ensure that the LUN’s VMFS datastore is listed and mounted correctly.

4:Check Path Status: Verify the path status to the LUN. Use the following command to list the storage paths:

esxcli storage core path list

Ensure that all paths to the LUN are active and showing the “Normal” state.

5:Rescan Storage: If the LUN was recently connected or disconnected, perform a storage rescan on the ESXi host to refresh the storage information:

esxcli storage core adapter rescan --all
  1. Check Logs: Review the ESXi logs (e.g., vmkernel.log, messages.log) for any storage-related errors or warnings around the time of the LUN disconnection. Use commands like tail or cat to view the logs.

Checking from Storage:

  1. Storage Array Management: Log in to the storage array management interface or storage management software to check the status of the LUN. Look for any errors, warnings, or status indicators related to the LUN.
  2. LUN Visibility: Ensure that the storage array is detecting the LUN and making it available to the ESXi host. Verify that the LUN is properly presented to the correct ESXi host(s).
  3. Check for Errors: Look for any specific errors or alerts related to the LUN or the storage array that may indicate a problem.
  4. Check Connectivity: Verify the connectivity between the storage array and the ESXi host(s) by checking the network connectivity, Fibre Channel (FC) or iSCSI connections, and any relevant zoning or masking configurations.
  5. Check Disk Health: Review the disk health status of the physical disks associated with the LUN. Ensure there are no reported issues with the disks.

To troubleshoot a LUN disconnection, it is essential to check the logs on the ESXi host. The primary logs to review for LUN disconnection issues are the vmkernel.log and messages.log files. Below are the steps to check these logs along with examples:

Step 1: SSH to ESXi Host:

Enable SSH on the ESXi host and use an SSH client (e.g., PuTTY) to connect to the host.

Step 2: View vmkernel.log:

Use the following command to view the last 100 lines of the vmkernel.log file:

tail -n 100 /var/log/vmkernel.log

Example 1: SCSI Errors in vmkernel.log:

If there is a LUN disconnection, you might see SCSI errors in the vmkernel.log. These errors could indicate issues with the storage device or communication problems.

2023-07-01T12:34:56.789Z cpu1:12345)ScsiDeviceIO: XXXX: Device  naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx performance has deteriorated. I/O latency increased from average value of X microseconds to Y microseconds.

Example 2: LUN Disconnection in vmkernel.log:

A LUN disconnection event can be logged in the vmkernel.log as well.

2023-07-01T12:34:56.789Z cpu1:12345)NMP: nmp_DeviceConnect:3779: Successfully opened device naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.
2023-07-01T12:34:56.790Z cpu2:12346)NMP: nmp_DeviceDisconnect:3740: Disconnect device "naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" due to LUN Reset event. I/O error status: [Aborted].

Step 3: View messages.log:

Use the following command to view the last 100 lines of the messages.log file:

tail -n 100 /var/log/messages.log

Example 3: Multipath Errors in messages.log:

If there are multipath-related issues, you might see errors in the messages.log.

2023-07-01T12:34:56.789Z cpu1:12345)WARNING: NMP: nmpDeviceAttemptFailover:512: Retry world failover device "naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" - issuing command X.

Example 4: LUN Disconnection in messages.log:

A LUN disconnection event might be logged in the messages.log as well.

2023-07-01T12:34:56.789Z cpu2:12346)WARNING: NMP: nmpDeviceBadLink:7152: NMP: nmp_DeviceStartLoop:984: NMP Device "naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" loop reset with I/O error.

By checking the vmkernel.log and messages.log files on the ESXi host, you can gather valuable information about LUN disconnection events and any related errors or warnings. This information is essential for diagnosing the cause of the LUN disconnection and taking appropriate corrective actions to restore normal operations. If needed, involve storage and VMware support teams to assist with the troubleshooting process.

Leave a comment