Esxi log analyser

The ESXi Universal Log Analyzer is a professional Python-based tool that automatically parses and analyzes ESXi log files to identify storage issues, hardware problems, and system errors. It processes three critical ESXi log types:

vmkernel.log – Storage, SCSI, path failures, and kernel-level issues
hostd.log – VM operations, VIM faults, task failures, and management errors
vobd.log – Hardware alerts, sensor warnings, and physical component issues

Key Capabilities

Feature	Description
Multi-Log Support	Parse vmkernel, hostd, and vobd logs simultaneously
Auto-Detection	Automatically identifies log type from content
SCSI Decoding	Translates hex sense codes to human-readable errors
Path Analysis	Tracks storage path states (dead, working, standby)
LUN Identification	Extracts LUN IDs and detects Tintri storage
Multiple Outputs	Console, JSON, CSV, and HTML reports
Zero Dependencies	Uses only Python standard library

What This Tool Does

In Simple Terms

This tool reads ESXi log files and finds problems automatically. Instead of manually searching through thousands of log lines, it:

Scans log files for errors, warnings, and failures
Categorizes issues by type (SCSI errors, path failures, VM problems, etc.)
Prioritizes critical issues that need immediate attention
Exports results in multiple formats for analysis and reporting

Problem It Solves

Without This Tool:

Manually grep through thousands of log lines
Look up SCSI sense codes in documentation
Track down multiple types of errors across different logs
Compile reports manually for tickets

With This Tool:

One command analyzes all logs
Automatically decodes technical error codes
Generates comprehensive reports in seconds
Provides actionable information immediately

Common Use Cases

Troubleshooting Storage Issues
- “Why is my datastore showing latency?”
- “Why are VMs experiencing storage errors?”
Post-Incident Analysis
- “What happened during the outage last night?”
- “Which LUNs failed and when?”
Proactive Monitoring
- Daily/weekly health checks
- Trend analysis over time
Support Ticket Creation
- Generate detailed error reports for VMware support
- Export evidence for vendor escalations

Prerequisites

System Requirements

Requirement	Specification
Python Version	Python 3.6 or higher
Operating System	Linux, macOS, or Windows
Memory	512 MB minimum (for large log files)
Disk Space	100 MB for script and output files

Required Files

You need one or more ESXi log files:

vmkernel.log
hostd.log
vobd.log

Where to Find ESXi Logs:

Method 1: SSH to ESXi Host

			
# SSH to ESXi
ssh root@esxi-host.example.com
# Logs are located in:
/var/run/log/vmkernel.log
/var/run/log/hostd.log
/var/run/log/vobd.log
# Copy to your workstation
scp root@esxi-host:/var/run/log/*.log /local/path/

		

Method 2: Download via Web UI

Login to ESXi web UI (https://esxi-host)
Navigate to Host → Monitor → Logs
Select log file
Click Download

Method 3: From Support Bundle

Generate support bundle via web UI or command: vm-support
Extract bundle
Logs are in esx-<hostname>-<date>/var/run/log/

Verify Python Installation

			
# Check Python version
python3 --version
# Expected output: Python 3.6.x or higher

Installation

Step 1: Download the Script

Save the Python script as esxi_log_analyzer.py:

			
# Create directory for the script
mkdir -p ~/esxi-tools
cd ~/esxi-tools
# Copy the script content to this file
nano esxi_log_analyzer.py
# (Paste the script content from the artifact)

		

Step 2: Make Executable

chmod +x esxi_log_analyzer.py

Step 3: Verify Installation

python3 esxi_log_analyzer.py --help

Expected Output:

			
usage: esxi_log_analyzer.py [-h] [--type {vmkernel,hostd,vobd}] [--json JSON]
                           [--csv CSV] [--html HTML] [--limit LIMIT] [--quiet]
                           logfiles [logfiles ...]
ESXi Universal Log Analyzer - Parse vmkernel, hostd, and vobd logs
...

		

Quick Start Guide

Example 1: Analyze Single Log File

python3 esxi_log_analyzer.py vmkernel.log

What Happens:

Script reads vmkernel.log
Automatically detects it’s a vmkernel log
Parses for SCSI errors, path failures, timeouts
Displays summary and top 10 critical issues

Output:

			
[INFO] Parsing log file: vmkernel.log
[SUCCESS] Parsed 15234 lines, found 127 issues
================================================================================
ESXi LOG ANALYSIS SUMMARY
================================================================================
[VMKERNEL ISSUES]
  SCSI Errors:            45
  Path Failures:          12
  Storage Timeouts:       8
  Reservation Conflicts:  3
  HBA Errors:             2
[HOSTD ISSUES]
  Errors:                 0
  Warnings:               0
  ...
[CRITICAL]
  Total Critical Issues:  34
================================================================================
[TOP 10 CRITICAL ISSUES]
--------------------------------------------------------------------------------
1. [VMKERNEL] 2025-12-28T18:46:00.123Z
   Issues: SCSI_ERROR, PATH_DOWN
   LUN: naa.60a980006482b4f4f4f4f4f4f4f4f4f
   SCSI: Aborted Command
   Path State: dead
2. [VMKERNEL] 2025-12-28T18:47:15.456Z
   Issues: STORAGE_TIMEOUT
   LUN: naa.60a980006482b4f4f4f4f4f4f4f4f4f
   ...

		

Example 2: Analyze Multiple Logs

python3 esxi_log_analyzer.py vmkernel.log hostd.log vobd.log

What Happens:

Parses all three log types
Combines results
Shows comprehensive summary across all logs

Example 3: Export to CSV for Excel

python3 esxi_log_analyzer.py vmkernel.log --csv report.csv

What Happens:

Analyzes log
Exports critical issues to report.csv
Open in Excel for further analysis

CSV Format:

			
Timestamp,Type,Issues,LUN,SCSI_Error,Path_State,Message
2025-12-28T18:46:00.123Z,vmkernel,SCSI_ERROR|PATH_DOWN,naa.60a980...,Aborted Command,dead,...

Example 4: Generate HTML Report

python3 esxi_log_analyzer.py vmkernel.log --html report.html

What Happens:

Creates beautiful HTML dashboard
Open report.html in browser
View color-coded summary and issue table

Usage Examples

Basic Analysis

Analyze vmkernel.log Only

python3 esxi_log_analyzer.py vmkernel.log

Analyze All Three Log Types

python3 esxi_log_analyzer.py vmkernel.log hostd.log vobd.log

Force Specific Log Type (if auto-detection fails)

python3 esxi_log_analyzer.py esxi.log --type vmkernel

Export Options

Export to JSON (Complete Data)

python3 esxi_log_analyzer.py vmkernel.log --json results.json

Use Case: API integration, further processing, archival

Export to CSV (Critical Issues Only)

python3 esxi_log_analyzer.py vmkernel.log --csv critical-issues.csv

Use Case: Excel analysis, ticket attachments

Generate HTML Dashboard

python3 esxi_log_analyzer.py vmkernel.log --html dashboard.html

Use Case: Management reports, stakeholder visibility

Export All Formats

			
python3 esxi_log_analyzer.py vmkernel.log \
  --json full-data.json \
  --csv issues.csv \
  --html report.html

Advanced Options

Show More Critical Issues

python3 esxi_log_analyzer.py vmkernel.log --limit 50

Default: 10 issues. Increase to see more.

Quiet Mode (No Console Output)

python3 esxi_log_analyzer.py vmkernel.log --quiet --json results.json

Use Case: Automated scripts, cron jobs

Analyze Logs from Different Hosts

			
python3 esxi_log_analyzer.py \
  esxi1-vmkernel.log \
  esxi2-vmkernel.log \
  esxi3-vmkernel.log \
  --csv combined-report.csv

		

Understanding the Output

Console Summary Explained

			
[VMKERNEL ISSUES]
  SCSI Errors:            45    ← SCSI sense code errors detected
  Path Failures:          12    ← Storage paths marked as "dead" or "failed"
  Storage Timeouts:       8     ← I/O operations that timed out
  Reservation Conflicts:  3     ← SCSI reservation issues (often during vMotion)
  HBA Errors:             2     ← Host Bus Adapter hardware/driver errors
[HOSTD ISSUES]
  Errors:                 23    ← hostd daemon errors
  Warnings:               156   ← hostd warnings (less severe)
  VIM Faults:             7     ← vSphere API (VIM) errors
  Task Failures:          5     ← Failed ESXi tasks (VM power ops, etc.)
  Datastore Errors:       2     ← Datastore not found or inaccessible
[VOBD ISSUES]
  Hardware Alerts:        4     ← Physical hardware alerts
  Sensor Warnings:        2     ← Temperature, voltage, fan sensor warnings
  Hardware Errors:        1     ← Hardware component failures
[CRITICAL]
  Total Critical Issues:  34    ← High-priority issues requiring attention

		

Critical Issue Details

			
1. [VMKERNEL] 2025-12-28T18:46:00.123Z
   Issues: SCSI_ERROR, PATH_DOWN
   LUN: naa.60a980006482b4f4f4f4f4f4f4f4f4f
   SCSI: Aborted Command
   Path State: dead

		

What This Means:

Timestamp: When the error occurred
Issues: Types of problems (SCSI error + path failure)
LUN: Storage device identifier (NAA format)
SCSI: Human-readable SCSI error (decoded from hex)
Path State: Storage path is dead (not operational)

Action Required:

Check storage connectivity (cables, switches)
Verify SAN configuration
Rescan storage adapters
Contact storage vendor if persistent

SCSI Error Codes Reference

Code	Description	Common Cause	Fix
0xb (Aborted Command)	I/O operation aborted	SAN overload, queue full	Increase HBA queue depth
0x2 (Not Ready)	Device not ready	LUN offline, format in progress	Wait or check storage
0x6 (Unit Attention)	Device reset occurred	LUN reset, rescan	Rescan storage adapters
0x47 (Reservation Conflict)	SCSI reservation held by another host	vMotion, VAAI conflict	Stagger vMotion operations
0x4 (Hardware Error)	Disk/controller error	Physical failure	Replace disk/HBA

JSON Output Structure

			
{
  "vmkernel": {
    "SCSI_ERROR": [
      {
        "timestamp": "2025-12-28T18:46:00.123Z",
        "lun": "naa.60a980...",
        "scsi_sense": "0xb",
        "scsi_description": "Aborted Command",
        "path_state": "dead",
        "issues": ["SCSI_ERROR", "PATH_DOWN"],
        "raw": "original log line..."
      }
    ],
    "PATH_DOWN": [...],
    "STORAGE_TIMEOUT": [...]
  },
  "hostd": {...},
  "vobd": {...},
  "summary": {
    "total_scsi_errors": 45,
    "path_failures": 12,
    ...
  },
  "critical_issues": [...]
}

		

CSV Output (Excel-Friendly)

Open report.csv in Excel:

Timestamp	Type	Issues	LUN	SCSI_Error	Path_State	Message
2025-12-28T18:46:00	vmkernel	SCSI_ERROR\|PATH_DOWN	naa.60a980…	Aborted Command	dead	…

Excel Analysis:

Sort by Issues to group similar problems
Filter by LUN to focus on specific storage
Pivot table on SCSI_Error for error distribution

HTML Dashboard

Open report.html in browser to see:

Summary Cards:

Critical Issues (red card)
SCSI Errors
Path Failures
Storage Timeouts
Hostd Errors
Hardware Alerts

Critical Issues Table:

Timestamp
Log Type
Issue Tags (color-coded badges)
Details (LUN, SCSI error, path state)

Features:

Responsive design (works on mobile)
Hover effects for readability
Professional layout for management

Advanced Usage

Scenario 1: Daily Health Check

Create a script: daily-esxi-check.sh

			
#!/bin/bash
# Daily ESXi log health check
DATE=$(date +%Y%m%d)
REPORT_DIR="/reports/esxi/$DATE"
mkdir -p $REPORT_DIR
# Analyze logs from all hosts
for host in esxi1 esxi2 esxi3; do
    echo "Analyzing $host..."
    
    # Copy logs
    scp root@${host}:/var/run/log/*.log /tmp/${host}/
    
    # Run analysis
    python3 esxi_log_analyzer.py \
        /tmp/${host}/*.log \
        --json ${REPORT_DIR}/${host}-results.json \
        --csv ${REPORT_DIR}/${host}-issues.csv \
        --html ${REPORT_DIR}/${host}-report.html
done
echo "Reports generated in: $REPORT_DIR"

		

Run daily via cron:

0 2 * * * /scripts/daily-esxi-check.sh

Scenario 2: Alert on Critical Issues

			
#!/bin/bash
# Alert if critical issues exceed threshold
THRESHOLD=10
REPORT="/tmp/esxi-analysis.json"
python3 esxi_log_analyzer.py vmkernel.log --json $REPORT --quiet
CRITICAL=$(python3 -c "import json; print(json.load(open('$REPORT'))['summary']['critical_issues'])")
if [ "$CRITICAL" -gt "$THRESHOLD" ]; then
    echo "ALERT: $CRITICAL critical ESXi issues found!" | \
        mail -s "ESXi Critical Alert" admin@company.com
fi

		

Scenario 3: Multi-Host Analysis

			
# Analyze logs from multiple hosts
python3 esxi_log_analyzer.py \
    esxi1-vmkernel.log \
    esxi2-vmkernel.log \
    esxi3-vmkernel.log \
    esxi1-hostd.log \
    esxi2-hostd.log \
    esxi3-hostd.log \
    --html cluster-report.html \
    --csv cluster-issues.csv

		

Scenario 4: Tintri-Specific Analysis

			
# Analyze logs and filter Tintri LUNs
python3 esxi_log_analyzer.py vmkernel.log --json results.json
# Extract Tintri-specific issues
python3 -c "
import json
data = json.load(open('results.json'))
tintri_issues = [
    issue for issue in data['critical_issues'] 
    if 'lun' in issue and '60a980' in issue['lun']
]
print(f'Tintri LUN issues: {len(tintri_issues)}')
for issue in tintri_issues[:10]:
    print(f\"  {issue['timestamp']}: {issue['lun']} - {issue.get('scsi_description', 'N/A')}\")
"

		

Scenario 5: Historical Trend Analysis

			
#!/bin/bash
# Weekly trend report
for week in {1..4}; do
    LOGFILE="vmkernel-week${week}.log"
    python3 esxi_log_analyzer.py $LOGFILE --json week${week}.json --quiet
done
# Compare weekly results
python3 -c "
import json
for week in range(1, 5):
    data = json.load(open(f'week{week}.json'))
    print(f'Week {week}: {data[\"summary\"][\"total_scsi_errors\"]} SCSI errors')
"

		

Troubleshooting Guide

Issue 1: “No module named ‘xyz'”

Error Message:

ModuleNotFoundError: No module named 'xyz'

Solution: This script uses ONLY Python standard library. This error should not occur. If it does:

			
# Verify Python version
python3 --version
# Should be 3.6 or higher
# If older, upgrade Python:
# Ubuntu/Debian:
sudo apt update && sudo apt install python3
# macOS:
brew install python3

		

Issue 2: “Permission Denied”

Error Message:

[ERROR] Failed to parse file: [Errno 13] Permission denied: 'vmkernel.log'

Solution:

			
# Make log files readable
chmod +r vmkernel.log
# Or run with appropriate permissions
sudo python3 esxi_log_analyzer.py vmkernel.log

Issue 3: “File not found”

Error Message:

[ERROR] File not found: vmkernel.log

Solution:

			
# Verify file exists
ls -l vmkernel.log
# Use full path
python3 esxi_log_analyzer.py /full/path/to/vmkernel.log
# Or navigate to log directory
cd /path/to/logs
python3 /path/to/esxi_log_analyzer.py vmkernel.log

		

Issue 4: Large Log Files (Memory Issues)

Error Message:

MemoryError: Unable to allocate array

Solution:

			
# Split large log files
split -l 100000 vmkernel.log vmkernel-part-
# Analyze each part
for part in vmkernel-part-*; do
    python3 esxi_log_analyzer.py $part --csv ${part}.csv
done
# Combine results
cat vmkernel-part-*.csv > combined-report.csv

		

Issue 5: No Issues Found

Output:

			
[SUCCESS] Parsed 1000 lines, found 0 issues
Total Critical Issues: 0

Possible Causes:

Logs are healthy – No issues present (good!)
Wrong log type – Try forcing log type: python3 esxi_log_analyzer.py logfile --type vmkernel
Log format changed – Verify log format matches ESXi expected format

Verification:

			
# Check log content manually
head -50 vmkernel.log
# Look for typical patterns
grep -i "error\|warning\|fail" vmkernel.log | head

Best Practices

1. Regular Analysis Schedule

Run analysis on a schedule:

Frequency	Use Case	Command
Daily	Proactive monitoring	Automated script + email if issues
Weekly	Trend analysis	Generate weekly reports
After incidents	Root cause analysis	Immediate deep-dive
Before changes	Pre-change baseline	Document healthy state

2. Log File Management

			
# Compress old logs
gzip vmkernel.log.old
# Analyze compressed logs (if supported)
# Uncompress first:
gunzip vmkernel.log.gz
python3 esxi_log_analyzer.py vmkernel.log
# Archive analysis results
mkdir -p /archive/esxi-logs/$(date +%Y%m)
mv *.json *.csv *.html /archive/esxi-logs/$(date +%Y%m)/

		

3. Integration with Monitoring

			
# Send metrics to monitoring system
CRITICAL=$(python3 esxi_log_analyzer.py vmkernel.log --json - | \
    python3 -c "import sys, json; print(json.load(sys.stdin)['summary']['critical_issues'])")
# Send to monitoring (example: Prometheus push gateway)
echo "esxi_critical_issues $CRITICAL" | \
    curl --data-binary @- http://pushgateway:9091/metrics/job/esxi

		

4. Documentation Standards

For each analysis, document:

Date/Time of analysis
Host(s) analyzed
Log files processed
Critical issues count
Actions taken
Ticket numbers if applicable

5. Retention Policy

Recommended retention:

Log files: 30 days
JSON exports: 90 days
CSV reports: 1 year
HTML reports: 30 days (regenerate as needed)

Integration Examples

Integration 1: ServiceNow Ticket Creation

			
#!/usr/bin/env python3
import json
import requests
# Run analysis
import subprocess
subprocess.run([
    'python3', 'esxi_log_analyzer.py', 
    'vmkernel.log', '--json', 'results.json', '--quiet'
])
# Load results
with open('results.json') as f:
    data = json.load(f)
# Create ticket if critical issues
if data['summary']['critical_issues'] > 5:
    ticket_data = {
        'short_description': f"ESXi Critical Issues: {data['summary']['critical_issues']}",
        'description': json.dumps(data['summary'], indent=2),
        'urgency': '2',
        'impact': '2'
    }
    
    response = requests.post(
        'https://servicenow.company.com/api/now/table/incident',
        auth=('user', 'password'),
        json=ticket_data
    )
    
    print(f"Ticket created: {response.json()['result']['number']}")

		

Integration 2: Slack Notifications

			
#!/bin/bash
# Send Slack notification for critical issues
python3 esxi_log_analyzer.py vmkernel.log --json results.json --quiet
CRITICAL=$(python3 -c "import json; print(json.load(open('results.json'))['summary']['critical_issues'])")
if [ "$CRITICAL" -gt 5 ]; then
    curl -X POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL \
        -H 'Content-Type: application/json' \
        -d "{
            \"text\": \"ESXi Alert: $CRITICAL critical issues detected!\",
            \"attachments\": [{
                \"color\": \"danger\",
                \"fields\": [{
                    \"title\": \"SCSI Errors\",
                    \"value\": \"$(python3 -c "import json; print(json.load(open('results.json'))['summary']['total_scsi_errors'])")\"
                }]
            }]
        }"
fi

		

Integration 3: Grafana Dashboard

			
# Export metrics for Grafana
python3 esxi_log_analyzer.py vmkernel.log --json results.json --quiet
# Convert to Prometheus format
python3 << EOF
import json
data = json.load(open('results.json'))
metrics = [
    f"esxi_scsi_errors {data['summary']['total_scsi_errors']}",
    f"esxi_path_failures {data['summary']['path_failures']}",
    f"esxi_timeouts {data['summary']['storage_timeouts']}",
    f"esxi_critical_issues {data['summary']['critical_issues']}"
]
for metric in metrics:
    print(metric)
EOF > /var/lib/node_exporter/esxi_metrics.prom

		

Integration 4: Email Reports

			
#!/bin/bash
# Generate and email daily report
DATE=$(date +%Y-%m-%d)
REPORT="esxi-report-${DATE}.html"
python3 esxi_log_analyzer.py vmkernel.log hostd.log --html $REPORT
# Email with attachment
echo "Daily ESXi log analysis attached." | \
    mail -s "ESXi Report - $DATE" \
         -a $REPORT \
         esxi-team@company.com

		

Command Reference

All Command Options

			
python3 esxi_log_analyzer.py [OPTIONS] logfile1 [logfile2 ...]
Required Arguments:
  logfiles              One or more ESXi log files to analyze
Optional Arguments:
  --type {vmkernel,hostd,vobd}
                        Force log type (auto-detect if not specified)
  
  --json FILE           Export complete results to JSON file
  
  --csv FILE            Export critical issues to CSV file
  
  --html FILE           Generate HTML dashboard report
  
  --limit N             Number of critical issues to display (default: 10)
  
  --quiet               Suppress console output (useful for scripts)
  
  --help                Show help message and exit

Common Command Patterns

			
# Basic analysis
python3 esxi_log_analyzer.py vmkernel.log
# Multi-log analysis
python3 esxi_log_analyzer.py vmkernel.log hostd.log vobd.log
# Export all formats
python3 esxi_log_analyzer.py vmkernel.log --json data.json --csv issues.csv --html report.html
# Quiet mode for automation
python3 esxi_log_analyzer.py vmkernel.log --quiet --csv output.csv
# Show top 50 issues
python3 esxi_log_analyzer.py vmkernel.log --limit 50
# Force log type
python3 esxi_log_analyzer.py unknown.log --type vmkernel

		

Appendix

Sample Output Files

Sample JSON Output Structure

			
{
  "vmkernel": {
    "SCSI_ERROR": [...],
    "PATH_DOWN": [...],
    "STORAGE_TIMEOUT": [...]
  },
  "hostd": {...},
  "vobd": {...},
  "summary": {
    "total_scsi_errors": 45,
    "path_failures": 12,
    "storage_timeouts": 8,
    "reservation_conflicts": 3,
    "hba_errors": 2,
    "hostd_errors": 23,
    "hostd_warnings": 156,
    "vim_faults": 7,
    "task_failures": 5,
    "datastore_errors": 2,
    "vobd_alerts": 4,
    "sensor_warnings": 2,
    "hardware_errors": 1,
    "critical_issues": 34
  },
  "critical_issues": [...]
}

		

SCSI Sense Code Quick Reference

Hex Code	Decimal	Description	Severity
0x0	0	No Sense (OK)	Info
0x2	2	Not Ready	Warning
0x3	3	Medium Error	Error
0x4	4	Hardware Error	Critical
0x5	5	Illegal Request	Error
0x6	6	Unit Attention	Warning
0x7	7	Data Protect	Error
0xb	11	Aborted Command	Error
0xe	14	Overlapped Commands	Warning

Useful ESXi Commands

			
# Rescan storage adapters
esxcli storage core adapter rescan --all
# List all storage paths
esxcli storage core path list
# Check path status
esxcli storage core path list | grep -E "dead|disabled"
# List SCSI LUNs
esxcli storage core device list
# Check HBA status
esxcli storage core adapter list
# View active SCSI reservations
esxcli storage core device list | grep -i reservation

		

Log File Locations on ESXi

Log File	Location	Purpose
vmkernel.log	/var/run/log/vmkernel.log	Kernel, storage, SCSI
hostd.log	/var/run/log/hostd.log	Management, VM operations
vobd.log	/var/run/log/vobd.log	Hardware, sensors
vpxa.log	/var/run/log/vpxa.log	vCenter agent
fdm.log	/var/run/log/fdm.log	HA operations

Support and Resources

VMware Documentation:

VMware vSphere Troubleshooting Guide
ESXi Log File Reference
SCSI Sense Code Documentation

Internal Resources:

Infrastructure Team: infrastructure@company.com
Confluence: https://confluence.company.com/vmware
Ticket System: ServiceNow – Category: VMware/ESXi

Script Repository:

GitHub: https://github.com/company/esxi-tools
Version: 1.0
Last Updated: March 30, 2026

Quick Start Checklist

			
[ ] Python 3.6+ installed and verified
[ ] Script downloaded and made executable
[ ] ESXi log files collected
[ ] Test run completed: python3 esxi_log_analyzer.py vmkernel.log
[ ] Output format selected (console/JSON/CSV/HTML)
[ ] Results reviewed and understood
[ ] Integration configured (if needed)
[ ] Documentation read and bookmarked

		

Document End

For questions or issues with the ESXi Log Analyzer, contact:

IT Infrastructure Team: infrastructure@company.com
Internal Confluence: https://confluence.company.com/esxi-tools

esxi_log_analyzer.py

			
#!/usr/bin/env python3
"""
ESXi Universal Log Analyzer Suite
Parses vmkernel.log, hostd.log, vobd.log, and other ESXi logs
Extracts errors, warnings, SCSI issues, and generates comprehensive reports
"""
import re
import sys
import json
import argparse
from datetime import datetime
from collections import defaultdict, Counter
from typing import Dict, List, Tuple, Any
class ESXiLogAnalyzer:
    """Universal ESXi log parser for vmkernel, hostd, and vobd logs"""
    
    # SCSI Sense Code Mappings
    SCSI_SENSE = {
        '0x0': 'No Sense (OK)',
        '0x2': 'Not Ready',
        '0x3': 'Medium Error',
        '0x4': 'Hardware Error',
        '0x5': 'Illegal Request',
        '0x6': 'Unit Attention',
        '0x7': 'Data Protect',
        '0xb': 'Aborted Command',
        '0xe': 'Overlapped Commands Attempted'
    }
    
    ASC_QUAL = {
        '0x2800': 'LUN Not Ready, Format in Progress',
        '0x3f01': 'Removed Target',
        '0x3f07': 'Multiple LUN Reported',
        '0x4700': 'Reservation Conflict',
        '0x4c00': 'Snapshot Failed',
        '0x5506': 'Illegal Message',
        '0x0800': 'Logical Unit Communication Failure'
    }
    
    # Log type patterns
    LOG_PATTERNS = {
        'vmkernel': {
            'scsi_error': r'VMW_SCSIERR_([0-9a-fA-Fx]+)',
            'path_state': r'path\s+(dead|working|standby|active|disabled)',
            'lun_pattern': r'naa\.([0-9a-fA-F:]+)',
            'storage_timeout': r'(timeout|LUN.*timeout|NMP.*timeout)',
            'reservation': r'(reservation|RESERVATION|scsi_status.*0x18)',
            'path_down': r'path.*dead|path.*down|path.*failed',
            'hba_error': r'vmhba\d+.*error|vmhba\d+.*fail',
            'cpu_pattern': r'cpu(\d+):',
            'timestamp': r'\[(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z)\]'
        },
        'hostd': {
            'error': r'\[Originator@\d+\s+sub=(\w+).*?\]\s+(.+)',
            'warning': r'warning|WARN|Warning',
            'vim_fault': r'vim\.fault\.(\w+)',
            'task_error': r'Task.*failed|Task.*error',
            'datastore_error': r'Datastore.*not found|Datastore.*error',
            'vm_operation': r'(VirtualMachine|VM).*\[(.*?)\]',
            'timestamp': r'(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z)'
        },
        'vobd': {
            'alert': r'\[(\w+)\]\s+\[(\w+)\].*?Alert:?\s+(.+)',
            'sensor': r'sensor.*?(warning|critical|alarm)',
            'hardware': r'hardware.*?(error|fail|fault)',
            'timestamp': r'(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z)'
        }
    }
    
    def __init__(self):
        self.results = {
            'vmkernel': defaultdict(list),
            'hostd': defaultdict(list),
            'vobd': defaultdict(list),
            'summary': {},
            'critical_issues': []
        }
        self.stats = Counter()
        
    def detect_log_type(self, line: str) -> str:
        """Auto-detect log type from line content"""
        if 'vmkernel:' in line or 'cpu' in line.lower() or 'VMW_SCSI' in line:
            return 'vmkernel'
        elif 'Hostd:' in line or 'vim.' in line or '[Originator@' in line:
            return 'hostd'
        elif 'vobd[' in line or 'Alert:' in line or 'sensor' in line.lower():
            return 'vobd'
        return 'unknown'
    
    def parse_timestamp(self, line: str, log_type: str) -> str:
        """Extract timestamp from log line"""
        pattern = self.LOG_PATTERNS[log_type].get('timestamp')
        if pattern:
            match = re.search(pattern, line)
            if match:
                return match.group(1)
        return datetime.now().isoformat()
    
    def parse_vmkernel_line(self, line: str) -> Dict[str, Any]:
        """Parse vmkernel.log line for SCSI errors, paths, and storage issues"""
        result = {'type': 'vmkernel', 'raw': line.strip(), 'issues': []}
        patterns = self.LOG_PATTERNS['vmkernel']
        
        # Extract timestamp
        result['timestamp'] = self.parse_timestamp(line, 'vmkernel')
        
        # SCSI Error Detection
        scsi_match = re.search(patterns['scsi_error'], line)
        if scsi_match:
            sense_code = scsi_match.group(1)
            if not sense_code.startswith('0x'):
                sense_code = f"0x{sense_code}"
            
            result['scsi_sense'] = sense_code
            result['scsi_description'] = self.SCSI_SENSE.get(
                sense_code.lower(), 
                f'Unknown ({sense_code})'
            )
            result['issues'].append('SCSI_ERROR')
            self.stats['scsi_errors'] += 1
        
        # LUN Identification
        lun_match = re.search(patterns['lun_pattern'], line)
        if lun_match:
            result['lun'] = f"naa.{lun_match.group(1)}"
            
            # Check if Tintri LUN
            if '60a980' in lun_match.group(1) or 'tintri' in line.lower():
                result['vendor'] = 'Tintri'
        
        # Path State
        path_match = re.search(patterns['path_state'], line, re.IGNORECASE)
        if path_match:
            result['path_state'] = path_match.group(1).lower()
            if result['path_state'] in ['dead', 'failed', 'disabled']:
                result['issues'].append('PATH_DOWN')
                self.stats['path_failures'] += 1
        
        # Storage Timeout
        if re.search(patterns['storage_timeout'], line, re.IGNORECASE):
            result['issues'].append('STORAGE_TIMEOUT')
            self.stats['timeouts'] += 1
        
        # Reservation Conflict
        if re.search(patterns['reservation'], line, re.IGNORECASE):
            result['issues'].append('RESERVATION_CONFLICT')
            self.stats['reservations'] += 1
        
        # HBA Error
        hba_match = re.search(patterns['hba_error'], line, re.IGNORECASE)
        if hba_match:
            result['issues'].append('HBA_ERROR')
            result['hba'] = re.search(r'vmhba\d+', line).group(0)
            self.stats['hba_errors'] += 1
        
        # CPU Context
        cpu_match = re.search(patterns['cpu_pattern'], line)
        if cpu_match:
            result['cpu'] = int(cpu_match.group(1))
        
        return result if result['issues'] else None
    
    def parse_hostd_line(self, line: str) -> Dict[str, Any]:
        """Parse hostd.log line for VM operations, tasks, and VIM faults"""
        result = {'type': 'hostd', 'raw': line.strip(), 'issues': []}
        patterns = self.LOG_PATTERNS['hostd']
        
        # Extract timestamp
        result['timestamp'] = self.parse_timestamp(line, 'hostd')
        
        # Error/Warning Level
        if re.search(patterns['warning'], line, re.IGNORECASE):
            result['severity'] = 'WARNING'
            self.stats['hostd_warnings'] += 1
        if 'error' in line.lower():
            result['severity'] = 'ERROR'
            result['issues'].append('HOSTD_ERROR')
            self.stats['hostd_errors'] += 1
        
        # VIM Fault Detection
        vim_fault = re.search(patterns['vim_fault'], line)
        if vim_fault:
            result['vim_fault'] = vim_fault.group(1)
            result['issues'].append('VIM_FAULT')
            self.stats['vim_faults'] += 1
        
        # Originator/Subsystem
        originator_match = re.search(patterns['error'], line)
        if originator_match:
            result['subsystem'] = originator_match.group(1)
            result['message'] = originator_match.group(2)
        
        # Task Failure
        if re.search(patterns['task_error'], line, re.IGNORECASE):
            result['issues'].append('TASK_FAILURE')
            self.stats['task_failures'] += 1
        
        # Datastore Issues
        if re.search(patterns['datastore_error'], line, re.IGNORECASE):
            result['issues'].append('DATASTORE_ERROR')
            self.stats['datastore_errors'] += 1
        
        # VM Operation
        vm_match = re.search(patterns['vm_operation'], line)
        if vm_match:
            result['vm_name'] = vm_match.group(2) if vm_match.group(2) else 'Unknown'
        
        return result if result['issues'] else None
    
    def parse_vobd_line(self, line: str) -> Dict[str, Any]:
        """Parse vobd.log line for hardware alerts and sensor warnings"""
        result = {'type': 'vobd', 'raw': line.strip(), 'issues': []}
        patterns = self.LOG_PATTERNS['vobd']
        
        # Extract timestamp
        result['timestamp'] = self.parse_timestamp(line, 'vobd')
        
        # Alert Detection
        alert_match = re.search(patterns['alert'], line)
        if alert_match:
            result['alert_type'] = alert_match.group(1)
            result['severity'] = alert_match.group(2)
            result['message'] = alert_match.group(3)
            result['issues'].append('HARDWARE_ALERT')
            self.stats['vobd_alerts'] += 1
        
        # Sensor Issues
        if re.search(patterns['sensor'], line, re.IGNORECASE):
            result['issues'].append('SENSOR_WARNING')
            self.stats['sensor_warnings'] += 1
        
        # Hardware Errors
        if re.search(patterns['hardware'], line, re.IGNORECASE):
            result['issues'].append('HARDWARE_ERROR')
            self.stats['hardware_errors'] += 1
        
        return result if result['issues'] else None
    
    def parse_file(self, filepath: str, log_type: str = None) -> None:
        """Parse entire log file"""
        print(f"[INFO] Parsing log file: {filepath}")
        
        line_count = 0
        parsed_count = 0
        
        try:
            with open(filepath, 'r', encoding='utf-8', errors='ignore') as f:
                for line in f:
                    line_count += 1
                    
                    # Auto-detect log type if not specified
                    if not log_type:
                        log_type = self.detect_log_type(line)
                    
                    # Parse based on log type
                    if log_type == 'vmkernel':
                        result = self.parse_vmkernel_line(line)
                    elif log_type == 'hostd':
                        result = self.parse_hostd_line(line)
                    elif log_type == 'vobd':
                        result = self.parse_vobd_line(line)
                    else:
                        continue
                    
                    # Store results
                    if result:
                        parsed_count += 1
                        for issue in result['issues']:
                            self.results[log_type][issue].append(result)
                        
                        # Track critical issues
                        if self._is_critical(result):
                            self.results['critical_issues'].append(result)
            
            print(f"[SUCCESS] Parsed {line_count} lines, found {parsed_count} issues")
            
        except FileNotFoundError:
            print(f"[ERROR] File not found: {filepath}")
        except Exception as e:
            print(f"[ERROR] Failed to parse file: {e}")
    
    def _is_critical(self, result: Dict) -> bool:
        """Determine if issue is critical"""
        critical_issues = [
            'SCSI_ERROR', 'PATH_DOWN', 'STORAGE_TIMEOUT',
            'RESERVATION_CONFLICT', 'HARDWARE_ERROR', 'DATASTORE_ERROR'
        ]
        return any(issue in result['issues'] for issue in critical_issues)
    
    def generate_summary(self) -> Dict:
        """Generate summary statistics"""
        summary = {
            'total_scsi_errors': self.stats['scsi_errors'],
            'path_failures': self.stats['path_failures'],
            'storage_timeouts': self.stats['timeouts'],
            'reservation_conflicts': self.stats['reservations'],
            'hba_errors': self.stats['hba_errors'],
            'hostd_errors': self.stats['hostd_errors'],
            'hostd_warnings': self.stats['hostd_warnings'],
            'vim_faults': self.stats['vim_faults'],
            'task_failures': self.stats['task_failures'],
            'datastore_errors': self.stats['datastore_errors'],
            'vobd_alerts': self.stats['vobd_alerts'],
            'sensor_warnings': self.stats['sensor_warnings'],
            'hardware_errors': self.stats['hardware_errors'],
            'critical_issues': len(self.results['critical_issues'])
        }
        
        self.results['summary'] = summary
        return summary
    
    def print_summary(self) -> None:
        """Print analysis summary to console"""
        print("\n" + "="*80)
        print("ESXi LOG ANALYSIS SUMMARY")
        print("="*80)
        
        summary = self.results['summary']
        
        print("\n[VMKERNEL ISSUES]")
        print(f"  SCSI Errors:            {summary['total_scsi_errors']}")
        print(f"  Path Failures:          {summary['path_failures']}")
        print(f"  Storage Timeouts:       {summary['storage_timeouts']}")
        print(f"  Reservation Conflicts:  {summary['reservation_conflicts']}")
        print(f"  HBA Errors:             {summary['hba_errors']}")
        
        print("\n[HOSTD ISSUES]")
        print(f"  Errors:                 {summary['hostd_errors']}")
        print(f"  Warnings:               {summary['hostd_warnings']}")
        print(f"  VIM Faults:             {summary['vim_faults']}")
        print(f"  Task Failures:          {summary['task_failures']}")
        print(f"  Datastore Errors:       {summary['datastore_errors']}")
        
        print("\n[VOBD ISSUES]")
        print(f"  Hardware Alerts:        {summary['vobd_alerts']}")
        print(f"  Sensor Warnings:        {summary['sensor_warnings']}")
        print(f"  Hardware Errors:        {summary['hardware_errors']}")
        
        print("\n[CRITICAL]")
        print(f"  Total Critical Issues:  {summary['critical_issues']}")
        print("="*80)
    
    def print_critical_issues(self, limit: int = 10) -> None:
        """Print top critical issues"""
        print(f"\n[TOP {limit} CRITICAL ISSUES]")
        print("-"*80)
        
        for i, issue in enumerate(self.results['critical_issues'][:limit], 1):
            print(f"\n{i}. [{issue['type'].upper()}] {issue['timestamp']}")
            print(f"   Issues: {', '.join(issue['issues'])}")
            
            if 'lun' in issue:
                print(f"   LUN: {issue['lun']}")
            if 'scsi_description' in issue:
                print(f"   SCSI: {issue['scsi_description']}")
            if 'path_state' in issue:
                print(f"   Path State: {issue['path_state']}")
            if 'vim_fault' in issue:
                print(f"   VIM Fault: {issue['vim_fault']}")
            if 'message' in issue:
                print(f"   Message: {issue['message'][:100]}...")
    
    def export_json(self, output_file: str) -> None:
        """Export results to JSON file"""
        print(f"\n[INFO] Exporting results to: {output_file}")
        
        try:
            with open(output_file, 'w') as f:
                json.dump(self.results, f, indent=2, default=str)
            print(f"[SUCCESS] Results exported successfully")
        except Exception as e:
            print(f"[ERROR] Failed to export JSON: {e}")
    
    def export_csv(self, output_file: str) -> None:
        """Export critical issues to CSV"""
        print(f"\n[INFO] Exporting CSV to: {output_file}")
        
        try:
            with open(output_file, 'w') as f:
                # CSV Header
                f.write("Timestamp,Type,Issues,LUN,SCSI_Error,Path_State,Message\n")
                
                # Write critical issues
                for issue in self.results['critical_issues']:
                    timestamp = issue.get('timestamp', '')
                    log_type = issue.get('type', '')
                    issues = '|'.join(issue.get('issues', []))
                    lun = issue.get('lun', '')
                    scsi = issue.get('scsi_description', '')
                    path = issue.get('path_state', '')
                    message = issue.get('message', issue.get('raw', ''))[:200]
                    
                    # Escape commas in message
                    message = message.replace(',', ';')
                    
                    f.write(f"{timestamp},{log_type},{issues},{lun},{scsi},{path},{message}\n")
            
            print(f"[SUCCESS] CSV exported successfully")
        except Exception as e:
            print(f"[ERROR] Failed to export CSV: {e}")
    
    def export_html_report(self, output_file: str) -> None:
        """Generate HTML dashboard report"""
        print(f"\n[INFO] Generating HTML report: {output_file}")
        
        summary = self.results['summary']
        
        html = f"""
<!DOCTYPE html>
<html>
<head>
    <title>ESXi Log Analysis Report</title>
    <style>
        body {{
            font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
            margin: 20px;
            background: #f5f5f5;
        }}
        .header {{
            background: #dc2626;
            color: white;
            padding: 20px;
            border-radius: 8px;
            margin-bottom: 20px;
        }}
        .summary {{
            display: grid;
            grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
            gap: 15px;
            margin-bottom: 30px;
        }}
        .card {{
            background: white;
            padding: 20px;
            border-radius: 8px;
            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
        }}
        .card h3 {{
            margin-top: 0;
            color: #333;
            font-size: 14px;
            text-transform: uppercase;
        }}
        .card .value {{
            font-size: 32px;
            font-weight: bold;
            color: #dc2626;
        }}
        .critical {{
            background: #fee;
            border-left: 4px solid #dc2626;
        }}
        table {{
            width: 100%;
            background: white;
            border-collapse: collapse;
            border-radius: 8px;
            overflow: hidden;
            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
        }}
        th {{
            background: #333;
            color: white;
            padding: 12px;
            text-align: left;
        }}
        td {{
            padding: 10px 12px;
            border-bottom: 1px solid #eee;
        }}
        tr:hover {{
            background: #f9f9f9;
        }}
        .issue-badge {{
            display: inline-block;
            padding: 4px 8px;
            background: #dc2626;
            color: white;
            border-radius: 4px;
            font-size: 11px;
            margin: 2px;
        }}
    </style>
</head>
<body>
    <div class="header">
        <h1>ESXi Log Analysis Report</h1>
        <p>Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</p>
    </div>
    
    <div class="summary">
        <div class="card critical">
            <h3>Critical Issues</h3>
            <div class="value">{summary['critical_issues']}</div>
        </div>
        <div class="card">
            <h3>SCSI Errors</h3>
            <div class="value">{summary['total_scsi_errors']}</div>
        </div>
        <div class="card">
            <h3>Path Failures</h3>
            <div class="value">{summary['path_failures']}</div>
        </div>
        <div class="card">
            <h3>Storage Timeouts</h3>
            <div class="value">{summary['storage_timeouts']}</div>
        </div>
        <div class="card">
            <h3>Hostd Errors</h3>
            <div class="value">{summary['hostd_errors']}</div>
        </div>
        <div class="card">
            <h3>Hardware Alerts</h3>
            <div class="value">{summary['vobd_alerts']}</div>
        </div>
    </div>
    
    <div class="card">
        <h2>Critical Issues</h2>
        <table>
            <thead>
                <tr>
                    <th>Timestamp</th>
                    <th>Type</th>
                    <th>Issues</th>
                    <th>Details</th>
                </tr>
            </thead>
            <tbody>
"""
        
        # Add critical issues to table
        for issue in self.results['critical_issues'][:50]:
            timestamp = issue.get('timestamp', 'N/A')
            log_type = issue.get('type', 'N/A').upper()
            issues_html = ''.join([f'<span class="issue-badge">{i}</span>' for i in issue['issues']])
            
            details = []
            if 'lun' in issue:
                details.append(f"LUN: {issue['lun']}")
            if 'scsi_description' in issue:
                details.append(f"SCSI: {issue['scsi_description']}")
            if 'path_state' in issue:
                details.append(f"Path: {issue['path_state']}")
            if 'vim_fault' in issue:
                details.append(f"VIM: {issue['vim_fault']}")
            
            details_str = '<br>'.join(details) if details else 'N/A'
            
            html += f"""
                <tr>
                    <td>{timestamp}</td>
                    <td>{log_type}</td>
                    <td>{issues_html}</td>
                    <td>{details_str}</td>
                </tr>
"""
        
        html += """
            </tbody>
        </table>
    </div>
</body>
</html>
"""
        
        try:
            with open(output_file, 'w') as f:
                f.write(html)
            print(f"[SUCCESS] HTML report generated successfully")
        except Exception as e:
            print(f"[ERROR] Failed to generate HTML report: {e}")
def main():
    parser = argparse.ArgumentParser(
        description='ESXi Universal Log Analyzer - Parse vmkernel, hostd, and vobd logs',
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Examples:
  Parse vmkernel log:
    python3 esxi_log_analyzer.py vmkernel.log
  
  Parse multiple logs:
    python3 esxi_log_analyzer.py vmkernel.log hostd.log vobd.log
  
  Export to JSON and CSV:
    python3 esxi_log_analyzer.py vmkernel.log --json results.json --csv results.csv
  
  Generate HTML report:
    python3 esxi_log_analyzer.py vmkernel.log --html report.html
  
  Parse with specific log type:
    python3 esxi_log_analyzer.py esxi.log --type vmkernel
        """
    )
    
    parser.add_argument('logfiles', nargs='+', help='ESXi log files to analyze')
    parser.add_argument('--type', choices=['vmkernel', 'hostd', 'vobd'], 
                       help='Force log type (auto-detect if not specified)')
    parser.add_argument('--json', help='Export results to JSON file')
    parser.add_argument('--csv', help='Export critical issues to CSV file')
    parser.add_argument('--html', help='Generate HTML report')
    parser.add_argument('--limit', type=int, default=10, 
                       help='Number of critical issues to display (default: 10)')
    parser.add_argument('--quiet', action='store_true', 
                       help='Suppress console output')
    
    args = parser.parse_args()
    
    # Create analyzer instance
    analyzer = ESXiLogAnalyzer()
    
    # Parse all log files
    for logfile in args.logfiles:
        analyzer.parse_file(logfile, log_type=args.type)
    
    # Generate summary
    analyzer.generate_summary()
    
    # Print results
    if not args.quiet:
        analyzer.print_summary()
        analyzer.print_critical_issues(limit=args.limit)
    
    # Export results
    if args.json:
        analyzer.export_json(args.json)
    
    if args.csv:
        analyzer.export_csv(args.csv)
    
    if args.html:
        analyzer.export_html_report(args.html)
    
    # Print completion message
    if not args.quiet:
        print("\n[INFO] Analysis complete!")
        if args.json or args.csv or args.html:
            print("[INFO] Export files created successfully")
if __name__ == "__main__":
    main()

		

Key Capabilities

What This Tool Does

In Simple Terms

Problem It Solves

Common Use Cases

Prerequisites

System Requirements

Required Files

Method 1: SSH to ESXi Host

Method 2: Download via Web UI

Method 3: From Support Bundle

Verify Python Installation

Installation

Step 1: Download the Script

Step 2: Make Executable

Step 3: Verify Installation

Quick Start Guide

Example 1: Analyze Single Log File

Example 2: Analyze Multiple Logs

Example 3: Export to CSV for Excel

Example 4: Generate HTML Report

Usage Examples

Basic Analysis

Analyze vmkernel.log Only

Analyze All Three Log Types

Force Specific Log Type (if auto-detection fails)

Export Options

Export to JSON (Complete Data)

Export to CSV (Critical Issues Only)

Generate HTML Dashboard

Export All Formats

Advanced Options

Show More Critical Issues

Quiet Mode (No Console Output)

Analyze Logs from Different Hosts

Understanding the Output

Console Summary Explained

Critical Issue Details

SCSI Error Codes Reference

JSON Output Structure

CSV Output (Excel-Friendly)

HTML Dashboard

Advanced Usage

Scenario 1: Daily Health Check

Scenario 2: Alert on Critical Issues

Scenario 3: Multi-Host Analysis

Scenario 4: Tintri-Specific Analysis

Scenario 5: Historical Trend Analysis

Troubleshooting Guide

Issue 1: “No module named ‘xyz'”

Issue 2: “Permission Denied”

Issue 3: “File not found”

Issue 4: Large Log Files (Memory Issues)

Issue 5: No Issues Found

Best Practices

1. Regular Analysis Schedule

2. Log File Management

3. Integration with Monitoring

4. Documentation Standards

5. Retention Policy

Integration Examples

Integration 1: ServiceNow Ticket Creation

Integration 2: Slack Notifications

Integration 3: Grafana Dashboard

Integration 4: Email Reports

Command Reference

All Command Options

Common Command Patterns

Appendix

Sample Output Files

Sample JSON Output Structure

SCSI Sense Code Quick Reference

Useful ESXi Commands

Log File Locations on ESXi

Support and Resources

Quick Start Checklist

Share this:

Related

Leave a comment Cancel reply