Esxi log analyser

The ESXi Universal Log Analyzer is a professional Python-based tool that automatically parses and analyzes ESXi log files to identify storage issues, hardware problems, and system errors. It processes three critical ESXi log types:

  • vmkernel.log – Storage, SCSI, path failures, and kernel-level issues
  • hostd.log – VM operations, VIM faults, task failures, and management errors
  • vobd.log – Hardware alerts, sensor warnings, and physical component issues

Key Capabilities

FeatureDescription
Multi-Log SupportParse vmkernel, hostd, and vobd logs simultaneously
Auto-DetectionAutomatically identifies log type from content
SCSI DecodingTranslates hex sense codes to human-readable errors
Path AnalysisTracks storage path states (dead, working, standby)
LUN IdentificationExtracts LUN IDs and detects Tintri storage
Multiple OutputsConsole, JSON, CSV, and HTML reports
Zero DependenciesUses only Python standard library

What This Tool Does

In Simple Terms

This tool reads ESXi log files and finds problems automatically. Instead of manually searching through thousands of log lines, it:

  1. Scans log files for errors, warnings, and failures
  2. Categorizes issues by type (SCSI errors, path failures, VM problems, etc.)
  3. Prioritizes critical issues that need immediate attention
  4. Exports results in multiple formats for analysis and reporting

Problem It Solves

Without This Tool:

  • Manually grep through thousands of log lines
  • Look up SCSI sense codes in documentation
  • Track down multiple types of errors across different logs
  • Compile reports manually for tickets

With This Tool:

  • One command analyzes all logs
  • Automatically decodes technical error codes
  • Generates comprehensive reports in seconds
  • Provides actionable information immediately

Common Use Cases

  1. Troubleshooting Storage Issues
    • “Why is my datastore showing latency?”
    • “Why are VMs experiencing storage errors?”
  2. Post-Incident Analysis
    • “What happened during the outage last night?”
    • “Which LUNs failed and when?”
  3. Proactive Monitoring
    • Daily/weekly health checks
    • Trend analysis over time
  4. Support Ticket Creation
    • Generate detailed error reports for VMware support
    • Export evidence for vendor escalations

Prerequisites

System Requirements

RequirementSpecification
Python VersionPython 3.6 or higher
Operating SystemLinux, macOS, or Windows
Memory512 MB minimum (for large log files)
Disk Space100 MB for script and output files

Required Files

You need one or more ESXi log files:

  • vmkernel.log
  • hostd.log
  • vobd.log

Where to Find ESXi Logs:

Method 1: SSH to ESXi Host

# SSH to ESXi
ssh root@esxi-host.example.com
# Logs are located in:
/var/run/log/vmkernel.log
/var/run/log/hostd.log
/var/run/log/vobd.log
# Copy to your workstation
scp root@esxi-host:/var/run/log/*.log /local/path/

Method 2: Download via Web UI

  1. Login to ESXi web UI (https://esxi-host)
  2. Navigate to Host → Monitor → Logs
  3. Select log file
  4. Click Download

Method 3: From Support Bundle

  1. Generate support bundle via web UI or command: vm-support
  2. Extract bundle
  3. Logs are in esx-<hostname>-<date>/var/run/log/

Verify Python Installation

# Check Python version
python3 --version
# Expected output: Python 3.6.x or higher

Installation

Step 1: Download the Script

Save the Python script as esxi_log_analyzer.py:

# Create directory for the script
mkdir -p ~/esxi-tools
cd ~/esxi-tools
# Copy the script content to this file
nano esxi_log_analyzer.py
# (Paste the script content from the artifact)

Step 2: Make Executable

chmod +x esxi_log_analyzer.py

Step 3: Verify Installation

python3 esxi_log_analyzer.py --help

Expected Output:

usage: esxi_log_analyzer.py [-h] [--type {vmkernel,hostd,vobd}] [--json JSON]
[--csv CSV] [--html HTML] [--limit LIMIT] [--quiet]
logfiles [logfiles ...]
ESXi Universal Log Analyzer - Parse vmkernel, hostd, and vobd logs
...

Quick Start Guide

Example 1: Analyze Single Log File

python3 esxi_log_analyzer.py vmkernel.log

What Happens:

  1. Script reads vmkernel.log
  2. Automatically detects it’s a vmkernel log
  3. Parses for SCSI errors, path failures, timeouts
  4. Displays summary and top 10 critical issues

Output:

[INFO] Parsing log file: vmkernel.log
[SUCCESS] Parsed 15234 lines, found 127 issues
================================================================================
ESXi LOG ANALYSIS SUMMARY
================================================================================
[VMKERNEL ISSUES]
SCSI Errors: 45
Path Failures: 12
Storage Timeouts: 8
Reservation Conflicts: 3
HBA Errors: 2
[HOSTD ISSUES]
Errors: 0
Warnings: 0
...
[CRITICAL]
Total Critical Issues: 34
================================================================================
[TOP 10 CRITICAL ISSUES]
--------------------------------------------------------------------------------
1. [VMKERNEL] 2025-12-28T18:46:00.123Z
Issues: SCSI_ERROR, PATH_DOWN
LUN: naa.60a980006482b4f4f4f4f4f4f4f4f4f
SCSI: Aborted Command
Path State: dead
2. [VMKERNEL] 2025-12-28T18:47:15.456Z
Issues: STORAGE_TIMEOUT
LUN: naa.60a980006482b4f4f4f4f4f4f4f4f4f
...

Example 2: Analyze Multiple Logs

python3 esxi_log_analyzer.py vmkernel.log hostd.log vobd.log

What Happens:

  • Parses all three log types
  • Combines results
  • Shows comprehensive summary across all logs

Example 3: Export to CSV for Excel

python3 esxi_log_analyzer.py vmkernel.log --csv report.csv

What Happens:

  • Analyzes log
  • Exports critical issues to report.csv
  • Open in Excel for further analysis

CSV Format:

Timestamp,Type,Issues,LUN,SCSI_Error,Path_State,Message
2025-12-28T18:46:00.123Z,vmkernel,SCSI_ERROR|PATH_DOWN,naa.60a980...,Aborted Command,dead,...

Example 4: Generate HTML Report

python3 esxi_log_analyzer.py vmkernel.log --html report.html

What Happens:

  • Creates beautiful HTML dashboard
  • Open report.html in browser
  • View color-coded summary and issue table

Usage Examples

Basic Analysis

Analyze vmkernel.log Only

python3 esxi_log_analyzer.py vmkernel.log

Analyze All Three Log Types

python3 esxi_log_analyzer.py vmkernel.log hostd.log vobd.log

Force Specific Log Type (if auto-detection fails)

python3 esxi_log_analyzer.py esxi.log --type vmkernel

Export Options

Export to JSON (Complete Data)

python3 esxi_log_analyzer.py vmkernel.log --json results.json

Use Case: API integration, further processing, archival

Export to CSV (Critical Issues Only)

python3 esxi_log_analyzer.py vmkernel.log --csv critical-issues.csv

Use Case: Excel analysis, ticket attachments

Generate HTML Dashboard

python3 esxi_log_analyzer.py vmkernel.log --html dashboard.html

Use Case: Management reports, stakeholder visibility

Export All Formats

python3 esxi_log_analyzer.py vmkernel.log \
--json full-data.json \
--csv issues.csv \
--html report.html

Advanced Options

Show More Critical Issues

python3 esxi_log_analyzer.py vmkernel.log --limit 50

Default: 10 issues. Increase to see more.

Quiet Mode (No Console Output)

python3 esxi_log_analyzer.py vmkernel.log --quiet --json results.json

Use Case: Automated scripts, cron jobs

Analyze Logs from Different Hosts

python3 esxi_log_analyzer.py \
esxi1-vmkernel.log \
esxi2-vmkernel.log \
esxi3-vmkernel.log \
--csv combined-report.csv

Understanding the Output

Console Summary Explained

[VMKERNEL ISSUES]
SCSI Errors: 45 ← SCSI sense code errors detected
Path Failures: 12 ← Storage paths marked as "dead" or "failed"
Storage Timeouts: 8 ← I/O operations that timed out
Reservation Conflicts: 3 ← SCSI reservation issues (often during vMotion)
HBA Errors: 2 ← Host Bus Adapter hardware/driver errors
[HOSTD ISSUES]
Errors: 23 ← hostd daemon errors
Warnings: 156 ← hostd warnings (less severe)
VIM Faults: 7 ← vSphere API (VIM) errors
Task Failures: 5 ← Failed ESXi tasks (VM power ops, etc.)
Datastore Errors: 2 ← Datastore not found or inaccessible
[VOBD ISSUES]
Hardware Alerts: 4 ← Physical hardware alerts
Sensor Warnings: 2 ← Temperature, voltage, fan sensor warnings
Hardware Errors: 1 ← Hardware component failures
[CRITICAL]
Total Critical Issues: 34 ← High-priority issues requiring attention

Critical Issue Details

1. [VMKERNEL] 2025-12-28T18:46:00.123Z
Issues: SCSI_ERROR, PATH_DOWN
LUN: naa.60a980006482b4f4f4f4f4f4f4f4f4f
SCSI: Aborted Command
Path State: dead

What This Means:

  • Timestamp: When the error occurred
  • Issues: Types of problems (SCSI error + path failure)
  • LUN: Storage device identifier (NAA format)
  • SCSI: Human-readable SCSI error (decoded from hex)
  • Path State: Storage path is dead (not operational)

Action Required:

  • Check storage connectivity (cables, switches)
  • Verify SAN configuration
  • Rescan storage adapters
  • Contact storage vendor if persistent

SCSI Error Codes Reference

CodeDescriptionCommon CauseFix
0xb (Aborted Command)I/O operation abortedSAN overload, queue fullIncrease HBA queue depth
0x2 (Not Ready)Device not readyLUN offline, format in progressWait or check storage
0x6 (Unit Attention)Device reset occurredLUN reset, rescanRescan storage adapters
0x47 (Reservation Conflict)SCSI reservation held by another hostvMotion, VAAI conflictStagger vMotion operations
0x4 (Hardware Error)Disk/controller errorPhysical failureReplace disk/HBA

JSON Output Structure

{
"vmkernel": {
"SCSI_ERROR": [
{
"timestamp": "2025-12-28T18:46:00.123Z",
"lun": "naa.60a980...",
"scsi_sense": "0xb",
"scsi_description": "Aborted Command",
"path_state": "dead",
"issues": ["SCSI_ERROR", "PATH_DOWN"],
"raw": "original log line..."
}
],
"PATH_DOWN": [...],
"STORAGE_TIMEOUT": [...]
},
"hostd": {...},
"vobd": {...},
"summary": {
"total_scsi_errors": 45,
"path_failures": 12,
...
},
"critical_issues": [...]
}

CSV Output (Excel-Friendly)

Open report.csv in Excel:

TimestampTypeIssuesLUNSCSI_ErrorPath_StateMessage
2025-12-28T18:46:00vmkernelSCSI_ERROR|PATH_DOWNnaa.60a980…Aborted Commanddead

Excel Analysis:

  • Sort by Issues to group similar problems
  • Filter by LUN to focus on specific storage
  • Pivot table on SCSI_Error for error distribution

HTML Dashboard

Open report.html in browser to see:

Summary Cards:

  • Critical Issues (red card)
  • SCSI Errors
  • Path Failures
  • Storage Timeouts
  • Hostd Errors
  • Hardware Alerts

Critical Issues Table:

  • Timestamp
  • Log Type
  • Issue Tags (color-coded badges)
  • Details (LUN, SCSI error, path state)

Features:

  • Responsive design (works on mobile)
  • Hover effects for readability
  • Professional layout for management

Advanced Usage

Scenario 1: Daily Health Check

Create a script: daily-esxi-check.sh

#!/bin/bash
# Daily ESXi log health check
DATE=$(date +%Y%m%d)
REPORT_DIR="/reports/esxi/$DATE"
mkdir -p $REPORT_DIR
# Analyze logs from all hosts
for host in esxi1 esxi2 esxi3; do
echo "Analyzing $host..."
# Copy logs
scp root@${host}:/var/run/log/*.log /tmp/${host}/
# Run analysis
python3 esxi_log_analyzer.py \
/tmp/${host}/*.log \
--json ${REPORT_DIR}/${host}-results.json \
--csv ${REPORT_DIR}/${host}-issues.csv \
--html ${REPORT_DIR}/${host}-report.html
done
echo "Reports generated in: $REPORT_DIR"

Run daily via cron:

0 2 * * * /scripts/daily-esxi-check.sh

Scenario 2: Alert on Critical Issues

#!/bin/bash
# Alert if critical issues exceed threshold
THRESHOLD=10
REPORT="/tmp/esxi-analysis.json"
python3 esxi_log_analyzer.py vmkernel.log --json $REPORT --quiet
CRITICAL=$(python3 -c "import json; print(json.load(open('$REPORT'))['summary']['critical_issues'])")
if [ "$CRITICAL" -gt "$THRESHOLD" ]; then
echo "ALERT: $CRITICAL critical ESXi issues found!" | \
mail -s "ESXi Critical Alert" admin@company.com
fi

Scenario 3: Multi-Host Analysis

# Analyze logs from multiple hosts
python3 esxi_log_analyzer.py \
esxi1-vmkernel.log \
esxi2-vmkernel.log \
esxi3-vmkernel.log \
esxi1-hostd.log \
esxi2-hostd.log \
esxi3-hostd.log \
--html cluster-report.html \
--csv cluster-issues.csv

Scenario 4: Tintri-Specific Analysis

# Analyze logs and filter Tintri LUNs
python3 esxi_log_analyzer.py vmkernel.log --json results.json
# Extract Tintri-specific issues
python3 -c "
import json
data = json.load(open('results.json'))
tintri_issues = [
issue for issue in data['critical_issues']
if 'lun' in issue and '60a980' in issue['lun']
]
print(f'Tintri LUN issues: {len(tintri_issues)}')
for issue in tintri_issues[:10]:
print(f\" {issue['timestamp']}: {issue['lun']} - {issue.get('scsi_description', 'N/A')}\")
"

Scenario 5: Historical Trend Analysis

#!/bin/bash
# Weekly trend report
for week in {1..4}; do
LOGFILE="vmkernel-week${week}.log"
python3 esxi_log_analyzer.py $LOGFILE --json week${week}.json --quiet
done
# Compare weekly results
python3 -c "
import json
for week in range(1, 5):
data = json.load(open(f'week{week}.json'))
print(f'Week {week}: {data[\"summary\"][\"total_scsi_errors\"]} SCSI errors')
"

Troubleshooting Guide

Issue 1: “No module named ‘xyz'”

Error Message:

ModuleNotFoundError: No module named 'xyz'

Solution: This script uses ONLY Python standard library. This error should not occur. If it does:

# Verify Python version
python3 --version
# Should be 3.6 or higher
# If older, upgrade Python:
# Ubuntu/Debian:
sudo apt update && sudo apt install python3
# macOS:
brew install python3

Issue 2: “Permission Denied”

Error Message:

[ERROR] Failed to parse file: [Errno 13] Permission denied: 'vmkernel.log'

Solution:

# Make log files readable
chmod +r vmkernel.log
# Or run with appropriate permissions
sudo python3 esxi_log_analyzer.py vmkernel.log

Issue 3: “File not found”

Error Message:

[ERROR] File not found: vmkernel.log

Solution:

# Verify file exists
ls -l vmkernel.log
# Use full path
python3 esxi_log_analyzer.py /full/path/to/vmkernel.log
# Or navigate to log directory
cd /path/to/logs
python3 /path/to/esxi_log_analyzer.py vmkernel.log

Issue 4: Large Log Files (Memory Issues)

Error Message:

MemoryError: Unable to allocate array

Solution:

# Split large log files
split -l 100000 vmkernel.log vmkernel-part-
# Analyze each part
for part in vmkernel-part-*; do
python3 esxi_log_analyzer.py $part --csv ${part}.csv
done
# Combine results
cat vmkernel-part-*.csv > combined-report.csv

Issue 5: No Issues Found

Output:

[SUCCESS] Parsed 1000 lines, found 0 issues
Total Critical Issues: 0

Possible Causes:

  1. Logs are healthy – No issues present (good!)
  2. Wrong log type – Try forcing log type: python3 esxi_log_analyzer.py logfile --type vmkernel
  3. Log format changed – Verify log format matches ESXi expected format

Verification:

# Check log content manually
head -50 vmkernel.log
# Look for typical patterns
grep -i "error\|warning\|fail" vmkernel.log | head

Best Practices

1. Regular Analysis Schedule

Run analysis on a schedule:

FrequencyUse CaseCommand
DailyProactive monitoringAutomated script + email if issues
WeeklyTrend analysisGenerate weekly reports
After incidentsRoot cause analysisImmediate deep-dive
Before changesPre-change baselineDocument healthy state

2. Log File Management

# Compress old logs
gzip vmkernel.log.old
# Analyze compressed logs (if supported)
# Uncompress first:
gunzip vmkernel.log.gz
python3 esxi_log_analyzer.py vmkernel.log
# Archive analysis results
mkdir -p /archive/esxi-logs/$(date +%Y%m)
mv *.json *.csv *.html /archive/esxi-logs/$(date +%Y%m)/

3. Integration with Monitoring

# Send metrics to monitoring system
CRITICAL=$(python3 esxi_log_analyzer.py vmkernel.log --json - | \
python3 -c "import sys, json; print(json.load(sys.stdin)['summary']['critical_issues'])")
# Send to monitoring (example: Prometheus push gateway)
echo "esxi_critical_issues $CRITICAL" | \
curl --data-binary @- http://pushgateway:9091/metrics/job/esxi

4. Documentation Standards

For each analysis, document:

  • Date/Time of analysis
  • Host(s) analyzed
  • Log files processed
  • Critical issues count
  • Actions taken
  • Ticket numbers if applicable

5. Retention Policy

Recommended retention:

  • Log files: 30 days
  • JSON exports: 90 days
  • CSV reports: 1 year
  • HTML reports: 30 days (regenerate as needed)

Integration Examples

Integration 1: ServiceNow Ticket Creation

#!/usr/bin/env python3
import json
import requests
# Run analysis
import subprocess
subprocess.run([
'python3', 'esxi_log_analyzer.py',
'vmkernel.log', '--json', 'results.json', '--quiet'
])
# Load results
with open('results.json') as f:
data = json.load(f)
# Create ticket if critical issues
if data['summary']['critical_issues'] > 5:
ticket_data = {
'short_description': f"ESXi Critical Issues: {data['summary']['critical_issues']}",
'description': json.dumps(data['summary'], indent=2),
'urgency': '2',
'impact': '2'
}
response = requests.post(
'https://servicenow.company.com/api/now/table/incident',
auth=('user', 'password'),
json=ticket_data
)
print(f"Ticket created: {response.json()['result']['number']}")

Integration 2: Slack Notifications

#!/bin/bash
# Send Slack notification for critical issues
python3 esxi_log_analyzer.py vmkernel.log --json results.json --quiet
CRITICAL=$(python3 -c "import json; print(json.load(open('results.json'))['summary']['critical_issues'])")
if [ "$CRITICAL" -gt 5 ]; then
curl -X POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL \
-H 'Content-Type: application/json' \
-d "{
\"text\": \"ESXi Alert: $CRITICAL critical issues detected!\",
\"attachments\": [{
\"color\": \"danger\",
\"fields\": [{
\"title\": \"SCSI Errors\",
\"value\": \"$(python3 -c "import json; print(json.load(open('results.json'))['summary']['total_scsi_errors'])")\"
}]
}]
}"
fi

Integration 3: Grafana Dashboard

# Export metrics for Grafana
python3 esxi_log_analyzer.py vmkernel.log --json results.json --quiet
# Convert to Prometheus format
python3 << EOF
import json
data = json.load(open('results.json'))
metrics = [
f"esxi_scsi_errors {data['summary']['total_scsi_errors']}",
f"esxi_path_failures {data['summary']['path_failures']}",
f"esxi_timeouts {data['summary']['storage_timeouts']}",
f"esxi_critical_issues {data['summary']['critical_issues']}"
]
for metric in metrics:
print(metric)
EOF > /var/lib/node_exporter/esxi_metrics.prom

Integration 4: Email Reports

#!/bin/bash
# Generate and email daily report
DATE=$(date +%Y-%m-%d)
REPORT="esxi-report-${DATE}.html"
python3 esxi_log_analyzer.py vmkernel.log hostd.log --html $REPORT
# Email with attachment
echo "Daily ESXi log analysis attached." | \
mail -s "ESXi Report - $DATE" \
-a $REPORT \
esxi-team@company.com

Command Reference

All Command Options

python3 esxi_log_analyzer.py [OPTIONS] logfile1 [logfile2 ...]
Required Arguments:
logfiles One or more ESXi log files to analyze
Optional Arguments:
--type {vmkernel,hostd,vobd}
Force log type (auto-detect if not specified)
--json FILE Export complete results to JSON file
--csv FILE Export critical issues to CSV file
--html FILE Generate HTML dashboard report
--limit N Number of critical issues to display (default: 10)
--quiet Suppress console output (useful for scripts)
--help Show help message and exit

Common Command Patterns

# Basic analysis
python3 esxi_log_analyzer.py vmkernel.log
# Multi-log analysis
python3 esxi_log_analyzer.py vmkernel.log hostd.log vobd.log
# Export all formats
python3 esxi_log_analyzer.py vmkernel.log --json data.json --csv issues.csv --html report.html
# Quiet mode for automation
python3 esxi_log_analyzer.py vmkernel.log --quiet --csv output.csv
# Show top 50 issues
python3 esxi_log_analyzer.py vmkernel.log --limit 50
# Force log type
python3 esxi_log_analyzer.py unknown.log --type vmkernel

Appendix

Sample Output Files

Sample JSON Output Structure

{
"vmkernel": {
"SCSI_ERROR": [...],
"PATH_DOWN": [...],
"STORAGE_TIMEOUT": [...]
},
"hostd": {...},
"vobd": {...},
"summary": {
"total_scsi_errors": 45,
"path_failures": 12,
"storage_timeouts": 8,
"reservation_conflicts": 3,
"hba_errors": 2,
"hostd_errors": 23,
"hostd_warnings": 156,
"vim_faults": 7,
"task_failures": 5,
"datastore_errors": 2,
"vobd_alerts": 4,
"sensor_warnings": 2,
"hardware_errors": 1,
"critical_issues": 34
},
"critical_issues": [...]
}

SCSI Sense Code Quick Reference

Hex CodeDecimalDescriptionSeverity
0x00No Sense (OK)Info
0x22Not ReadyWarning
0x33Medium ErrorError
0x44Hardware ErrorCritical
0x55Illegal RequestError
0x66Unit AttentionWarning
0x77Data ProtectError
0xb11Aborted CommandError
0xe14Overlapped CommandsWarning

Useful ESXi Commands

# Rescan storage adapters
esxcli storage core adapter rescan --all
# List all storage paths
esxcli storage core path list
# Check path status
esxcli storage core path list | grep -E "dead|disabled"
# List SCSI LUNs
esxcli storage core device list
# Check HBA status
esxcli storage core adapter list
# View active SCSI reservations
esxcli storage core device list | grep -i reservation

Log File Locations on ESXi

Log FileLocationPurpose
vmkernel.log/var/run/log/vmkernel.logKernel, storage, SCSI
hostd.log/var/run/log/hostd.logManagement, VM operations
vobd.log/var/run/log/vobd.logHardware, sensors
vpxa.log/var/run/log/vpxa.logvCenter agent
fdm.log/var/run/log/fdm.logHA operations

Support and Resources

VMware Documentation:

  • VMware vSphere Troubleshooting Guide
  • ESXi Log File Reference
  • SCSI Sense Code Documentation

Internal Resources:

Script Repository:


Quick Start Checklist

[ ] Python 3.6+ installed and verified
[ ] Script downloaded and made executable
[ ] ESXi log files collected
[ ] Test run completed: python3 esxi_log_analyzer.py vmkernel.log
[ ] Output format selected (console/JSON/CSV/HTML)
[ ] Results reviewed and understood
[ ] Integration configured (if needed)
[ ] Documentation read and bookmarked

Document End

For questions or issues with the ESXi Log Analyzer, contact:

esxi_log_analyzer.py
#!/usr/bin/env python3
"""
ESXi Universal Log Analyzer Suite
Parses vmkernel.log, hostd.log, vobd.log, and other ESXi logs
Extracts errors, warnings, SCSI issues, and generates comprehensive reports
"""
import re
import sys
import json
import argparse
from datetime import datetime
from collections import defaultdict, Counter
from typing import Dict, List, Tuple, Any
class ESXiLogAnalyzer:
"""Universal ESXi log parser for vmkernel, hostd, and vobd logs"""
# SCSI Sense Code Mappings
SCSI_SENSE = {
'0x0': 'No Sense (OK)',
'0x2': 'Not Ready',
'0x3': 'Medium Error',
'0x4': 'Hardware Error',
'0x5': 'Illegal Request',
'0x6': 'Unit Attention',
'0x7': 'Data Protect',
'0xb': 'Aborted Command',
'0xe': 'Overlapped Commands Attempted'
}
ASC_QUAL = {
'0x2800': 'LUN Not Ready, Format in Progress',
'0x3f01': 'Removed Target',
'0x3f07': 'Multiple LUN Reported',
'0x4700': 'Reservation Conflict',
'0x4c00': 'Snapshot Failed',
'0x5506': 'Illegal Message',
'0x0800': 'Logical Unit Communication Failure'
}
# Log type patterns
LOG_PATTERNS = {
'vmkernel': {
'scsi_error': r'VMW_SCSIERR_([0-9a-fA-Fx]+)',
'path_state': r'path\s+(dead|working|standby|active|disabled)',
'lun_pattern': r'naa\.([0-9a-fA-F:]+)',
'storage_timeout': r'(timeout|LUN.*timeout|NMP.*timeout)',
'reservation': r'(reservation|RESERVATION|scsi_status.*0x18)',
'path_down': r'path.*dead|path.*down|path.*failed',
'hba_error': r'vmhba\d+.*error|vmhba\d+.*fail',
'cpu_pattern': r'cpu(\d+):',
'timestamp': r'\[(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z)\]'
},
'hostd': {
'error': r'\[Originator@\d+\s+sub=(\w+).*?\]\s+(.+)',
'warning': r'warning|WARN|Warning',
'vim_fault': r'vim\.fault\.(\w+)',
'task_error': r'Task.*failed|Task.*error',
'datastore_error': r'Datastore.*not found|Datastore.*error',
'vm_operation': r'(VirtualMachine|VM).*\[(.*?)\]',
'timestamp': r'(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z)'
},
'vobd': {
'alert': r'\[(\w+)\]\s+\[(\w+)\].*?Alert:?\s+(.+)',
'sensor': r'sensor.*?(warning|critical|alarm)',
'hardware': r'hardware.*?(error|fail|fault)',
'timestamp': r'(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z)'
}
}
def __init__(self):
self.results = {
'vmkernel': defaultdict(list),
'hostd': defaultdict(list),
'vobd': defaultdict(list),
'summary': {},
'critical_issues': []
}
self.stats = Counter()
def detect_log_type(self, line: str) -> str:
"""Auto-detect log type from line content"""
if 'vmkernel:' in line or 'cpu' in line.lower() or 'VMW_SCSI' in line:
return 'vmkernel'
elif 'Hostd:' in line or 'vim.' in line or '[Originator@' in line:
return 'hostd'
elif 'vobd[' in line or 'Alert:' in line or 'sensor' in line.lower():
return 'vobd'
return 'unknown'
def parse_timestamp(self, line: str, log_type: str) -> str:
"""Extract timestamp from log line"""
pattern = self.LOG_PATTERNS[log_type].get('timestamp')
if pattern:
match = re.search(pattern, line)
if match:
return match.group(1)
return datetime.now().isoformat()
def parse_vmkernel_line(self, line: str) -> Dict[str, Any]:
"""Parse vmkernel.log line for SCSI errors, paths, and storage issues"""
result = {'type': 'vmkernel', 'raw': line.strip(), 'issues': []}
patterns = self.LOG_PATTERNS['vmkernel']
# Extract timestamp
result['timestamp'] = self.parse_timestamp(line, 'vmkernel')
# SCSI Error Detection
scsi_match = re.search(patterns['scsi_error'], line)
if scsi_match:
sense_code = scsi_match.group(1)
if not sense_code.startswith('0x'):
sense_code = f"0x{sense_code}"
result['scsi_sense'] = sense_code
result['scsi_description'] = self.SCSI_SENSE.get(
sense_code.lower(),
f'Unknown ({sense_code})'
)
result['issues'].append('SCSI_ERROR')
self.stats['scsi_errors'] += 1
# LUN Identification
lun_match = re.search(patterns['lun_pattern'], line)
if lun_match:
result['lun'] = f"naa.{lun_match.group(1)}"
# Check if Tintri LUN
if '60a980' in lun_match.group(1) or 'tintri' in line.lower():
result['vendor'] = 'Tintri'
# Path State
path_match = re.search(patterns['path_state'], line, re.IGNORECASE)
if path_match:
result['path_state'] = path_match.group(1).lower()
if result['path_state'] in ['dead', 'failed', 'disabled']:
result['issues'].append('PATH_DOWN')
self.stats['path_failures'] += 1
# Storage Timeout
if re.search(patterns['storage_timeout'], line, re.IGNORECASE):
result['issues'].append('STORAGE_TIMEOUT')
self.stats['timeouts'] += 1
# Reservation Conflict
if re.search(patterns['reservation'], line, re.IGNORECASE):
result['issues'].append('RESERVATION_CONFLICT')
self.stats['reservations'] += 1
# HBA Error
hba_match = re.search(patterns['hba_error'], line, re.IGNORECASE)
if hba_match:
result['issues'].append('HBA_ERROR')
result['hba'] = re.search(r'vmhba\d+', line).group(0)
self.stats['hba_errors'] += 1
# CPU Context
cpu_match = re.search(patterns['cpu_pattern'], line)
if cpu_match:
result['cpu'] = int(cpu_match.group(1))
return result if result['issues'] else None
def parse_hostd_line(self, line: str) -> Dict[str, Any]:
"""Parse hostd.log line for VM operations, tasks, and VIM faults"""
result = {'type': 'hostd', 'raw': line.strip(), 'issues': []}
patterns = self.LOG_PATTERNS['hostd']
# Extract timestamp
result['timestamp'] = self.parse_timestamp(line, 'hostd')
# Error/Warning Level
if re.search(patterns['warning'], line, re.IGNORECASE):
result['severity'] = 'WARNING'
self.stats['hostd_warnings'] += 1
if 'error' in line.lower():
result['severity'] = 'ERROR'
result['issues'].append('HOSTD_ERROR')
self.stats['hostd_errors'] += 1
# VIM Fault Detection
vim_fault = re.search(patterns['vim_fault'], line)
if vim_fault:
result['vim_fault'] = vim_fault.group(1)
result['issues'].append('VIM_FAULT')
self.stats['vim_faults'] += 1
# Originator/Subsystem
originator_match = re.search(patterns['error'], line)
if originator_match:
result['subsystem'] = originator_match.group(1)
result['message'] = originator_match.group(2)
# Task Failure
if re.search(patterns['task_error'], line, re.IGNORECASE):
result['issues'].append('TASK_FAILURE')
self.stats['task_failures'] += 1
# Datastore Issues
if re.search(patterns['datastore_error'], line, re.IGNORECASE):
result['issues'].append('DATASTORE_ERROR')
self.stats['datastore_errors'] += 1
# VM Operation
vm_match = re.search(patterns['vm_operation'], line)
if vm_match:
result['vm_name'] = vm_match.group(2) if vm_match.group(2) else 'Unknown'
return result if result['issues'] else None
def parse_vobd_line(self, line: str) -> Dict[str, Any]:
"""Parse vobd.log line for hardware alerts and sensor warnings"""
result = {'type': 'vobd', 'raw': line.strip(), 'issues': []}
patterns = self.LOG_PATTERNS['vobd']
# Extract timestamp
result['timestamp'] = self.parse_timestamp(line, 'vobd')
# Alert Detection
alert_match = re.search(patterns['alert'], line)
if alert_match:
result['alert_type'] = alert_match.group(1)
result['severity'] = alert_match.group(2)
result['message'] = alert_match.group(3)
result['issues'].append('HARDWARE_ALERT')
self.stats['vobd_alerts'] += 1
# Sensor Issues
if re.search(patterns['sensor'], line, re.IGNORECASE):
result['issues'].append('SENSOR_WARNING')
self.stats['sensor_warnings'] += 1
# Hardware Errors
if re.search(patterns['hardware'], line, re.IGNORECASE):
result['issues'].append('HARDWARE_ERROR')
self.stats['hardware_errors'] += 1
return result if result['issues'] else None
def parse_file(self, filepath: str, log_type: str = None) -> None:
"""Parse entire log file"""
print(f"[INFO] Parsing log file: {filepath}")
line_count = 0
parsed_count = 0
try:
with open(filepath, 'r', encoding='utf-8', errors='ignore') as f:
for line in f:
line_count += 1
# Auto-detect log type if not specified
if not log_type:
log_type = self.detect_log_type(line)
# Parse based on log type
if log_type == 'vmkernel':
result = self.parse_vmkernel_line(line)
elif log_type == 'hostd':
result = self.parse_hostd_line(line)
elif log_type == 'vobd':
result = self.parse_vobd_line(line)
else:
continue
# Store results
if result:
parsed_count += 1
for issue in result['issues']:
self.results[log_type][issue].append(result)
# Track critical issues
if self._is_critical(result):
self.results['critical_issues'].append(result)
print(f"[SUCCESS] Parsed {line_count} lines, found {parsed_count} issues")
except FileNotFoundError:
print(f"[ERROR] File not found: {filepath}")
except Exception as e:
print(f"[ERROR] Failed to parse file: {e}")
def _is_critical(self, result: Dict) -> bool:
"""Determine if issue is critical"""
critical_issues = [
'SCSI_ERROR', 'PATH_DOWN', 'STORAGE_TIMEOUT',
'RESERVATION_CONFLICT', 'HARDWARE_ERROR', 'DATASTORE_ERROR'
]
return any(issue in result['issues'] for issue in critical_issues)
def generate_summary(self) -> Dict:
"""Generate summary statistics"""
summary = {
'total_scsi_errors': self.stats['scsi_errors'],
'path_failures': self.stats['path_failures'],
'storage_timeouts': self.stats['timeouts'],
'reservation_conflicts': self.stats['reservations'],
'hba_errors': self.stats['hba_errors'],
'hostd_errors': self.stats['hostd_errors'],
'hostd_warnings': self.stats['hostd_warnings'],
'vim_faults': self.stats['vim_faults'],
'task_failures': self.stats['task_failures'],
'datastore_errors': self.stats['datastore_errors'],
'vobd_alerts': self.stats['vobd_alerts'],
'sensor_warnings': self.stats['sensor_warnings'],
'hardware_errors': self.stats['hardware_errors'],
'critical_issues': len(self.results['critical_issues'])
}
self.results['summary'] = summary
return summary
def print_summary(self) -> None:
"""Print analysis summary to console"""
print("\n" + "="*80)
print("ESXi LOG ANALYSIS SUMMARY")
print("="*80)
summary = self.results['summary']
print("\n[VMKERNEL ISSUES]")
print(f" SCSI Errors: {summary['total_scsi_errors']}")
print(f" Path Failures: {summary['path_failures']}")
print(f" Storage Timeouts: {summary['storage_timeouts']}")
print(f" Reservation Conflicts: {summary['reservation_conflicts']}")
print(f" HBA Errors: {summary['hba_errors']}")
print("\n[HOSTD ISSUES]")
print(f" Errors: {summary['hostd_errors']}")
print(f" Warnings: {summary['hostd_warnings']}")
print(f" VIM Faults: {summary['vim_faults']}")
print(f" Task Failures: {summary['task_failures']}")
print(f" Datastore Errors: {summary['datastore_errors']}")
print("\n[VOBD ISSUES]")
print(f" Hardware Alerts: {summary['vobd_alerts']}")
print(f" Sensor Warnings: {summary['sensor_warnings']}")
print(f" Hardware Errors: {summary['hardware_errors']}")
print("\n[CRITICAL]")
print(f" Total Critical Issues: {summary['critical_issues']}")
print("="*80)
def print_critical_issues(self, limit: int = 10) -> None:
"""Print top critical issues"""
print(f"\n[TOP {limit} CRITICAL ISSUES]")
print("-"*80)
for i, issue in enumerate(self.results['critical_issues'][:limit], 1):
print(f"\n{i}. [{issue['type'].upper()}] {issue['timestamp']}")
print(f" Issues: {', '.join(issue['issues'])}")
if 'lun' in issue:
print(f" LUN: {issue['lun']}")
if 'scsi_description' in issue:
print(f" SCSI: {issue['scsi_description']}")
if 'path_state' in issue:
print(f" Path State: {issue['path_state']}")
if 'vim_fault' in issue:
print(f" VIM Fault: {issue['vim_fault']}")
if 'message' in issue:
print(f" Message: {issue['message'][:100]}...")
def export_json(self, output_file: str) -> None:
"""Export results to JSON file"""
print(f"\n[INFO] Exporting results to: {output_file}")
try:
with open(output_file, 'w') as f:
json.dump(self.results, f, indent=2, default=str)
print(f"[SUCCESS] Results exported successfully")
except Exception as e:
print(f"[ERROR] Failed to export JSON: {e}")
def export_csv(self, output_file: str) -> None:
"""Export critical issues to CSV"""
print(f"\n[INFO] Exporting CSV to: {output_file}")
try:
with open(output_file, 'w') as f:
# CSV Header
f.write("Timestamp,Type,Issues,LUN,SCSI_Error,Path_State,Message\n")
# Write critical issues
for issue in self.results['critical_issues']:
timestamp = issue.get('timestamp', '')
log_type = issue.get('type', '')
issues = '|'.join(issue.get('issues', []))
lun = issue.get('lun', '')
scsi = issue.get('scsi_description', '')
path = issue.get('path_state', '')
message = issue.get('message', issue.get('raw', ''))[:200]
# Escape commas in message
message = message.replace(',', ';')
f.write(f"{timestamp},{log_type},{issues},{lun},{scsi},{path},{message}\n")
print(f"[SUCCESS] CSV exported successfully")
except Exception as e:
print(f"[ERROR] Failed to export CSV: {e}")
def export_html_report(self, output_file: str) -> None:
"""Generate HTML dashboard report"""
print(f"\n[INFO] Generating HTML report: {output_file}")
summary = self.results['summary']
html = f"""
<!DOCTYPE html>
<html>
<head>
<title>ESXi Log Analysis Report</title>
<style>
body {{
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
margin: 20px;
background: #f5f5f5;
}}
.header {{
background: #dc2626;
color: white;
padding: 20px;
border-radius: 8px;
margin-bottom: 20px;
}}
.summary {{
display: grid;
grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
gap: 15px;
margin-bottom: 30px;
}}
.card {{
background: white;
padding: 20px;
border-radius: 8px;
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
}}
.card h3 {{
margin-top: 0;
color: #333;
font-size: 14px;
text-transform: uppercase;
}}
.card .value {{
font-size: 32px;
font-weight: bold;
color: #dc2626;
}}
.critical {{
background: #fee;
border-left: 4px solid #dc2626;
}}
table {{
width: 100%;
background: white;
border-collapse: collapse;
border-radius: 8px;
overflow: hidden;
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
}}
th {{
background: #333;
color: white;
padding: 12px;
text-align: left;
}}
td {{
padding: 10px 12px;
border-bottom: 1px solid #eee;
}}
tr:hover {{
background: #f9f9f9;
}}
.issue-badge {{
display: inline-block;
padding: 4px 8px;
background: #dc2626;
color: white;
border-radius: 4px;
font-size: 11px;
margin: 2px;
}}
</style>
</head>
<body>
<div class="header">
<h1>ESXi Log Analysis Report</h1>
<p>Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</p>
</div>
<div class="summary">
<div class="card critical">
<h3>Critical Issues</h3>
<div class="value">{summary['critical_issues']}</div>
</div>
<div class="card">
<h3>SCSI Errors</h3>
<div class="value">{summary['total_scsi_errors']}</div>
</div>
<div class="card">
<h3>Path Failures</h3>
<div class="value">{summary['path_failures']}</div>
</div>
<div class="card">
<h3>Storage Timeouts</h3>
<div class="value">{summary['storage_timeouts']}</div>
</div>
<div class="card">
<h3>Hostd Errors</h3>
<div class="value">{summary['hostd_errors']}</div>
</div>
<div class="card">
<h3>Hardware Alerts</h3>
<div class="value">{summary['vobd_alerts']}</div>
</div>
</div>
<div class="card">
<h2>Critical Issues</h2>
<table>
<thead>
<tr>
<th>Timestamp</th>
<th>Type</th>
<th>Issues</th>
<th>Details</th>
</tr>
</thead>
<tbody>
"""
# Add critical issues to table
for issue in self.results['critical_issues'][:50]:
timestamp = issue.get('timestamp', 'N/A')
log_type = issue.get('type', 'N/A').upper()
issues_html = ''.join([f'<span class="issue-badge">{i}</span>' for i in issue['issues']])
details = []
if 'lun' in issue:
details.append(f"LUN: {issue['lun']}")
if 'scsi_description' in issue:
details.append(f"SCSI: {issue['scsi_description']}")
if 'path_state' in issue:
details.append(f"Path: {issue['path_state']}")
if 'vim_fault' in issue:
details.append(f"VIM: {issue['vim_fault']}")
details_str = '<br>'.join(details) if details else 'N/A'
html += f"""
<tr>
<td>{timestamp}</td>
<td>{log_type}</td>
<td>{issues_html}</td>
<td>{details_str}</td>
</tr>
"""
html += """
</tbody>
</table>
</div>
</body>
</html>
"""
try:
with open(output_file, 'w') as f:
f.write(html)
print(f"[SUCCESS] HTML report generated successfully")
except Exception as e:
print(f"[ERROR] Failed to generate HTML report: {e}")
def main():
parser = argparse.ArgumentParser(
description='ESXi Universal Log Analyzer - Parse vmkernel, hostd, and vobd logs',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
Parse vmkernel log:
python3 esxi_log_analyzer.py vmkernel.log
Parse multiple logs:
python3 esxi_log_analyzer.py vmkernel.log hostd.log vobd.log
Export to JSON and CSV:
python3 esxi_log_analyzer.py vmkernel.log --json results.json --csv results.csv
Generate HTML report:
python3 esxi_log_analyzer.py vmkernel.log --html report.html
Parse with specific log type:
python3 esxi_log_analyzer.py esxi.log --type vmkernel
"""
)
parser.add_argument('logfiles', nargs='+', help='ESXi log files to analyze')
parser.add_argument('--type', choices=['vmkernel', 'hostd', 'vobd'],
help='Force log type (auto-detect if not specified)')
parser.add_argument('--json', help='Export results to JSON file')
parser.add_argument('--csv', help='Export critical issues to CSV file')
parser.add_argument('--html', help='Generate HTML report')
parser.add_argument('--limit', type=int, default=10,
help='Number of critical issues to display (default: 10)')
parser.add_argument('--quiet', action='store_true',
help='Suppress console output')
args = parser.parse_args()
# Create analyzer instance
analyzer = ESXiLogAnalyzer()
# Parse all log files
for logfile in args.logfiles:
analyzer.parse_file(logfile, log_type=args.type)
# Generate summary
analyzer.generate_summary()
# Print results
if not args.quiet:
analyzer.print_summary()
analyzer.print_critical_issues(limit=args.limit)
# Export results
if args.json:
analyzer.export_json(args.json)
if args.csv:
analyzer.export_csv(args.csv)
if args.html:
analyzer.export_html_report(args.html)
# Print completion message
if not args.quiet:
print("\n[INFO] Analysis complete!")
if args.json or args.csv or args.html:
print("[INFO] Export files created successfully")
if __name__ == "__main__":
main()

Leave a comment