The ESXi Universal Log Analyzer is a professional Python-based tool that automatically parses and analyzes ESXi log files to identify storage issues, hardware problems, and system errors. It processes three critical ESXi log types:
- vmkernel.log – Storage, SCSI, path failures, and kernel-level issues
- hostd.log – VM operations, VIM faults, task failures, and management errors
- vobd.log – Hardware alerts, sensor warnings, and physical component issues
Key Capabilities
| Feature | Description |
|---|---|
| Multi-Log Support | Parse vmkernel, hostd, and vobd logs simultaneously |
| Auto-Detection | Automatically identifies log type from content |
| SCSI Decoding | Translates hex sense codes to human-readable errors |
| Path Analysis | Tracks storage path states (dead, working, standby) |
| LUN Identification | Extracts LUN IDs and detects Tintri storage |
| Multiple Outputs | Console, JSON, CSV, and HTML reports |
| Zero Dependencies | Uses only Python standard library |
What This Tool Does
In Simple Terms
This tool reads ESXi log files and finds problems automatically. Instead of manually searching through thousands of log lines, it:
- Scans log files for errors, warnings, and failures
- Categorizes issues by type (SCSI errors, path failures, VM problems, etc.)
- Prioritizes critical issues that need immediate attention
- Exports results in multiple formats for analysis and reporting
Problem It Solves
Without This Tool:
- Manually grep through thousands of log lines
- Look up SCSI sense codes in documentation
- Track down multiple types of errors across different logs
- Compile reports manually for tickets
With This Tool:
- One command analyzes all logs
- Automatically decodes technical error codes
- Generates comprehensive reports in seconds
- Provides actionable information immediately
Common Use Cases
- Troubleshooting Storage Issues
- “Why is my datastore showing latency?”
- “Why are VMs experiencing storage errors?”
- Post-Incident Analysis
- “What happened during the outage last night?”
- “Which LUNs failed and when?”
- Proactive Monitoring
- Daily/weekly health checks
- Trend analysis over time
- Support Ticket Creation
- Generate detailed error reports for VMware support
- Export evidence for vendor escalations
Prerequisites
System Requirements
| Requirement | Specification |
|---|---|
| Python Version | Python 3.6 or higher |
| Operating System | Linux, macOS, or Windows |
| Memory | 512 MB minimum (for large log files) |
| Disk Space | 100 MB for script and output files |
Required Files
You need one or more ESXi log files:
vmkernel.loghostd.logvobd.log
Where to Find ESXi Logs:
Method 1: SSH to ESXi Host
# SSH to ESXissh root@esxi-host.example.com# Logs are located in:/var/run/log/vmkernel.log/var/run/log/hostd.log/var/run/log/vobd.log# Copy to your workstationscp root@esxi-host:/var/run/log/*.log /local/path/
Method 2: Download via Web UI
- Login to ESXi web UI (https://esxi-host)
- Navigate to Host → Monitor → Logs
- Select log file
- Click Download
Method 3: From Support Bundle
- Generate support bundle via web UI or command:
vm-support - Extract bundle
- Logs are in
esx-<hostname>-<date>/var/run/log/
Verify Python Installation
# Check Python versionpython3 --version# Expected output: Python 3.6.x or higher
Installation
Step 1: Download the Script
Save the Python script as esxi_log_analyzer.py:
# Create directory for the scriptmkdir -p ~/esxi-toolscd ~/esxi-tools# Copy the script content to this filenano esxi_log_analyzer.py# (Paste the script content from the artifact)
Step 2: Make Executable
chmod +x esxi_log_analyzer.py
Step 3: Verify Installation
python3 esxi_log_analyzer.py --help
Expected Output:
usage: esxi_log_analyzer.py [-h] [--type {vmkernel,hostd,vobd}] [--json JSON] [--csv CSV] [--html HTML] [--limit LIMIT] [--quiet] logfiles [logfiles ...]ESXi Universal Log Analyzer - Parse vmkernel, hostd, and vobd logs...
Quick Start Guide
Example 1: Analyze Single Log File
python3 esxi_log_analyzer.py vmkernel.log
What Happens:
- Script reads
vmkernel.log - Automatically detects it’s a vmkernel log
- Parses for SCSI errors, path failures, timeouts
- Displays summary and top 10 critical issues
Output:
[INFO] Parsing log file: vmkernel.log[SUCCESS] Parsed 15234 lines, found 127 issues================================================================================ESXi LOG ANALYSIS SUMMARY================================================================================[VMKERNEL ISSUES] SCSI Errors: 45 Path Failures: 12 Storage Timeouts: 8 Reservation Conflicts: 3 HBA Errors: 2[HOSTD ISSUES] Errors: 0 Warnings: 0 ...[CRITICAL] Total Critical Issues: 34================================================================================[TOP 10 CRITICAL ISSUES]--------------------------------------------------------------------------------1. [VMKERNEL] 2025-12-28T18:46:00.123Z Issues: SCSI_ERROR, PATH_DOWN LUN: naa.60a980006482b4f4f4f4f4f4f4f4f4f SCSI: Aborted Command Path State: dead2. [VMKERNEL] 2025-12-28T18:47:15.456Z Issues: STORAGE_TIMEOUT LUN: naa.60a980006482b4f4f4f4f4f4f4f4f4f ...
Example 2: Analyze Multiple Logs
python3 esxi_log_analyzer.py vmkernel.log hostd.log vobd.log
What Happens:
- Parses all three log types
- Combines results
- Shows comprehensive summary across all logs
Example 3: Export to CSV for Excel
python3 esxi_log_analyzer.py vmkernel.log --csv report.csv
What Happens:
- Analyzes log
- Exports critical issues to
report.csv - Open in Excel for further analysis
CSV Format:
Timestamp,Type,Issues,LUN,SCSI_Error,Path_State,Message2025-12-28T18:46:00.123Z,vmkernel,SCSI_ERROR|PATH_DOWN,naa.60a980...,Aborted Command,dead,...
Example 4: Generate HTML Report
python3 esxi_log_analyzer.py vmkernel.log --html report.html
What Happens:
- Creates beautiful HTML dashboard
- Open
report.htmlin browser - View color-coded summary and issue table
Usage Examples
Basic Analysis
Analyze vmkernel.log Only
python3 esxi_log_analyzer.py vmkernel.log
Analyze All Three Log Types
python3 esxi_log_analyzer.py vmkernel.log hostd.log vobd.log
Force Specific Log Type (if auto-detection fails)
python3 esxi_log_analyzer.py esxi.log --type vmkernel
Export Options
Export to JSON (Complete Data)
python3 esxi_log_analyzer.py vmkernel.log --json results.json
Use Case: API integration, further processing, archival
Export to CSV (Critical Issues Only)
python3 esxi_log_analyzer.py vmkernel.log --csv critical-issues.csv
Use Case: Excel analysis, ticket attachments
Generate HTML Dashboard
python3 esxi_log_analyzer.py vmkernel.log --html dashboard.html
Use Case: Management reports, stakeholder visibility
Export All Formats
python3 esxi_log_analyzer.py vmkernel.log \ --json full-data.json \ --csv issues.csv \ --html report.html
Advanced Options
Show More Critical Issues
python3 esxi_log_analyzer.py vmkernel.log --limit 50
Default: 10 issues. Increase to see more.
Quiet Mode (No Console Output)
python3 esxi_log_analyzer.py vmkernel.log --quiet --json results.json
Use Case: Automated scripts, cron jobs
Analyze Logs from Different Hosts
python3 esxi_log_analyzer.py \ esxi1-vmkernel.log \ esxi2-vmkernel.log \ esxi3-vmkernel.log \ --csv combined-report.csv
Understanding the Output
Console Summary Explained
[VMKERNEL ISSUES] SCSI Errors: 45 ← SCSI sense code errors detected Path Failures: 12 ← Storage paths marked as "dead" or "failed" Storage Timeouts: 8 ← I/O operations that timed out Reservation Conflicts: 3 ← SCSI reservation issues (often during vMotion) HBA Errors: 2 ← Host Bus Adapter hardware/driver errors[HOSTD ISSUES] Errors: 23 ← hostd daemon errors Warnings: 156 ← hostd warnings (less severe) VIM Faults: 7 ← vSphere API (VIM) errors Task Failures: 5 ← Failed ESXi tasks (VM power ops, etc.) Datastore Errors: 2 ← Datastore not found or inaccessible[VOBD ISSUES] Hardware Alerts: 4 ← Physical hardware alerts Sensor Warnings: 2 ← Temperature, voltage, fan sensor warnings Hardware Errors: 1 ← Hardware component failures[CRITICAL] Total Critical Issues: 34 ← High-priority issues requiring attention
Critical Issue Details
1. [VMKERNEL] 2025-12-28T18:46:00.123Z Issues: SCSI_ERROR, PATH_DOWN LUN: naa.60a980006482b4f4f4f4f4f4f4f4f4f SCSI: Aborted Command Path State: dead
What This Means:
- Timestamp: When the error occurred
- Issues: Types of problems (SCSI error + path failure)
- LUN: Storage device identifier (NAA format)
- SCSI: Human-readable SCSI error (decoded from hex)
- Path State: Storage path is dead (not operational)
Action Required:
- Check storage connectivity (cables, switches)
- Verify SAN configuration
- Rescan storage adapters
- Contact storage vendor if persistent
SCSI Error Codes Reference
| Code | Description | Common Cause | Fix |
|---|---|---|---|
| 0xb (Aborted Command) | I/O operation aborted | SAN overload, queue full | Increase HBA queue depth |
| 0x2 (Not Ready) | Device not ready | LUN offline, format in progress | Wait or check storage |
| 0x6 (Unit Attention) | Device reset occurred | LUN reset, rescan | Rescan storage adapters |
| 0x47 (Reservation Conflict) | SCSI reservation held by another host | vMotion, VAAI conflict | Stagger vMotion operations |
| 0x4 (Hardware Error) | Disk/controller error | Physical failure | Replace disk/HBA |
JSON Output Structure
{ "vmkernel": { "SCSI_ERROR": [ { "timestamp": "2025-12-28T18:46:00.123Z", "lun": "naa.60a980...", "scsi_sense": "0xb", "scsi_description": "Aborted Command", "path_state": "dead", "issues": ["SCSI_ERROR", "PATH_DOWN"], "raw": "original log line..." } ], "PATH_DOWN": [...], "STORAGE_TIMEOUT": [...] }, "hostd": {...}, "vobd": {...}, "summary": { "total_scsi_errors": 45, "path_failures": 12, ... }, "critical_issues": [...]}
CSV Output (Excel-Friendly)
Open report.csv in Excel:
| Timestamp | Type | Issues | LUN | SCSI_Error | Path_State | Message |
|---|---|---|---|---|---|---|
| 2025-12-28T18:46:00 | vmkernel | SCSI_ERROR|PATH_DOWN | naa.60a980… | Aborted Command | dead | … |
Excel Analysis:
- Sort by
Issuesto group similar problems - Filter by
LUNto focus on specific storage - Pivot table on
SCSI_Errorfor error distribution
HTML Dashboard
Open report.html in browser to see:
Summary Cards:
- Critical Issues (red card)
- SCSI Errors
- Path Failures
- Storage Timeouts
- Hostd Errors
- Hardware Alerts
Critical Issues Table:
- Timestamp
- Log Type
- Issue Tags (color-coded badges)
- Details (LUN, SCSI error, path state)
Features:
- Responsive design (works on mobile)
- Hover effects for readability
- Professional layout for management
Advanced Usage
Scenario 1: Daily Health Check
Create a script: daily-esxi-check.sh
#!/bin/bash# Daily ESXi log health checkDATE=$(date +%Y%m%d)REPORT_DIR="/reports/esxi/$DATE"mkdir -p $REPORT_DIR# Analyze logs from all hostsfor host in esxi1 esxi2 esxi3; do echo "Analyzing $host..." # Copy logs scp root@${host}:/var/run/log/*.log /tmp/${host}/ # Run analysis python3 esxi_log_analyzer.py \ /tmp/${host}/*.log \ --json ${REPORT_DIR}/${host}-results.json \ --csv ${REPORT_DIR}/${host}-issues.csv \ --html ${REPORT_DIR}/${host}-report.htmldoneecho "Reports generated in: $REPORT_DIR"
Run daily via cron:
0 2 * * * /scripts/daily-esxi-check.sh
Scenario 2: Alert on Critical Issues
#!/bin/bash# Alert if critical issues exceed thresholdTHRESHOLD=10REPORT="/tmp/esxi-analysis.json"python3 esxi_log_analyzer.py vmkernel.log --json $REPORT --quietCRITICAL=$(python3 -c "import json; print(json.load(open('$REPORT'))['summary']['critical_issues'])")if [ "$CRITICAL" -gt "$THRESHOLD" ]; then echo "ALERT: $CRITICAL critical ESXi issues found!" | \ mail -s "ESXi Critical Alert" admin@company.comfi
Scenario 3: Multi-Host Analysis
# Analyze logs from multiple hostspython3 esxi_log_analyzer.py \ esxi1-vmkernel.log \ esxi2-vmkernel.log \ esxi3-vmkernel.log \ esxi1-hostd.log \ esxi2-hostd.log \ esxi3-hostd.log \ --html cluster-report.html \ --csv cluster-issues.csv
Scenario 4: Tintri-Specific Analysis
# Analyze logs and filter Tintri LUNspython3 esxi_log_analyzer.py vmkernel.log --json results.json# Extract Tintri-specific issuespython3 -c "import jsondata = json.load(open('results.json'))tintri_issues = [ issue for issue in data['critical_issues'] if 'lun' in issue and '60a980' in issue['lun']]print(f'Tintri LUN issues: {len(tintri_issues)}')for issue in tintri_issues[:10]: print(f\" {issue['timestamp']}: {issue['lun']} - {issue.get('scsi_description', 'N/A')}\")"
Scenario 5: Historical Trend Analysis
#!/bin/bash# Weekly trend reportfor week in {1..4}; do LOGFILE="vmkernel-week${week}.log" python3 esxi_log_analyzer.py $LOGFILE --json week${week}.json --quietdone# Compare weekly resultspython3 -c "import jsonfor week in range(1, 5): data = json.load(open(f'week{week}.json')) print(f'Week {week}: {data[\"summary\"][\"total_scsi_errors\"]} SCSI errors')"
Troubleshooting Guide
Issue 1: “No module named ‘xyz'”
Error Message:
ModuleNotFoundError: No module named 'xyz'
Solution: This script uses ONLY Python standard library. This error should not occur. If it does:
# Verify Python versionpython3 --version# Should be 3.6 or higher# If older, upgrade Python:# Ubuntu/Debian:sudo apt update && sudo apt install python3# macOS:brew install python3
Issue 2: “Permission Denied”
Error Message:
[ERROR] Failed to parse file: [Errno 13] Permission denied: 'vmkernel.log'
Solution:
# Make log files readablechmod +r vmkernel.log# Or run with appropriate permissionssudo python3 esxi_log_analyzer.py vmkernel.log
Issue 3: “File not found”
Error Message:
[ERROR] File not found: vmkernel.log
Solution:
# Verify file existsls -l vmkernel.log# Use full pathpython3 esxi_log_analyzer.py /full/path/to/vmkernel.log# Or navigate to log directorycd /path/to/logspython3 /path/to/esxi_log_analyzer.py vmkernel.log
Issue 4: Large Log Files (Memory Issues)
Error Message:
MemoryError: Unable to allocate array
Solution:
# Split large log filessplit -l 100000 vmkernel.log vmkernel-part-# Analyze each partfor part in vmkernel-part-*; do python3 esxi_log_analyzer.py $part --csv ${part}.csvdone# Combine resultscat vmkernel-part-*.csv > combined-report.csv
Issue 5: No Issues Found
Output:
[SUCCESS] Parsed 1000 lines, found 0 issuesTotal Critical Issues: 0
Possible Causes:
- Logs are healthy – No issues present (good!)
- Wrong log type – Try forcing log type:
python3 esxi_log_analyzer.py logfile --type vmkernel - Log format changed – Verify log format matches ESXi expected format
Verification:
# Check log content manuallyhead -50 vmkernel.log# Look for typical patternsgrep -i "error\|warning\|fail" vmkernel.log | head
Best Practices
1. Regular Analysis Schedule
Run analysis on a schedule:
| Frequency | Use Case | Command |
|---|---|---|
| Daily | Proactive monitoring | Automated script + email if issues |
| Weekly | Trend analysis | Generate weekly reports |
| After incidents | Root cause analysis | Immediate deep-dive |
| Before changes | Pre-change baseline | Document healthy state |
2. Log File Management
# Compress old logsgzip vmkernel.log.old# Analyze compressed logs (if supported)# Uncompress first:gunzip vmkernel.log.gzpython3 esxi_log_analyzer.py vmkernel.log# Archive analysis resultsmkdir -p /archive/esxi-logs/$(date +%Y%m)mv *.json *.csv *.html /archive/esxi-logs/$(date +%Y%m)/
3. Integration with Monitoring
# Send metrics to monitoring systemCRITICAL=$(python3 esxi_log_analyzer.py vmkernel.log --json - | \ python3 -c "import sys, json; print(json.load(sys.stdin)['summary']['critical_issues'])")# Send to monitoring (example: Prometheus push gateway)echo "esxi_critical_issues $CRITICAL" | \ curl --data-binary @- http://pushgateway:9091/metrics/job/esxi
4. Documentation Standards
For each analysis, document:
- Date/Time of analysis
- Host(s) analyzed
- Log files processed
- Critical issues count
- Actions taken
- Ticket numbers if applicable
5. Retention Policy
Recommended retention:
- Log files: 30 days
- JSON exports: 90 days
- CSV reports: 1 year
- HTML reports: 30 days (regenerate as needed)
Integration Examples
Integration 1: ServiceNow Ticket Creation
#!/usr/bin/env python3import jsonimport requests# Run analysisimport subprocesssubprocess.run([ 'python3', 'esxi_log_analyzer.py', 'vmkernel.log', '--json', 'results.json', '--quiet'])# Load resultswith open('results.json') as f: data = json.load(f)# Create ticket if critical issuesif data['summary']['critical_issues'] > 5: ticket_data = { 'short_description': f"ESXi Critical Issues: {data['summary']['critical_issues']}", 'description': json.dumps(data['summary'], indent=2), 'urgency': '2', 'impact': '2' } response = requests.post( 'https://servicenow.company.com/api/now/table/incident', auth=('user', 'password'), json=ticket_data ) print(f"Ticket created: {response.json()['result']['number']}")
Integration 2: Slack Notifications
#!/bin/bash# Send Slack notification for critical issuespython3 esxi_log_analyzer.py vmkernel.log --json results.json --quietCRITICAL=$(python3 -c "import json; print(json.load(open('results.json'))['summary']['critical_issues'])")if [ "$CRITICAL" -gt 5 ]; then curl -X POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL \ -H 'Content-Type: application/json' \ -d "{ \"text\": \"ESXi Alert: $CRITICAL critical issues detected!\", \"attachments\": [{ \"color\": \"danger\", \"fields\": [{ \"title\": \"SCSI Errors\", \"value\": \"$(python3 -c "import json; print(json.load(open('results.json'))['summary']['total_scsi_errors'])")\" }] }] }"fi
Integration 3: Grafana Dashboard
# Export metrics for Grafanapython3 esxi_log_analyzer.py vmkernel.log --json results.json --quiet# Convert to Prometheus formatpython3 << EOFimport jsondata = json.load(open('results.json'))metrics = [ f"esxi_scsi_errors {data['summary']['total_scsi_errors']}", f"esxi_path_failures {data['summary']['path_failures']}", f"esxi_timeouts {data['summary']['storage_timeouts']}", f"esxi_critical_issues {data['summary']['critical_issues']}"]for metric in metrics: print(metric)EOF > /var/lib/node_exporter/esxi_metrics.prom
Integration 4: Email Reports
#!/bin/bash# Generate and email daily reportDATE=$(date +%Y-%m-%d)REPORT="esxi-report-${DATE}.html"python3 esxi_log_analyzer.py vmkernel.log hostd.log --html $REPORT# Email with attachmentecho "Daily ESXi log analysis attached." | \ mail -s "ESXi Report - $DATE" \ -a $REPORT \ esxi-team@company.com
Command Reference
All Command Options
python3 esxi_log_analyzer.py [OPTIONS] logfile1 [logfile2 ...]Required Arguments: logfiles One or more ESXi log files to analyzeOptional Arguments: --type {vmkernel,hostd,vobd} Force log type (auto-detect if not specified) --json FILE Export complete results to JSON file --csv FILE Export critical issues to CSV file --html FILE Generate HTML dashboard report --limit N Number of critical issues to display (default: 10) --quiet Suppress console output (useful for scripts) --help Show help message and exit
Common Command Patterns
# Basic analysispython3 esxi_log_analyzer.py vmkernel.log# Multi-log analysispython3 esxi_log_analyzer.py vmkernel.log hostd.log vobd.log# Export all formatspython3 esxi_log_analyzer.py vmkernel.log --json data.json --csv issues.csv --html report.html# Quiet mode for automationpython3 esxi_log_analyzer.py vmkernel.log --quiet --csv output.csv# Show top 50 issuespython3 esxi_log_analyzer.py vmkernel.log --limit 50# Force log typepython3 esxi_log_analyzer.py unknown.log --type vmkernel
Appendix
Sample Output Files
Sample JSON Output Structure
{ "vmkernel": { "SCSI_ERROR": [...], "PATH_DOWN": [...], "STORAGE_TIMEOUT": [...] }, "hostd": {...}, "vobd": {...}, "summary": { "total_scsi_errors": 45, "path_failures": 12, "storage_timeouts": 8, "reservation_conflicts": 3, "hba_errors": 2, "hostd_errors": 23, "hostd_warnings": 156, "vim_faults": 7, "task_failures": 5, "datastore_errors": 2, "vobd_alerts": 4, "sensor_warnings": 2, "hardware_errors": 1, "critical_issues": 34 }, "critical_issues": [...]}
SCSI Sense Code Quick Reference
| Hex Code | Decimal | Description | Severity |
|---|---|---|---|
| 0x0 | 0 | No Sense (OK) | Info |
| 0x2 | 2 | Not Ready | Warning |
| 0x3 | 3 | Medium Error | Error |
| 0x4 | 4 | Hardware Error | Critical |
| 0x5 | 5 | Illegal Request | Error |
| 0x6 | 6 | Unit Attention | Warning |
| 0x7 | 7 | Data Protect | Error |
| 0xb | 11 | Aborted Command | Error |
| 0xe | 14 | Overlapped Commands | Warning |
Useful ESXi Commands
# Rescan storage adaptersesxcli storage core adapter rescan --all# List all storage pathsesxcli storage core path list# Check path statusesxcli storage core path list | grep -E "dead|disabled"# List SCSI LUNsesxcli storage core device list# Check HBA statusesxcli storage core adapter list# View active SCSI reservationsesxcli storage core device list | grep -i reservation
Log File Locations on ESXi
| Log File | Location | Purpose |
|---|---|---|
| vmkernel.log | /var/run/log/vmkernel.log | Kernel, storage, SCSI |
| hostd.log | /var/run/log/hostd.log | Management, VM operations |
| vobd.log | /var/run/log/vobd.log | Hardware, sensors |
| vpxa.log | /var/run/log/vpxa.log | vCenter agent |
| fdm.log | /var/run/log/fdm.log | HA operations |
Support and Resources
VMware Documentation:
- VMware vSphere Troubleshooting Guide
- ESXi Log File Reference
- SCSI Sense Code Documentation
Internal Resources:
- Infrastructure Team: infrastructure@company.com
- Confluence: https://confluence.company.com/vmware
- Ticket System: ServiceNow – Category: VMware/ESXi
Script Repository:
- GitHub: https://github.com/company/esxi-tools
- Version: 1.0
- Last Updated: March 30, 2026
Quick Start Checklist
[ ] Python 3.6+ installed and verified[ ] Script downloaded and made executable[ ] ESXi log files collected[ ] Test run completed: python3 esxi_log_analyzer.py vmkernel.log[ ] Output format selected (console/JSON/CSV/HTML)[ ] Results reviewed and understood[ ] Integration configured (if needed)[ ] Documentation read and bookmarked
Document End
For questions or issues with the ESXi Log Analyzer, contact:
- IT Infrastructure Team: infrastructure@company.com
- Internal Confluence: https://confluence.company.com/esxi-tools
#!/usr/bin/env python3"""ESXi Universal Log Analyzer SuiteParses vmkernel.log, hostd.log, vobd.log, and other ESXi logsExtracts errors, warnings, SCSI issues, and generates comprehensive reports"""import reimport sysimport jsonimport argparsefrom datetime import datetimefrom collections import defaultdict, Counterfrom typing import Dict, List, Tuple, Anyclass ESXiLogAnalyzer: """Universal ESXi log parser for vmkernel, hostd, and vobd logs""" # SCSI Sense Code Mappings SCSI_SENSE = { '0x0': 'No Sense (OK)', '0x2': 'Not Ready', '0x3': 'Medium Error', '0x4': 'Hardware Error', '0x5': 'Illegal Request', '0x6': 'Unit Attention', '0x7': 'Data Protect', '0xb': 'Aborted Command', '0xe': 'Overlapped Commands Attempted' } ASC_QUAL = { '0x2800': 'LUN Not Ready, Format in Progress', '0x3f01': 'Removed Target', '0x3f07': 'Multiple LUN Reported', '0x4700': 'Reservation Conflict', '0x4c00': 'Snapshot Failed', '0x5506': 'Illegal Message', '0x0800': 'Logical Unit Communication Failure' } # Log type patterns LOG_PATTERNS = { 'vmkernel': { 'scsi_error': r'VMW_SCSIERR_([0-9a-fA-Fx]+)', 'path_state': r'path\s+(dead|working|standby|active|disabled)', 'lun_pattern': r'naa\.([0-9a-fA-F:]+)', 'storage_timeout': r'(timeout|LUN.*timeout|NMP.*timeout)', 'reservation': r'(reservation|RESERVATION|scsi_status.*0x18)', 'path_down': r'path.*dead|path.*down|path.*failed', 'hba_error': r'vmhba\d+.*error|vmhba\d+.*fail', 'cpu_pattern': r'cpu(\d+):', 'timestamp': r'\[(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z)\]' }, 'hostd': { 'error': r'\[Originator@\d+\s+sub=(\w+).*?\]\s+(.+)', 'warning': r'warning|WARN|Warning', 'vim_fault': r'vim\.fault\.(\w+)', 'task_error': r'Task.*failed|Task.*error', 'datastore_error': r'Datastore.*not found|Datastore.*error', 'vm_operation': r'(VirtualMachine|VM).*\[(.*?)\]', 'timestamp': r'(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z)' }, 'vobd': { 'alert': r'\[(\w+)\]\s+\[(\w+)\].*?Alert:?\s+(.+)', 'sensor': r'sensor.*?(warning|critical|alarm)', 'hardware': r'hardware.*?(error|fail|fault)', 'timestamp': r'(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z)' } } def __init__(self): self.results = { 'vmkernel': defaultdict(list), 'hostd': defaultdict(list), 'vobd': defaultdict(list), 'summary': {}, 'critical_issues': [] } self.stats = Counter() def detect_log_type(self, line: str) -> str: """Auto-detect log type from line content""" if 'vmkernel:' in line or 'cpu' in line.lower() or 'VMW_SCSI' in line: return 'vmkernel' elif 'Hostd:' in line or 'vim.' in line or '[Originator@' in line: return 'hostd' elif 'vobd[' in line or 'Alert:' in line or 'sensor' in line.lower(): return 'vobd' return 'unknown' def parse_timestamp(self, line: str, log_type: str) -> str: """Extract timestamp from log line""" pattern = self.LOG_PATTERNS[log_type].get('timestamp') if pattern: match = re.search(pattern, line) if match: return match.group(1) return datetime.now().isoformat() def parse_vmkernel_line(self, line: str) -> Dict[str, Any]: """Parse vmkernel.log line for SCSI errors, paths, and storage issues""" result = {'type': 'vmkernel', 'raw': line.strip(), 'issues': []} patterns = self.LOG_PATTERNS['vmkernel'] # Extract timestamp result['timestamp'] = self.parse_timestamp(line, 'vmkernel') # SCSI Error Detection scsi_match = re.search(patterns['scsi_error'], line) if scsi_match: sense_code = scsi_match.group(1) if not sense_code.startswith('0x'): sense_code = f"0x{sense_code}" result['scsi_sense'] = sense_code result['scsi_description'] = self.SCSI_SENSE.get( sense_code.lower(), f'Unknown ({sense_code})' ) result['issues'].append('SCSI_ERROR') self.stats['scsi_errors'] += 1 # LUN Identification lun_match = re.search(patterns['lun_pattern'], line) if lun_match: result['lun'] = f"naa.{lun_match.group(1)}" # Check if Tintri LUN if '60a980' in lun_match.group(1) or 'tintri' in line.lower(): result['vendor'] = 'Tintri' # Path State path_match = re.search(patterns['path_state'], line, re.IGNORECASE) if path_match: result['path_state'] = path_match.group(1).lower() if result['path_state'] in ['dead', 'failed', 'disabled']: result['issues'].append('PATH_DOWN') self.stats['path_failures'] += 1 # Storage Timeout if re.search(patterns['storage_timeout'], line, re.IGNORECASE): result['issues'].append('STORAGE_TIMEOUT') self.stats['timeouts'] += 1 # Reservation Conflict if re.search(patterns['reservation'], line, re.IGNORECASE): result['issues'].append('RESERVATION_CONFLICT') self.stats['reservations'] += 1 # HBA Error hba_match = re.search(patterns['hba_error'], line, re.IGNORECASE) if hba_match: result['issues'].append('HBA_ERROR') result['hba'] = re.search(r'vmhba\d+', line).group(0) self.stats['hba_errors'] += 1 # CPU Context cpu_match = re.search(patterns['cpu_pattern'], line) if cpu_match: result['cpu'] = int(cpu_match.group(1)) return result if result['issues'] else None def parse_hostd_line(self, line: str) -> Dict[str, Any]: """Parse hostd.log line for VM operations, tasks, and VIM faults""" result = {'type': 'hostd', 'raw': line.strip(), 'issues': []} patterns = self.LOG_PATTERNS['hostd'] # Extract timestamp result['timestamp'] = self.parse_timestamp(line, 'hostd') # Error/Warning Level if re.search(patterns['warning'], line, re.IGNORECASE): result['severity'] = 'WARNING' self.stats['hostd_warnings'] += 1 if 'error' in line.lower(): result['severity'] = 'ERROR' result['issues'].append('HOSTD_ERROR') self.stats['hostd_errors'] += 1 # VIM Fault Detection vim_fault = re.search(patterns['vim_fault'], line) if vim_fault: result['vim_fault'] = vim_fault.group(1) result['issues'].append('VIM_FAULT') self.stats['vim_faults'] += 1 # Originator/Subsystem originator_match = re.search(patterns['error'], line) if originator_match: result['subsystem'] = originator_match.group(1) result['message'] = originator_match.group(2) # Task Failure if re.search(patterns['task_error'], line, re.IGNORECASE): result['issues'].append('TASK_FAILURE') self.stats['task_failures'] += 1 # Datastore Issues if re.search(patterns['datastore_error'], line, re.IGNORECASE): result['issues'].append('DATASTORE_ERROR') self.stats['datastore_errors'] += 1 # VM Operation vm_match = re.search(patterns['vm_operation'], line) if vm_match: result['vm_name'] = vm_match.group(2) if vm_match.group(2) else 'Unknown' return result if result['issues'] else None def parse_vobd_line(self, line: str) -> Dict[str, Any]: """Parse vobd.log line for hardware alerts and sensor warnings""" result = {'type': 'vobd', 'raw': line.strip(), 'issues': []} patterns = self.LOG_PATTERNS['vobd'] # Extract timestamp result['timestamp'] = self.parse_timestamp(line, 'vobd') # Alert Detection alert_match = re.search(patterns['alert'], line) if alert_match: result['alert_type'] = alert_match.group(1) result['severity'] = alert_match.group(2) result['message'] = alert_match.group(3) result['issues'].append('HARDWARE_ALERT') self.stats['vobd_alerts'] += 1 # Sensor Issues if re.search(patterns['sensor'], line, re.IGNORECASE): result['issues'].append('SENSOR_WARNING') self.stats['sensor_warnings'] += 1 # Hardware Errors if re.search(patterns['hardware'], line, re.IGNORECASE): result['issues'].append('HARDWARE_ERROR') self.stats['hardware_errors'] += 1 return result if result['issues'] else None def parse_file(self, filepath: str, log_type: str = None) -> None: """Parse entire log file""" print(f"[INFO] Parsing log file: {filepath}") line_count = 0 parsed_count = 0 try: with open(filepath, 'r', encoding='utf-8', errors='ignore') as f: for line in f: line_count += 1 # Auto-detect log type if not specified if not log_type: log_type = self.detect_log_type(line) # Parse based on log type if log_type == 'vmkernel': result = self.parse_vmkernel_line(line) elif log_type == 'hostd': result = self.parse_hostd_line(line) elif log_type == 'vobd': result = self.parse_vobd_line(line) else: continue # Store results if result: parsed_count += 1 for issue in result['issues']: self.results[log_type][issue].append(result) # Track critical issues if self._is_critical(result): self.results['critical_issues'].append(result) print(f"[SUCCESS] Parsed {line_count} lines, found {parsed_count} issues") except FileNotFoundError: print(f"[ERROR] File not found: {filepath}") except Exception as e: print(f"[ERROR] Failed to parse file: {e}") def _is_critical(self, result: Dict) -> bool: """Determine if issue is critical""" critical_issues = [ 'SCSI_ERROR', 'PATH_DOWN', 'STORAGE_TIMEOUT', 'RESERVATION_CONFLICT', 'HARDWARE_ERROR', 'DATASTORE_ERROR' ] return any(issue in result['issues'] for issue in critical_issues) def generate_summary(self) -> Dict: """Generate summary statistics""" summary = { 'total_scsi_errors': self.stats['scsi_errors'], 'path_failures': self.stats['path_failures'], 'storage_timeouts': self.stats['timeouts'], 'reservation_conflicts': self.stats['reservations'], 'hba_errors': self.stats['hba_errors'], 'hostd_errors': self.stats['hostd_errors'], 'hostd_warnings': self.stats['hostd_warnings'], 'vim_faults': self.stats['vim_faults'], 'task_failures': self.stats['task_failures'], 'datastore_errors': self.stats['datastore_errors'], 'vobd_alerts': self.stats['vobd_alerts'], 'sensor_warnings': self.stats['sensor_warnings'], 'hardware_errors': self.stats['hardware_errors'], 'critical_issues': len(self.results['critical_issues']) } self.results['summary'] = summary return summary def print_summary(self) -> None: """Print analysis summary to console""" print("\n" + "="*80) print("ESXi LOG ANALYSIS SUMMARY") print("="*80) summary = self.results['summary'] print("\n[VMKERNEL ISSUES]") print(f" SCSI Errors: {summary['total_scsi_errors']}") print(f" Path Failures: {summary['path_failures']}") print(f" Storage Timeouts: {summary['storage_timeouts']}") print(f" Reservation Conflicts: {summary['reservation_conflicts']}") print(f" HBA Errors: {summary['hba_errors']}") print("\n[HOSTD ISSUES]") print(f" Errors: {summary['hostd_errors']}") print(f" Warnings: {summary['hostd_warnings']}") print(f" VIM Faults: {summary['vim_faults']}") print(f" Task Failures: {summary['task_failures']}") print(f" Datastore Errors: {summary['datastore_errors']}") print("\n[VOBD ISSUES]") print(f" Hardware Alerts: {summary['vobd_alerts']}") print(f" Sensor Warnings: {summary['sensor_warnings']}") print(f" Hardware Errors: {summary['hardware_errors']}") print("\n[CRITICAL]") print(f" Total Critical Issues: {summary['critical_issues']}") print("="*80) def print_critical_issues(self, limit: int = 10) -> None: """Print top critical issues""" print(f"\n[TOP {limit} CRITICAL ISSUES]") print("-"*80) for i, issue in enumerate(self.results['critical_issues'][:limit], 1): print(f"\n{i}. [{issue['type'].upper()}] {issue['timestamp']}") print(f" Issues: {', '.join(issue['issues'])}") if 'lun' in issue: print(f" LUN: {issue['lun']}") if 'scsi_description' in issue: print(f" SCSI: {issue['scsi_description']}") if 'path_state' in issue: print(f" Path State: {issue['path_state']}") if 'vim_fault' in issue: print(f" VIM Fault: {issue['vim_fault']}") if 'message' in issue: print(f" Message: {issue['message'][:100]}...") def export_json(self, output_file: str) -> None: """Export results to JSON file""" print(f"\n[INFO] Exporting results to: {output_file}") try: with open(output_file, 'w') as f: json.dump(self.results, f, indent=2, default=str) print(f"[SUCCESS] Results exported successfully") except Exception as e: print(f"[ERROR] Failed to export JSON: {e}") def export_csv(self, output_file: str) -> None: """Export critical issues to CSV""" print(f"\n[INFO] Exporting CSV to: {output_file}") try: with open(output_file, 'w') as f: # CSV Header f.write("Timestamp,Type,Issues,LUN,SCSI_Error,Path_State,Message\n") # Write critical issues for issue in self.results['critical_issues']: timestamp = issue.get('timestamp', '') log_type = issue.get('type', '') issues = '|'.join(issue.get('issues', [])) lun = issue.get('lun', '') scsi = issue.get('scsi_description', '') path = issue.get('path_state', '') message = issue.get('message', issue.get('raw', ''))[:200] # Escape commas in message message = message.replace(',', ';') f.write(f"{timestamp},{log_type},{issues},{lun},{scsi},{path},{message}\n") print(f"[SUCCESS] CSV exported successfully") except Exception as e: print(f"[ERROR] Failed to export CSV: {e}") def export_html_report(self, output_file: str) -> None: """Generate HTML dashboard report""" print(f"\n[INFO] Generating HTML report: {output_file}") summary = self.results['summary'] html = f"""<!DOCTYPE html><html><head> <title>ESXi Log Analysis Report</title> <style> body {{ font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; margin: 20px; background: #f5f5f5; }} .header {{ background: #dc2626; color: white; padding: 20px; border-radius: 8px; margin-bottom: 20px; }} .summary {{ display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 15px; margin-bottom: 30px; }} .card {{ background: white; padding: 20px; border-radius: 8px; box-shadow: 0 2px 4px rgba(0,0,0,0.1); }} .card h3 {{ margin-top: 0; color: #333; font-size: 14px; text-transform: uppercase; }} .card .value {{ font-size: 32px; font-weight: bold; color: #dc2626; }} .critical {{ background: #fee; border-left: 4px solid #dc2626; }} table {{ width: 100%; background: white; border-collapse: collapse; border-radius: 8px; overflow: hidden; box-shadow: 0 2px 4px rgba(0,0,0,0.1); }} th {{ background: #333; color: white; padding: 12px; text-align: left; }} td {{ padding: 10px 12px; border-bottom: 1px solid #eee; }} tr:hover {{ background: #f9f9f9; }} .issue-badge {{ display: inline-block; padding: 4px 8px; background: #dc2626; color: white; border-radius: 4px; font-size: 11px; margin: 2px; }} </style></head><body> <div class="header"> <h1>ESXi Log Analysis Report</h1> <p>Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</p> </div> <div class="summary"> <div class="card critical"> <h3>Critical Issues</h3> <div class="value">{summary['critical_issues']}</div> </div> <div class="card"> <h3>SCSI Errors</h3> <div class="value">{summary['total_scsi_errors']}</div> </div> <div class="card"> <h3>Path Failures</h3> <div class="value">{summary['path_failures']}</div> </div> <div class="card"> <h3>Storage Timeouts</h3> <div class="value">{summary['storage_timeouts']}</div> </div> <div class="card"> <h3>Hostd Errors</h3> <div class="value">{summary['hostd_errors']}</div> </div> <div class="card"> <h3>Hardware Alerts</h3> <div class="value">{summary['vobd_alerts']}</div> </div> </div> <div class="card"> <h2>Critical Issues</h2> <table> <thead> <tr> <th>Timestamp</th> <th>Type</th> <th>Issues</th> <th>Details</th> </tr> </thead> <tbody>""" # Add critical issues to table for issue in self.results['critical_issues'][:50]: timestamp = issue.get('timestamp', 'N/A') log_type = issue.get('type', 'N/A').upper() issues_html = ''.join([f'<span class="issue-badge">{i}</span>' for i in issue['issues']]) details = [] if 'lun' in issue: details.append(f"LUN: {issue['lun']}") if 'scsi_description' in issue: details.append(f"SCSI: {issue['scsi_description']}") if 'path_state' in issue: details.append(f"Path: {issue['path_state']}") if 'vim_fault' in issue: details.append(f"VIM: {issue['vim_fault']}") details_str = '<br>'.join(details) if details else 'N/A' html += f""" <tr> <td>{timestamp}</td> <td>{log_type}</td> <td>{issues_html}</td> <td>{details_str}</td> </tr>""" html += """ </tbody> </table> </div></body></html>""" try: with open(output_file, 'w') as f: f.write(html) print(f"[SUCCESS] HTML report generated successfully") except Exception as e: print(f"[ERROR] Failed to generate HTML report: {e}")def main(): parser = argparse.ArgumentParser( description='ESXi Universal Log Analyzer - Parse vmkernel, hostd, and vobd logs', formatter_class=argparse.RawDescriptionHelpFormatter, epilog="""Examples: Parse vmkernel log: python3 esxi_log_analyzer.py vmkernel.log Parse multiple logs: python3 esxi_log_analyzer.py vmkernel.log hostd.log vobd.log Export to JSON and CSV: python3 esxi_log_analyzer.py vmkernel.log --json results.json --csv results.csv Generate HTML report: python3 esxi_log_analyzer.py vmkernel.log --html report.html Parse with specific log type: python3 esxi_log_analyzer.py esxi.log --type vmkernel """ ) parser.add_argument('logfiles', nargs='+', help='ESXi log files to analyze') parser.add_argument('--type', choices=['vmkernel', 'hostd', 'vobd'], help='Force log type (auto-detect if not specified)') parser.add_argument('--json', help='Export results to JSON file') parser.add_argument('--csv', help='Export critical issues to CSV file') parser.add_argument('--html', help='Generate HTML report') parser.add_argument('--limit', type=int, default=10, help='Number of critical issues to display (default: 10)') parser.add_argument('--quiet', action='store_true', help='Suppress console output') args = parser.parse_args() # Create analyzer instance analyzer = ESXiLogAnalyzer() # Parse all log files for logfile in args.logfiles: analyzer.parse_file(logfile, log_type=args.type) # Generate summary analyzer.generate_summary() # Print results if not args.quiet: analyzer.print_summary() analyzer.print_critical_issues(limit=args.limit) # Export results if args.json: analyzer.export_json(args.json) if args.csv: analyzer.export_csv(args.csv) if args.html: analyzer.export_html_report(args.html) # Print completion message if not args.quiet: print("\n[INFO] Analysis complete!") if args.json or args.csv or args.html: print("[INFO] Export files created successfully")if __name__ == "__main__": main()