VixDiskLib API

On ESXi hosts, virtual machine disk (VMDK) files are usually located under one of the /vmfs/volumes, perhaps on shared storage. Storage volumes are visible from the vSphere Client, in the inventory of hosts and clusters. Typical names are datastore1 and datastore2. To see a VMDK file, click Summary > Resources > Datastore, right-click Browse Datastore, and select a virtual machine.
On Workstation, VMDK files are stored in the same directory with virtual machine configuration (VMX) files, for example, /path/to/disk on Linux or C:\My Documents\My Virtual Machines on Windows.
VMDK files store data representing a virtual machine’s hard disk drive. Almost the entire portion of a VMDK file is the virtual machine’s data, with a small portion allotted to overhead.

Initialize the Library:
+++++++++++++++++

VixDiskLib_Init() initializes the old virtual disk library. The arguments majorVersion and minorVersion represent the VDDK library’s release number and dot-release number. The optional third, fourth, and fifth arguments specify log, warning, and panic handlers. DLLs and shared objects may be located in libDir.
VixError vixError = VixDiskLib_Init(majorVer, minorVer, &logFunc, &warnFunc, &panicFunc, libDir);
You should call VixDiskLib_Init() only once per process because of internationalization restrictions, at the beginning of your program. You should call VixDiskLib_Exit() at the end of your program for cleanup. For multithreaded programs you should write your own logFunc because the default function is not thread safe.
In most cases you should replace VixDiskLib_Init() with VixDiskLib_InitEx(), which allows you to specify a configuration file.

Virtual Disk Types:
++++++++++++++++

The following disk types are defined in the virtual disk library:
>>>
VIXDISKLIB_DISK_MONOLITHIC_SPARSE – Growable virtual disk contained in a single virtual disk file. This is the default type for hosted disk, and the only setting in the Virtual Disk API .
>>>
VIXDISKLIB_DISK_MONOLITHIC_FLAT – Preallocated virtual disk contained in a single virtual disk file. This takes time to create and occupies a lot of space, but might perform better than sparse.
>>>
VIXDISKLIB_DISK_SPLIT_SPARSE – Growable virtual disk split into 2GB extents (s sequence). These files can to 2GB, then continue growing in a new extent. This type works on older file systems.
>>>
VIXDISKLIB_DISK_SPLIT_FLAT – Preallocated virtual disk split into 2GB extents (f sequence). These files start at 2GB, so they take a while to create, but available space can grow in 2GB increments.
>>>
VIXDISKLIB_DISK_VMFS_FLAT – Preallocated virtual disk compatible with ESX 3 and later. Also known as thick disk. This managed disk type is discussed in Managed Disk and Hosted Disk.
>>>
VIXDISKLIB_DISK_VMFS_SPARSE – Employs a copy-on-write (COW) mechanism to save storage space.
>>>
VIXDISKLIB_DISK_VMFS_THIN – A Growable virtual disk that consumes only as much space as needed, compatible with ESX 3 or later, supported by VDDK 1.1 or later, and highly recommended.
>>>
VIXDISKLIB_DISK_STREAM_OPTIMIZED – Monolithic sparse format compressed for streaming. Stream optimized format does not support random reads or writes.

Check the sample programs on :

http://pubs.vmware.com/vsphere-65/index.jsp#com.vmware.vddk.pg.doc/vddkSample.7.2.html#995259

 

Confirming connectivity to a TCP port with telnet

While the ping command confirms connectivity, it does not necessarily mean that all TCP ports on the remote host can be reached. It is possible for a network firewall to allow or block access to certain ports on a host.

 

To check if specific TCP ports are running on the remote host, you can use the telnet command to confirm if a port is online.

# telnet destination-ip destination-port

When trying to establish a telnet connection to TCP port 80, you see an output similar to:

 

# telnet 192.168.1.11 80Trying 192.168.1.11…

Connected to 192.168.1.11.

Escape character is ‘^]’.

In this sample output, you can see that you are connected to port 80 (http) on the server with IP address 192.168.1.11.

 

If you choose a port number for a service that is not running on the host, you see an output similar to:

 

# telnet 192.168.48.1.11 

Trying 192.168.1.11…

telnet: Unable to connect to remote host: Connection timed out

In this case, you can see that there is no response when you attempt to connect to port 81 on the server 192.168.48.133.

Note: Telnet is an application that operates using the TCP protocol. UDP connectivity can not be tested using Telnet.

 

Viewing active TCP/UDP connections with netstat and esxcli network

When troubleshooting network connectivity issues, it may be helpful to see all the active incoming and outgoing TCP/UDP connections on an ESX/ESXi host. ESX hosts can use the netstat command and ESXi 4.1 and later hosts can use esxcli network to show the list of TCP/UDP connections. The commands are: 

ESX 3.5/4.x – # netstat -tnp

ESXi 4.1 – # esxcli network connection list

ESXi 5.0 – # esxcli network ip connection list

 

ESXi 5.1 – # esxcli network ip connection list 

ESXi 5.5 – # esxcli network ip connection list

ESXi 6.0 – # esxcli network ip connection list

ESXi 6.5 – # esxcli network ip connection list

Sample output from an ESXi 4.1 host:

 

# esxcli network connection list 

Proto  Recv-Q  Send-Q  Local Address       Foreign Address     State        World ID

tcp    0       52      192.168.1.11:22   192.168.25.1:55169  ESTABLISHED  0

tcp    0       0       127.0.0.1:62024     127.0.0.1:5988      TIME_WAIT    0

tcp    0       0       127.0.0.1:57867     127.0.0.1:5988      TIME_WAIT    0

tcp    0       0       127.0.0.1:62196     127.0.0.1:5988      TIME_WAIT    0

tcp    0       0       127.0.0.1:8307      127.0.0.1:52943     ESTABLISHED  5790

tcp    0       0       127.0.0.1:52943     127.0.0.1:8307      ESTABLISHED  5790

tcp    0       0       127.0.0.1:80        127.0.0.1:55629     ESTABLISHED  5785

tcp    0       0       127.0.0.1:55629     127.0.0.1:80        ESTABLISHED  6613

tcp    0       0       127.0.0.1:8307      127.0.0.1:56319     ESTABLISHED  5785

tcp    0       0       127.0.0.1:56319     127.0.0.1:8307      ESTABLISHED  5785

tcp    0       0       127.0.0.1:80        127.0.0.1:62782     ESTABLISHED  5166

tcp    0       0       127.0.0.1:62782     127.0.0.1:80        ESTABLISHED  6613

tcp    0       0       127.0.0.1:5988      127.0.0.1:53808     FIN_WAIT_2   0

tcp    0       0       127.0.0.1:53808     127.0.0.1:5988      CLOSE_WAIT   5166

tcp    0       0       127.0.0.1:8307      127.0.0.1:56963     CLOSE_WAIT   5788

tcp    0       0       127.0.0.1:56963     127.0.0.1:8307      FIN_WAIT_2   5785

tcp    0       0       127.0.0.1:8307      0.0.0.0:0           LISTEN       5031

tcp    0       0       127.0.0.1:8309      0.0.0.0:0           LISTEN       5031

tcp    0       0       127.0.0.1:5988      0.0.0.0:0           LISTEN       0

tcp    0       0       0.0.0.0:5989        0.0.0.0:0           LISTEN       0

tcp    0       0       0.0.0.0:80          0.0.0.0:0           LISTEN       5031

tcp    0       0       0.0.0.0:443         0.0.0.0:0           LISTEN       5031

tcp    0       0       127.0.0.1:12001     0.0.0.0:0           LISTEN       5031

tcp    0       0       127.0.0.1:8889      0.0.0.0:0           LISTEN       5331

tcp    0       0       192.168.1.11:427  0.0.0.0:0           LISTEN       0

tcp    0       0       127.0.0.1:427       0.0.0.0:0           LISTEN       0

tcp    0       0       0.0.0.0:22          0.0.0.0:0           LISTEN       0

tcp    0       0       0.0.0.0:902         0.0.0.0:0           LISTEN       0

tcp    0       0       0.0.0.0:8000        0.0.0.0:0           LISTEN       4801

tcp    0       0       0.0.0.0:8100        0.0.0.0:0           LISTEN       4795

udp    0       0       192.168.1.11:427  0.0.0.0:0                        0

udp    0       0       0.0.0.0:427         0.0.0.0:0                        0

udp    0       0       192.168.1.11:68   0.0.0.0:0                        4693

udp    0       0       0.0.0.0:8200        0.0.0.0:0                        4795

udp    0       0       0.0.0.0:8301        0.0.0.0:0                        4686

udp    0       0       0.0.0.0:8302        0.0.0.0:0                        4686

To retrieve errors and statistics for a network adapter, run this command:

# esxcli network nic stats get -n <vmnicX>

Where <vmnicX> is the name of a NIC in your ESXi host.

How Does vMotion Works?

If you need to take a host offline for maintenance, you can move the virtual machine to another host. Migration with vMotion™ allows virtual machine processes to continue working throughout a migration.

With vMotion, you can change the host on which a virtual machine is running, or you can change both the host and the datastore of the virtual machine.

When you migrate virtual machines with vMotion and choose to change only the host, the entire state of the virtual machine is moved to the new host. The associated virtual disk remains in the same location on storage that is shared between the two hosts.

When you choose to change both the host and the datastore, the virtual machine state is moved to a new host and the virtual disk is moved to another datastore. vMotion migration to another host and datastore is possible in vSphere environments without shared storage.

After the virtual machine state is migrated to the alternate host, the virtual machine runs on the new host. Migrations with vMotion are completely transparent to the running virtual machine.

The state information includes the current memory content and all the information that defines and identifies the virtual machine. The memory content includes transaction data and the bits of the operating system and applications that are in the memory. The defining and identification information stored in the state includes all the data that maps to the virtual machine hardware elements, such as BIOS, devices, CPU, MAC addresses for the Ethernet cards, chip set states, registers, and so forth.

When you migrate a virtual machine with vMotion, the new host for the virtual machine must meet compatibility requirements so that the migration can proceed.

Below are the steps :

  1. The first step is to ensure that the source VM can be operated on the chosen destination server (same CPU architecture and make sure you configure a vMotion Network else it operates on Management network of a host).
  2. Then a second VM process is started on the target host and the resources are reserved.
  3. Next, a system memory checkpoint is created. This means all changes to the source VM are written to an extra memory area.
  4. The contents of the system memory recorded at the checkpoint are transferred to the target VM(on the destination host).
  5. The checkpoint/checkpoint-restore process is repeated until only the smallest change sets remain in the target VM’s memory.
  6. The CPU of the source VM is stopped.
  7. The last modifications to the main memory are transferred to the target VM in milliseconds.
  8. The vMotion process is ended and a reverse ARP packet is sent to the physical switch (important: Notify Switches must be activated in the properties of the virtual switch). Hard disk access is taken over by the target ESX.
  9. The source VM is shut down. This means the VM process on the source ESXi is deleted.

A very good article on troubleshooting VMotion issue is Understanding and troubleshooting vMotion (1003734)

All about DRS

VMware vSphere Distributed Resource Schedule (DRS) is the resource scheduling and load balancing solution for vSphere. DRS works on a cluster of ESXi hosts and provides resource management capabilities like load-balancing and virtual machine (VM) placement. DRS also enforces user-defined resource allocation policies at the cluster level, while working with system-level constraints.
Although DRS is widely deployed and generally understood, questions about “how” DRS does what it does are not uncommon. Not knowing exactly how DRS works often leads to confusion and improper expectations about DRS behavior and its performance.

Let’s take a closer look at how DRS achieves its goal of ensuring VMs are happy, with effective placement and
efficient load balancing.

Effective VM Placement:

++++++++++++++++++++
When a VM is being powered up in a DRS cluster, DRS runs its algorithm to determine the right ESXi host for it to be powered up on. This decision, also known as VM placement (or initial placement) is made based on the expected change in resource distribution (after ensuring that there are no constraint violations if the VM was placed on the host).

 

Efficient Load Balancing

+++++++++++++++++++
When the host resources in a cluster are more or less evenly utilized, then the cluster is well balanced. DRS uses a cluster-level balance metric to make load-balancing decisions. This balance metric is calculated from the standard deviation of resource utilization data from hosts in the cluster. DRS runs its algorithm once every 5 minutes (by default) to study imbalance in the cluster. In each round, if it needs to balance the load, DRS uses vMotion to migrate running VMs from one ESXi host to another.

Calculating VM Resource Demand

+++++++++++++++++++++++++++
In calculating the resource utilization, DRS looks for the demand for every running VM in the cluster. VM demand is the amount of resources that the VM currently needs to run. For CPU, demand is calculated based on the amount of CPU the VM is currently consuming. For memory, demand is calculated based on the following formula.
VM memory demand = Function(Active memory used, Swapped, Shared) + 25% (idle consumed memory)
In other words, by default, DRS balances memory workloads based mostly on a VM’s active memory usage, while considering a small amount of its idle consumed memory as a cushion for any increase in workload. This behavior enables you to efficiently run memory workloads in your DRS clusters, even with over-commitment.

Detecting VM Demand Changes

+++++++++++++++++++++++++
During each round, along with resource usage data, DRS also collects resource availability data from each and every VM and host in the cluster. Data like VM CPU average and VM CPU max over the last collection interval depict the resource usage trend for a given VM, while availability data like VM CPU Ready Time2 and VM Memory Swapped3 indicate resource shortage, if any, for the VM (availability data indicate if a VM is running short of resources). DRS then correlates the resource usage data with the availability data and runs its loadbalancing algorithm before taking necessary vMotion actions in order to keep the cluster balanced and toensure that VMs are always getting the resources they need to run.

Cost Benefit Analysis

+++++++++++++++++
vMotion of live VMs comes with a performance cost, which depends on the size of the VM being migrated. If the VM is large, it will use a lot of the current host’s and target host’s CPU and memory for vMotion. The benefit,however, is in terms of performance for VMs on the source host, the migrated VM on the destination host, and improved load balance across the cluster.

Factors That Affect DRS Behavior: 
While DRS constantly works to ensure that VMs are getting the resources they need, it also provides several useful customizations that work very well for a variety of cluster needs. By understanding these customizations, you can get the best performance out of DRS and have it meet your expectations. In this section, we discuss some of the customizations and factors that affect DRS and how to use them for best performance.

During initial placement and load balancing, DRS generates placement and vMotion recommendations,respectively. DRS can apply these recommendations automatically, or you can apply them manually.
DRS has three levels of automation:
Fully Automated – DRS applies both initial placement and load balancing recommendations automatically.
Partially Automated – DRS applies recommendations only for initial placement.
Manual – You must apply both initial placement and load balancing recommendations.

The DRS aggression level controls the amount of imbalance that will be tolerated in the cluster. DRS has fiveaggression levels ranging between 1 (most conservative) and 5 (most aggressive). The more aggressive the level, the less DRS tolerates imbalance in the cluster. The more conservative, the more DRS tolerates imbalance.As a result, you might see DRS initiate more migrations and generate a more even load distribution when you increase the aggression level. By default, DRS aggression level is set to 3. However, if you do need DRS to be more active in load balancing at the
cost of increased live migrations, you can increase the DRS aggression level. When DRS aggression is set to level 1, DRS will not load balance the VMs. DRS will only apply move recommendations that must be taken either to satisfy hard constraints, such as affinity or anti-affinity rules, or to evacuate VMs from a host entering maintenance or standby mode.

 

This is for DRS automation :

Reservation, Limit, and Shares:
DRS provides many tools for you to customize your VMs and workloads according to specific use cases. Reservation, limit, and shares

Reservation:
You might need to guarantee compute resources to some critical VMs in your clusters. This is often the case when running applications that cannot tolerate any type of resource shortage, or when running an application that is always expected to be up and serving requests from other parts of the infrastructure.
With the help of reservations, you can guarantee a specified amount of CPU or memory to your critical VMs.
Reservations can be made for an individual VM, or at the resource pool level. In a resource pool with several
VMs, a reservation guarantees resources collectively for all the VMs in the pool.

Limit:
In some cases, you might want to limit the resource usage of some VMs in their cluster, in order to prevent them from consuming resources from other VMs in the cluster. This can be useful, for example, when you want to ensure that when the load spikes in a non-critical VM, it does not end up consuming all the resources and thereby starving other critical VMs in the cluster.

Shares:
Shares provide you a way to prioritize resources for VMs when there is competition in the cluster. They can be set at a VM or a resource pool level.

 

A very good documentation on the workflow of DRS are :

VMware Infrastructure: Resource Management with VMware DRS 

DRS Deepdive

 

 

What is VMware .vSphere-HA folder for?

If you’re running VMware vSphere High Availability (HA) cluster, you have certainly noticed a folder named “.vSphere-HA” on several of your shared datastores. What is VMware .vSphere-HA folder for? This folder has something to do with HA, you think, but what exactly is stored there? You certainly do not want to delete it, do you?

That’s what we will look at today. You can click to enlarge.

This folder resides on shared datastore which is used as a secondary communication channel in HA architecture. This folder has several files inside, and everyone of them has different role:

  • host-xxx-hb files – those files are for the heartbeat datastore. The heartbeat mechanism uses the part of the VMFS volume for regular updates. Each host in cluster has it’s own file like this in the .vSphere-HA folder.

vSphere HA uses datastore heartbeating to distinguish between hosts that have failed and hosts that reside on a network partition. Datastore heartbeating allows vSphere HA to monitor hosts when a management network partition occurs and to continue to respond to failures that occur.

  • protected list file – when you open this file, you’ll see a list of VMs protected by a HA. The master host uses this file for storing the inventory and the state of each VM.
  • host-xxx-poweron files – this files role’s is to track the running VMs for each host of the cluster. The file is read by the master host which will know if a slave host is isolated from the network. Slave hosts uses this poweron file to tell the master host “hey, I’m isolated”. The content of this file reveals that there can be two states: zero or one. Zero = not isolated and One = isolated. If the slave host is isolated, master host informs vCenter.

The .vSphere HA folder is created only on datastores that are used for the datastore heartbeating. You shouldn’t delete or modify those files.

You may also want to read this two VMware Knowledge Base articles:

 

 

Split Brain Syndrome

Split brain syndrome, in a clustering context, is a state in which a cluster of nodes gets divided (or partitioned) into smaller clusters of equal numbers of nodes, each of which believes it is the only active cluster.

Lets assume that the other clusters are dead, each cluster may simultaneously access the same application data or disks, which can lead to data corruption. A split brain situation is created during cluster reformation. When one or more node fails in a cluster, the cluster reforms itself with the available nodes. During this reformation, instead of forming a single cluster, multiple fragments of  the cluster with an equal number of nodes may be formed. Each cluster fragment assumes that it is the only active cluster — and that other clusters are dead — and starts accessing the data or disk. Since more than one cluster is accessing the disk, the data gets corrupted.

 

Here’s how it works in more detail:

  • Let’s say there are 5 nodes A,B,C,D and E which form a cluster, TEST.
  • Now a node (say A) fails.
  • Cluster reformation takes place. Actually, the remaining nodes B,C,D and E should form cluster TEST.
  • But split brain situation may occur which leads to formation of two clusters TEST1(containing B and C) and TEST2(containing D and E).
  • Both TEST1 and TEST2 clusters think that they are the only active cluster. Both clusters start accessing the data or disk, leading to data corruption.

HA clusters are all vulnerable to split brain syndrome and should use some mechanism to avoid it. Clustering tools, such as Pacemaker, HP ServiceGuard, CMAN and LinuxHA, generally include such mechanisms.

 

SRM : How it work?

Here is the process SRM follows when doing a Test failover or Failover operation. Please read/ understand and mug this(if you have to J), since most of us are unaware of this.
Hope this clarifies your doubts. These steps are written by SRM-Guru CPD.

1. SRM issues the TestFailoverStart of Failover command to the SRA. In this command SRM provides the list of devices that are to be prepared for failover or testfailover. It also provides a list (ESXi) initiators to which this device is to be presented.

2. Once the SRA completes the command, it informs SRM (through an output XML file that the SRA generates and that SRM listens and waits for) that the devie in question is ready for access. At this point this device is supposed truly accessible and ready for access.

3. SRM now will issue a cluster wide rescan to all ESXi hosts (that designated as recovery hosts). In this step, SRM can wait for a certina period of time before issuing the rescan command. This is called a SRA rescan delay. We introduced this delay to speciafically latency problems in relation to the RecoverPoint SRA which is known to signal completion of TestFailover/Failover prematurely – in other wirds before the recovered device is truly ready for READ-WRITE access. This problem with that SRA was discussed close to two years ago and EMNC said they cannot do anything about that because of the ASYNCH way that that SRA works in.
By default the SRA delay is 0. But it is configurable as we have done in this case.

Note: Setting the SRA rescan delay to 5 minutes will be in line with the VMkernel 5 minute path evaluation interval. So that guarantees at least one rescan before the next VMKernel path evaluation.

4. SRM can also issue multiple successive rescan command followed by a single VMFS refresh command. The number of successive rescans that SRM can issue is also configurable. In this case, it is set to 2. The default value is 1.

5. After the rescan is completed, SRM will wait a certain of time for notification from VC Server as to which devices on ESX are attached and which ones are not. SRM is only interested in the recovered device in question.

6. On any ESX host where the recovered is in detached state, SRM will issue the ATTACH_SCSI_LUN command to have that device attached.

Note: If on an ESX has the recovered device is in APD state, the ESX will not report it as detached. As a result, SRM will not issue an command to that host to have that device attached. This is what is happening on some ESX hosts in this case as you see in other comments in the PR.

7. After SRM issues the ATTACH_SCSI_LUN on the ESXi hosts that have reported the recovered device in DETACHED state (i.e. these are now the good hosts or as SRM calls them ‘the requested hosts’), SRM then issues the command QUERY_UNRESOLVED_VOLUMES command to all these hosts.

Note: all good hosts should report the same number of unresolved volumes as the number of recovered devices or less. It would be less if say for example, the recovered devices contain VMFS datastore and RDMs.

8. SRM then selects one of these good hosts to issue the resignature command to.

Note: we have configured SRM in this case to use the resignaturing API (and not the old LVM resignaturing method – i.e. the LVM.EnableResiganture = 1).

9. SRM then issues another Rescan (Refresh = true) to these hosts.

10.SRM then waits for notifications from VC as which ESX hosts are now seeing the recovered VMFS datastore (that is on the recovered device)

Note: the time SRM waits for these VC notification of recovered VMFS datastores is configurable.

Note: SRM knows what VMFS datastore should be seen by ESX and SRM is able to verify if the good ESX hosts are now seeing the correct (i.e. expected VMFS datastore)

11. At this stage, SRM proceeds further by removing the placeholder VMs and registering the actual recovered VMs that on the recovered VMFS datastore.

Note: SRM will only use the good hosts to register these VMs on.

12. If IP customization is required, for these VMs, SRM will complete that as well.

There are other minor details that I have skipped. But this is pretty much the process.

In the Testfailover cleanup process, SRM follows these steps:

Note: There is no cleanup in the case of an actual failover operation.

1. Unregister the recovered VMs
2. Re-Register the placeholder VMs.
3. Unmounts the recovered VMFS datastore from all good hosts
4. Detach the recovered device from all good hosts (DETACH_SCSI_LUN)
6. Clear the unmount state of the VMFS volume
7. Clear the detach state of the SCSI LUN

8. SRM then issues the TestFailoverStop command to the SRA. The SRA now returns the device (on the storage array) to the state it was in before the TestFailoverStart command was issued and executed.

Some useful vmkfstools ‘hidden’ options

1. Display hosts which are actively using a volume

~ # vmkfstools –activehosts /vmfs/volumes/TEST-DATASTORE
Found 1 actively heartbeating hosts on volume ‘/vmfs/volumes/TEST-DATASTORE’
(1): MAC address 91:1a:b2:7a:14:d4

This option will show the management interface MAC address of any hosts which is actively using a datastore.

2.Display File Metadata, including Lock Information

~ # vmkfstools -D /vmfs/volumes/TEST-DATASTORE/TEST/TEST.vmdk
Lock [type 10c00001 offset 456325421v 44, hb offset 3268682
gen 6155, mode 0, owner 00000000-00000000-0000-000000000000 mtime 12783 nHld 0 n Ovf 0]
Addr <4, 558, 4>, gen 6, links 1, type reg, flags 0, uid 0, gid 0, mode 600
len 492, nb 0 tbz 0, cow 0, newSinceEpoch 0, zla 4305, bs 8192

3.Display VMFS File Block Details

~ # vmkfstools -P -v10 /vmfs/volumes/TEST-DATASTORE
VMFS-5.54 file system spanning 1 partitions.
File system label (if any): TEST-20
Mode: public
Capacity xxxxxxxxxxx (xxxxxx file blocks * xxxxxx), xxxxxxxxx(xxxxxxx blocks) avail
Volume Creation Time: Tue Feb 21 12:18:44 2012
Files (max/free): xxxxxx/xxxxx
Ptr Blocks (max/free): xxxxxx/xxxxxxx
Sub Blocks (max/free): xxxxxxx/xxxxxxx
Secondary Ptr Blocks (max/free): 256/256
File Blocks (overcommit/used/overcommit %): 0/115092/0
Ptr Blocks  (overcommit/used/overcommit %): 0/133/0
Sub Blocks  (overcommit/used/overcommit %): 0/6/0
UUID: xxxxxxxx-xxxxxxxx-xxxx-xxxxxxxxxxxx
Partitions spanned (on “lvm”):
naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:1
DISKLIB-LIB   : Getting VAAI support status for /vmfs/volumes/TEST-DATASTORE
Is Native Snapshot Capable: NO

4:VMDK Block Mappings

This command is very useful for displaying the mappings of VMFS file blocks to a VMDK. It can also be used to display the layout of a VMDK if you are concerned that a thin provisioned VMDK might be fragmented from a block allocation perspective.

~ # vmkfstools -t0 /vmfs/volumes/TEST-DATATSORE/TEST/TEST.vmdk

What is VMware Raw Device Mapping (RDM)?

In other contexts, such as the VirtualCenter client console, raw device mapping may be
described in terms such as “Mapping a VMFS volume into a datastore,” “Mapping a system LUN”
or “mapping a disk file to a physical disk volume.” These terms all refer to raw device mapping.
The following terms are used in this document or related documentation:
• Raw Disk — A disk volume accessed by a virtual machine as an alternative to a virtual disk
file; it may or may not be accessed via a mapping file.
• Raw Device — Any SCSI device accessed via a mapping file. For ESX Server 2.5, only disk
devices are supported.
• Raw LUN — A logical disk volume located in a SAN.
• LUN — Acronym for a logical unit number.
• Mapping File — A VMFS file containing metadata used to map and manage a raw device.
• Mapping — An abbreviated term for a raw device mapping.
• Mapped Device — A raw device managed by a mapping file.
• Metadata File — A mapping file.
• Compatibility Mode — The virtualization type used for SCSI device access (physical or
virtual).
• SAN — Acronym for a storage area network.
• VMFS — A high-performance file system used by VMware ESX Server

Benefits of Raw Device Mapping:

User-Friendly Persistent Names
Raw device mapping provides a user-friendly name for a mapped device — the name of its
mapping file. When you use a mapping, you don’t need to refer to the device by its device
name, as required with previous versions of ESX Server. You refer to it by the name of the
mapping file — for example, use:
/vmfs/myvmfsvolume/myrawdisk.vmdk.

Dynamic Name Resolution
Raw device mapping stores unique identification information for each mapped device. The
VMFS file system resolves each mapping to its current SCSI device, regardless of changes in the
physical configuration of the server due to adapter hardware changes, path changes, device
relocation, and so forth.

Distributed File Locking
Raw device mapping makes it possible to use VMFS distributed locking for raw SCSI devices.
Distributed locking on a raw device mapping makes it safe to use a shared raw LUN without
losing data when two virtual machines on different servers access the same LUN.

File Permissions
Raw device mapping makes file permissions possible. The permissions of the mapping file are
applied at file open time to protect the mapped volume. Previously, permissions for raw devices
could not be enforced by the file system.

File System Operations
Raw device mapping makes it possible to use file system utilities to work with a mapped
volume, using the mapping file as a proxy. Most operations that are valid for an ordinary file can
be applied to the mapping file, and are redirected to operate on the mapped device.

Redo Logs
Raw device mapping makes it possible to keep a redo log for a mapped volume. The redo log
has the name of the mapping file, with .REDO appended. Note that redo logs are not possible
when raw device mapping is used in physical compatibility mode.

VMotion
Raw device mapping allows you to migrate a virtual machine with VMotion. Previously, this was
only possible for virtual machines that used virtual disk files. When you use raw device mapping,
the mapping file acts as a proxy to allow VirtualCenter to migrate the virtual machine using the
same mechanism that exists for virtual disk files.

Note: You cannot migrate virtual machines with raw, clustered, or non-persistent mode disks
using VMotion. If you have clustered disks, you can store them on separate VMFS volumes from
the virtual machines prior to migrating them using VMotion.

SAN Management Agents
Raw device mapping makes it possible to run some SAN management agents inside a virtual
machine. Similarly, any software that needs to access a device using raw SCSI commands can be
run inside a virtual machine. This kind of software may be referred to as “SCSI target based
software”

 

Limitations of Raw Device Mapping:

+++++++++++++++++++++++++++++++++++

Not Available for Block Devices or RAID Devices
Raw device mapping (in the current implementation) uses a SCSI serial number to identify the
mapped volume. Since block devices and some direct-attach RAID devices do not export serial
numbers, they can’t be used in raw device mappings.

Not Available for Devices Attached to a Shared Adapter
If your SCSI adapter is configured as a shared adapter, you can’t use raw device mapping for any
of its devices. Only adapters dedicated to the VMkernel support raw device mapping.

Available with VMFS-2 Volumes Only
Raw device mapping requires the VMFS-2 format. If you choose not to convert your VMFS
volume from VMFS-1 format to VMFS-2 format, you cannot use raw device mapping.

No Redo Log in Physical Compatibility Mode
If you are using raw device mapping in physical compatibility mode, you can’t use a redo log
with the disk. Physical compatibility mode allows the virtual machine to manage its own
snapshot or mirroring operations. This conflicts with the SCSI virtualization objectives of physical
compatibility mode

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Virtual Compatibility Mode Versus Physical Compatibility Mode:

Virtual mode for a mapping specifies full virtualization of the mapped device. It appears to the
guest operating system exactly the same as a virtual disk file in a VMFS volume. The real
hardware characteristics are hidden. Virtual mode allows customers using raw disks to realize
the benefits of VMFS such as advanced file locking for data protection and redo logs for
streamlining development processes. Virtual mode is also more portable across storage
hardware, presenting the same behavior as a virtual disk file.
Physical mode for a raw device mapping specifies minimal SCSI virtualization of the mapped
device, allowing the greatest flexibility for SAN management software. In physical mode, the
VMkernel passes all SCSI commands to the device, with one exception: The REPORT LUNs
command is virtualized, so that the VMkernel can isolate the LUN for the owning virtual
machine. Otherwise, all physical characteristics of the underlying hardware are exposed. Physical
mode is useful to run SAN management agents or other SCSI target based software in the virtual
machine. Physical mode also allows virtual to physical clustering for cost-effective high
availability.

Raw Device Mapping with Virtual Machine Clusters
VMware recommends the use of raw device mapping with virtual machine clusters that need to
access the same raw LUN for failover scenarios. The setup is similar to a virtual machine cluster
that accesses the same virtual disk file, but a raw device mapping file replaces the virtual disk file.
The VMFS must be configured in shared access mode, to allow more than one virtual machine
to open the mapping file simultaneously.

 

Managing Raw Device Mappings

vmkfstools The vmkfstools utility can be used in the service console to do many of the same operations available in the management interface. Typical operations applicable to raw device mappings are the commands to create a mapping, to query mapping information such as the name and identification of the mapped device, and to import or export a virtual disk: The form of these commands is shown here:

vmkfstools -r ———————————>Create a Raw Device Mapping

vmkfstools -P———————————->Display VMFS File Block Details

vmkfstools -I———————————–>Clone a disk

vmkfstools -e ———————————–>Renaming a virtual machine disk

vmkfstools -q ———————————–>This option prints the name of the raw disk RDM. The option also prints other identification information, like the disk ID, for the raw disk.