FlexPod Datacenter with Cisco UCS Unified Software Release and VMware vSphere 6.0 U1

FlexPod has been gaining lots of market traction as the preferred converged solution platform of choice for many customers of over the last 4 years. This has been due to the amazing hardware technologies that underpins the solution offering (Cisco UCS compute + Cisco Nexus unified networking + NetApp FAS range of Clustered ONTAP SAN). Often, customers deploy FlexPod solutions together with VMware vSphere or MS Hyper-V on top (other hypervisors are also supported) which together, provide a complete, ready to go live, private and hybrid cloud platform that has been pre-validated to run most if not all typical enterprise data center workloads.

FlexPod Datacenter with Cisco UCS Unified Software release and VMware vSphere 6.0 U1 is designed to be fully redundant in the compute, network, and storage layers. There is no single point of failure from a device or traffic path perspective. Figure 2 illustrates a FlexPod topology using the Cisco UCS 6300 Fabric Interconnect top-of-rack model while Figure 3 shows the same network and storage elements paired with the Cisco UCS 6200 series Fabric Interconnects.

The Cisco UCS 6300 Fabric Interconnect FlexPod Datacenter model enables a high-performance, low latency and lossless fabric supporting application with these elevated requirements.  The 40GbE compute and network fabric with optional4/8/16G FC support increase the overall capacity of the system while maintaining the uniform and resilient design of the FlexPod solution.  The remainder of this section describes the network, compute and storage connections and enabled features.

Network: Link aggregation technologies play an important role in this FlexPod design, providing improved aggregate bandwidth and link resiliency across the solution stack. The NetApp storage controllers, Cisco Unified Computing System, and Cisco Nexus 9000 platforms support active port channeling using 802.3ad standard Link Aggregation Control Protocol (LACP). Port channeling is a link aggregation technique offering link fault tolerance and traffic distribution (load balancing) for improved aggregate bandwidth across member ports. In addition, the Cisco Nexus 9000 series features virtual Port Channel (vPC) capabilities. vPC allows links that are physically connected to two different Cisco Nexus 9000 Series devices to appear as a single “logical” port channel to a third device, essentially offering device fault tolerance. The Cisco UCS Fabric Interconnects and NetApp FAS storage controllers benefit from the Cisco Nexus vPC abstraction, gaining link and device resiliency as well as full utilization of a non-blocking Ethernet fabric.

Compute: Each Cisco UCS Fabric Interconnect (FI) is connected to the Cisco Nexus 9000. Figure 2 illustrates the use of vPC enabled 40GbE uplinks between the Cisco Nexus 9000 switches and Cisco UCS 6300 Fabric Interconnects. Figure 3 shows vPCs configured with 10GbE uplinks to a pair of Cisco Nexus 9000 switches from a Cisco UCS 6200 FI.  Note that additional ports can be easily added to the design for increased bandwidth, redundancy and workload distribution. The Cisco UCS unified software release 3.1 provides a common policy feature set that can be readily applied to the appropriate Fabric Interconnect platform based on the organizations workload requirements.

Note:  For SAN environments, NetApp clustered Data ONTAP allows up to 4 HA pairs or 8 nodes. For NAS environments, it allows 12 HA pairs or 24 nodes to form a logical entity.

The HA interconnect allows each node in an HA pair to assume control of its partner’s storage (disks and shelves) directly. The local physical HA storage failover capability does not extend beyond the HA pair. Furthermore, a cluster of nodes does not have to include similar hardware. Rather, individual nodes in an HA pair are configured alike, allowing customers to scale as needed, as they bring additional HA pairs into the larger cluster.

For more reference please follow :

FlexPod Datacenter with Cisco UCS 6300 Fabric Interconnect and VMware vSphere 6.0 U1 Design Guide

UCS Blade Server vs Rack Server

The following table lists the differences between the Blade Servers and Rack Servers:

Blade Servers Rack Servers
Sits into UCS blade chassis Sits on a rack
Can’t operate independently. Requires UCS infrastructure Can operate independently
Connects only to UCS Fabric Interconnect Can connect to any Ethernet switch including UCS Fabric   Interconnect
Managed from the UCS Fabric   Interconnect by Cisco UCS Manager Features Cisco Integrated Management Controller (CIMC) and   thus can be managed independently or from Cisco UCS Manager if connected to a   UCS Fabric Interconnect
Supports RAID 0 and 1 only Supports built-in RAID 0 and 1 for SATA drives. RAID 0, 1, 5, 6, and 10 support through a PCIe LSI MegaRAID   Controller
Supports Gigabit Ethernet Mezz cards and Converged Network   Adapter Mezz cards Supports Gigabit Ethernet PCIe cards, Converged Network   Adapter PCIe cards and Fibre Channel HBA PCIe cards.
Supports up to two front-accessible,   hot-swappable SAS drives for local storage Supports up to 4 front-accessible,   hot-swappable, internal 3.5-inch SAS or SATA drives
No support for PCIe slots Support for up to 2 PCIe 2.0 slots
No support for optical drive Optional front panel optical drive for   CD and DVD media

Virtualized I/O with Cisco Virtual Interface Cards

Cisco VICs are PCIe-compliant interfaces that support up to 256 PCIe devices with
dynamically configured type (NIC or HBA), identity (MAC address or worldwide
name [WWN]), fabric failover policy, bandwidth, and QoS policy settings. With
Cisco VICs, server configuration—including I/O configuration—becomes configurable
on demand, making servers stateless resources that can be deployed to meet
any workload need at any time, without any physical reconfiguration or recabling
required. Cisco VICs support up to 80 Gbps of connectivity and are available in
multiple form factors:

• mLOM: These Cisco VICs can be ordered preinstalled in Cisco UCS M3 and
M4 blade servers, occupying a dedicated slot for the device . If more
than 40 Gbps of bandwidth is needed, a port expander card can be installed in
the server’s mezzanine slot to give the card access to an additional 40 Gbps of
bandwidth. When Cisco UCS 2304XP Fabric Extenders are installed in the blade
chassis, the Cisco UCS VIC 1340 detects the availability of 40 Gigabit Ethernet
and disables the port channel for greater efficiency.

• Mezzanine: Standard Cisco VICs can be installed in any blade server’s mezzanine
slot: one for half-width blade servers, and up to two for full-width blade servers.
Each Cisco VIC supports up to 80 Gbps, for a total of up to 320 Gbps of
aggregate bandwidth for double-width, double-height servers such as the Cisco
UCS C460 M4 Rack Server.

• PCIe: PCIe form-factor cards can be installed in Cisco rack servers. Cisco VICs
are required when you integrate these servers into Cisco UCS because they
have the circuitry to pass the unified fabric’s management traffic to the server’s
management network, enabling single-wire, unified management of rack servers.

Virtualized I/O with Converged Network Adapters

Virtual links can originate from converged network adapters that typically host a
dual 10 Gigabit Ethernet NIC and a dual HBA from either Emulex or Q-Logic, along
with circuitry to multiplex the four streams of traffic onto two 10-Gbps unified fabric
links. Cisco innovations first brought this concept to market, with the first generation
of converged network adapters (CNAs) supported by Cisco silicon that multiplexed
multiple traffic flows onto the unified fabric.

With servers connected to Cisco UCS through CNAs, the traffic from each of the
interface’s four devices is passed over four virtual links that terminate at virtual ports
within the fabric interconnects.

Virtualized I/O with Converged Network Adapters

The unified fabric virtualizes I/O so that rather than requiring each server to be
equipped with a set of physical I/O interfaces to separate network functions, all I/O
in the system is carried over a single set of cables and sent to separate physical
networks at the system’s fabric interconnects as necessary. For example, storage
traffic destined for Fibre Channel storage systems is carried in the system using
FCoE. At the fabric interconnects, storage-access traffic can transition to physical
Fibre Channel networks through a Fibre Channel transceiver installed in one or more
of the fabric interconnect’s unified ports.

I/O is further virtualized through the use of separate virtual network links for
each class and each flow of traffic. For example, management, storage-access,
and IP network traffic emanating from a server is carried to the system’s fabric
interconnects with the same level of secure isolation as if it were carried over
separate physical cables. These virtual network links originate within the server’s
converged network adapters and terminate at virtual ports within the system’s fabric
interconnects.

These virtual links are managed exactly as if they were physical networks. The
only characteristic that distinguishes physical from virtual networks within the
fabric interconnects is the naming of the ports. This approach has a multitude of
benefits: changing the way that servers are configured makes servers flexible,
adaptable resources that can be configured through software to meet any workload
requirement at any time. Servers are no longer tied to a specific function for
their lifetime because of their physical configuration. Physical configurations are
adaptable through software settings. The concept of virtual network links brings
immense power and flexibility to support almost any workload requirement through
flexible network configurations that bring complete visibility and control for both
physical servers and virtual machines.

Cisco UCS Architecture

Unified Computing System Manager:



  • Embedded device manager for family of UCS components
  • Enables stateless computing via Service Profiles
  • Efficient scale: Same effort for 1 to N blades
  • APIs for integration with new and existing data center infrastructure
Management Protocols:

+++++++++++++++++++





UCS 6248UP Fabric Interconnect:

+++++++++++++++++++++++++++++
  • High Density 48 ports in 1RU
  • 1Tbps Switching capability
  • All ports can be used as uplinks or downlinks
  • All ports can be configured to support either 1Gb or 10Gb speeds
  • Unified Ports
  • 1 Expansion slots
  • 2us Latency
  • 80 PLUS Gold PSUs
  • Backward and forward Compatibility




UCS 6296UP Fabric Interconnect:

++++++++++++++++++++++++++
  • High Density 96 ports in 2RU
  • 2Tbps Switching capability
  • All ports can be used as uplinks or downlinks
  • All ports can be configured to support either 1Gb or 10Gb speeds
  • Unified Ports
  • 4 Expansion slots
  • 2us Latency
  • 80 PLUS Gold PSUs
  • Backward and forward Compatibility


UCS 6200 Expansion Module:

+++++++++++++++++++++++++
  • 16 “Unified Ports”
  • Ports can be configured as either Ethernet or Native FC Ports
  • Ethernet operations at 1/10 Gigabit Ethernet
  • Fibre Channel operations at 8/4/2/1G
  • Uses existing Ethernet SFP+ and Cisco 8/4/2G and 4/2/1G FC Optics




UCS 2204XP I/O Module:

+++++++++++++++++++++
  • Increased uplink bandwidth
  • 4 x 10 Gig network-facing ports
  • Double the server-facing bandwidth
  • 16 x 10 Gig = 4 per half width slot
  • Two I/O Modules per chassis
  • 40Gbps to a single half-width blade (20Gbps left and right)
  • 80Gbps to a full-width blade
  • Built in chassis management
  • Fully managed by UCSM




UCS 2208XP I/O Module:

+++++++++++++++++++++
  • Double the uplink bandwidth
  • 8 x 10 Gig network-facing ports
  • Quadruple the server-facing bandwidth
  • 32 x 10 Gig = 4 per half width slot
  • Two I/O Modules per chassis
  • 80Gbps to a single half-width blade (40Gbps left and right)
  • 160Gbps to a full-width blade
  • Built in Chassis Management
  • Fully Managed by UCSM




UCS 5108 Blade Chassis:
+++++++++++++++++++++

Chassis
  • Up to 8 half slot blades
  • Up to 4 full slot blades
  • 4x power supplies, N+N grid redundant
  • 8x fans included
  • 2x UCS 2104 Fabric Extender
  • All items hot-pluggable

UCS Rack Servers: +++++++++++++++

C240 M3 Rack Server : 2 Socket2 Socket Intel E5-260024 DIMM slots Maximum memory speed 1600MHz24 or 12 Internal HDDs SFF and 3.5” – SAS, SATA and SSD options Battery Backed cache option650W and 1200W PSUs – Platinum Rated5 PCIe slots – GPU readyHeight 2RUIntegrated CIMC and KVM Rack Mount RAID controllers: Adapters for B-Series M72KR-Q: I/O Adapters for B-Series M72KR-E: UCS 1280 VIC: References :

 

UCS components

The basic Cisco components of the UCS are:

UCS manager: Cisco UCS Manager implements policy-based management of the server and network resources. Network, storage, and server administrators all create service profiles, allowing the manager to configure the servers, adapters, and fabric extenders and appropriate isolation, quality of service (QoS), and uplink connectivity. It also provides APIs for integration with existing data center systems management tools. An XML interface allows the system to be monitored or configured by upper-level systems management tools.

UCS fabric interconnect: Networking and management for attached blades and chassis with 10 GigE and FCoE. All attached blades are part of a single management domain. Deployed in redundant pairs, the 20-port and the 40-port offer centralized management with Cisco UCS Manager software and virtual machine optimized services with the support for VN-Link.

Cisco Fabric Manager: manages storage networking across all Cisco SAN and unified fabrics with control of FC and FCoE. Offers unified discovery of all Cisco Data Center 3.0 devices as well as task automation and reporting. Enables IT to optimize for the quality-of-service (QoS) levels, performance monitoring, federated reporting, troubleshooting tools, discovery and configuration automation.

Fabric extenders: connect the fabric to the blade server enclosure, with 10 Gigabit Ethernet connections and simplifying diagnostics, cabling, and management. The fabric extender is similar to a distributed line card and also manages the chassis environment (the power supply, fans and blades) so separate chassis management modules are not required. Each UCS chassis can support up to two fabric extenders for redundancy.

SAN Booting to Allow Server Mobility

Booting over a network (LAN or SAN) is a mature technology and an important step in moving toward stateless computing, which eliminates the static binding between a physical server and the OS and applications it is supposed to run.

The OS and applications are decoupled from the physical hardware and reside on the network.

The mapping between the physical server and the OS on the network is performed on demand when the server is deployed. Some of the benefits of booting from a network are:

• Reduced server footprint because fewer components (no disk) and resources are needed

• Simplified disaster and server failure recovery

• Higher availability because of the absence of failure-prone local hard drives

• Centralized image management

• Rapid redeployment With SAN booting, the image resides on the SAN, and the server communicates with the SAN through an HBA . The HBA’s BIOS contains the instructions that enable the server to find the boot disk. A common practice is to have the boot disk exposed to the server as LUN ID 0.

 

The Cisco UCS M71KR-E Emulex CNA, Cisco UCS M71KR-Q QLogic CNA, and Cisco UCS M81KR Virtual Interface Card (VIC) are all capable of booting from a SAN.

Management of Virtual Servers in the SAN Typically, virtual servers do not have an identity in the SAN: they do not log in to the SAN like physical servers do. However, if controlling and monitoring of the virtual servers is required, N-port ID virtualization (NPIV) can be used.

This approach requires you to:

• Have a Fibre Channel adapter and SAN switch that support NPIV

• Enable NPIV on the virtual infrastructure, such as by using VMware ESX Raw Device Mode (RDM)

• Assign virtual port worldwide names (pWWNs) to the virtual servers

• Provision the SAN switches and storage to allow access By zoning the virtual pWWNs in the SAN to permit access, you can control virtual server SAN access just as with physical servers. In addition, you can monitor virtual servers and provide service levels just as with any physical server.

Cisco UCS Box Upgarde Fails

 

Upgrade Issue on UCS Manager

 

++ UCSM and FI B is upgraded successfully but FI A is stuck at 65% with the following Message

UCS pre-upgrade check failed. Free space in the file system is below threshold

Note: None of UCS have stuck with this Message Unlinke N5K, N3K. This issue is similar to CSCun79792

Load the debug Plugin and Perform  the following (CHeck How to Load Debug Plugin at the End og this Section)

Linux(debug)# cd /var/tmp

Upgrade Issue on UCS Manager:

+++++++++++++++++++++++++++++

Versions Affected: https://www.cisco.com/c/en/us/support/docs/field-notices/640/fn64094.html

++ UCSM and FI B is upgraded successfully but FI A is stuck at 65% with the following Message

UCS pre-upgrade check failed. Free space in the file system is below threshold

Note: None of UCS have stuck with this Message Unlinke N5K, N3K. This issue is similar to CSCun79792

Load the debug Plugin and Perform  the following (CHeck How to Load Debug Plugin at the End og this Section)

Linux(debug)# cd /var/tmp

Linux(debug)# df -h

Filesystem            Size  Used Avail Use% Mounted on

/dev/root             300M  255M   46M  85% /

none                  2.0M  4.0K  2.0M   1% /post

none                  300M  255M   46M  85% /var

none                  3.0G  1.1G  2.0G  35% /isan

none                  600M  180M  421M  30% /var/tmp   <————

Note: Check How much Percentage is /var/tmp used? In this case, 30% is used.

Most of the time, it is smm.log and auto_file_deletion_log.txt  that eats up the space.

Here is an example. After you have run “  df –h  “ , issue the following command to see which files are eating the space (below output shows SMM.LOG is eating 1.3M of space)

 

Linux(debug)# ls -lSh

total 2.2M

-rw-rw-rw- 1 root   root   1.3M Apr 22 14:20 smm.log    <<<———

-rw-rw-rw- 1 root   root   245K Apr 21 03:28 afm_srv_15.log.0

-rw-r–r– 1 root   root   196K Apr  8 11:47 sam_bridge_boot.log

-rw-rw-rw- 1 root   root    63K Apr 22 14:18 afm_srv_15.log

-rw-rw-rw- 1 root   root    60K Apr  8 11:50 syslogd_errlog.4886_4886

-rw-rw-rw- 1 root   root    21K Apr  8 11:47 fm_debug.log.vdc_1.4919

-rw-rw-rw- 1 root   root    18K Apr  8 11:49 fcoe_mgr_init.log

-rw-rw-rw- 1 root   root   5.2K Apr  8 11:49 first_setup.log

-rwxr-xr-x 1 root   root   4.4K Apr  8 12:25 iptab.sh

 

Since the files are in /var/tmp they are assumed to be safe to be deleted.  Rather than fully deleting the files, it is safer to echo 0 into the file to empty and reduce the file size, this will ensure any service that is dependent upon being able to write to this log file still can.

Linux(debug)# echo 0 > /var/tmp/smm.log

Linux(debug)# echo 0 > /var/tmp/auto_root_file_deletion_log.txt

After this, you can df -h again to confirm that /var/tmp is now less than 10% utilzation.  The upgrade should now be able to proceed.

++

After this workaround, I waited for 10 mins and still the FI is stuck at 65% and I reboot the FI and waited to 10 more mins after it came up – No Luck. So, I went ahead and re-activated the FI firmware image to 2.2(6c) with force option and the upgrade went successful.

 

Loading Debug Plugin in UCS

=======================

First Get the debug Plugin from : https://cspg-releng.cisco.com/

Note: You will need to select the appropriate version for the Plugin.

Below is the example for 2.2(6c) version

SELECT and CLICK on  2.2.6   Elcap_MR5    UNDER Release Builds on the lefthand side  (Dont select Commits)

NOW You will select FCSc   and   click on BUNDLES and then you will select DEBUG.

Note: You can upload this image to UCSM same as you upload Upgrade Images. Once you uplaod it to UCSM, login to CLI and then Perfomr the following steps

Load debug plugin

FI-A(local-mgmt)# copy  debug_plugin/ucs-dplug.5.0.3.N2.2.26c.gbin   x

FI-A(local-mgmt)# load  x

###############################################################

Warning: debug-plugin is for engineering internal use only!

For security reason, plugin image has been deleted.

###############################################################

Successfully loaded debug-plugin!!!

Detail List of Commands:

+++++++++++++++++++++

Connect Local-mgmt

FI-A(local-mgmt)# copy  debug_plugin/ucs-dplug.5.0.3.N2.2.26c.gbin   x

FI-A(local-mgmt)# load  x

Linux(debug)# cd /var/tmp

Linux(debug)# df -h

Linux(debug)# ls -lSh

Linux(debug)# echo 0 > /var/tmp/smm.log

Linux(debug)# echo 0 > /var/tmp/auto_root_file_deletion_log.txt

 

 

CHECK THE HARDWARE VERSION :

+++++++++++++++++++++++++++++++

5596# show sprom sup | inc H/W

If hardware version in 1.1:

Copy dplugin and update file,  Load Plugin

Copy the ucd-update.tar to the bootflash

Copy the debug plugin to the bootflash

Load the image via the load command

>>You might have to call support for the update.tar script

Linux(debug)# cp /bootflash/ucd-update.tar /tmp

Linux(debug)# cd /tmp/

Linux(debug)# tar -xvf ucd-update.tar

ucd.bin

Linux(debug)#

Step 2

———————————-

Run the ucd.bin file:

Linux(debug)# ./ucd.bin

You will see the prompt about updated version 1.1.

SAFESHUT:

++++++++++

>>Have to contact support for the safeshut.tar script

Copy tftp ://10.96.60.154/safeshut.tar workspace://safeshut.tar

Untar safeshut.tar (tar -xvf safeshut.tar)

Linux(debug)# ./safeshut.sh

md5sum for ./klm_sup_ctrl_mc.klm does not match – corrupted tarball?

Linux(debug)# ./safeshut.sh

md5sum for ./rebootsys.bin does not match – corrupted tarball?

Cd bootflash/ ————————————>Checks if the safeshut is updated ,look for the change in time of reboot it should be the current time

Cd /sbin

ls -la |grep “reboot”LS

Cd/bootflash

Ls -la |grep -I “klm_sup_ctrl_mc.klm”

Repeat for the peer FI

OPTIONAL:

++++++++++

echo 7 7 7 7 > /proc/sys/kernel/printk

 

Linux(debug)# df -h

Filesystem            Size  Used Avail Use% Mounted on

/dev/root             300M  255M   46M  85% /

none                  2.0M  4.0K  2.0M   1% /post

none                  300M  255M   46M  85% /var

none                  3.0G  1.1G  2.0G  35% /isan

none                  600M  180M  421M  30% /var/tmp   <————

 

Note: Check How much Percentage is /var/tmp used? In this case, 30% is used.

Most of the time, it is smm.log and auto_file_deletion_log.txt  that eats up the space.

Here is an example. After you have run “  df –h  “ , issue the following command to see which files are eating the space (below output shows SMM.LOG is eating 1.3M of space)

Linux(debug)# ls -lSh

total 2.2M

-rw-rw-rw- 1 root   root   1.3M Apr 22 14:20 smm.log    <<<———

-rw-rw-rw- 1 root   root   245K Apr 21 03:28 afm_srv_15.log.0

-rw-r–r– 1 root   root   196K Apr  8 11:47 sam_bridge_boot.log

-rw-rw-rw- 1 root   root    63K Apr 22 14:18 afm_srv_15.log

-rw-rw-rw- 1 root   root    60K Apr  8 11:50 syslogd_errlog.4886_4886

-rw-rw-rw- 1 root   root    21K Apr  8 11:47 fm_debug.log.vdc_1.4919

-rw-rw-rw- 1 root   root    18K Apr  8 11:49 fcoe_mgr_init.log

-rw-rw-rw- 1 root   root   5.2K Apr  8 11:49 first_setup.log

-rwxr-xr-x 1 root   root   4.4K Apr  8 12:25 iptab.sh

Since the files are in /var/tmp they are assumed to be safe to be deleted.  Rather than fully deleting the files, it is safer to echo 0 into the file to empty and reduce the file size, this will ensure any service that is dependent upon being able to write to this log file still can.

Linux(debug)# echo 0 > /var/tmp/smm.log

Linux(debug)# echo 0 > /var/tmp/auto_root_file_deletion_log.txt

After this, you can df -h again to confirm that /var/tmp is now less than 10% utilzation.  The upgrade should now be able to proceed.

++ After this workaround, I waited for 10 mins and still the FI is stuck at 65% and I reboot the FI and waited to 10 more mins after it came up – No Luck. So, I went ahead and re-activated the FI firmware image to 2.2(6c) with force option and the upgrade went successful.

Loading Debug Plugin in UCS

=======================

First Get the debug Plugin from : https://cspg-releng.cisco.com/

Note: You will need to select the appropriate version for the Plugin.

Below is the example for 2.2(6c) version

SELECT and CLICK on  2.2.6   Elcap_MR5    UNDER Release Builds on the lefthand side  (Dont select Commits)

NOW You will select FCSc   and   click on BUNDLES and then you will select DEBUG.

Note: You can upload this image to UCSM same as you upload Upgrade Images. Once you uplaod it to UCSM, login to CLI and then Perfomr the following steps

Load debug plugin

FI-A(local-mgmt)# copy  debug_plugin/ucs-dplug.5.0.3.N2.2.26c.gbin   x

FI-A(local-mgmt)# load  x

###############################################################

Warning: debug-plugin is for engineering internal use only!

For security reason, plugin image has been deleted.

###############################################################

Successfully loaded debug-plugin!!!