Step-by-Step Guide: Running Kubernetes Applications in a VMware Environment

This documentation provides a comprehensive walkthrough for deploying and managing modern, containerized applications with Kubernetes on a VMware vSphere foundation. By leveraging familiar VMware tools and infrastructure, organizations can accelerate their adoption of Kubernetes while maintaining enterprise-grade stability, security, and performance. This guide covers architecture, deployment options, networking design, and practical examples using solutions like VMware Tanzu and vSphere.

1. Understanding the Architecture

Running Kubernetes on VMware combines the power of cloud-native orchestration with the robustness of enterprise virtualization. This hybrid approach allows you to leverage existing investments in hardware, skills, and operational processes.

VMware Environment: The Foundation

The core infrastructure is your vSphere platform, which provides the compute, storage, and networking resources for the Kubernetes nodes. Key components include:

  • ESXi Hosts: The hypervisors that run the virtual machines (VMs) for Kubernetes control plane and worker nodes.
  • vCenter Server: The centralized management plane for your ESXi hosts and VMs. It’s essential for deploying, managing, and monitoring the cluster’s underlying infrastructure.
  • vSphere Storage: Datastores (vSAN, VMFS, NFS) that provide persistent storage for VMs and, through the vSphere CSI driver, for Kubernetes applications.

Kubernetes Installation: A Spectrum of Choices

VMware offers a range of options for deploying Kubernetes, from deeply integrated, turn-key solutions to flexible, do-it-yourself methods.

  • VMware vSphere with Tanzu (VKS): This is the premier, integrated solution that embeds Kubernetes directly into vSphere. It transforms a vSphere cluster into a platform for running both VMs and containers side-by-side. It simplifies deployment and provides seamless access to vSphere resources.
  • VMware Tanzu Kubernetes Grid (TKG): A standalone, multi-cloud Kubernetes runtime that you can deploy on vSphere (and other clouds). TKG is ideal for organizations that need a consistent Kubernetes distribution across different environments.
  • Kubeadm on VMs: The generic, open-source approach. You create Linux VMs on vSphere and use standard Kubernetes tools like kubeadm to bootstrap a cluster. This offers maximum flexibility but requires more manual configuration and lifecycle management.

Networking: The Critical Connector

Proper network design is crucial for security and performance. VMware provides powerful constructs for Kubernetes networking:

  • VMware NSX: An advanced network virtualization and security platform. When integrated with Kubernetes, NSX provides a full networking and security stack, including pod networking, load balancing, and micro-segmentation for “zero-trust” security between microservices.
  • vSphere Distributed Switch (vDS): Can be used to create isolated networks (VLANs) for different traffic types—such as management, pod, and service traffic—providing a solid and performant networking base.

2. Prerequisites

Before deploying a cluster, ensure your VMware environment is prepared and has sufficient resources.

  • Configured vSphere/vCenter: A healthy vSphere 7.0U2 or newer environment with available ESXi hosts in a cluster.
  • Sufficient Resources: Plan for your desired cluster size. A small test cluster (1 control plane, 3 workers) may require at least 16 vCPUs, 64GB RAM, and 500GB of storage. Production clusters will require significantly more.
  • Networking Infrastructure:
    • (For vDS) Pre-configured port groups and VLANs for management, workload, and external access.
    • (For NSX) NSX Manager deployed and configured with network segments and T0/T1 gateways.
    • A pool of available IP addresses for all required networks.
  • Tooling (Optional but Recommended): VMware Tanzu CLI, Rancher, or other management tools to simplify cluster lifecycle operations.

3. Cluster Deployment: Step by Step

Option 1: VMware Tanzu Kubernetes Grid (TKG) Standalone

TKG provides a streamlined CLI or UI experience for creating conformant Kubernetes clusters.

# Install prerequisites: Docker, Tanzu CLI, kubectl
# Start the UI-based installer for a guided experience
tanzu standalone-cluster create --ui

# Alternatively, use a YAML configuration file for repeatable deployments
tanzu standalone-cluster create -f my-cluster-config.yaml

The wizard or YAML file allows you to specify the vCenter endpoint, the number of nodes, VM sizes (e.g., small, medium, large), and network settings.

Option 2: vSphere with Tanzu (VKS)

This method is fully integrated into the vSphere Client.

  1. In the vSphere Client, navigate to Workload Management.
  2. Enable it on a vSphere cluster, which deploys a Supervisor Cluster.
  3. Configure control plane node sizes and worker node pools via VM Classes.
  4. Assign network segments for Pod and Service IP ranges.
  5. Once enabled, developers can provision their own “Tanzu Kubernetes Clusters” on-demand.

Option 3: Kubeadm on VMs (DIY)

This is the most manual but also most transparent method.

  1. Prepare Linux VMs on vSphere (e.g., Ubuntu 20.04). Best practice is to create a template.
  2. Install a container runtime (Containerd), kubeadm, kubelet, and kubectl on all VMs.
  3. Initialize the master node:
# Replace with your chosen Pod network range
sudo kubeadm init --pod-network-cidr=192.168.0.0/16
  1. Install a CNI (Container Network Interface) plugin like Calico or Antrea.
  2. Join worker nodes using the command provided by the kubeadm init output.

4. Networking Design Example

A segmented network topology is a best practice for security and manageability. NSX or vDS with VLANs enables this isolation.

Reference Network Topology

ComponentNetwork / VLANExample Address RangePurpose
vSphere Managementmgmt-vlan10.0.0.0/24Access to vCenter, ESXi management, and NSX Manager. Highly secured.
Kubernetes APIk8s-control-plane10.10.10.0/24For `kubectl` access and external automation tools to reach the cluster API.
Pod Network (Overlay)k8s-pods-vxlan192.168.0.0/16Internal, private network for all Pod-to-Pod communication. Managed by the CNI.
Service Networkk8s-svc-vlan10.20.20.0/24Virtual IP range for Kubernetes services. Traffic is not routable externally.
External LB / Ingressext-lb-vlan10.30.30.0/24Public-facing network where application IPs are exposed via LoadBalancers.

[External Users]      | [Firewall / Router]      |  [Load Balancer/Ingress VIPs (10.30.30.x)]      |   [K8s Service Network (10.20.20.x) – Internal]      |   [Pods: Overlay Network (192.168.x.x)]      | [Worker Node VMs: Management Network on vSphere]      | [vSphere Mgmt (vCenter, NSX, ESXi)]

5. Deploying an Application Example

Once the cluster is running, you can deploy applications using standard Kubernetes manifest files.

Sample Deployment YAML (nginx-deployment.yaml)

This manifest creates a Deployment that ensures three replicas of an Nginx web server are always running.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 3 # Desired number of pods
  selector:
    matchLabels:
      app: nginx # Connects the Deployment to the pods
  template: # Pod template
    metadata:
      labels:
        app: nginx # Label applied to each pod
    spec:
      containers:
      - name: nginx
        image: nginx:latest # The container image to use
        ports:
        - containerPort: 80 # The port the application listens on

Apply the configuration to your cluster:

kubectl apply -f nginx-deployment.yaml

6. Exposing the Application via a Service

A Deployment runs your pods, but a Service exposes them to the network. For production, a LoadBalancer service is recommended.

Sample LoadBalancer Service (nginx-service.yaml)

When deployed in an integrated environment like Tanzu with NSX, this automatically provisions an external IP from your load balancer pool.

apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  type: LoadBalancer # Asks the cloud provider for a load balancer
  selector:
    app: nginx # Forwards traffic to pods with this label
  ports:
    - protocol: TCP
      port: 80 # The port the service will be exposed on
      targetPort: 80 # The port on the pod to send traffic to

Apply the service and find its external IP:

kubectl apply -f nginx-service.yaml
kubectl get service nginx-service
# The output will show an EXTERNAL-IP once provisioned

You can then access your application at http://<EXTERNAL-IP>.

7. Scaling, Monitoring, and Managing

  • Scaling: Easily adjust the number of replicas to handle changing loads.
kubectl scale deployment/nginx-deployment --replicas=5
  • Monitoring: Combine vSphere monitoring (for VM health) with in-cluster tools like Prometheus and Grafana (for application metrics). VMware vRealize Operations provides a holistic view from app to infrastructure.
  • Storage: Use the vSphere CSI driver to provide persistent storage. Developers request storage with a PersistentVolumeClaim (PVC), and vSphere automatically provisions a virtual disk on a datastore (vSAN, VMFS, etc.) to back it.

Best Practices & Further Reading

  • Use Resource Pools: In vSphere, use Resource Pools to guarantee CPU and memory for Kubernetes nodes, isolating them from other VM workloads.
  • Embrace NSX Security: Use NSX micro-segmentation to create firewall rules that control traffic between pods, enforcing a zero-trust security model.
  • Automate Everything: Leverage Terraform, Ansible, or PowerCLI to automate the deployment and configuration of your vSphere infrastructure and Kubernetes clusters.
  • Follow Validated Designs: For production, consult VMware’s official reference architectures to ensure a supportable and scalable deployment.

Useful References

Document Version 1.0 | A foundational framework for enterprise Kubernetes on VMware.

Docker Image Checks

Here are some essential Docker commands for working with Docker images along with examples:

  1. Pull an Image: Command: docker pull <image_name>:<tag>

Example: To pull the latest version of the official Ubuntu image from Docker Hub:

docker pull ubuntu:latest
  1. List Images: Command: docker images

Example: To list all the Docker images available on your system:

docker images
  1. Build an Image: Command: docker build -t <image_name>:<tag> <path_to_Dockerfile>

Example: Assuming you have a Dockerfile in the current directory, you can build an image named “my_app” with the tag “latest”:

docker build -t my_app:latest .
  1. Run a Container from an Image: Command: docker run [options] <image_name>:<tag>

Example: To run a container from the “nginx” image and map port 80 of the container to port 8080 on the host:

Run a Container from an Image:
Command: docker run [options] <image_name>:<tag>
Example: To run a container from the "nginx" image and map port 80 of the container to port 8080 on the host:
  1. Tag an Image: Command: docker tag <source_image>:<source_tag> <target_image>:<target_tag>

Example: To create a new tag “v2” for an existing image “my_app” with the tag “latest”:

docker tag my_app:latest my_app:v2
  1. Remove an Image: Command: docker rmi <image_name>:<tag>

Example: To remove the image “my_app” with the tag “latest”:

docker rmi my_app:latest
  1. Inspect an Image: Command: docker image inspect <image_name>:<tag>

Example: To get detailed information about the “ubuntu” image with the “latest” tag:

docker image inspect ubuntu:latest
  1. Prune Unused Images: Command: docker image prune

Example: To remove all unused Docker images from your system:

docker image prune
  1. Push an Image to Docker Hub: Command: docker push <image_name>:<tag>

Example: To push the “my_app” image with the “latest” tag to your Docker Hub repository:

docker push my_docker_username/my_app:latest

Docker images are the building blocks of containers, and sometimes you may encounter issues when building or running them. Let’s go through some common troubleshooting scenarios and their solutions:

  1. Docker Image Not Found: Problem: You are trying to run a container from an image, but Docker reports that the image does not exist.

Solution:

  • Check the image name and tag/version you are trying to use. Ensure that the image is correctly spelled.
  • Run docker images to see a list of available images on your system. If the image is missing, you may need to pull it from Docker Hub or build it locally using a Dockerfile.
  1. Build Failure: Problem: You encounter errors during the build process of a Docker image.

Solution:

  • Check the Dockerfile for syntax errors or missing dependencies.
  • Inspect the error messages and logs provided during the build process to identify the root cause.
  • Consider using the --no-cache flag while building to avoid using cached layers, which might be causing issues.
  1. Port Conflict: Problem: You start a container, but it fails with an error related to port conflicts.

Solution:

  • Ensure that the port you are trying to expose is not already in use by another process on the host machine.
  • Use the -p flag to map the container’s port to an available port on the host machine when running the container. For example: docker run -p 8080:80 my_image.
  1. Out of Memory or Resource Issues: Problem: Your Docker container crashes or becomes unresponsive due to resource constraints.

Solution:

  • Monitor your system’s resource usage to ensure you have enough memory, CPU, and disk space available.
  • Consider adjusting resource limits when running the container using the --memory and --cpus flags.
  1. Networking Problems: Problem: Your container cannot access external resources or is unable to be accessed from the host or other containers.

Solution:

  • Check if your container has the appropriate network settings. Use docker inspect <container_name> to view the container’s network configuration.
  • Ensure that firewalls or security groups are not blocking communication.
  • Check if there are any DNS resolution issues. You can try using Google’s public DNS (8.8.8.8) in your container’s network settings.
  1. Permission Issues: Problem: Your application running inside the container is unable to access or modify files due to permission problems.

Solution:

  • Ensure that the user inside the container has the necessary permissions to access the files or directories.
  • Be cautious about using the root user inside the container; consider using a non-root user instead.
  1. Base Image Compatibility: Problem: You are trying to run your application on a base image, but it’s not compatible with your software stack.

Solution:

  • Verify that the base image you are using is suitable for your application requirements.
  • Double-check that you have installed all necessary dependencies and packages required by your application.

If you encounter an issue where a Docker image is unable to load or run, it can be due to various reasons. Let’s explore some common scenarios and possible solutions using examples:

Scenario 1: Image Not Found Problem: Docker cannot find the specified image.

Example: Let’s say you try to run a container from an image named “my_app” with the tag “latest,” but the image does not exist on your system.

docker run my_app:latest

Solution: Make sure you have pulled the image using docker pull or built the image using docker build -t my_app:latest . before trying to run it.

Scenario 2: Image Pull Failure Problem: Docker fails to pull the image from a remote repository.

Example: You try to pull the official Ubuntu image, but it fails.

docker pull ubuntu:latest

Solution: Check your internet connection and ensure Docker can access the internet. If you are behind a proxy, configure Docker to use the proxy settings.

Scenario 3: Corrupt or Incomplete Image Problem: The image you pulled is corrupt or incomplete.

Example: You pull an image, but it throws an error during the pull process.

docker pull some_image:latest

Solution: Try pulling the image again. If the issue persists, it might be a problem with the remote image repository. Verify that the image repository is healthy and try again later.

Scenario 4: Image Build Failure Problem: You encounter errors while building a Docker image using a Dockerfile.

Example: Your Dockerfile has a syntax error or is missing some dependencies.

# Dockerfile
FROM ubuntu:latest
RUN apt-get update
RUN apt-get install -y python3
CMD ["python3", "app.py"]

Solution: Inspect the Dockerfile for errors, typos, or missing dependencies. Use docker build with the -t flag to build the image and check the build logs for specific error messages.

Scenario 5: Insufficient Resources Problem: The host system doesn’t have enough resources to run the container.

Example: You try to run a memory-intensive application in a container, but it fails due to lack of memory.

docker run -it --memory=128m my_app:latest

Solution: Increase the resource limits while running the container. In the example above, we set a memory limit of 128MB. You can adjust it according to your application’s requirements.

Scenario 6: Network Issues Problem: The container cannot access external resources, or the host cannot access the container.

Example: You run a web server in a container, but you cannot access it from your browser on the host.

docker run -d -p 8080:80 my_web_app:latest

Solution: Ensure the container’s ports are correctly mapped to the host using the -p flag. In this example, the container’s port 80 is mapped to the host’s port 8080. Also, check your firewall settings or any security groups that might be blocking the traffic.

If you encounter any other specific error messages, be sure to search for them and consult Docker’s documentation or community forums for more detailed solutions.

Managing a Kubernetes cluster 101

Managing a Kubernetes cluster involves various tasks, such as deploying applications, scaling resources, checking the cluster’s health, and more. Below are some examples of common operations to manage a Kubernetes cluster using the kubectl command-line tool:

Deploying an Application: To deploy an application on the Kubernetes cluster, you’ll need a YAML manifest file describing the deployment. Here’s an example YAML file for a basic Nginx web server deployment:nginx-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

To create the deployment, use the kubectl apply command:

kubectl apply -f nginx-deployment.yaml

Scaling a Deployment: You can scale the number of replicas in a deployment using the kubectl scale command:

# Scale the 'nginx-deployment' to 5 replicas
kubectl scale deployment nginx-deployment --replicas=5

Checking Cluster Nodes: To see the list of nodes in the cluster, use the kubectl get nodes command:

kubectl get nodes

Checking Cluster Pods: To list all the pods running in the cluster, use the kubectl get pods command:

kubectl get pods --all-namespaces

Viewing Pod Logs: To view the logs of a specific pod, use the kubectl logs command:

# Replace 'pod-name' and 'namespace' with the actual pod and namespace names
kubectl logs pod-name -n namespace

Updating a Deployment: To update the image of a deployment, modify the YAML file with the new image tag and then use kubectl apply again:

# Edit the nginx-deployment.yaml file with the new image tag
vim nginx-deployment.yaml

# Apply the changes to the deployment
kubectl apply -f nginx-deployment.yaml

Deleting Resources: To delete resources like a deployment, service, or pod, use the kubectl delete command:

# Delete a deployment
kubectl delete deployment nginx-deployment

# Delete a service
kubectl delete service my-service

# Delete a pod
kubectl delete pod pod-name

These are just a few examples of common operations to manage a Kubernetes cluster using kubectl. There are many more features and functionalities available to manage and monitor Kubernetes clusters. Always refer to the official Kubernetes documentation and other resources for more in-depth knowledge and advanced management tasks.