Upgrading Tanzu Kubernetes Grid (TKG) from v1.1.x to v1.2.0

VMware announced the General Availability of VMware Tanzu Kubernetes Grid (TKG) 1.2 last week. The new version comes with many enhancements and features.

Image Source : Unsplash

Some of the notable enhancements released in the newer version are:

Support for deployment to Microsoft Azure

TKG 1.1.x supported two platforms. vSphere and Amazon EC2. Starting with version 1.2, TKG now supports deployment to Microsoft Azure. This gives customers who are more invested in Microsoft Azure, a standardized way to deploy Kubernetes clusters.

Image Source: TKG Announcement Blog

Support for newer versions of Kubernetes

TKG 1.2 supports below set of Kubernetes versions:

  • 1.19.1
  • 1.18.8
  • 1.17.11

For previously supported versions, check out the 1.1.3 version release notes.

Support for Antrea as the default CNI provider

Previous versions of TKG included Calico as a CNI and there were provisions to swap it out with other CNI plugins like Antrea. This version used Antrea by default. Antrea is a Kubernetes networking solution intended to be Kubernetes native. It operates at Layer3/4 to provide networking and security services for a Kubernetes cluster, leveraging Open vSwitch as the networking data plane.

This blog post will cover the details of the steps involved in the upgrading Tanzu Kubernetes Grid from 1.1.x to 1.2.0 The process is fairly straight forward, thanks to all the effort that VMware engineers have put into it, to make it so.

To start with, make sure you have access to the server that has the tkg command line running and you are able to access the TKG management and guest clusters.

The TKG upgrade process involves four main steps:

1) Downloading and installing the v1.2.0 TKG CLI from the server where you are accessing the TKG clusters.

2) Downloading and uploading a new Kubernetes node OVA, if you are moving to a newer version of Kubernetes.

3) Upgrading the management clusters.

4) Upgrading the guest clusters.

To begin with, download the TKG v1.2.0 CLI on to the server where the previous version of TKG CLI is installed. I will be doing this on an Ubuntu Linux server. The commands that you will see below are all for Linux. You can refer to the documentation for the Mac OS and Windows commands.

Once you downloaded the tkg cli file, extract it and run the below commands to install it.

mv ./tkg-linux-amd64-v1.2.0+vmware.1 /usr/local/bin/tkg
chmod +x /usr/local/bin/tkg

Running command tkg version should give you the below output:

Client:
        Version: v1.2.0
        Git commit: 05b233e75d6e40659247a67750b3e998c2d990a5

Here are some screenshots from my lab:

Next step is to download and upload the Kubernetes node OVA. Go to TKG download page and log in with your My VMware credentials. The version of the template can be any of the newly supported versions that I have highlighted above or the older versions also. If you decide to retain the version of Kubernetes supported by older TKG versions, you can skip this step. I will be using the 1.19.1 (Photon v3 Kubernetes v1.19.1 OVA) image here.

NOTE: The new version of TKG will not need the HA Proxy image like the previous versions did. If you decide to retain the Kubernetes versions supported by previous versions, do not delete the previously uploaded OVAs.

Since the OVF upload process is a fairly simple one, I’m going to assume it has been carried out and move to the next steps.

Once the OVF is uploaded, provide the permissions for the TKG User – tkguser (may be different in your case) on this template. The permissions required by the role TKG is highlighted here.

Next, we move on to upgrading the management clusters. Two important tips to note before you proceed to doing this.

Tip 1: You should first upgrade the management cluster before you proceed to upgrading the guest clusters or Tanzu Kubernetes Clusters (TKC). You cannot upgrade Tanzu Kubernetes Clusters until you have upgraded the management cluster that manages them.

Tip 2: If you using a single server with TKG CLI for TKG cluster management, it is a must to upgrade them all to v1.2.0. Due to changes in the newer version, the TKG CLI v1.2.0 will not be able to manage clusters deployed on older TKG versions.

Now, connect to the management cluster and run the below commands to upgrade it.

Run the tkg get management-cluster command to see the list of management clusters

$ tkg get management-cluster
Output:
MANAGEMENT-CLUSTER-NAME  CONTEXT-NAME                     STATUS  
 vsphere-mgmt *          vsphere-mgmt-admin@vsphere-mgmt  Success 

Run the tkg set management-cluster command to set the context of the Tanzu Kubernetes Grid CLI to the management cluster that you want to upgrade.

$ tkg set management-cluster vsphere-mgmt
Output:
The current management cluster context is switched to vsphere-mgmt

Run the tkg get cluster command with the --include-management-cluster option

$ tkg get cluster --include-management-cluster
Output:
 NAME          NAMESPACE   STATUS         CONTROLPLANE  WORKERS  KUBERNETES        ROLES  
 abhilashb     default     running        3/3           3/3      v1.18.3+vmware.1  <none> 
 vsphere-mgmt  tkg-system  running        1/1           1/1      v1.18.3+vmware.1  <none> 

Set the context of kubectl to the management cluster:

$ kubectl config use-context vsphere-mgmt-admin@vsphere-mgmt
Output:
Switched to context "vsphere-mgmt-admin@vsphere-mgmt"

Add the cluster-role label to the management cluster. the labels management and tanzu-services are applied to clusters when you create them, so that you can easily distinguish between management clusters and the clusters that are created when you deploy the Tanzu Kubernetes Grid extensions. When you are upgrading management clusters from a previous version, you must apply the new role labels to existing clusters manually.

$ kubectl label -n tkg-system cluster.cluster.x-k8s.io/vsphere-mgmt cluster-role.tkg.tanzu.vmware.com/management="" --overwrite=true
Output:
cluster.cluster.x-k8s.io/vsphere-mgmt labeled

Run the tkg upgrade management-cluster command and enter y to confirm. Since this might take different amounts of time in different environments, you can also add the timeout option with the command,. I have used 35m opposed to 30 mins of default timeout.

$ tkg upgrade management-cluster vsphere-mgmt --timeout 35m0s
Output:
Logs of the command execution can also be found at: /tmp/tkg-20201022T021646019157006.log
Upgrading management cluster 'vsphere-mgmt' to TKG version 'v1.2.0' with Kubernetes version 'v1.19.1+vmware.2'. Are you sure? [y/N]: y
Upgrading management cluster providers...
Checking cert-manager version...
Cert-manager is already up to date
Performing upgrade...
Deleting Provider="cluster-api" Version="" TargetNamespace="capi-system"
Installing Provider="cluster-api" Version="v0.3.10" TargetNamespace="capi-system"
Deleting Provider="bootstrap-kubeadm" Version="" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.10" TargetNamespace="capi-kubeadm-bootstrap-system"
Deleting Provider="control-plane-kubeadm" Version="" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.10" TargetNamespace="capi-kubeadm-control-plane-system"
Deleting Provider="infrastructure-vsphere" Version="" TargetNamespace="capv-system"
Installing Provider="infrastructure-vsphere" Version="v0.7.1" TargetNamespace="capv-system"
Management cluster providers upgraded successfully...
Upgrading management cluster kubernetes version...
Verifying kubernetes version...
Retrieving configuration for upgrade cluster...
Create InfrastructureTemplate for upgrade...
Upgrading control plane nodes...
Patching KubeadmControlPlane with the kubernetes version v1.19.1+vmware.2...
Waiting for kubernetes version to be updated for control plane nodes
Upgrading worker nodes...
Patching MachineDeployment with the kubernetes version v1.19.1+vmware.2...
Waiting for kubernetes version to be updated for worker nodes...
Management cluster 'vsphere-mgmt' successfully upgraded to TKG version 'v1.2.0' with kubernetes version 'v1.19.1+vmware.2'

This completes the upgrade process on the management cluster. If you observe on the vSphere UI, this will create newer nodes with latest Kubernetes version and delete the older ones. This holds good for the control plane and the worker node(s).

Run the tkg get cluster command with the --include-management-cluster option again to check that the management cluster has been upgraded.

$ tkg get cluster --include-management-cluster
Output:
 NAME          NAMESPACE   STATUS   CONTROLPLANE  WORKERS  KUBERNETES        ROLES      
 abhilashb     default     running  3/3           3/3      v1.18.3+vmware.1  <none>     
 vsphere-mgmt  tkg-system  running  1/1           1/1      v1.19.1+vmware.2  management

Notice that the management cluster now shows the Kubernetes version of 1.19.1 and also a new role called management is also added to the cluster.

Some screenshots from my lab:

Now that the management clusters are upgraded, we can start upgrading the workload/guest clusters.

Run the tkg get cluster command with the --include-management-cluster option again.

$ tkg get cluster --include-management-cluster
Output:
 NAME          NAMESPACE   STATUS   CONTROLPLANE  WORKERS  KUBERNETES        ROLES      
 abhilashb     default     running  3/3           3/3      v1.18.3+vmware.1  <none>     
 vsphere-mgmt  tkg-system  running  1/1           1/1      v1.19.1+vmware.2  management

Notice that we have a cluster named abhilashb which is still running on an older Kubernetes version. This is the same cluster we used in my How-To Scale a Tanzu Kubernetes cluster using TKG CLI bog post.

To discover which versions of Kubernetes are made available by a management cluster, run the tkg get kubernetesversions command.

$ tkg get kubernetesversions
output:
 VERSIONS          
 v1.17.11+vmware.1 
 v1.17.3+vmware.2  
 v1.17.6+vmware.1  
 v1.17.9+vmware.1  
 v1.18.2+vmware.1  
 v1.18.3+vmware.1  
 v1.18.6+vmware.1  
 v1.18.8+vmware.1  
 v1.19.1+vmware.2

Before running the next command, make sure the context is set to the cluster that is being upgraded. Set the context of kubectl to the cluster that you want to upgrade.

$ kubectl config use-context abhilashb-admin@abhilashb
Switched to context "abhilashb-admin@abhilashb"

Run tkg upgrade cluster and specify the --yes option to skip validation and --timeout option with a value greater than the default of 30 minutes.

$ tkg upgrade cluster abhilashb --yes --timeout 35m0s
Output:
Logs of the command execution can also be found at: /tmp/tkg-20201023T042958763765745.log
Validating configuration...
Verifying kubernetes version...
Retrieving configuration for upgrade cluster...
Create InfrastructureTemplate for upgrade...
Upgrading control plane nodes...
Patching KubeadmControlPlane with the kubernetes version v1.19.1+vmware.2...
Waiting for kubernetes version to be updated for control plane nodes
Upgrading worker nodes...
Patching MachineDeployment with the kubernetes version v1.19.1+vmware.2...
Waiting for kubernetes version to be updated for worker nodes...
Cluster 'abhilashb' successfully upgraded to kubernetes version 'v1.19.1+vmware.2'

When the upgrade finishes, run the tkg get cluster command with the --include-management-cluster option again, to check that the Tanzu Kubernetes cluster has been upgraded.

$ tkg get cluster --include-management-cluster
Output:
 NAME          NAMESPACE   STATUS   CONTROLPLANE  WORKERS  KUBERNETES        ROLES      
 abhilashb     default     running  3/3           3/3      v1.19.1+vmware.2  <none>     
 vsphere-mgmt  tkg-system  running  1/1           1/1      v1.19.1+vmware.2  management

Notice that both management and the TKC have all been upgraded to the latest Kubernetes version. This is all you need to upgrade your existing 1.1.x Tanzu Kubernetes Grid to 1.2.0.

Some screenshots from the lab, again, for your reference.

Leave a Reply