This article is contributed. See the original author and article here.

Minimizing business downtime during planned maintenance

 

Introduction

I was preparing a proposal for a customer recently and one of the pain points with their current SAP infrastructure architecture was that they were unable to patch their SAP environment without impacting their business with planned downtime.

 

Given the business criticality of SAP to this customer, minimizing planned downtime during patching was a key requirement and something that had to be addressed in our SAP on Azure proposal.

 

An SAP (on Azure) environment is made up of multiple software components that may need to be patched during the lifecycle of the system:

 

  • SAP Application (Enhancement Packages and Support Packages)
  • SAP Kernel
  • Database
  • Operating System

In this blog post I’d like to demonstrate how an SAP on Azure infrastructure deployed following Microsoft’s reference architecture for SAP S/4HANA for Linux VMs on Azure in combination with an SAP native capability called Rolling Kernel Switch (RKS) can enable the SAP Kernel to be patched with minimized business downtime.

 

It’s important to note that while the detailed steps outlined in this blog post are for Linux, the Rolling Kernel Switch (RKS) will work equally as well in Windows environments on Azure that follow the Microsoft reference architecture for SAP NetWeaver (Windows) for AnyDB on Azure.

 

Important SAP notes to review in relation to SAP Rolling Kernel Switch (RKS) are:

 

OSS Note Number

Description

URL

953653

Rolling Kernel Switch

https://launchpad.support.sap.com/#/notes/953653

2254173

Linux: Rolling Kernel Switch in Pacemaker based NetWeaver HA environments

https://launchpad.support.sap.com/#/notes/2254173

2199317

Support of Rolling Kernel Switch on Windows Failover Clusters

https://launchpad.support.sap.com/#/notes/2199317

 

It is mandatory to read and understand the standard SAP documentation and notes before implementing RKS in your own environment.

 

Microsoft Reference Architectures for SAP on Azure

The Microsoft reference architecture for SAP S/4HANA for Linux VMs on Azure and reference architecture for SAP NetWeaver (Windows) for AnyDB on Azure show a set of proven practices for running S/4HANA and SAP NetWeaver in a high availability environment that supports disaster recovery on Azure. This architecture is deployed with specific virtual machine (VM) sizes that can be changed to accommodate your organization’s needs.

 

For the purposes of this blog/demo I’ve built an S/4HANA 1809 on Azure environment that follows the reference architecture with the following exceptions:

 

The SAP components of the reference architecture for SAP S/4HANA for Linux VMs on Azure that we will be patching without system downtime are outlined in green in the schematic below:

 

chrishough_0-1601535916494.png

Source: Reference architecture for SAP S/4HANA for Linux VMs on Azure

 

Component

Virtual Hostname

Operating System

HA Cluster

SAP ASCS/ERS Node 1

anf-ascs

SLES for SAP 12 SP4

Y

SAP ASCS/ERS Node 2

anf-ers

SLES for SAP 12 SP4

Y

SAP Primary Application Server (PAS)

sapapp1

SLES for SAP 12 SP4

n/a

SAP Additional Application Server (AAS)

sapapp2

SLES for SAP 12 SP4

n/a

 

The current kernel patch level is 773 patch 101 and this will be patched to level 201 using RKS.

 

It is important to note that while the detailed steps in this blog are specific to SUSE SLES, the Rolling Kernel Switch (RKS) can be implemented in a similar fashion for SAP NetWeaver on RHEL HA environments on Azure.

 

Rolling Kernel Switch (RKS)

SAP provides detailed guidance on the RKS process in the SAP NetWeaver 7.4 Administration Guide.

 

In addition, SAP note 953653 – Rolling Kernel Switch contains important pre-requisites and restrictions. In particular, it should be noted that there is a manual RKS procedure for 7.2x kernel releases. As of 7.4x an automatic RKS procedure is provided by SAP.

 

It is mandatory to read and understand the standard SAP documentation and notes before implementing RKS in your own environment.

 

In summary, the rolling kernel switch (RKS) is an automated procedure that enables the SAP kernel in an SAP ABAP system to be exchanged without system downtime (for dual-stack and AS Java scenario limitations please refer to SAP note 953653).

 

RKS can also be used to make parameter changes while the system is running. Usually, RKS only causes minimal restrictions for users of the system.

 

In the SAP NetWeaver 7.5 Admin Guide, SAP states that the advantages of the rolling kernel switch are:

 

  • SAP kernel exchange without system downtime (Note: individual SAP application instances are re-started but there is no overall system downtime)
  • The procedure is automated
  • The procedure can be started and monitored using standard tools in SAP MMC and in the system overview (transaction SM51)
  • No or minimal impact on system users
  • Static parameters can be changed while the system is running

Note: Parameters that affect the whole system should be checked carefully. Parameters that affect the system landscape (e.g. with ASCS instance or database in their name) cannot be changed with RKS.

          

Source: Automated Rolling Kernel Switch (RKS) in the SAP NetWeaver 7.5 Admin Guide

 

The intent of this blog is not to repeat the SAP documentation, rather, the key steps in the RKS process are outlined, in particular any steps that are relevant to the Microsoft reference architecture for SAP S/4HANA for Linux VMs on Azure are called out.

 

RKS Pre-Checks

Before the RKS process can be executed there are pre-requisite checks that must be carried out beforehand. Some of these are manual and must be performed by the system administrator. Other are automatic checks that are executed by the RKS process itself.

 

RKS manual Pre-checks

SAP lists the manual checks and preparations as follows:

 

chrishough_1-1601535916513.png

 

Source: RKS – Manual and Automatic Checks of the System Configuration in the SAP NetWeaver 7.5 Admin Guide

 

Let’s consider each of these manual checks in turn:

 

  1. No component should form a single point of failure in the system

Because we have followed the Microsoft reference architecture for SAP S/4HANA for Linux VMs on Azure (or reference architecture for SAP NetWeaver for AnyDB on Azure) we know that no single SAP component forms a single point of failure in the system:

 

  • SAP Central Services – deployed as a 2 node HA Cluster
  • Minimum of 2 SAP Application Servers – To manage logon groups for ABAP application servers, the SMLG transaction is used. It uses the load balancing function within the message server of the Central Services to distribute workload among SAP application servers pool for SAPGUIs and RFC traffic. The application server connection to the highly available Central Services is through the cluster virtual network name. This avoids the need to change the application server profile for Central Services connectivity after a local failover.
  1. The system should be configured so that the expected workload can also be handled if an AS ABAP instance is stopped

The expected workload on my SAP on Azure demo environment is minimal, however, in a real Production environment this is a key consideration and is why SAP recommends performing RKS activities during periods of low business activity if possible.

 

  1. Make any necessary parameter changes before starting the RKS procedure

One of the additional benefits of RKS is that it can be used to implement SAP profile parameter changes without planned business downtime.

 

Note: Parameters that affect the whole system should be checked carefully. Parameters that affect the system landscape (e.g. with ASCS instance or database in their name) cannot be changed with RKS.

 

  1. If you want to import a new kernel patch, create a backup of DIR_CT_RUN

We definitely want to import a new kernel patch so let’s quickly confirm the location of DIR_CT_RUN. I checked using transaction AL11:

 

chrishough_2-1601535916520.png

 

Create a backup of DIR_CT_RUN is also easy. For the purposes of this demo I’ve simply created a copy using the o/s command cp as follows:

 

cp -avr /usr/sap/A4H/SYS/exe/uc/linuxx86_64 /usr/sap/A4H/SYS/exe/uc/linuxx86_64_backup_11192019

 

  1. If you want to import a new kernel patch, download the relevant SAPEXE.SAR and SAPEXE<DB>.SAR from SAP ServiceMarketPlace. Extract these with the SAPCAR command line tool to DIR_CT_RUN.

In my case the latest patch level for the complete 773 kernel is 201. Remember to download the DATABASE INDEPENDENT archive AND the DATABASE SPECIFIC archive. In my case SAP HANA.

chrishough_3-1601535916547.png

 

chrishough_4-1601535916564.png

 

Extract with SAPCAR into DIR_CT_RUN e.g.

cd /sapmnt/A4H/exe/uc/linuxx86_64

SAPCAR -xvf /sapsoftware/SAPKernel773_PatchLevel_201/SAPEXEDB_201-80003385.SAR

SAPCAR -xvf /sapsoftware/SAPKernel773_PatchLevel_201/SAPEXE_201-80003386.SAR

 

  1. Use logon groups instead of a fixed logon to a specific server

Most SAP Production environments will be using SAP Logon Groups already but if not it’s always a good practice to do so and they can be configured via transaction SMLG. In my demo case I have two SAP application severs configured in a single Logon Group:

chrishough_5-1601535916566.png

 

  1. Avoid long running processes such as batch jobs

In my demo environment this isn’t an issue, however, it very likely will be in real Production system. This is also why SAP recommends performing RKS activities at periods of lower system activity. Each SAP application server will be stopped in turn and if there are still long running batch jobs running on the application server these will be terminated.

 

RKS Automatic Pre-Checks

The list of RKS automatic checks is extensive and available here

 

As well as being executed prior to the RKS process itself an Administrator can also execute the automated pre-checks in advance.

 

To do this, in SAP MMC choose Check Prerequisites from the context menu (right-click) of System Update.

 

chrishough_6-1601535916570.png

 

 

RKS MMC Pre-requisites Check – Error

The first time I executed the pre-requisite checks in my SAP on Azure demo environment I experienced the following error:

 

FAIL: NIEHOST_UNKNOWN (Invalid argument), <errordetails xmlns=”urn:SAPControl”>NiRawConnect failed in plugin_fopen()</errordetails>

 

To resolve I uncommented the localhost entry in the hosts file on the VM running the SAM MMC:

 

# localhost name resolution is handled within DNS itself.

#127.0.0.1       localhost

127.0.0.1       localhost

 

RKS Pre-requisites Check – Warning

Once the localhosts issue was resolved the next message received was an RKS Warning. RKS had detected – correctly – that my setup was HA using SUSE SLES for SAP Applications 12 SP4.

chrishough_7-1601535916574.png

 

The SAP note in the message and one other referenced within it were relevant to my setup:

 

chrishough_8-1601535916588.png

 

As my HA clusters are based on SLES 12 SP4 I put the ASCS cluster into maintenance mode:

chrishough_9-1601535916590.png

 

chrishough_10-1601535916601.png

 

But noted that for Pacemaker clusters running on SUSE Linux Enterprise Server for SAP Applications (SLES for SAP) 15 you should follow the following procedure:

 

  • Check, if you already have sap-suse-cluster-connector version 3.1.0. If you have already this version installed RKS is even supported without setting the cluster to maintenance mode.

RKS Execution

Now that we’ve completed the pre-checks, we’re just about ready to execute the RKS process itself.

SAP provides a detailed description of the RKS Process in the SAP NetWeaver 7.5 Admin Guide

 

Each component of the SAP system will be stopped in turn:

 

chrishough_11-1601535916615.png

  1. The enqueue replication server is the first instance restarted.
  2. The ASCS instance is the second instance restarted.
  3. Then the application server instances are restarted in the order specified beforehand. In the figure below, instance A is the first application server instance restarted.
  4. Instance B is the second application server instance restarted, and so on.
  5. The instance defined as the last one in the order is restarted together with its start service last of all with the new kernel version. The RKS procedure is completed with this final step.

Source: RKS Process in the SAP NetWeaver 7.5 Admin Guide

 

  1. Nearly ready to Update System. Before we do let’s just confirm the current kernel patch level of the SAP components that will be updated.

Kernel Patch levels prior to starting the RKS

ERS (dev_enq_replicator):

chrishough_12-1601535916621.png

 

Message Server (dev_ms):

chrishough_13-1601535916627.png

 

First SAP Application Server:

chrishough_14-1601535916636.png

 

Second SAP Application Server:

chrishough_15-1601535916646.png

 

RKS – Update System

To start the RKS Process again I right click on System Update and select Update System from the Context Menu.

 

Note: You can also start RKS from the command line

 

In my demo case I reduced the soft shutdown timeout and start wait timeout to lower values than the defaults shown here. In Production environments this should be carefully considered following SAP’s guidance on RKS timeouts.

 

chrishough_16-1601535916658.png

 

The RKS process re-runs the automated pre-requisite checks we did earlier. As we’ve seen this warning already and read and understood the SAP note 2077934 we can simply click OK.

 

chrishough_17-1601535916661.png

 

The restart of the ERS and ASCS is very quick (see the RKS log below for detailed timings). Within a few seconds the first SAP application server is shutdown:

 

chrishough_18-1601535916663.png

 

I’m still logged on via the second application server and can see that Kernel update is active in transaction SM51:

 

chrishough_19-1601535916669.png

 

After the first SAP application server is started, the second SAP application server is then shutdown:

 

chrishough_20-1601535916672.png

 

After a few more minutes the process completes successfully.

 

We can view the complete RKS log via the SAP MMC or directly at the operating system level.

chrishough_21-1601535916691.png

 

Post RKS Checks

Now that the RKS process has completed we’ll quickly double-check that all the required components have been patched correctly:

 

The Enqueue replication server looks good – updated to 773 patch 201:

chrishough_22-1601535916693.png

 

So does the SAP Message Server – updated to 773 patch 201:

chrishough_23-1601535916698.png

 

Now let’s check both SAP application servers via transaction SM51 – Release Notes:

chrishough_24-1601535916706.png

 

chrishough_25-1601535916715.png

 

Yup, all good.

 

We can now take the ASCS Cluster out of maintenance mode:

chrishough_26-1601535916716.png

 

chrishough_27-1601535916723.png

 

Remembering that per with SLES15 and the sap-suse-cluster-connector version 3.1.0 it will not be necessary to take the cluster in and out of maintenance mode before executing RKS.

 

Conclusion

An SAP on Azure deployment that follows the Microsoft reference architecture for SAP S/4HANA for Linux VMs on Azure or reference architecture for SAP NetWeaver (Windows) for AnyDB on Azure in combination with the SAP native capability Rolling Kernel Switch is an excellent combination for patching SAP kernels on Azure without system downtime.

 

Future blog posts will look at downtime minimized patching approaches to other software components of an SAP on Azure environment.

 

 

 

 

 

 

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.