This article is contributed. See the original author and article here.
Minimizing business downtime during planned maintenance
Introduction
I was preparing a proposal for a customer recently and one of the pain points with their current SAP infrastructure architecture was that they were unable to patch their SAP environment without impacting their business with planned downtime.
Given the business criticality of SAP to this customer, minimizing planned downtime during patching was a key requirement and something that had to be addressed in our SAP on Azure proposal.
An SAP (on Azure) environment is made up of multiple software components that may need to be patched during the lifecycle of the system:
- SAP Application (Enhancement Packages and Support Packages)
- SAP Kernel
- Database
- Operating System
In this blog post I’d like to demonstrate how an SAP on Azure infrastructure deployed following Microsoft’s reference architecture for SAP S/4HANA for Linux VMs on Azure in combination with an SAP native capability called Rolling Kernel Switch (RKS) can enable the SAP Kernel to be patched with minimized business downtime.
It’s important to note that while the detailed steps outlined in this blog post are for Linux, the Rolling Kernel Switch (RKS) will work equally as well in Windows environments on Azure that follow the Microsoft reference architecture for SAP NetWeaver (Windows) for AnyDB on Azure.
Important SAP notes to review in relation to SAP Rolling Kernel Switch (RKS) are:
OSS Note Number |
Description |
URL |
953653 |
Rolling Kernel Switch |
|
2254173 |
Linux: Rolling Kernel Switch in Pacemaker based NetWeaver HA environments |
|
2199317 |
Support of Rolling Kernel Switch on Windows Failover Clusters |
It is mandatory to read and understand the standard SAP documentation and notes before implementing RKS in your own environment.
Microsoft Reference Architectures for SAP on Azure
The Microsoft reference architecture for SAP S/4HANA for Linux VMs on Azure and reference architecture for SAP NetWeaver (Windows) for AnyDB on Azure show a set of proven practices for running S/4HANA and SAP NetWeaver in a high availability environment that supports disaster recovery on Azure. This architecture is deployed with specific virtual machine (VM) sizes that can be changed to accommodate your organization’s needs.
For the purposes of this blog/demo I’ve built an S/4HANA 1809 on Azure environment that follows the reference architecture with the following exceptions:
- S2S VPN rather than ExpressRoute connectivity between my On-Premise (in my case, home) network and Azure. For Production scenarios an ExpressRoute connection is recommended.
- Embedded Fiori FES as per SAP’s default guidance
- Azure NetApp Files for the SAP Shared File Systems rather than an NFS Cluster, however, the RKS process described here would work equally well in either case
The SAP components of the reference architecture for SAP S/4HANA for Linux VMs on Azure that we will be patching without system downtime are outlined in green in the schematic below:
Source: Reference architecture for SAP S/4HANA for Linux VMs on Azure
Component |
Virtual Hostname |
Operating System |
HA Cluster |
SAP ASCS/ERS Node 1 |
anf-ascs |
SLES for SAP 12 SP4 |
Y |
SAP ASCS/ERS Node 2 |
anf-ers |
SLES for SAP 12 SP4 |
Y |
SAP Primary Application Server (PAS) |
sapapp1 |
SLES for SAP 12 SP4 |
n/a |
SAP Additional Application Server (AAS) |
sapapp2 |
SLES for SAP 12 SP4 |
n/a |
The current kernel patch level is 773 patch 101 and this will be patched to level 201 using RKS.
It is important to note that while the detailed steps in this blog are specific to SUSE SLES, the Rolling Kernel Switch (RKS) can be implemented in a similar fashion for SAP NetWeaver on RHEL HA environments on Azure.
Rolling Kernel Switch (RKS)
SAP provides detailed guidance on the RKS process in the SAP NetWeaver 7.4 Administration Guide.
In addition, SAP note 953653 – Rolling Kernel Switch contains important pre-requisites and restrictions. In particular, it should be noted that there is a manual RKS procedure for 7.2x kernel releases. As of 7.4x an automatic RKS procedure is provided by SAP.
It is mandatory to read and understand the standard SAP documentation and notes before implementing RKS in your own environment.
In summary, the rolling kernel switch (RKS) is an automated procedure that enables the SAP kernel in an SAP ABAP system to be exchanged without system downtime (for dual-stack and AS Java scenario limitations please refer to SAP note 953653).
RKS can also be used to make parameter changes while the system is running. Usually, RKS only causes minimal restrictions for users of the system.
In the SAP NetWeaver 7.5 Admin Guide, SAP states that the advantages of the rolling kernel switch are:
- SAP kernel exchange without system downtime (Note: individual SAP application instances are re-started but there is no overall system downtime)
- The procedure is automated
- The procedure can be started and monitored using standard tools in SAP MMC and in the system overview (transaction SM51)
- No or minimal impact on system users
- Static parameters can be changed while the system is running
Note: Parameters that affect the whole system should be checked carefully. Parameters that affect the system landscape (e.g. with ASCS instance or database in their name) cannot be changed with RKS.
Source: Automated Rolling Kernel Switch (RKS) in the SAP NetWeaver 7.5 Admin Guide
The intent of this blog is not to repeat the SAP documentation, rather, the key steps in the RKS process are outlined, in particular any steps that are relevant to the Microsoft reference architecture for SAP S/4HANA for Linux VMs on Azure are called out.
RKS Pre-Checks
Before the RKS process can be executed there are pre-requisite checks that must be carried out beforehand. Some of these are manual and must be performed by the system administrator. Other are automatic checks that are executed by the RKS process itself.
RKS manual Pre-checks
SAP lists the manual checks and preparations as follows:
Source: RKS – Manual and Automatic Checks of the System Configuration in the SAP NetWeaver 7.5 Admin Guide
Let’s consider each of these manual checks in turn:
- No component should form a single point of failure in the system
Because we have followed the Microsoft reference architecture for SAP S/4HANA for Linux VMs on Azure (or reference architecture for SAP NetWeaver for AnyDB on Azure) we know that no single SAP component forms a single point of failure in the system:
- SAP Central Services – deployed as a 2 node HA Cluster
- Minimum of 2 SAP Application Servers – To manage logon groups for ABAP application servers, the SMLG transaction is used. It uses the load balancing function within the message server of the Central Services to distribute workload among SAP application servers pool for SAPGUIs and RFC traffic. The application server connection to the highly available Central Services is through the cluster virtual network name. This avoids the need to change the application server profile for Central Services connectivity after a local failover.
- The system should be configured so that the expected workload can also be handled if an AS ABAP instance is stopped
The expected workload on my SAP on Azure demo environment is minimal, however, in a real Production environment this is a key consideration and is why SAP recommends performing RKS activities during periods of low business activity if possible.
- Make any necessary parameter changes before starting the RKS procedure
One of the additional benefits of RKS is that it can be used to implement SAP profile parameter changes without planned business downtime.
Note: Parameters that affect the whole system should be checked carefully. Parameters that affect the system landscape (e.g. with ASCS instance or database in their name) cannot be changed with RKS.
- If you want to import a new kernel patch, create a backup of DIR_CT_RUN
We definitely want to import a new kernel patch so let’s quickly confirm the location of DIR_CT_RUN. I checked using transaction AL11:
Create a backup of DIR_CT_RUN is also easy. For the purposes of this demo I’ve simply created a copy using the o/s command cp as follows:
cp -avr /usr/sap/A4H/SYS/exe/uc/linuxx86_64 /usr/sap/A4H/SYS/exe/uc/linuxx86_64_backup_11192019
- If you want to import a new kernel patch, download the relevant SAPEXE.SAR and SAPEXE<DB>.SAR from SAP ServiceMarketPlace. Extract these with the SAPCAR command line tool to DIR_CT_RUN.
In my case the latest patch level for the complete 773 kernel is 201. Remember to download the DATABASE INDEPENDENT archive AND the DATABASE SPECIFIC archive. In my case SAP HANA.
Extract with SAPCAR into DIR_CT_RUN e.g.
cd /sapmnt/A4H/exe/uc/linuxx86_64
SAPCAR -xvf /sapsoftware/SAPKernel773_PatchLevel_201/SAPEXEDB_201-80003385.SAR
SAPCAR -xvf /sapsoftware/SAPKernel773_PatchLevel_201/SAPEXE_201-80003386.SAR
- Use logon groups instead of a fixed logon to a specific server
Most SAP Production environments will be using SAP Logon Groups already but if not it’s always a good practice to do so and they can be configured via transaction SMLG. In my demo case I have two SAP application severs configured in a single Logon Group:
- Avoid long running processes such as batch jobs
In my demo environment this isn’t an issue, however, it very likely will be in real Production system. This is also why SAP recommends performing RKS activities at periods of lower system activity. Each SAP application server will be stopped in turn and if there are still long running batch jobs running on the application server these will be terminated.
RKS Automatic Pre-Checks
The list of RKS automatic checks is extensive and available here
As well as being executed prior to the RKS process itself an Administrator can also execute the automated pre-checks in advance.
To do this, in SAP MMC choose Check Prerequisites from the context menu (right-click) of System Update.
RKS MMC Pre-requisites Check – Error
The first time I executed the pre-requisite checks in my SAP on Azure demo environment I experienced the following error:
FAIL: NIEHOST_UNKNOWN (Invalid argument), <errordetails xmlns=”urn:SAPControl”>NiRawConnect failed in plugin_fopen()</errordetails>
To resolve I uncommented the localhost entry in the hosts file on the VM running the SAM MMC:
# localhost name resolution is handled within DNS itself.
#127.0.0.1 localhost
127.0.0.1 localhost
RKS Pre-requisites Check – Warning
Once the localhosts issue was resolved the next message received was an RKS Warning. RKS had detected – correctly – that my setup was HA using SUSE SLES for SAP Applications 12 SP4.
The SAP note in the message and one other referenced within it were relevant to my setup:
- https://launchpad.support.sap.com/#/notes/2077934 – Rolling kernel switch in HA environments
- https://launchpad.support.sap.com/#/notes/2254173 – Linux: Rolling Kernel Switch in Pacemaker based NetWeaver HA environments
As my HA clusters are based on SLES 12 SP4 I put the ASCS cluster into maintenance mode:
But noted that for Pacemaker clusters running on SUSE Linux Enterprise Server for SAP Applications (SLES for SAP) 15 you should follow the following procedure:
- Check, if you already have sap-suse-cluster-connector version 3.1.0. If you have already this version installed RKS is even supported without setting the cluster to maintenance mode.
RKS Execution
Now that we’ve completed the pre-checks, we’re just about ready to execute the RKS process itself.
SAP provides a detailed description of the RKS Process in the SAP NetWeaver 7.5 Admin Guide
Each component of the SAP system will be stopped in turn:
- The enqueue replication server is the first instance restarted.
- The ASCS instance is the second instance restarted.
- Then the application server instances are restarted in the order specified beforehand. In the figure below, instance A is the first application server instance restarted.
- Instance B is the second application server instance restarted, and so on.
- The instance defined as the last one in the order is restarted together with its start service last of all with the new kernel version. The RKS procedure is completed with this final step.
Source: RKS Process in the SAP NetWeaver 7.5 Admin Guide
- Nearly ready to Update System. Before we do let’s just confirm the current kernel patch level of the SAP components that will be updated.
Kernel Patch levels prior to starting the RKS
ERS (dev_enq_replicator):
Message Server (dev_ms):
First SAP Application Server:
Second SAP Application Server:
RKS – Update System
To start the RKS Process again I right click on System Update and select Update System from the Context Menu.
Note: You can also start RKS from the command line
In my demo case I reduced the soft shutdown timeout and start wait timeout to lower values than the defaults shown here. In Production environments this should be carefully considered following SAP’s guidance on RKS timeouts.
The RKS process re-runs the automated pre-requisite checks we did earlier. As we’ve seen this warning already and read and understood the SAP note 2077934 we can simply click OK.
The restart of the ERS and ASCS is very quick (see the RKS log below for detailed timings). Within a few seconds the first SAP application server is shutdown:
I’m still logged on via the second application server and can see that Kernel update is active in transaction SM51:
After the first SAP application server is started, the second SAP application server is then shutdown:
After a few more minutes the process completes successfully.
We can view the complete RKS log via the SAP MMC or directly at the operating system level.
Post RKS Checks
Now that the RKS process has completed we’ll quickly double-check that all the required components have been patched correctly:
The Enqueue replication server looks good – updated to 773 patch 201:
So does the SAP Message Server – updated to 773 patch 201:
Now let’s check both SAP application servers via transaction SM51 – Release Notes:
Yup, all good.
We can now take the ASCS Cluster out of maintenance mode:
Remembering that per with SLES15 and the sap-suse-cluster-connector version 3.1.0 it will not be necessary to take the cluster in and out of maintenance mode before executing RKS.
Conclusion
An SAP on Azure deployment that follows the Microsoft reference architecture for SAP S/4HANA for Linux VMs on Azure or reference architecture for SAP NetWeaver (Windows) for AnyDB on Azure in combination with the SAP native capability Rolling Kernel Switch is an excellent combination for patching SAP kernels on Azure without system downtime.
Future blog posts will look at downtime minimized patching approaches to other software components of an SAP on Azure environment.
Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.
Recent Comments