Customer’s business application deployed on Azure are dependent on Azure Cloud Native Services (IaaS, PaaS). Orange Business Services provides the managed services necessary to ensure service assurance and change management for those dependences, as well as the configuration and deployment for building and recovering them.
The cloud native services
One can typically distinguish 3 categories of services:
-
- The user plane services: if a business application depends on it, the business application is likely to be affected by a defect of it. The service does not have persistent data, therefore the recovery does not necessitate data restore.
-
- The data services: if a business application depends on a data service, the business application is likely to be affected by a defect of it. The service has persistent data, therefore a recovery may necessitate data restore. Data loss, data corruption may affect the business application as well.
-
- The other services: the business application does not depend on them. Most of those services are used for automation, observation, migration. The loss of the service is not likely to affect the business application. Some of the services are used for managing the user plane and data plane services of the business application, some others have specific usage for which a scope of work shall be established would the customer requires OBS to leverage them as part of the managed service provided.
Tasks involved Cloud Native service management
The tasks involved for the management of a cloud native service depends on the service. They consist in:
-
- Configuring and deploying the service: Infrastructure as Code is leveraged in order to configure the service, the observability, the backup. Level 3 expertise on the service is leveraged for proper implementation thanks to the scope of work (refer to detailed description of build and SRE services)
-
- Applying the security group and access control policy defined by the customer.
-
- Service recovery thanks to Infrastructure as Code: in case of failure, most of the services requires to be recovered thanks to a redeployment. Re-configuring the service manually from scratch is not an efficient option: it takes time and is error prone. This is why recovery / redeployment from Infrastructure as Code is preferred.
-
- Supervision and remedial consists in watching for alarms raised on the service during the monitoring range (typically: 8×5 or 24×7). When an alarm occurs, an incident ticket is raised, a priority is assigned, the customer is notified. Then remedial action is taken thanks to the procedures made available to Level 2 / 1 by the Level 3. The remedial on a cloud native service may be necessary to restore the service of the business application. Would the procedure not remedy to the incident, then the incident is escaladed to the Level 3. Would the root cause be the CSP itself, then the incident is raised to the CSP by the Level 3.
-
- Backup and restore: depending on the service (if the service has persistence), it is necessary to backup the service data. The management service consists in configuring the backup solution and monitoring the proper run of it. Note: the backup solution has to be subscribed separately e.g Azure backup. Restoring the service on incident may involve restoring the data from a backup.
-
- OS patching and anti-virus: keeping OS up to date and virus free is a managed service for Managed Virtual Machine / Managed OS. Please refer to the detailed description.
-
- Specifics: some cloud native services may have specific configuration or management tasks.
-
- Business application specifics: by default, standard alerts are watched. The configuration of alerts, logs on a cloud native service which are specific to a business application is subject to a specific scope of work.
Table of tasks involved in the management a Cloud Native service
Tooling used for cloud native managed services
General pre-requisites to the run of managed services
The following pre-requisites are necessary to all managed services:
-
- The Customer shall have defined a valid architecture. (OBS can optionally provide Professional Services for architecture definition).
-
- The Customer shall have a valid subscription to Azure including subscription to Azure Support plan and procure the Azure resources and Azure support plan. OBS can optionally supply this subscription inclusive of Azure support (ref to Multi-Cloud Ready offer for Azure), however, the subscription, the IaaS resources, the Azure support are not part of the Managed Services. The Managed Services will leverage this support contract to escalades incident to Azure CSP.
-
- Azure platform for the Customer shall be urbanized alongside best practices of Azure’s landing zone or shall offer comparable services.
-
- OBS proposes a default RACI depending on the class of transition and the resource managed. As a pre-requisite to the project, OBS and the Customer shall agree on the RACI.
-
- Agreement on the tooling used for GIT, CI / CD chain, Monitoring, Logging and Alerting solution.
-
- Additional pre-requisites are required when transition is not the entire responsibility of OBS (e.g required for partial build like “Operations Build” or “Backend Build” models, refer to chapter 8 of the document: Build Scope of Work)
In the case of Fully Managed service, OBS is using its own Git, CI / CD chain, Monitoring, Logging and Alerting solution.
In the case of a Co-managed service, OBS and the Customer agree on the Git, CI / CD chain, Monitoring, Logging and Alerting solution to be used. By default, the tooling is
-
- Either based on Azure tools i.e Azure DevOps, Azure Monitoring
-
- Or based on generic multi-cloud tooling proposed by OBS e.g CaasCad (Prometheus, Grafana,…)
This tooling not included in the Managed Applications work units and can be purchased separately as part of Azure Subscription or as a multi-cloud tooling proposal made by OBS.
Criteria for the run of a managed cloud native service component
Criteria shall be met with an approval by Level 2 before turning a cloud native component to an active manage service (i.e Run) by the Level 2 / Level 1 operations. The owner of the Build and of the Level 3 support owns the responsibility of making sure that the criteria are met:
-
- The architecture and deployment of the service shall be defined.
-
- The service shall be deployed thanks to Infrastructure-as-Code and tested prior to transitioning to the run team. Typically, successful testing in pre-production, with a pre-production environment iso-production. Note: IaC is necessary to recover the services in case of major failure.
-
- The use of the service shall be explained to the operation team
-
- The security policies and access control shall have been configured.
-
- The access shall have been configured allowing OBS Level 2 teams access.
-
- The service shall export the necessary metrics towards Azure Monitor.
-
- The data backup shall be configured in Azure Backup when backup is applicable.
-
- The disaster recovery shall be configured when applicable.
-
- The troubleshooting and service restoration procedures shall be provided to Level 2.
-
- Whereas a procedure requires logs or dashboard those shall have been developed and deployed prior to transferring to run phase.
-
- A remedial procedure on incident shall not last more than 15 minutes. Beyond, that time amount, the effort would be charged on time base.