The customer can delegate the supervision of the business applications services for an always-on service 24h/7d. The customer development teams can continue to evolve the software and architecture of the business application. In the co-managed mode, OBS Service Reliability Engineer will participate to customer’s scrum team meetings to contribute to the enablers necessary to manage the business application.
The customer and the SRE can concentrate on defining the observability and management procedures for the business functions and rely on standard managed services for the management of underlying dependencies.
The Scope of Work for co-management of the Business Application is defined between the Customer and OBS.
The architecture of the Application is explained by the customer to OBS expert to identify which components need to be supervised and maintained among:
- The business functions
- Their dependencies on interfaces with other Application business functions and with external services.
- Their dependencies on operating systems, middleware, databases, micro-services, Kubernetes services, cloud services, big data services
The main assumption for the co-management of Business Application is that the software is coded by the developers of the Customer or by a software 3rd party supplier to the customer. The customer is accountable and responsible – by himself or through his supplier – for the business application software and architecture maintenance, for the business application functioning and testing on the cloud environment prior to transition to OBS managed service.
For managing the business functions properly, the dependencies need to be managed. The catalogue of managed services includes pre-defined work units for the known middleware, databases, micro-services Kubernetes clusters, OS and native cloud services. The customer and OBS identify the necessary dependencies to be managed and add them to the Scope of Work as per service definition for the Manage Application catalogue of service (please refer to this document and to Managed Application Service Description).
Dependencies may include interfaces towards external systems or interfaces towards other business functions. Those shall be supervised as well to rapidly detect and identify root cause. The responsibility for repairing the external system is not part of the scope of work.
For the business applications functions, the scope of work shall be established based on the following inputs and deliverables from the customers:
- How is the business function supervised? what is the RACI between the customer and OBS.
- What is the criticality of the business function for the service?
- What are the known issues? what procedure shall be used to recover?
- How is the business function recovered in case of failure? Is it based on redeployment from Infra as Code? Is it based on restore from backup? what is the procedure?
- Are there specific routines to be run?
- How is the business function created and deployed? What is the chain dev, pre-prod, prod? what is the RACI between the customer and OBS on the various environments.
- What are the security policies to be applied and firewalling rules?
- Is a disaster plan needed? How is it achieved?
- Are there other services required from OBS? advisory, health check, performance, capacity?
- What is the frequency of incidents and changes on the element?
- What is the frequency of release roll-out?
- What is the maturity of the element?
Since the specificities of the Business Applications are specific knowledge of the customer and its 3rd party supplier, the customer is responsible for the level 3 support for the business application (potentially through its 3rd party supplier).
OBS level 1 and level 2 tasks consists in:
- supervising the business functions agreed in the scope of work
- applying the remediation procedures if an incident occurs
- resolving an incident on a managed dependence
- notifying and escalading to customer’s level 3 if resolution is not possible thanks to the procedure
As not all metrics, alerts, remediation procedures for managing the business application may not be known at once by the customer’s development team at the beginning of the project. The Service Reliability Engineer participates (remotely) to customer’s development team scrum meetings to contribute to the observability and automation specification and/or development. The SRE brings his expertise and experience about running applications on the cloud and facilitates the transition to the run.
As time goes, the customer’s development team and the SRE:
- Identify new pertinent metrics to be monitored by the run team
- Develop new exporters to have them supervised
- Document and automates troubleshooting of known problems
- Implement dashboards to be used by the development team for trend and behavior analysis
- Implement procedures based on logs for troubleshooting
- Implement various test routines
The ultimate goal is the improvement of the reliability and availability of the business application.
The following table summarizes the service
Service | Type | Configuration | Monitoring and alerts configured in AWS CloudWatch | Backup configured in AWS Backup | Recovery procedure | Patch management | Antivirus management | Specificities |
Business Function supervision | Managed | Deployment, redeployment of the Business Application is based on Time and Material. Pre-requisite: Image or deployment script provided by the customer. The Business App has been successfully tested on the infrastructure prior to transition to run. | Metrics exporters, alerters to AWS CloudWatch or Prometheus provided by customer as a prerequisite. | Backup and restore is an option. Customer to identify backup procedures necessary to protect business application data. And confirm whether backup of the underlying components is sufficient or not. | Troubleshooting and recovery procedure provided by the Customer. Procedure shall last less than15 min. Otherwise would be charged time based. The Customer is performing Level 3 support. |
Customer is responsible for the software and software patching. | Customer is responsible for the software antivirus of the Business Application. | Optional: Scope of work to be defined with Customer Pre-requisite: dependencies shall be managed. |
Supervision of an External Interface | Managed | A pre-requisite is that the external interface is exposed and reachable. Out of scope of the Managed Service. | A part of the software or a probe tests the availability of the external interface. | n/a | Customer is notified when the external interface down. The support of the external interface is out of scope of MA service. | n/a | n/a | n/a |
Cloud Services dependencies | Managed | Refer to each cloud managed service (as per catalogue) on which the Business App is dependant. |
The architecture of the application and deployment on the cloud shall be defined. Architecture is out of scope of the service.
The application shall be deployed and tested by the customer prior to transitioning to the run team. Typically, successful testing in pre-production, with a pre-production environment iso-production.
The business application exports metrics towards AWS CloudWatch or an agreed Prometheus.
The data backup strategy and disaster recovery strategy shall be provided by the customer.
The troubleshooting and service restoration procedures shall be provided by the customer.
Whereas a procedure requires logs or dashboard those shall have been developed by the customer prior to the service.
A remedial procedure on incident shall not last more than 15 minutes. Beyond, that time amount, the effort would be charged on time base.
Customer shall have subscribed to the managed applications service for the underlying components on which the business application is dependent.
The services needed from the Cloud Service Providers for observability, logs, monitoring, backup are not included in the service and therefore are charged as part of CSP subscription.
The business application software and Third-party Application Maintenance are out of scope. Application patching, vulnerability, virus free is customers responsibility.
The scope of work of the Managed Business Application service and OBS’s responsibility with regards to security is limited to configuration of the CSP firewalls and policy groups as per specifications by the customer. Should more security services be required, it shall be part of an optional mutually agreed scope of work.
Business Application end-users are not managed nor supported.
Application Performance Management is a specific Scope of Work and Quote.
Build, pipeline and deployment of the application is out of scope of the standard work unit. A specific scope of work shall be established.
Service | Work Unit |
Service Reliability Engineer | Time and material |
Business function – supervision – low priority | Per supervision source |
Business function – supervision – standard | Per supervision source |
Business function – supervision – critical | Per supervision source |
External interface – supervision – low priority | Per supervision source |
External interface – supervision – standard | Per supervision source |
External interface – supervision – critical | Per supervision source |
Data backup | Scope of Work |
Disaster Recovery | Scope of Work |
Cloud Services dependencies: OS, middleware, database, Kubernetes, microservices, big data | Work Unit of the managed service catalogue |
Incident raised by customer | Per incident ticket |
Changes examples | Effort |
Adding a new alarm | On quote or estimation in tokens based on time spent. Additional recurring work unit |
Deploying the business application | On quote or estimation in tokens based on time spent |
Adding a new troubleshooting procedure to the operational knowledge database | On quote or estimation in tokens based on time spent |
Troubleshooting beyond 15 mins due to lengthy procedure | Estimation in tokens based on over-time spent |