Cloud Load Balancing

Description

Google Cloud Load Balancing operates at layer 4 or 7 of the Open Systems Interconnection (OSI) model. Google Cloud Load Balancing is a software-based managed service for distributing traffic in a single or multiple region across multiple instances of applications It’s the single point of contact for clients. Load balancer distributes inbound flows that arrive at the load balancer’s front end to backend pool instances. These flows are according to configured load-balancing rules and health probes. The backend pool instances can be GCP Virtual Machines or instances in a virtual machine scale set.

public load balancer can provide outbound connections for virtual machines (VMs) inside your virtual network. These connections are accomplished by translating their private IP addresses to public IP addresses. Public Load Balancers are used to load balance internet traffic to your VMs.

An internal (or private) load balancer is used where private IPs are needed at the frontend only. Internal load balancers are used to load balance traffic inside a virtual network. A load balancer frontend can be accessed from an on-premises network in a hybrid scenario.

Build to run service included in the OTC

Build service pre-requisite
  • Refer to generic description.
 Build to run service
  • Refer to generic description.

RUN services included in the MRC

Run service pre-requisite
  • A referential file exists in the Git including the reference configuration of the load balancer.
  • This file can be executed with a CI/CD and the execution has been tested successfully.
Co-manage option

Yes, if CI/CD shared with the customer

KPI & alerts

Monitoring

Yes, Insights, Metrics, New metric possible with logs, Health probes

KPI monitored

L4/TCP

  • l3/external/rtt_latencies >= xms
  • l3/internal/rtt_latencies >= xms

L7 / HTTP(s)

  • https/backend_latencies >= xms
  • https/internal/backend_latencies >= xms

HTTP codes ratio (on demand)

  • https/backend_request_count response_code_class = 500 / https/backend_request_count response_code_class = 200 >= x%
  • https/backend_request_count response_code_class = 400 / https/backend_request_count response_code_class = 200 >= x%”

Alerts observed

L3/TCP

  • l3/external/rtt_latencies >= xms
  • l3/internal/rtt_latencies >= xms

L4 / HTTP(s)

  • https/backend_latencies >= xms
  • https/internal/backend_latencies >= xms

HTTP codes ratio (on demand)

  • https/backend_request_count response_code_class = 500 / https/backend_request_count response_code_class = 200 >= x%
  • https/backend_request_count response_code_class = 400 / https/backend_request_count response_code_class = 200 >= x%”
Backup and restore

Data backup and restore

Not applicable. Load balancer does not store data persistently.

Service restore

GCP SLA High Availability and Disaster Recovery inter-region

GCP ensures High Availability of the Load Balancer with standard SKU.

 

Maintaining a cross region Disaster Recovery requires specific design and subject to a specific additional charging.

Charging model

Work Unit
Per Load Balancer instance

Changes catalogue – in Tokens, per act

Changes examples Effort Impact on MRC
Setup / modify / delete URI 1 token
Change health probes / Add new backend 2 tokens
Other changes Estimation in tokens based on time spent

Cloud DNS

Description

Cloud DNS host your Domain Name System (DNS) domains in GCP. Cloud DNS offers both public zones and private managed DNS zones. A public zone is visible to the public internet, while a private zone is visible only from one or more Virtual Private Cloud (VPC) networks that you specify.

Build to run service included in the OT

Build service pre-requisite
  • Refer to generic description.
Build to run service
  • Refer to generic description.

RUN services included in the MRC

Run a managed Cloud DNS service is optional. Depending on Customer’s interest, the Customer may request the service. By default, there is no recurring task proposed on Cloud DNS service, but on demand changes and on demand investigations.

Run service pre-requisite
  • A referential file exists in the Git used by OBS which includes the reference configuration of the DNS.
  • This file can be executed with a CI/CD used by OBS and the execution has been tested successfully.
Co-manage option

For the Public part, OBS work with the customer for the publics domain naming context.

For the private Part, a RACI must be done.

 

KPI & alerts

Monitoring

Yes, Metrics,

KPI monitored

Number of changes in the DNS database.

Alerts observed

Number of changes in the DNS rules

 

Backup and restore

Data backup and restore

Yes. Backup is proposed based on regular export.

Service restore

The CI/CD chain is used to redeploy the records from a backup zone into the native DNS service or from an export

 

GCP SLA High Availability and Disaster Recovery inter-region

Cloud DNS is a high-performance, resilient, global Domain Name System (DNS) service that publishes your domain names to the global DNS.

In case of public DNS the customer should be responsible for the host mastering (registration)

Charging model

Work Unit
Per resource group

Changes catalogue – in Tokens, per act

Changes examples Effort
Create / update/ delete zone (one zone including reverse) 1 token
Create / update/ delete record (up to 10 records) 1 token
Zone delegation* 1 token
Configure Firewall DNS 2 tokens
Other changes Estimation in tokens based on time spent

 

Content Delivery Network (CDN)

Description

Google Cloud CDN is a fast, reliable, and secure content delivery network that ensures the delivery of data without any latency. Google Cloud CDN delivers content peers to peers securely over the cloud. Google Cloud CDN optimizes your static content on its fast and reliable servers for delivering your static assets quickly and efficiently and gives you the option to keep our data public or private. Through Google cloud CDN it allows to load very easily, faster and securely, the website of our organizations in a simple and secure way for customer as well as for us.

Build to run service included in the OTC

Build service pre-requisite
  • Refer to generic description.
Build to run service
  • Refer to generic description.

RUN services included in the MRC

Run a managed Cloud DNS service is optional. Mandatory if offer is Managed applications, optional if offer is managed infrastructure.

Run service pre-requisite
  • A referential file exists in the Git including the reference configuration of the CDN.
  • This file can be executed with a CI/CD and the execution has been tested successfully.
Co-manage option

Yes based on RACI determined during pre-sales or build.

KPI & alerts

Monitoring

Yes: Metrics and diagnostic logs

KPI monitored

  • Byte Hit Ratio
  • Request Count
  • Response Size
  • Total Latency
  • Customized ping page per zone

Alerts observed

  • Customized ping page per zone
  • Latency per zone,
  • log analysis on métrics
Backup and restore

Data backup and restore

Can be exported from CI/CD Pipeline.

Service restore

The Continuous Deployment chain is used to redeploy the CDN from the configuration file of reference for production environment committed in the Git.

GCP SLA High Availability and Disaster Recovery inter-region

 

Based on design SOW, the service can be built in multiple regions.

Charging model

Work Unit
Per Endpoint

 

1.3.5     Changes catalogue – in Tokens, per act

Changes examples Effort
Purge CDN 1 Token
Add URL 1 Token
Other changes Estimation in tokens based on time spent

Cloud NAT

Description

Cloud NAT is a distributed, software-defined managed service. Cloud NAT configures the Andromeda software that powers your Virtual Private Cloud (VPC) network so that it provides source network address translation (source NAT or SNAT) for VMs without external IP addresses. Cloud NAT also provides destination network address translation (destination NAT or DNAT) for established inbound response packets.

1.1.1     Description

Cloud NAT is a distributed, software-defined managed service. Cloud NAT configures the Andromeda software that powers your Virtual Private Cloud (VPC) network so that it provides source network address translation (source NAT or SNAT) for VMs without external IP addresses. Cloud NAT also provides destination network address translation (destination NAT or DNAT) for established inbound response packets.

1.1.2     Build to run service included in the OTC

1.1.2.1     Build service pre-requisite
  • Refer to generic description.
1.1.2.2     Build to run service
  • Refer to generic description.

1.1.3     RUN services included in the MRC

1.1.3.1     Run service pre-requisite
  • A referential file exists in the Git including the reference configuration of the Cloud NAT.
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.1.3.2     Co-manage option

No, OBS manages the Cloud NAT

1.1.3.3     KPI & alerts

Monitoring

Yes, Metrics,

KPI monitored

  • nat_allocation_failed = 1
  • dropped_sent_packets_count >= x%
  • dropped_received_packets_count >= x%

Alerts observed

  • nat_allocation_failed = 1
  • dropped_sent_packets_count >= x%
  • dropped_received_packets_count >= x%
1.1.3.4     Backup and restore

Data backup and restore

Can be exported from CI/CD Pipeline.

Service restore

The Continuous Deployment chain is used to redeploy the CDN from the configuration file of reference for production environment committed in the Git.

1.1.3.5     GCP SLA High Availability and Disaster Recovery inter-region

HA by design. Based on design SOW, the service can be built in multiple regions

1.1.4     Charging model

Work Unit
Per Endpoint

 

1.1.5     Changes catalogue – in Tokens, per act

Changes examples Effort
Create / update/ delete (including reverse) 1 token
Configure Firewall NAT 2 tokens
Other changes Estimation in tokens based on time spent

1.2     Cloud Router

1.2.1     Description

Cloud Router is a fully distributed and managed Google Cloud service that uses the Border Gateway Protocol (BGP) to advertise IP address ranges. It programs custom dynamic routes based on the BGP advertisements that it receives from a peer. Instead of a physical device or appliance, each Cloud Router is implemented by software tasks that act as BGP speakers and responders. A Cloud Router also serves as the control plane for Cloud NAT. Cloud Router provides BGP services for the following Google Cloud products:

1.2.2     Build to run service included in the OTC

1.2.2.1     Build service pre-requisite
  • Refer to generic description.
1.2.2.2     Build to run service
  • Refer to generic description.

1.2.3     RUN services included in the MRC

1.2.3.1     Run service pre-requisite
  • A referential file exists in the Git used by OBS which includes the reference configuration of the Cloud Router.
  • This file can be executed with a CI/CD used by OBS and the execution has been tested successfully.
1.2.3.2     Co-manage option

No, Orange Business Services manages the Cloud Router service.

1.2.3.3     KPI & alerts

Monitoring

Yes, Metrics, Logs, Probes

Cloud Router can be monitorized by using Cloud Monitoring using Alerts and Metrics. Realtime Native reporting from GCP (Cloud Monitoring, Cloud Logging) can be used by OBS and Specific reporting on quote.

KPI monitored

gcp.router.best_received_routes_count Current number of best routes received by router.
gcp.router.bgp.received_routes_count Current number of routes received on a bgp session.
gcp.router.bgp.sent_routes_count Current number of routes sent on a bgp session.
gcp.router.bgp.session_up Indicator for successful bgp session establishment.
gcp.router.bgp_sessions_down_count Number of BGP sessions on the router that are down.
gcp.router.bgp_sessions_up_count Number of BGP sessions on the router that are up.
gcp.router.nat.allocated_ports The number of ports allocated to all VMs by the NAT gateway
gcp.router.nat.closed_connections_count The number of connections to the NAT gateway that are closed
gcp.router.nat.dropped_received_packets_count The number of received packets dropped by the NAT gateway
gcp.router.nat.new_connections_count The number of new connections to the NAT gateway
gcp.router.nat.open_connections The number of connections open to the NAT gateway
gcp.router.nat.port_usage
(gauge)
The highest port usage among all VMs connected to the NAT gateway
gcp.router.nat.received_bytes_count The number of bytes received by the NAT gateway
gcp.router.nat.received_packets_count The number of packets received by the NAT gateway
gcp.router.nat.sent_bytes_count The number of bytes sent by the NAT gateway
gcp.router.nat.sent_packets_count The number of packets sent by the NAT gateway
gcp.router.router_up Router status, up or down
gcp.router.sent_routes_count Current number of routes sent by router.

Alerts observed

Orange Business Services will set alert depending on the SOW of the Customer.

1.2.3.4     Backup and restore

Data backup and restore

The backup is based on demand Export Template IaC

Service restore

Recovery will be from Infra as Code or by Orange Business Services Operation Team actions.

The Continuous Deployment chain is used to redeploy the Cloud Router service from the configuration file of reference for production environment committed in the Git.

1.2.3.5     GCP SLA High Availability and Disaster Recovery inter-region

HA and non HA are provided by Google Cloud Platform depending on the design and service parameter configuration

Recovery after regions loss is Based on design SOW, the service can be built in multiple regions.

1.2.4     Charging model

Work Unit
Per router

1.2.5     Changes catalogue – in Tokens, per act

Changes examples Effort
Modify/delete router

Simple modification router

1 token
Create router

Complex modification router

2 tokens
Other changes Estimation in tokens based on time spent

1.3     Cloud VPN

1.3.1     Description

Cloud VPN securely extends your peer network to Google’s network through an IPsec VPN tunnel. Traffic is encrypted and travels between the two networks over the public internet.

1.3.2     Build to run service included in the OTC

1.3.2.1     Build service pre-requisite
  • Refer to generic description.
1.3.2.2     Build to run service
  • Refer to generic description.

1.3.3     RUN services included in the MRC

1.3.3.1     Run service pre-requisite
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.3.3.2     Co-manage option

No, Orange Business Services manages the Cloud VPN service.

1.3.3.3     KPI & alerts

Monitoring

Yes, Metrics, Logs, Probes

Cloud VPN can be monitorized by using Cloud Monitoring using Alerts and Metrics. Realtime Native reporting from GCP (Cloud Monitoring, Cloud Logging) can be used by OBS and Specific reporting on quote.

KPI monitored

gcp.vpn.network.dropped_received_packets_count Ingress packets dropped for tunnel.
gcp.vpn.network.dropped_sent_packets_count Egress packets dropped for tunnel.
gcp.vpn.network.received_bytes_count Ingress bytes for tunnel.
gcp.vpn.network.sent_bytes_count Egress bytes for tunnel.
gcp.vpn.tunnel_established Indicates successful tunnel establishment if greater than 0.
gcp.router.best_received_routes_count Number of best routes received by router.
gcp.router.bgp.received_routes_count Number of routes received on a bgp session.
gcp.router.bgp.sent_routes_count Number of routes sent on a bgp session.
gcp.router.bgp.session_up Indicator for successful bgp session establishment.
gcp.router.bgp_sessions_down_count Number of BGP sessions on the router that are down.
gcp.router.bgp_sessions_up_count Number of BGP sessions on the router that are up.
gcp.router.router_up Router status up or down
gcp.router.sent_routes_count Number of routes sent by router.

Alerts observed

Orange Business Services will set alert depending on the SOW of the Customer.

1.3.3.4     Backup and restore

Data backup and restore

The backup is based on demand Export Template IaC

Service restore

Recovery will be from Infra as Code or by Orange Business Services Operation Team actions.

The Continuous Deployment chain is used to redeploy the Cloud VPN service from the configuration file of reference for production environment committed in the Git.

1.3.3.5     GCP SLA High Availability and Disaster Recovery inter-region

HA are provided by Google Cloud Platform by default.

HA VPN is a high-availability (HA) Cloud VPN solution that lets you securely connect your on-premises network to your VPC network through an IPsec VPN connection in a single region. HA VPN provides an SLA of 99.99% service availability.

Recovery after regions loss is Based on design SOW, the service can be built in multiple regions.

1.3.4     Charging model

Work Unit
Per Tunnel VPN

1.3.5     Changes catalogue – in Tokens, per act

Changes examples Effort
Modify/delete tunnel 1 token
Create tunnel 2 tokens
Other changes Estimation in tokens based on time spent

1.4     Cloud SQL

1.4.1     Description

Cloud SQL is a fully-managed database service that helps you set up, maintain, manage, and administer your relational databases on Google Cloud Platform. You can use Cloud SQL with MySQL, PostgreSQL, or SQL Server. Cloud SQL provides a cloud-based alternative to local MySQL, PostgreSQL, and SQL Server databases. Many applications running on Compute Engine, App Engine and other services in Google Cloud use Cloud SQL for database storage.

Each Cloud SQL instance is powered by a virtual machine (VM) running on a host Google Cloud server. Each VM operates the database program, such as MySQL Server, PostgreSQL, or SQL Server, and service agents that provide supporting services, such as logging and monitoring.

1.4.2     Build to run service included in the OTC

1.4.2.1     Build service pre-requisite
  • Refer to generic description.
1.4.2.2     Build to run service
  • Refer to generic description.

1.4.3     RUN services included in the MRC

1.4.3.1     Run service pre-requisite
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.4.3.2     Co-manage option

Yes if CI/CD shared with the customer (IaC Part)

1.4.3.3     KPI & alerts

Monitoring

Yes, Metrics, SlowQuery Log (MySQL)

KPI monitored

  • CPU utilization
  • Storage usage
  • Memory usage
  • Read/write operations
  • Ingress/Egress bytes
  • MySQL queries
  • MySQL questions
  • Read/write InnoDB pages
  • InnoDB data fsyncs
  • InnoDB log fsyncs
  • Active connections

Alerts observed

  • CPU and memory utilization
  • Disk utilization
  • MySQL connections
  • Auto-failover requests and replication lag
1.4.3.4     Backup and restore

Data backup and restore

The backup is based on regular export.

Service restore

Recovery will be from Infra as Code + Backup of the data.

1.4.3.5     GCP SLA High Availability and Disaster Recovery inter-region

HA and non HA are provided by Google Cloud Platform depending on the design and service parameter configuration

Recovery after regions loss is Based on design SOW, the service can be built in multiple regions.

1.4.4     Charging model

Work Unit
Per Instance

1.4.5     Changes catalogue – in Tokens, per act

Changes examples Effort
Create / update/ delete instance

Create/update/delete Database (MySQL, MySQL, PostgreSQL, or SQL Server)

Run script SQL

1 token
Clonage Database 2 tokens
Other changes Estimation in tokens based on time spent

1.5     Cloud Storage

1.5.1     Description

Google Cloud Storage is a RESTful online file storage web service for storing and accessing data on Google Cloud Platform infrastructure. The service combines the performance and scalability of Google’s cloud with advanced security and sharing capabilities.

1.5.2     Build to run service included in the OTC

1.5.2.1     Build service pre-requisite
  • Refer to generic description.
1.5.2.2     Build to run service
  • Refer to generic description.
  • In addition, build to run service for Cloud Storage service will include lifecycle rules, IAM policies.

1.5.3     RUN services included in the MRC

Run a managed Cloud Storage service is optional. Depending on Customer’s interest in monitoring the storage KPIs, in alerting based on KPIs, the Customer may request the service. By default, there is no recurring task proposed on Cloud Storage service, but on demand changes and on demand investigations.

1.5.3.1     Run service pre-requisite
  • A referential file exists in the Git including the reference configuration of the Cloud Storage service.
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.5.3.2     Co-manage option

Yes

1.5.3.3     KPI & alerts

Monitoring

Yes, Metrics

Cloud Storage service is monitored through Cloud Monitoring. Orange Business Services will examines Cloud Storage usage (e.g., how many bytes are stored, how many download requests are coming from your applications) and will set alerts according to your SOW.

Orange Business Service will collect metrics from Google Storage to:

  • Visualize the performance of your Storage services
  • Correlate the performance of your Storage services with your applications

Métriques

gcp.storage.api.request_count The number of API calls
gcp.storage.authn.authentication_count The number of HMAC/RSA signed requests
gcp.storage.authz.acl_based_object_access_count The number of requests that result in an object being granted access solely due to object ACLs.
gcp.storage.authz.acl_operations_count The usage of ACL operations
gcp.storage.authz.object_specific_acl_mutation_count The number of changes made to object specific ACLs
gcp.storage.network.received_bytes_count The number of bytes received over the network
gcp.storage.network.sent_bytes_count The number of bytes sent over the network
gcp.storage.storage.object_count The total number of objects per bucket
gcp.storage.storage.total_byte_seconds The total daily storage in byte seconds used
gcp.storage.storage.total_bytes The total size of all objects in the bucket
1.5.3.4     Backup and restore

Data backup and restore

No backup.

Service restore

Recovery will be from Infra as Code + Backup of the data.

1.5.3.5     GCP SLA High Availability and Disaster Recovery inter-region

HA and non HA are provided by Google Cloud Platform by default for Cloud Storage service.

1.5.4     Charging model

Work Unit
Per Bucket

1.5.5     Changes catalogue – in Tokens, per act

Changes examples Effort
Modify life cycle rules/ Chargement de données 1 token
Bucket synchronization 2 tokens
Other changes Estimation in tokens based on time spent

1.6     Storage Transfer Service

1.6.1     Description

Storage Transfer Service is a Google Cloud product that enables you to:

  • Move or backup data to a Cloud Storage bucket either from other cloud storage providers or from your on-premises
  • Move data from one Cloud Storage bucket to another, so that it is available to different groups of users or applications.
  • Periodically move data as part of a data processing pipeline or analytical workflow.

With Storage Transfer Service, you can transfer data from other clouds, HTTP(S) and filesystems in private data centers, as well as transfer data between Google Cloud Storage buckets.

1.6.2     Build to run service included in the OTC

1.6.2.1     Build service pre-requisite
  • Refer to generic description.
1.6.2.2     Build to run service
  • Refer to generic description.

1.6.3     RUN services included in the MRC

Run a managed Storage Transfer Service is optional. Depending on Customer’s interest in monitoring the storage KPIs, in alerting based on KPIs, the Customer may request the service. By default, there is no recurring task proposed on Storage Transfer Service, but on demand changes and on demand investigations.

1.6.3.1     Run service pre-requisite
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.6.3.2     Co-manage option

No, Orange Business Services fully managed OBS manages the Storage Transfer Service.

1.6.3.3     KPI & alerts

Monitoring

Yes, Metrics, Logs, Probes

KPI monitored

  • CPU
  • Disk
  • HTTP request and response status
  • Memory
  • Network
  • Number of active instances

Alerts observed

  • CPU
  • Disk
  • HTTP request and response status
  • Memory
  • Network
  • Number of active instances
1.6.3.4     Backup and restore

Data backup and restore

The backup is based on demand Export Template IaC.

Using Google data transfer services you can easily backup data from another cloud storage provider to Google Storage Transfer Service.

Service restore

Recovery will be from Infra as Code + Backup of the data.

1.6.3.5     GCP SLA High Availability and Disaster Recovery inter-region

HA and non HA are provided by Google Cloud Platform by default.

1.6.4     Charging model

Work Unit
Per Job

1.6.5     Changes catalogue – in Tokens, per act

Changes examples Effort
Modify/delete Job 1 token
Create Job 2 tokens
Other changes Estimation in tokens based on time spent

1.7     Google Kubernetes Engine (Std)

1.7.1     Description

Google Kubernetes Engine (GKE) is a Google Cloud Platform (GCP) service. It is a hosted platform that allows you to run and orchestrate containerized applications. GKE manages Docker containers deployed on a cluster of machines.

GKE offers two modes of operation:

  • Standard: You manage the underlying infrastructure of the cluster, which provides greater flexibility in configuring nodes.
  • Autopilot: Google provisions and manages all of the underlying cluster infrastructure, including nodes and node pools. This gives you a cluster that is optimized for autonomous operation.

1.7.2     Build to run service included in the OTC

1.7.2.1     Build service pre-requisite
  • Refer to generic description.
1.7.2.2     Build to run service
  • Refer to generic description.

1.7.3     RUN services included in the MRC

1.7.3.1     Run service pre-requisite
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.7.3.2     Co-manage option

Yes if CI/CD shared with the customer KPI & alerts

Monitoring

Yes, Insights, Metrics, logs, Health probes.

Orange Business Services will collect metrics from Docker, Kubernetes, and your containerized applications

KPI monitored

  • Disk I/O
  • CPU and memory usage
  • Container and pod events
  • Network throughput
  • Individual request traces

Alerts observed

  • Disk I/O
  • CPU and memory usage
  • Container and pod events
  • Network throughput
1.7.3.3     Backup and restore

Data backup and restore

The backup is based on backup of IaC + resources k8s + data

Service restore

Recovery will be from Infra as Code + Backup of the data.

1.7.3.4     GCP SLA High Availability and Disaster Recovery inter-region

HA and non HA are provided by Google Cloud Platform depending on the design and service parameter configuration.

Recovery is based on design SOW, need actions from Operation teams of Orange Business Services.

1.7.4     Charging model

Work Unit
Per Cluster

1.7.5     Changes catalogue – in Tokens, per act

Changes examples Effort
Add/delete node 1 token
Update Cluster 2 tokens
Modify network ranges

Modify autoscaling parameters

4 tokens
Other changes Estimation in tokens based on time spent

1.8     Google Kubernetes Engine (Autopilot)

1.8.1     Description

Google Kubernetes Engine (GKE) is a Google Cloud Platform (GCP) service. It is a hosted platform that allows you to run and orchestrate containerized applications. GKE manages Docker containers deployed on a cluster of machines.

GKE offers two modes of operation:

  • Standard: You manage the underlying infrastructure of the cluster, which provides greater flexibility in configuring nodes.
  • Autopilot: Google provisions and manages all of the underlying cluster infrastructure, including nodes and node pools. This gives you a cluster that is optimized for autonomous operation.

1.8.2     Build to run service included in the OTC

1.8.2.1     Build service pre-requisite
  • Refer to generic description.
1.8.2.2     Build to run service
  • Refer to generic description.

1.8.3     RUN services included in the MRC

1.8.3.1     Run service pre-requisite
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.8.3.2     Co-manage option

Yes if CI/CD shared with the customer KPI & alerts

Monitoring

Yes, Insights, Metrics, logs, Health probes.

Orange Business Services will collect metrics from Docker, Kubernetes, and your containerized applications

KPI monitored

  • Disk I/O
  • CPU and memory usage
  • Container and pod events
  • Network throughput
  • Individual request traces

Alerts observed

  • Disk I/O
  • CPU and memory usage
  • Container and pod events
  • Network throughput
1.8.3.3     Backup and restore

Data backup and restore

The backup is based on backup of IaC + resources k8s + data

Service restore

Recovery will be from Infra as Code + Backup of the data.

1.8.3.4     GCP SLA High Availability and Disaster Recovery inter-region

HA and non HA are provided by Google Cloud Platform depending on the design and service parameter configuration.

Recovery is based on design SOW, need actions from Operation teams of Orange Business Services.

1.8.4     Charging model

Work Unit
Per Cluster

1.8.5     Changes catalogue – in Tokens, per act

Changes examples Effort
Force update cluster 1 token
Other changes Estimation in tokens based on time spent

1.9     Compute Engine

1.9.1     Description

The Managed Service for Compute Engine is called Managed OS. OBS manages both the OS and the Compute Engine.

Orange Business Services can managed service units like OS, Middleware, Database in the Managed Compute Engine.

4 possible Managed services:

  • Managed OS only
  • Managed OS + Managed MW
  • Managed OS + Managed DB
  • Managed OS + Managed MW + Managed DB

Compute Engine is a computing and hosting service that lets you create and run virtual machines on Google infrastructure. Compute Engine offers scale, performance, and value that lets you easily launch large compute clusters on Google’s infrastructure.

1.9.2     Build to run service included in the OTC

1.9.2.1     Build service pre-requisite
  • Refer to generic description.
1.9.2.2     Build to run service
  • Refer to generic description.

1.9.3     RUN services included in the MRC

1.9.3.1     Run service pre-requisite
  • A referential file exists in the Git including the reference configuration of the Compute Engine.
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.9.3.2     Co-manage option

Yes but need to be careful to the RACI between OBS & Customer

1.9.3.3     KPI & alerts

Monitoring is performed through configuration and activation of Cloud Monitoring.

OBS backend supervision system is collecting alerts from Cloud Monitoring & Cloud Logging.

Monitoring

Yes, Insights, Metrics, logs, Health probes.

Metrics do not require installation of the Monitoring or Logging agent, but you must enable the Container-Optimized OS Health Monitoring feature.

KPI monitored for Instances:

  • CPU Utilization
  • Count of disk read/write bytes
  • Count of disk read/write operations
  • Count of throttled read/write operations
  • Count of sent bytes/received bytes
  • Count of incoming bytes dropped due to firewall policy
  • Count of incoming packets dropped due to firewall policy

Alerts observed:

Alert on CPU, Memory Usage and Disk Usage.

Project metrics:

Like most cloud service providers, Google Compute Engine has limits on the number of resources a project may consume. If the customer are approaching (or have reached) his quota for a specific resource, OBS will tune the quota metrics for the customer if needed.

Activating Detailed Monitoring will be charged by GCP.

  • OS patching

GCP VM Manager

For managed OS, OBS leverages GCP VM Manager for the patching of the Operating System (OS).

Behavior: With GCP VM Manager, patches are decided by Google and all patches are to be applied if mandatory for the Compute Engine for Windows and Linux.

Additional reporting could be asked by the Customer and extra fees will be charged.

  • Antivirus

For managed OS, OBS leverages its central anti-virus system based on Sophos. This requires the installation of the anti-virus agent on the OS for each Compute Engine as well as the VPN connectivity to OBS Centralized Administration Zone. OBS systems allows for central reporting on Malware from its backend console system.

 

Would the Customer desire to keep its own Antivirus system, then OBS shall not be taken responsible for protection against viruses.

 

  • Backup and restore

Data backup and restore

By default, OBS leverages GCBDR on the Compute Engine for Managed OS. The configuration of GCBDR pattern as well as retention period shall be agreed with the Customer prior to the RUN. The first backup is full. The following backups are incremental. You can the frequency of the backup. As example: 1 x backup per week, 1x incremental backup per day per Compute Engine. The retention period depends on customer request. GCP charges will be calculated based on change rate.

Restore of Compute Engine are performed from the backup.

  • In case of incident, latest version of backup can be restored
  • Upon change request, a previous version of backup can be restored.
    • GCP SLA High Availability and Disaster Recovery inter-region

Service is Highly Available within a single Availability Zone. HA can be configured using instance group.

Multi-Availability Zones design requires specific design and subject to a specific additional charging.

This service is covered by GCBDR which enables the creation of backup copies across GCP Regions.

If this option is activated, traffic between regions and storage will be charged by GCP.

  • Administration tasks tracing

Actions performed by OBS managed teams on the managed OS are done from OBS Administration Zone through an access controlled by a CyberArk bastion. OBS CyberArk bastion protects the access and keep trace of the actions performed by the maintenance team allowing for audit.

The VPN connectivity to the OBS Administration Zone necessary for the management.

  • Login on to the Virtual Machine

For Windows OS based Compute Engine, access shall be granted by the Customer to OBS managed application operations staff through a domain account configured with proper privilege groups.

For Linux OS based Compute Engine, an encrypted key is created and provided to OBS managed application operations staff to log onto the VM.

For Applications, in case of managed application: a secret stored in a safe.

  • Logs

Log management is not included in the managed OS / managed Compute Engine service.

Optionally it can be activated through GCP Cloud Logging through Change Request process.

  • Security

By default, the MRC includes the use of security policies and groups as per customer’s configuration request.

The MRC does not cover security recommendations. Security recommendations can be part of an optional security scope of work based on customer request.

 

  • Limitations

Managed Applications services is provided only for OS versions supported by the CSP vendor.

1.9.4     Charging model

Work Unit
Per Virtual Machine instance

1.9.5     Changes catalogue – in Tokens, per act

Changes examples Effort
Create a Virtual Machine 2 Tokens
Attach a Disk to a Virtual Machine 2 Tokens
Restore a Virtual Machine from a snapshot 1 Token
Backup a Virtual Machine 1 Token
Create and Deploy VMs in a Instance Group 2 Tokens
Start/Stop/Restart Virtual Machine 2 Tokens
Create/modify/delete Storage Accounts 2 Tokens
Other changes Estimation in tokens based on time spent

1.10 Virtual Private Cloud

1.10.1  Description

Virtual Private Cloud (VPC) provides networking functionality to Compute Engine virtual machine (VM) instances, Google Kubernetes Engine (GKE) clusters, and the App Engine flexible environment. VPC provides networking for your cloud-based resources and services that is global, scalable, and flexible.

A VPC network is a global resource which consists of a list of regional virtual subnetworks (subnets) in data centers, all connected by a global wide area network. VPC networks are logically isolated from each other in the Google Cloud Platform.

At the basic level, managing Virtual Private Cloud consists in building, deploying, and maintaining the Infra as Code for it and managing the changes.

OBS has 2 prices for Managed Virtual Private Cloud depending on the number of subnets of the customer projects:

  • VPC with 1 to 2 subnets
  • VPC with more then 3 subnets

The management of Virtual Private Cloud is included as part of a larger bundle of Network and Security Managed services which provides network and security design, maintenance, network watching, intrusion detection, troubleshooting depending on an agreed Scope of Work.

1.10.2  Build to run service included in the OTC

1.10.2.1  Build service pre-requisite
  • Refer to generic description.
1.10.2.2  Build to run service
  • Refer to generic description.

1.10.3  RUN services included in the MRC

1.10.3.1  Run service pre-requisite
  • A referential file exists in the Git including the reference configuration of the Virtual Private Cloud
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.10.3.2  Co-manage option

No, Orange Business Services manages the Virtual Private Cloud service.

1.10.3.3  KPI & alerts

Monitoring

Yes, Metrics, Logs (option)

Alerts observed:

Packet loss, up/down network

1.10.3.4  Backup and restore

Data backup and restore

Can be exported from Infra as Code.

Service restore

Recovery will be from Infra as Code + Backup

1.10.3.5  GCP SLA High Availability and Disaster Recovery inter-region

HA by design.

No Recovery after region loss, need to run the IaC on another region only for subnet

1.10.4  Charging model

Work Unit
Per Virtual Private Cloud instance

1.10.5  Changes catalogue – in Tokens, per act

Changes examples Effort
Add subnet/add range IP on subnet/reservation of static address 1 token
Creation network peering 2 tokens
Other changes Estimation in tokens based on time spent

1.11 Persistent Disk

1.11.1  Description

Persistent disks are durable network storage devices that your instances can access like physical disks in a desktop or a server. The data on each persistent disk is distributed across several physical disks.

Store data from VM instances running in Compute Engine or GKE, Persistent Disk is an Google’s Cloud block storage offering.

OBS proposed 4 types of Persistent Disk:

  1. Managed Standard Persistent Disk
  2. Managed Balanced Persistent Disk
  3. Managed SSD Persistent Disk
  4. Managed Extreme Persistent Disk

1.11.2  Build to run service included in the OTC

1.11.2.1  Build service pre-requisite
  • Refer to generic description.
1.11.2.2  Build to run service
  • Refer to generic description.

1.11.3  RUN services included in the MRC

Run a managed Persistent Disk service is optional. Depending on Customer’s mandatory if Persistent Disk is attached to managed services, the Customer may request the service.

1.11.3.1  Run service pre-requisite
  • A referential file exists in the Git including the reference configuration of the Persistent Disk.
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.11.3.2  Co-manage option

No, Orange Business Services manages the Persistent Disk service.

1.11.3.3  KPI & alerts

Monitoring

Yes, Metrics

Persistent Disk service is monitored through Cloud Monitoring. Orange Business Services will examines  Persistent Disk usage (e.g., how many bytes are stored, how many download requests are coming from your applications) and will set alerts according to your SOW.

Orange Business Service will collect metrics from Cloud Monitoring to:

  • Graph multiple persistent disk performance metrics with Metrics Explorerpage
  • Graph average IOPS by using the Disk read operationsmetric
  • Graph average throughput rates by using the Disk read bytes metric
  • Graph maximum per second read operations by using the Peak disk read operationsmetric
  • Graph average throttled operations rates by using the Throttled read operationsmetric
  • Graph average throttled bytes rates by using the Throttled read bytesmetric
1.11.3.4  Backup and restore

Data backup and restore

Backup of Iac + Disk + Data

Service restore

Recovery will be from Infra as Code + Backup of the data.

1.11.3.5  GCP SLA High Availability and Disaster Recovery inter-region

HA by design but not DR by design.

Regional  Persistent Disk depending on application need, need to run the IaC on another region and restore (option)

1.11.4  Charging model

Work Unit
Per Disk

1.11.5  Changes catalogue – in Tokens, per act

Changes examples Effort
Create Disk

Attach Disk to a VM

1 token
Extend Disk

Mount/Format Disk

2 tokens
Enable Encryption 4 tokens
Other changes Estimation in tokens based on time spent

1.12 Cloud Interconnect

1.12.1  Description

Cloud Interconnect provides low latency, high availability connections that enable you to reliably transfer data between your on-premises and Google Cloud Virtual Private Cloud (VPC) networks. Also, Interconnect connections provide internal IP address communication, which means internal IP addresses are directly accessible from both networks.

Cloud Interconnect offers two options for extending your on-premises network:

  • Dedicated Interconnect provides a direct physical connection between your on-premises network and Google’s network.
  • Partner Interconnect provides connectivity between your on-premises and VPC networks through a supported service provider.

1.12.2  Build to run service included in the OTC

1.12.2.1  Build service pre-requisite
  • Refer to generic description.
1.12.2.2  Build to run service
  • Refer to generic description.

1.12.3  RUN services included in the MRC

1.12.3.1  Run service pre-requisite
  • A referential file exists in the Git including the reference configuration of the Cloud Interconnect service.
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.12.3.2  Co-manage option

No, Orange Business Services manages the Cloud Interconnect service.

1.12.3.3  KPI & alerts

Monitoring

Yes, Insights, Metrics, Health probes

Metric

gcp.interconnect.network.attachment.capacity Network capacity of the attachment
gcp.interconnect.network.attachment.received_bytes_count Number of inbound bytes received.
gcp.interconnect.network.attachment.received_packets_count Number of inbound packets received.
gcp.interconnect.network.attachment.sent_bytes_count Number of outbound bytes sent.
gcp.interconnect.network.attachment.sent_packets_count Number of outbound packets sent.
gcp.interconnect.network.interconnect.capacity Active capacity of the interconnect.
gcp.interconnect.network.interconnect.dropped_packets_count Number of outbound packets dropped due to link congestion.
gcp.interconnect.network.interconnect.link.operational Whether the operational status of the circuit is up.
gcp.interconnect.network.interconnect.link.rx_power Light level received over physical circuit.
gcp.interconnect.network.interconnect.link.tx_power Light level transmitted over physical circuit.
gcp.interconnect.network.interconnect.operational Whether the operational status of the interconnect is up.
gcp.interconnect.network.interconnect.receive_errors_count Number of errors encountered while receiving packets.
gcp.interconnect.network.interconnect.received_bytes_count Number of inbound bytes received.
gcp.interconnect.network.interconnect.received_unicast_packets_count Number of inbound unicast packets received.
gcp.interconnect.network.interconnect.send_errors_count Number of errors encountered while sending packets.
Shown as error
gcp.interconnect.network.interconnect.sent_bytes_count Number of outbound bytes sent.
gcp.interconnect.network.interconnect.sent_unicast_packets_count Number of outbound unicast packets sent.
1.12.3.4  Backup and restore

Data backup and restore

Backup of Iac

Service restore

Recovery will be from Infra as Code and actions from Operation Team.

1.12.3.5  GCP SLA High Availability and Disaster Recovery inter-region

HA (SLA 99,9% or 99,99%) by design depending of the chosen options.

Recovery after region loss based on WAN Architecture requirement from the customer.

1.12.4  Charging model

Work Unit
Per Cloud Interconnect

1.12.5  Changes catalogue – in Tokens, per act

Changes examples Effort
Disable my interconnect connection 1 token
Restrict interconnect usage 2 tokens
Create interco with customer configuration > 9 tokens
Other changes Estimation in tokens based on time spent

1.13 Big Query

1.13.1  Description

Google BigQuery is a Data Warehouse designed to allow companies to perform SQL queries very quickly thanks to the processing power of the Google Cloud infrastructure. Thus, it is part of the Infrastructure as a Cloud Service (IaaS) family. Designed for Big Data, this platform can analyze billions of rows of data.

Google BigQuery is the Big Data analysis platform offered by Google via the Cloud.

1.13.2  Build to run service included in the OTC

1.13.2.1  Build service pre-requisite
  • Refer to generic description.
  • Interaction loop necessary with the customer at each Build
1.13.2.2  Build to run service
  • Refer to generic description.

1.13.3  RUN services included in the MRC

1.13.3.1  Run service pre-requisite
  • A referential file exists in the Git including the reference configuration of the BigQuery service.
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.13.3.2  Co-manage option

No by default, IAC is fully managed by OBS, we are responsible of the CI/CD up to the dataset (the customer can have access to the tables modifications case by case. The requests for table changes are through tokens.

1.13.3.3  KPI & alerts

Monitoring

Yes, Metrics, Logs

Alerts observed:

 

Alerts on KPI customer per customer :

  • Slot usage
  • Job Concurrency
  • Job performance
  • Failed jobs
  • Bytes processed by default in BigQuery
1.13.3.4  Backup and restore

Data backup and restore

Yes, Template IaC, Backup Regional Tables

Service restore

Recovery from Snapshot – Log – Ingestion Code –

1.13.3.5  GCP SLA High Availability and Disaster Recovery inter-region

HA and non HA are provided by Google Cloud Platform by default for BigQuery service.

BigQuery does not automatically provide a backup or replica of your data in another geographic region. You can create cross-region dataset copies to enhance your disaster recovery strategy.”

1.13.4  Charging model

Work Unit
Per Table

1.13.5  Changes catalogue – in Tokens, per act

Changes examples Effort
Create table/modify table/delete table

Add/modify/update/delete user with policies

Copy table

1 token
Charge data from a bucket 2 tokens
Other changes Estimation in tokens based on time spent

1.14 Pub/Sub

1.14.1  Description

Create scalable messaging and ingestion for event-driven systems and streaming analytics. Ingest events for streaming into BigQuery, data lakes or operational databases.

Pub/Sub offers a broader range of features, per-message parallelism, global routing, and automatically scaling resource capacity.

Pub/Sub allows services to communicate asynchronously, with latencies on the order of 100 milliseconds. Pub/Sub is used for streaming analytics and data integration pipelines to ingest and distribute data. It is equally effective as a messaging- oriented middleware for service integration or as a queue to parallelize tasks.

1.14.2  Build to run service included in the OTC

1.14.2.1  Build service pre-requisite
  • Refer to generic description.
1.14.2.2  Build to run service
  • Refer to generic description.

1.14.3  RUN services included in the MRC

1.14.3.1  Run service pre-requisite
  • A referential file exists in the Git including the reference configuration of the Pub/Sub service.
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.14.3.2  Co-manage option

No by default,  Iac is fully managed by Orange Business Services.

1.14.3.3  KPI & alerts

Monitoring

Yes, Metrics

Monitoring/alarm on  :

  • Publisher Status,
  • Troughput,
  • Publish Requests size,
  • Topic,
  • Access right

Alerts observed:

 

Alerts on KPI customer per customer :

  • pubsub_snapshot
  • pubsub_subscription
  • pubsub_topic
1.14.3.4  Backup and restore

Data backup and restore

Yes, from IaC and snapshot.

Service restore

Recovery will be from snapshot.

1.14.3.5  GCP SLA High Availability and Disaster Recovery inter-region

HA are provided by Google Cloud Platform by default for Pub/Sub service. Pub/Sub is global/multi-regional with SLAs guaranteed by Google. For the highest degree of redundancy OBS can create Pub/Sub publisher clients in different GCP regions. Pub/Sub keeps any given message in a single region, although, replicated across zones

1.14.4  Charging model

Work Unit
Per instance

1.14.5  Changes catalogue – in Tokens, per act

Changes examples Effort
Create/modify/delete instance 1 token
Create snapshot msg 2 tokens
Other changes Estimation in tokens based on time spent

1.15 Pub/Sub Lite

1.15.1  Description

Pub/Sub and Pub/Sub Lite are both horizontally scalable and managed messaging services. Pub/Sub is usually the default solution for most application integration and analytics use cases. Pub/Sub Lite is only recommended for applications where achieving extremely low cost justifies some additional operational work.

Pub/Sub Lite is a cost-effective solution that trades off operational workload, availability, and features for cost efficiency. Pub/Sub Lite requires you to manually reserve and manage resource capacity. Within Pub/Sub Lite, you can choose either zonal or regional Lite topics. Regional Lite topics offer the same availability SLA as Pub/Sub topics. However, there are reliability differences between the two services in terms of message replication.

1.15.2  Build to run service included in the OTC

1.15.2.1  Build service pre-requisite
  • Refer to generic description.
1.15.2.2  Build to run service
  • Refer to generic description.

1.15.3  RUN services included in the MRC

1.15.3.1  Run service pre-requisite
  • A referential file exists in the Git including the reference configuration of the Pub/Sub Lite service.
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.15.3.2  Co-manage option

No by default,  Iac is fully managed by Orange Business Services.

1.15.3.3  KPI & alerts

Monitoring

Yes, Metrics

Monitoring/alarm on  :

  • Publisher Status,
  • Troughput,
  • Publish Requests size,
  • Reservation

Alerts observed:

 

Alerts on KPI customer per customer :

  • pubsublite_reservation
  • pubsublite_subscription_partition
  • pubsublite_topic_partition
1.15.3.4  Backup and restore

Data backup and restore

Yes, from IaC and snapshot.

Service restore

Recovery will be from snapshot.

1.15.3.5  GCP SLA High Availability and Disaster Recovery inter-region

HA are provided by Google Cloud Platform by default for Pub/Sub Lite service with less resiliency & low reliability then Pub/Sub Lite service.

1.15.4  Charging model

Work Unit
Per instance

1.15.5  Changes catalogue – in Tokens, per act

Changes examples Effort
Create/modify/delete instance

Reservation gestion

Throughput capacity

1 token
Create snapshot msg 2 tokens
Other changes Estimation in tokens based on time spent

1.16 Dataproc

1.16.1  Description

Dataproc is a managed Spark and Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning. Dataproc automation helps you create clusters quickly, manage them easily, and save money by turning clusters off when you don’t need them. With less time and money spent on administration, you can focus on your jobs and your data.

1.16.2  Build to run service included in the OTC

1.16.2.1  Build service pre-requisite
  • Refer to generic description.
1.16.2.2  Build to run service
  • Refer to generic description.

1.16.3  RUN services included in the MRC

1.16.3.1  Run service pre-requisite
  • A referential file exists in the Git including the reference configuration of the Dataproc service.
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.16.3.2  Co-manage option

No by default,  Iac is fully managed by Orange Business Services.

1.16.3.3  KPI & alerts

Monitoring

Yes, Metrics

Metric

gcp.dataproc.cluster.hdfs.datanodes Indicates the number of HDFS DataNodes that are running inside a cluster.
gcp.dataproc.cluster.hdfs.storage_capacity Indicates capacity of HDFS system running on cluster in GB.
gcp.dataproc.cluster.hdfs.storage_utilization The percentage of HDFS storage currently used.
gcp.dataproc.cluster.hdfs.unhealthy_blocks Indicates the number of unhealthy blocks inside the cluster.
gcp.dataproc.cluster.job.completion_time.avg The time jobs took to complete from the time the user submits a job to the time Dataproc reports it is completed.
gcp.dataproc.cluster.job.completion_time.samplecount Sample count for cluster job completion time
gcp.dataproc.cluster.job.completion_time.sumsqdev Sum of squared deviation for cluster job completion time
gcp.dataproc.cluster.job.duration.avg The time jobs have spent in a given state.
gcp.dataproc.cluster.job.duration.samplecount Sample count for cluster job duration
gcp.dataproc.cluster.job.duration.sumsqdev Sum of squared deviation for cluster job duration
gcp.dataproc.cluster.job.failed_count Indicates the number of jobs that have failed on a cluster.
gcp.dataproc.cluster.job.running_count Indicates the number of jobs that are running on a cluster.
gcp.dataproc.cluster.job.submitted_count Indicates the number of jobs that have been submitted to a cluster.
gcp.dataproc.cluster.operation.completion_time.avg The time operations took to complete from the time the user submits a operation to the time Dataproc reports it is completed.
gcp.dataproc.cluster.operation.completion_time.samplecount Sample count for cluster operation completion time
gcp.dataproc.cluster.operation.completion_time.sumsqdev Sum of squared deviation for cluster operation completion time
gcp.dataproc.cluster.operation.duration.avg The time operations have spent in a given state.
gcp.dataproc.cluster.operation.duration.samplecount Sample count for cluster operation duration
gcp.dataproc.cluster.operation.duration.sumsqdev Sum of squared deviation for cluster operation duration
gcp.dataproc.cluster.operation.failed_count Indicates the number of operations that have failed on a cluster.
gcp.dataproc.cluster.operation.running_count Indicates the number of operations that are running on a cluster.
gcp.dataproc.cluster.operation.submitted_count Indicates the number of operations that have been submitted to a cluster.
gcp.dataproc.cluster.yarn.allocated_memory_percentage The percentage of YARN memory is allocated.
gcp.dataproc.cluster.yarn.apps Indicates the number of active YARN applications.
gcp.dataproc.cluster.yarn.containers Indicates the number of YARN containers.
gcp.dataproc.cluster.yarn.memory_size Indicates the YARN memory size in GB.
gcp.dataproc.cluster.yarn.nodemanagers Indicates the number of YARN NodeManagers running inside cluster.
gcp.dataproc.cluster.yarn.pending_memory_size The current memory request, in GB, that is pending to be fulfilled by the scheduler.
gcp.dataproc.cluster.yarn.virtual_cores Indicates the number of virtual cores in YARN.
1.16.3.4  Backup and restore

Data backup and restore

Yes, from IaC.

Service restore

Recovery will be from Infra as Code

1.16.3.5  GCP SLA High Availability and Disaster Recovery inter-region

Standard, Single node and HA  are provided by Google Cloud Platform for Dataproc service.

1.16.4  Charging model

Work Unit
Per Cluster

1.16.5  Changes catalogue – in Tokens, per act

Changes examples Effort
Create/delete cluster 1 token
Bench/config cluster 4 tokens
Other changes Estimation in tokens based on time spent

1.17 Dataflow

1.17.1  Description

Google Cloud Dataflow is a fully managed service for executing Apache Beam pipelines within the Google Cloud Platform ecosystem.

1.17.2  Build to run service included in the OTC

1.17.2.1  Build service pre-requisite
  • Refer to generic description.
1.17.2.2  Build to run service
  • Refer to generic description.

1.17.3  RUN services included in the MRC

1.17.3.1  Run service pre-requisite
  • A referential file exists in the Git including the reference configuration of the Dataflow service.
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.17.3.2  Co-manage option

No by default,  Iac is fully managed by Orange Business Services.

1.17.3.3  KPI & alerts

Monitoring

Yes, Metrics, Lgs

Overview metrics:

 

  • Autoscaling
  • Throughput
  • CPU utilization
  • Worker error log count

Streaming metrics (streaming pipelines only):

 

  • Data freshness (with and without Streaming Engine)
  • System latency (with and without Streaming Engine)
  • Backlog bytes (with and without Streaming Engine)
  • Parallelism (Streaming Engine only)
  • Duplicates (Streaming Engine only)

Input metrics:

  • Pub/Sub read, BigQuery read, etc.

Output metrics:

  • Pub/Sub write, BigQuery write, etc.
1.17.3.4  Backup and restore

Data backup and restore

Yes, From Iac + Backup Pipeline by Customer

Service restore

Recovery From Iac or by Operation Team actions (Restoration). ingestion by the Customer or by OBS with procedure

1.17.3.5  GCP SLA High Availability and Disaster Recovery inter-region

Not HA by design for Dataflow service.

Dataflow does not automatically provide a backup or replica of your data in another geographic region ==> need actions from Operation teams.

 

If there are no grouping/time-windowing operations, a failover to another Dataflow job in another zone or region by reusing the subscription leads to no data loss in pipeline output data.

  1. Job fails if region fails over : deploy 2 or more dataflow for streaming purposes
  2. Streaming from PubSub (no grouping / time-windowing) : messages are acked only when persisted in destination
  3. Streaming from PubSub (windowing + not rely on data before the outage) : PubSub Seek functionnality
  4. Streaming from PubSub (grouping + rely on data after the outage) : Dataflow Snapshot functionnality (in preview)

1.17.4  Charging model

Work Unit
Per Job

1.17.5  Changes catalogue – in Tokens, per act

Changes examples Effort
Delete Job 1 token
Deploy/Create Job 1 Business Hour day
Other changes Estimation in tokens based on time spent

1.18 Cloud Composer

1.18.1  Description

Cloud Composer is a managed Apache Airflow service that helps you create, schedule, monitor and manage workflows. Cloud Composer automation helps you create Airflow environments quickly and use Airflow-native tools, such as the powerful Airflow web interface and command line tools, so you can focus on your workflows and not your infrastructure.

1.18.2  Build to run service included in the OTC

1.18.2.1  Build service pre-requisite
  • Refer to generic description.
1.18.2.2  Build to run service
  • Refer to generic description.

1.18.3  RUN services included in the MRC

1.18.3.1  Run service pre-requisite
  • A referential file exists in the Git including the reference configuration of the Cloud Composer service.
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.18.3.2  Co-manage option

Yes only if CI/CD shared with the customer

1.18.3.3  KPI & alerts

Monitoring

Yes, Metrics, logs, Health probes

Metric

gcp.composer.environment.api.request_count Number of Composer API requests seen so far.
gcp.composer.environment.api.request_latencies.avg Distribution of Composer API call latencies.
gcp.composer.environment.api.request_latencies.samplecount Sample count for API request latencies
gcp.composer.environment.api.request_latencies.sumsqdev Sum of squared deviation for API request latencies
gcp.composer.environment.dagbag_size The current DAG bag size
gcp.composer.environment.dag_processing.parse_error_count Number of errors raised during parsing DAG files
gcp.composer.environment.dag_processing.processes Number of currently running DAG parsing processes
gcp.composer.environment.dag_processing.total_parse_time Number of seconds taken to scan and import all DAG files once
gcp.composer.environment.database_health Healthiness of Composer Airflow database
gcp.composer.environment.database.cpu.reserved_cores Number of cores reserved for the database instance
gcp.composer.environment.database.cpu.usage_time CPU usage time of the database instance, in seconds
gcp.composer.environment.database.cpu.utilization CPU utilization ratio (from 0.0 to 1.0) of the database instance
gcp.composer.environment.database.disk.bytes_used Used disk space on the database instance, in bytes
gcp.composer.environment.database.disk.quota Maximum data disk size of the database instance, in bytes
gcp.composer.environment.database.disk.utilization Disk quota usage ratio (from 0.0 to 1.0) of the database instance
gcp.composer.environment.database.memory.bytes_used Memory usage of the database instance in bytes
gcp.composer.environment.database.memory.quota Maximum RAM size of the database instance, in bytes
gcp.composer.environment.database.memory.utilization Memory utilization ratio (from 0.0 to 1.0) of the database instance
gcp.composer.environment.executor.open_slots Number of open slots on executor
gcp.composer.environment.executor.running_tasks Number of running tasks on executor
gcp.composer.environment.finished_task_instance_count Overall number of finished task instances
gcp.composer.environment.healthy Healthiness of Composer environment.
gcp.composer.environment.num_celery_workers Number of Celery workers.
gcp.composer.environment.num_workflows Number of workflows.
gcp.composer.environment.scheduler_heartbeat_count Scheduler heartbeats
gcp.composer.environment.task_queue_length Number of tasks in queue.
gcp.composer.environment.web_server.cpu.reserved_cores Number of cores reserved for the web server instance
gcp.composer.environment.web_server.cpu.usage_time CPU usage time of the web server instance, in seconds
gcp.composer.environment.web_server.memory.bytes_used Memory usage of the web server instance in bytes
gcp.composer.environment.web_server.memory.quota Maximum RAM size of the web server instance, in bytes
gcp.composer.environment.worker.pod_eviction_count Number of Airflow worker pods evictions
gcp.composer.workflow.run_count Number of workflow runs completed so far.
gcp.composer.workflow.run_duration Duration of workflow run completion.
gcp.composer.workflow.task.run_count Number of workflow tasks completed so far.
gcp.composer.workflow.task.run_duration Duration of task completion.
1.18.3.4  Backup and restore

Data backup and restore

From Iac + GitLab for Application Part

Service restore

Recovery from Logs and actions from Operation Team.

1.18.3.5  GCP SLA High Availability and Disaster Recovery inter-region

HA and non HA are provided by Google Cloud Platform depending on the design and service parameter configuration.

Recovery after region loss are based on design SOW, need actions from Operation teams.

1.18.4  Charging model

Work Unit
Per instance GKE

1.18.5  Changes catalogue – in Tokens, per act

Changes examples Effort
Create/modify/delete instance GKE 1 token
Add node 2 tokens
Other changes Estimation in tokens based on time spent

 

 

1.19 Cloud Big Table

1.19.1  Description

Bigtable is a NoSQL database service, a concept that, by moving away from traditional relational databases, allows it to adapt to the needs of the modern web. These databases are indeed able to run on several different machines simultaneously, which allows to scale up and manage huge volumes of data. It is a system with horizontal scalability.

Bigtable is exposed to applications through multiple client libraries, including a supported extension to the Apache HBase library for Java. Then Bigtable integrates with the existing Apache ecosystem of open source big data software.

1.19.2  Build to run service included in the OTC

1.19.2.1  Build service pre-requisite
  • Refer to generic description.
1.19.2.2  Build to run service
  • Refer to generic description.

1.19.3  RUN services included in the MRC

1.19.3.1  Run service pre-requisite
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.19.3.2  Co-manage option

No by default, IAC is fully managed by OBS, we are master of the CI/CD up to the table ((Customer can have access to the modifications of the column families on a case by case basis/request for change via tickets)

1.19.3.3  KPI & alerts

Monitoring

Yes, Insights, Metrics, logs, Key Visualizer

Orange Business Services monitors Cloud Bigtable using graphs available in Google Cloud Console or automatically by programming using the Cloud Monitoring API

Orange Business Services uses native tools for logs. Google Bigtable logs are collected with Google Cloud Logging and sent to a Cloud Pub/Sub via a Push HTTP forwarder.

KPI monitored

  • Average CPU usage
  • Storage usage
  • Memory usage
  • Read/write operations
  • Reading latency

Alerts observed

  • CPU and memory utilization
  • Disk utilization

Metric

gcp.bigtable.backup.bytes_used Backup storage used.
gcp.bigtable.autoscaling.max_node_count Maximum number of nodes in an autoscaled cluster.
gcp.bigtable.autoscaling.min_node_count Minimum number of nodes in an autoscaled cluster.
gcp.bigtable.autoscaling.recommended_node_count_for_cpu Recommended number of nodes in an autoscaled cluster based on CPU usage.
gcp.bigtable.autoscaling.recommended_node_count_for_storage Recommended number of nodes in an autoscaled cluster based on storage usage.
gcp.bigtable.cluster.cpu_load CPU load of a cluster.
gcp.bigtable.cluster.cpu_load_by_app_profile_by_method_by_table CPU load of a cluster split by app profile, method, and table.
gcp.bigtable.cluster.cpu_load_hottest_node CPU load of the busiest node in a cluster.
gcp.bigtable.cluster.disk_load Utilization of HDD disks in a cluster.
gcp.bigtable.cluster.node_count Number of nodes in a cluster.
gcp.bigtable.cluster.storage_utilization Storage used as a fraction of total storage capacity.
gcp.bigtable.disk.bytes_used Amount of compressed data for tables stored in a cluster.
gcp.bigtable.disk.storage_capacity Capacity of compressed data for tables that can be stored in a cluster.
gcp.bigtable.replication.latencies.avg Distribution of replication request latencies for a table.
gcp.bigtable.replication.latencies.samplecount Sample count for replication request latencies.
gcp.bigtable.replication.latencies.sumsqdev Sum of squared deviation for replication request latencies.
gcp.bigtable.replication.max_delay Upper bound for replication delay between clusters of a table.
gcp.bigtable.server.error_count Number of server requests for a table that failed with an error.
gcp.bigtable.server.latencies.avg Distribution of replication request latencies for a table.
gcp.bigtable.server.latencies.samplecount Sample count for replication request latencies.
gcp.bigtable.server.latencies.sumsqdev Sum of squared deviation for replication request latencies.
gcp.bigtable.server.modified_rows_count Number of rows modified by server requests for a table.
gcp.bigtable.server.multi_cluster_failovers_count Number of failovers during multi-cluster requests.
gcp.bigtable.server.received_bytes_count Number of uncompressed bytes of request data received by servers for a table.
gcp.bigtable.server.request_count Number of server requests for a table.
gcp.bigtable.server.returned_rows_count Number of rows returned by server requests for a table.
gcp.bigtable.server.sent_bytes_count Number of uncompressed bytes of response data sent by servers for a table.
gcp.bigtable.table.bytes_used Amount of compressed data stored in a table.

 

1.19.3.4  Backup and restore

Data backup and restore

The backup is based From IaC + Snapshot from table in same zone in same cluster

Service restore

Recovery will be from other table.

 

1.19.3.5  GCP SLA High Availability and Disaster Recovery inter-region

HA by design.

Replication of tables on other regions necessary for recovery after region loss.

1.19.4  Charging model

Work Unit
Per Instance

1.19.5  Changes catalogue – in Tokens, per act

Changes examples Effort
Create/modify/delete table

Add/mofdify/update/delete user with policies

Copy table

1 token
Strategy for making optimal insertion keys 3 tokens
Reclustering table More than 1 day
Other changes Estimation in tokens based on time spent

1.20 Cloud Datastore

1.20.1  Description

Datastore is a NoSQL database that offers great scalability for your applications. This database automatically manages data segmentation and replication so that you have a sustainable, high-availability database that can dynamically scale to handle the load of your applications. Datastore offers a multitude of features such as ACID transactions, SQL queries, indexes and more.

  • Applications can use Datastore to execute SQL-like queries that support filtering and sorting.
  • Datastore replicates data across multiple data centers, providing a high level of read/write availability.
  • Datastore also provides automatic scalability, high consistency for read and ancestor queries, eventual consistency for all other queries, and atomic transactions. The service has no scheduled downtime.

1.20.2  Build to run service included in the OTC

1.20.2.1  Build service pre-requisite
  • Refer to generic description.
1.20.2.2  Build to run service
  • Refer to generic description.

1.20.3  RUN services included in the MRC

1.20.3.1  Run service pre-requisite
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.20.3.2  Co-manage option

No by default, IAC is fully managed by OBS, we are master of the CI/CD up to the table ((Customer can have access to the modifications of the column families on a case by case basis/request for change via tickets)

1.20.3.3  KPI & alerts

Monitoring

Yes, Insights, Metrics, logs

Orange Business Service collects metrics from Google Datastore to :

  • Visualize the performance of your datastores
  • Correlate the performance of your datastores with your applications

Orange Business Services uses native tools for logs. Cloud Datastore logs are collected with Google Cloud Logging and sent to a Cloud Pub/Sub via a Push HTTP forwarder.

Metric

gcp.datastore.api.request_count Datastore API calls.
gcp.datastore.index.write_count Datastore index writes.
gcp.datastore.entity.read_sizes.avg Average of sizes of read entities.
gcp.datastore.entity.read_sizes.samplecount Sample Count for sizes of read entities.
gcp.datastore.entity.read_sizes.sumsqdev Sum of Squared Deviation for sizes of read entities.
gcp.datastore.entity.write_sizes.avg Average of sizes of written entities.
gcp.datastore.entity.write_sizes.samplecount Sample Count for sizes of written entities.
gcp.datastore.entity.write_sizes.sumsqdev Sum of Squared Deviation for sizes of written entities.

 

1.20.3.4  Backup and restore

Data backup and restore

The backup is based From IaC + Snapshot from table in same zone in same cluster

Service restore

Recovery will be from other table.

 

1.20.3.5  GCP SLA High Availability and Disaster Recovery inter-region

HA by design.

Replication of tables on other regions necessary for recovery after region loss.

1.20.4  Charging model

Work Unit
Per Instance

1.20.5  Changes catalogue – in Tokens, per act

Changes examples Effort
Create/modify/delete table

Add/mofdify/update/delete user with policies

Copy table

1 token
Strategy for making optimal insertion keys 3 tokens
Reclustering table More than 1 day
Other changes Estimation in tokens based on time spent

1.21 Memorystore

1.21.1  Description

The Cloud Memorystore service is a data storage service in RAM entirely managed by Google and compatible with Redis. Redis is a cache management system compatible with the main CMS such as WordPress, Drupal, Magento or Prestashop. Enabling a Redis service for these applications will dramatically speed up your users’ browsing experience. With the Cloud Memorystore service you can easily achieve sub-millisecond latencies and the service is calibrated to support loads consistent with the largest cache requirements.

The Cloud Memorystore service is completely isolated inside your VPC network. And only your virtual server instances have access to it. By using Cloud Memorystore you relieve your virtual server instances of redundant and unnecessary computations.

1.21.2  Build to run service included in the OTC

1.21.2.1  Build service pre-requisite
  • Refer to generic description.
1.21.2.2  Build to run service
  • Refer to generic description.

1.21.3  RUN services included in the MRC

1.21.3.1  Run service pre-requisite
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.21.3.2  Co-manage option

No by default, IAC is fully managed by OBS, we are master of the CI/CD up to the table ((Customer can have access to the modifications of the column families on a case by case basis/request for change via tickets)

1.21.3.3  KPI & alerts

Monitoring

Yes, Metrics, Hit Logs,

Orange Business Service collects metrics from Cloud Memorystore to :

  • Visualize the performance of your datastores
  • Correlate the performance of your datastores with your applications

Orange Business Services uses native tools for logs. Cloud Memorystore logs are collected with Google Cloud Logging and sent to a Cloud Pub/Sub via a Push HTTP forwarder.

Metric

gcp.redis.clients.blocked Number of blocked clients
gcp.redis.clients.connected Number of client connections
gcp.redis.commands.calls Total number of calls for this command
gcp.redis.commands.total_time The amount of time in microseconds that this command took in the last second
gcp.redis.commands.usec_per_call Average time per call over 1 minute by command
gcp.redis.keyspace.avg_ttl Average TTL for keys in this database
gcp.redis.keyspace.keys_with_expiration Number of keys with an expiration in this database
gcp.redis.keyspace.keys Number of keys stored in this database
gcp.redis.persistence.rdb.bgsave_in_progress Flag indicating a RDB save is on-going
gcp.redis.replication.master.slaves.lag The number of bytes that replica is behind.
gcp.redis.replication.master.slaves.offset The number of bytes that have been acknowledged by replicas.
gcp.redis.replication.master_repl_offset The number of bytes that master has produced and sent to replicas. To be compared with replication byte offset of replica.
gcp.redis.replication.offset_diff The number of bytes that have not been replicated to the replica. This is the difference between replication byte offset (master) and replication byte offset (replica).
gcp.redis.replication.role Returns a value indicating the node role. 1 indicates master and 0 indicates replica.
gcp.redis.server.uptime Uptime in seconds
gcp.redis.stats.cache_hit_ratio Cache Hit ratio as a fraction
gcp.redis.stats.connections.total Total number of connections accepted by the server
gcp.redis.stats.cpu_utilization CPU, in seconds of utilization, consumed by the Redis server broken down by System/User and Parent/Child relationship
gcp.redis.stats.evicted_keys Number of evicted keys due to max memory limit
gcp.redis.stats.expired_keys Total number of key expiration events
gcp.redis.stats.keyspace_hits Number of successful lookup of keys in the main dictionary
gcp.redis.stats.keyspace_misses Number of failed lookup of keys in the main dictionary
gcp.redis.stats.memory.maxmemory Maximum amount of memory Redis can consume
gcp.redis.stats.memory.system_memory_usage_ratio Memory usage as a ratio of maximum system memory
gcp.redis.stats.memory.usage_ratio Memory usage as a ratio of maximum memory
gcp.redis.stats.memory.usage Total number of bytes allocated by Redis
gcp.redis.stats.network_traffic Total number of bytes sent to/from redis (includes bytes from commands themselves, payload data, and delimiters)
gcp.redis.stats.pubsub.channels Global number of pub/sub channels with client subscriptions
gcp.redis.stats.pubsub.patterns Global number of pub/sub pattern with client subscriptions
gcp.redis.stats.reject_connections_count Number of connections rejected because of max clients limit
1.21.3.4  Backup and restore

Data backup and restore

The backup is based From IaC + Snapshot from table in same zone in same cluster

Service restore

Recovery will be from Dump of the DB

 

1.21.3.5  GCP SLA High Availability and Disaster Recovery inter-region

HA by design.

Replication of tables on other regions necessary for recovery after region loss.

1.21.4  Charging model

Work Unit
Per Instance

1.21.5  Changes catalogue – in Tokens, per act

Changes examples Effort
Create/modify/delete table

Add/mofdify/update/delete user with policies

Copy table

1 token
Optimisation of index 4 tokens
Other changes Estimation in tokens based on time spent

1.22 Cloud Firestore

1.22.1  Description

Cloud Firestore is a document-oriented NoSQL database that automatically manages data partitioning and replication to ensure reliability, while being able to scale up according to application needs. And it does so automatically.

Google Cloud Firestore is also a flexible and scalable database for mobile, web and server development from Firebase and Google Cloud Platform.

1.22.2  Build to run service included in the OTC

1.22.2.1  Build service pre-requisite
  • Refer to generic description.
1.22.2.2  Build to run service
  • Refer to generic description.

1.22.3  RUN services included in the MRC

1.22.3.1  Run service pre-requisite
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.22.3.2  Co-manage option

Yes if CI/CD shared with the customer (IaC Part)

1.22.3.3  KPI & alerts

Monitoring

Yes, Metrics, SlowQuery Log (FireStore)

Orange Business Services uses native tools for logs. Cloud Firestore logs are collected with Google Cloud Logging and sent to a Cloud Pub/Sub via a Push HTTP forwarder.

Metric

gcp.firestore.document.delete_count The number of successful document deletes.
gcp.firestore.document.read_count The number of successful document reads from queries or lookups.
gcp.firestore.document.write_count The number of successful document

 

1.22.3.4  Backup and restore

Data backup and restore

The backup based on regular export.

Service restore

Recovery will be from Infra as Code + Backup of the data.

1.22.3.5  GCP SLA High Availability and Disaster Recovery inter-region

HA and non HA are provided by Google Cloud Platform depending on the design and service parameter configuration

Recovery after regions loss is Based on design SOW, the service can be built in multiple regions.

1.22.4  Charging model

Work Unit
Per Instance

1.22.5  Changes catalogue – in Tokens, per act

Changes examples Effort
Create/update/delete instance

Create/update/delete BD

Run script FireStore

1 token
Index refactoring 4 tokens
Other changes Estimation in tokens based on time spent

1.23 Cloud Spanner

1.23.1  Description

Cloud SQL is a fully-managed database service that helps you set up, maintain, manage, and administer your relational databases on Google Cloud Platform. You can use Cloud SQL with MySQL, PostgreSQL, or SQL Server. Cloud SQL provides a cloud-based alternative to local MySQL, PostgreSQL, and SQL Server databases. Many applications running on Compute Engine, App Engine and other services in Google Cloud use Cloud SQL for database storage.

1.23.2  Build to run service included in the OTC

1.23.2.1  Build service pre-requisite
  • Refer to generic description.
1.23.2.2  Build to run service
  • Refer to generic description.

1.23.3  RUN services included in the MRC

1.23.3.1  Run service pre-requisite
  • This file can be executed with a CI/CD and the execution has been tested successfully.
1.23.3.2  Co-manage option

Yes if CI/CD shared with the customer (IaC Part)

1.23.3.3  KPI & alerts

Monitoring

Yes, Metrics

Orange Business Services uses native tools for logs. Cloud Spanner logs are collected with Google Cloud Logging and sent to a Cloud Pub/Sub via a Push HTTP forwarder.

Metrics

gcp.spanner.api.received_bytes_count Uncompressed request bytes received by Cloud Spanner.
gcp.spanner.api.sent_bytes_count Uncompressed response bytes sent by Cloud Spanner.
gcp.spanner.api.api_request_count Cloud Spanner API requests.
gcp.spanner.api.request_count Rate of Cloud Spanner API requests.
gcp.spanner.api.request_latencies.avg Average server request latencies for a database.
gcp.spanner.api.request_latencies.samplecount Sample count of server request latencies for a database.
gcp.spanner.api.request_latencies.sumsqdev Sum of Squared Deviation of server request latencies for a database.
gcp.spanner.api.request_latencies_by_transaction_type Distribution of server request latencies by transaction types.
gcp.spanner.instance.cpu.utilization Utilization of provisioned CPU, between 0 and 1.
gcp.spanner.instance.cpu.smoothed_utilization 24-hour smoothed utilization of provisioned CPU between 0.0 and 1.0.
gcp.spanner.instance.cpu.utilization_by_operation_type Percent utilization of provisioned CPU, by operation type between 0.0 and 1.0.
gcp.spanner.instance.cpu.utilization_by_priority Percent utilization of provisioned CPU, by priority between 0.0 and 1.0.
gcp.spanner.instance.node_count Total number of nodes.
gcp.spanner.instance.session_count Number of sessions in use.
gcp.spanner.instance.storage.used_bytes Storage used in bytes.
gcp.spanner.instance.storage.limit_bytes Storage limit for instance in bytes
gcp.spanner.instance.storage.limit_bytes_per_processing_unit Storage limit per processing unit in bytes.
gcp.spanner.instance.storage.utilization Storage used as a fraction of storage limit.
gcp.spanner.instance.backup.used_bytes Backup storage used in bytes.
gcp.spanner.instance.leader_percentage_by_region Percentage of leaders by cloud region between 0.0 and 1.0.
gcp.spanner.instance.processing_units Total number of processing units.
gcp.spanner.lock_stat.total.lock_wait_time Total lock wait time for lock conflicts recorded for the entire database.
gcp.spanner.query_count Count of queries by database name, status, query type, and used optimizer version.
gcp.spanner.query_stat.total.bytes_returned_count Number of data bytes that the queries returned
gcp.spanner.query_stat.total.cpu_time Number of seconds of CPU time Cloud Spanner spent on operations to execute the queries.
gcp.spanner.query_stat.total.execution_count Number of times Cloud Spanner saw queries during the interval.
gcp.spanner.query_stat.total.failed_execution_count Number of times queries failed during the interval.
gcp.spanner.query_stat.total.query_latencies Distribution of total length of time, in seconds, for query executions within the database.
gcp.spanner.query_stat.total.returned_rows_count Number of rows that the queries returned.
gcp.spanner.query_stat.total.scanned_rows_count Number of rows that the queries scanned excluding deleted values.
gcp.spanner.read_stat.total.bytes_returned_count Total number of data bytes that the reads returned excluding transmission encoding overhead.
gcp.spanner.read_stat.total.client_wait_time Number of seconds spent waiting due to throttling.
gcp.spanner.read_stat.total.cpu_time Number of seconds of CPU time Cloud Spanner spent execute the reads excluding prefetch CPU and other overhead.
Shown as second
gcp.spanner.read_stat.total.execution_count Number of times Cloud Spanner executed the read shapes during the interval.
gcp.spanner.read_stat.total.leader_refresh_delay Number of seconds spent coordinating reads across instances in multi-region configurations.
gcp.spanner.read_stat.total.locking_delays Distribution of total time in seconds spent waiting due to locking.
gcp.spanner.read_stat.total.returned_rows_count Number of rows that the read returned.
gcp.spanner.row_deletion_policy.deleted_rows_count
(count)
Count of rows deleted by the policy since the last sample.
gcp.spanner.row_deletion_policy.processed_watermark_age Time between now and the read timestamp of the last successful execution.
gcp.spanner.row_deletion_policy.undeletable_rows Number of rows in all tables in the database that can’t be deleted.
gcp.spanner.transaction_stat.total.bytes_written_count Number of bytes written by transactions.
gcp.spanner.transaction_stat.total.commit_attempt_count Number of commit attempts for transactions.
gcp.spanner.transaction_stat.total.commit_retry_count Number of commit attempts that are retries from previously aborted transaction attempts.
gcp.spanner.transaction_stat.total.participants Distribution of total number of participants in each commit attempt.
gcp.spanner.transaction_stat.total.transaction_latencies Distribution of total seconds taken from the first operation of the transaction to commit or abort.
1.23.3.4  Backup and restore

Data backup and restore

Yes, backup automatic include

Service restore

Recovery will be from Infra as Code + Backup of the data.

1.23.3.5  GCP SLA High Availability and Disaster Recovery inter-region

HA and Multiregional

Recovery after regions loss : Managed Service, Serverless, everything is managed by Google

1.23.4  Charging model

Work Unit
Per Instance

1.23.5  Changes catalogue – in Tokens, per act

Changes examples Effort
Create/update/delete BD 1 token
Modification of the DB schema 4 tokens
Other changes Estimation in tokens based on time spent