Public Cloud – Flexible Engine

Cloud Stream – analyse your data in real-time

Analyse real-time big data stream analysis service running on the public cloud

Cloud Stream Service is a real-time big data stream analysis service running on the public cloud. Computing clusters are fully managed by Cloud Stream, enabling you to focus on Stream SQL services. Cloud Stream is compatible with Apache Flink APIs, and Cloud Stream jobs run in real time.

Cloud Stream is a distributed real-time stream computing system featuring low latency (millisecond-level latency), high throughput, and high reliability. Powered on Flink and Spark Streaming, Cloud Stream integrates enhanced features and security, and supports both stream processing and batch processing methods. It provides mandatory Stream SQL features for data processing, and will add algorithms of machine learning and graph computing to Stream SQL in the future.

Product architecture

$C:\Users\h00223286\AppData\Roaming\eSpace_Desktop\UserData\h00223286\imagefiles\CE0B8A28-A2CE-46F3-971B-26E1DB8B2B8B.png$

Cloud Stream Service is based on Flink and Spark engine, it support rich real-time analysis capabilities, include Stream SQL, open-source native FlinkML, Gelly and so on. It support two kind of ecological, one is the HUAWEI cloud services, including DIS and SMN. Another is that through exclusive cluster and peering function, it can connect to open-source cluster like Kafka/Hbase.

Cloud Stream support to connect with following source and sink:

Source:

DIS: Read streaming data from DIS.
OBS: Read file once from OBS object.
Kafka: Consume data from Kafka, this feature only support on exclusive cluster, cause Kafka cluster is user’s cluster, it need to create vpc peering between exclusive cluster and user’s Kafka cluster, thus the exclusive cluster can access user’s data. kafka supported version must be above 0.1

Sink:

DIS: Send streaming data from DIS.
Kafka: Produce data to Kafka, this feature only support on exclusive cluster, cause Kafka cluster is user’s cluster, it need to create vpc peering between exclusive cluster and user’s Kafka cluster, thus the exclusive cluster can access user’s data.
SMN: Send message or mail through SMN.

These features are supported in Stream SQL, user can just write sql to connect with these services. In Flink jar or Spark Streaming jar, user can also access these feature through user defined code base on Flink API or Spark API.

Cloud Stream has the following advantages

Easy to Use

You only need to compile Stream SQL statements in the editor to implement business logic.

Fully Managed

Cloud Stream provides visualized information on running jobs.

Pay-as-you-go

Users only need to pay for the SPUs they use. One SPU includes a one-core CPU and 4 GB memory.

Just Use It

You only need to compile Stream SQL statements to submit and run jobs without the need of focusing on big data frameworks such as Hadoop, Flink, and ZooKeeper.

Isolation

Tenants’ exclusive clusters are physically isolated from each other, so that they can submit user defined job on their own cluster.

High throughput and low latency

The Apache Flink Dataflow model is used. The natural pressure mechanism is supported. Real-time data analysis and transfer minimize the data latency.

Cloud Stream provides the following feature

Stream SQL online analysis
Aggregation functions, such as Window and Join, are supported. SQL is used to express business logic, facilitating service implementation
Distributed real-time computing
Large-scale cluster computing and auto scaling of clusters reduce costs greatly.
Fully hosted clusters
Cloud Stream provides visualized information on running jobs.
Pay-as-you-go
The pricing unit is stream processing unit (SPU), and an SPU contains one core and 4 GB memory. You are charged based on the running duration of specified SPUs, accurate to seconds.
High throughput and low latency
Cloud Stream enables real-time computing services with millisecond-level latency.
Interconnection with SMN
Cloud Stream can connect to Simple Message Notification (SMN), enabling real-time transmission of data analysis results and alarm information to user’s mobile phones in IoT scenarios.
Online SQL job debug
Job debugging helps you check whether the SQL statement logic is correct. After sample data is input manually or using Object Storage Service (OBS) buckets, the correct SQL statement logic will export results as expected.
Spark streaming and structured streaming
You can submit customized spark streaming jobs in exclusive clusters.
Exclusive cluster creation and resource quota allocation for jobs
Tenants can create exclusive clusters, which are physically isolated from shared clusters and other tenants’ clusters and are not subject to other jobs. They can also configure the maximum SPU quota for their exclusive clusters and allocate available clusters and SPU quota for owned users.
Customized Flink job
You can submit customized Flink jobs in exclusive clusters.

Function List

Cloud Stream enables a user to create a real-time analysis job using Stream SQL or User Defined Jar. It also enables a tenant to create exclusive cluster which is physically isolated from the computing resources of other tenants.

Feature	Description
Stream SQL	Window The window is used to calculate the aggregate value over a period of time or within a certain amount. For example, a website’s 5-minute clickthrough rate Group Window(hop/tumble/session) Do aggregation every period of time(or amount). Over Window Do aggregation every records come. Join Now support stream join stream(proctime/rowtime) to help you do some such as information complement. For example, Correlate user phone data and location data in real time to correct user’s phone call location. Geospatial Function Functions to define geographical areas, and evaluate incoming geospatial data for containment, proximity, overlap, and generate alerts or easily kick-off necessary workflows etc. ST_DISTANCE/ST_OVERLAPS/ST_INTERSECTS/ST_WITHIN Machine Learning Function Support Stream Random Forrest algorithm in SQL to help user detect anomalies and score according to the severity.
CEP	CEP API Support the standard Oracle pattern matching grammar – Match Recognize. The ability to recognize patterns found across multiple rows is important for many kinds of work. Examples include all kinds of business processes driven by sequences of events, such as security applications, where unusual behavior must be detected, and financial applications, where you seek patterns of pricing, trading volume, and other behavior. Other common uses are fraud detection applications and sensor data analysis. One term that describes this general area is complex event processing, and pattern matching is a powerful aid to this activity.
Multi-type Mode	Shared mode	Serverless and fully managed Stream sql job All user can submit Stream SQL job on shared cluster.
Multi-type Mode	Exclusive mode	Isolated Physically isolated from other tenants including shared cluster, the VPC and ECS are isolated from each other to ensure access security Multi-type job Not only Stream SQL job, but also can submit Flink jar job and Spark streaming jar job to exclusive cluster. Open Source Ecology User can connect to their own cluster, include Kafka/Hbase, thus user can access their own data.
Multi Stream Engine	Flink	Compatible with open source Fink 1.4.
	Spark Streaming Spark Structured Streaming	Compatible with open source Spark 2.2.
Source And Sink	Source: DIS: Read streaming data from DIS. OBS: Read file once from OBS object. Kafka: Consume data from Kafka, this feature only support on exclusive cluster, cause Kafka cluster is user’s cluster, it need to create vpc peering between exclusive cluster and user’s Kafka cluster, thus the exclusive cluster can access user’s data. Sink: DIS: Send streaming data from DIS. Kafka: Produce data to Kafka, this feature only support on exclusive cluster, cause Kafka cluster is user’s cluster, it need to create vpc peering between exclusive cluster and user’s Kafka cluster, thus the exclusive cluster can access user’s data. SMN: Send message or mail through SMN.

Feature	Description
Job Management	Create Stream SQL Job	The user can create stream sql job in shared cluster or exclusive cluster. Through SQL, user can connect to Data Ingestion Service (DIS) and Object Storage Service (OBS), also can send the analysis result to Data Ingestion Service (DIS) and Simple Message Notification (SMN).
	Create User Defined Jar Job	The user can create jar job only in exclusive cluster. The User Defined Jar can be Flink jar and Spark Streaming jar.
	Starting job	The user can start created jobs to make it run.
	Stopping job	The user can stop running jobs
	Deleting job	The user can delete a job with any status.
	Monitoring job	The user can check the operation history through audit logs. The user can check the job details, execution plan, data inputs/outputs through dashboard.
Template Management	Create SQL Template	To quickly create stream jobs, Cloud Stream provides the function of customizing job templates. The user can create templates for some Common business.
	Modify SQL Template	The user can modify created templates.
	Delete SQL Template	The user can delete created templates.
	Using SQL Template	The user can use created templates when create a new Stream SQL job.
Cluster Management	Create exclusive cluster	The domain user can create exclusive cluster and set the maximum resource that the cluster can use. The exclusive cluster is physically isolated from the computing resources of other tenants.
	Modify exclusive cluster quota	The domain user can modify the maximum resource that the cluster can use.
	Delete exclusive cluster	The domain user can delete created clusters. Once the cluster is deleted, all jobs in the cluster will be deleted also.
	Allocate user quota	The domain user can allocate the maximum resources that the sub-user can use.
	Allocate user cluster	The domain user can allocate which clusters that the sub-user can use.
	Manage tenant jobs	The domain user can start/stop/delete the jobs of his sub-users.

Specifications

Anti-DDoS supports traffic cleaning only for the ECThe user can submit job to the cluster created by Cloud Stream, user only know the resource as SPU.

Parameter	Description
SPU	SPU is a minimum calculation unit. An SPU consists of a 1-core CPU and 4 GB memory.
Parallelism	Indicates the number of tasks carried on a job.

Cloud Stream Job Specification:

Item	Specifications
Maximum number of draft jobs	100
Maximum job name length	57 Bytes
Maximum job desc length	512 Bytes
Maximum sql length	10000 Bytes
Maximum job spu	400
Maximum job parallelism	50

Cloud Stream Template Specification: Template is some sql samples, user can easy to modify and create new sql job base on it.

Item	Specifications
Maximum template count	100
Maximum template sql length	10000 Bytes
Maximum template name length	64 Bytes
Maximum template desc length	512 Bytes

Tenant exclusive cluster specification:

Item	Specifications	Description
Maximum spu quota of a single cluster	400	The maximum quota of a single cluster can use.
Maximum domain clusters number	10	The maximum number of exclusive clusters user can create
Maximum spu quota user can use	1000	The maximum spu quota the user can use. E.g. If one user is allocated 3 exclusive cluster, the maximum spu of all these 3 clusters he can use cannot greater than 1000
Maximum cluster name length	100 Bytes
Maximum cluster desc length	512 Bytes

Application Scenarios

Cloud Stream focuses on Internet and Internet of Things (IoT) service scenarios that require timeliness and high throughput. Basically, Cloud Stream provides Internet of Vehicles (IoV) services, online log analysis, online machine learning, online graph computing, and online algorithm application recommendation for multiple industries, such as small- and medium-sized enterprises in the Internet industry, IoT, IoV, and anti-financial fraud.

Real-time stream analysis

Purpose: to analyze big data in real time

Feature: Complex stream analysis methods, such as Window, CEP, and Join, can be performed on stream data with millisecond-level latency.

Application scenarios: real-time log analysis, network traffic monitoring, real-time risk control, real-time data statistics, and real-time data Extract-Transform-Load (ETL)

IoT

Purpose: to analyze online IoT data

Feature: IoT services call the APIs of Cloud Stream. Cloud Stream then reads sensor data in real time and executes users’ analysis logic. Analysis results are sent to services, such as Data Ingestion Service (DIS), for data persistency, alarm or report display, or visual display of results.Application scenarios: elevator IoT, industrial IoT, shared bicycles, IoV, and smart home.

$C:\Users\t00384520.CHINA\Desktop\IoT修改.jpg$

Related Services

Cloud Stream works with the following services :

By default, DIS serves as a data source of Cloud Stream and stores outputs of Cloud Stream jobs. For more information about DIS, see Data Ingestion Service User Guide.

Data source: DIS accesses user data and Cloud Stream reads data from the channel used by DIS as input data for jobs.
Data output: Cloud Stream writes output of jobs into DIS.
Object Storage Service (OBS)

OBS serves as a data source and backs up checkpoint data for Cloud Stream. For more information about OBS, see Object Storage Service User Guide.

Data source: Cloud Stream reads user-stored data from OBS as input data for jobs.
Checkpoint data backup or job log saving: If the checkpoint function or job log saving function is enabled, Cloud Stream stores job snapshots or logs to OBS. In the event of exceptions, Cloud Stream can recover the job based on checkpoint data backup or queries job logs to locate the fault.
Identity and Access Management (IAM)

IAM authenticates access to Cloud Stream. For more information about IAM, see the Identity and Access Management User Guide.

Cloud Trace Service (CTS)

CTS provides users with records of operations on Cloud Stream resources, facilitating query, audit, and backtracking. For more information about CTS, see the Cloud Trace Service User Guide.

Elastic Cloud Server (ECS)

ECS provides Cloud Stream with a computing server that consists of CPUs, memory, images, and Elastic Volume Service (EVS) disks and allows on-demand allocation and elastic scaling. For more information about ECS, see the Elastic Cloud Server User Guide.

Simple Message Notification (SMN)

SMN provides reliable and flexible large-scale message notification services to Cloud Stream. It significantly simplifies system coupling and pushes messages to subscription endpoints based on requirements. For more information about SMN, see the Simple Message Notification User Guide.

Usage Restrictions

Cloud Stream Job Specification:

Item	Specifications
Maximum number of draft jobs	100
Maximum job name length	57 Bytes
Maximum job desc length	512 Bytes
Maximum sql length	10000 Bytes
Maximum job spu	400
Maximum job parallelism	50

Cloud Stream Template Specification: Template is some sql samples, user can easy to modify and create new sql job base on it.

Item	Specifications
Maximum template count	100
Maximum template sql length	10000 Bytes
Maximum template name length	64 Bytes
Maximum template desc length	512 Bytes

Tenant exclusive cluster specification:

Item	Specifications	Description
Maximum spu quota of a single	400	The maximum quota of a single
cluster		cluster can use.
Maximum domain clusters number	10	The maximum number of exclusive clusters user can create
Maximum spu quota user can use	1000	The maximum spu quota the user can use. E.g. If one user is allocated 3 exclusive cluster, the maximum spu of all these 3 clusters he can use cannot greater than 1000.
Maximum cluster name length	100 Bytes
Maximum cluster desc length	512 Bytes

Version restriction

Item	Version
Flink	1.4
Kafka	above 0.1
HBase	1.3.1
Spark	2.2

Documentation

Price List

Technical Help Center

Q&A