Public Cloud – Flexible Engine

Data Warehouse Service – Massive secure data storage available in just a few clicks

Cloud-based online MPP (massively parallel processing)-based database

Data Warehouse Service (DWS) is a cloud-based online MPP (massively parallel processing)-based database. It is fast, stable, reliable, secure, scalable, easy to administer, and cost-effective. The database installation and deployment are performed automatically by DWS in minutes. DWS also provides some tools for database operation and maintenance (O&M), including backup and restoration, monitoring, and database connection. DWS significantly reduces the complexity and costs in terms of O&M, freeing tenants up to focus on their applications and business based on MPP-based database.

DWS is essentially a database O&M platform and integrated massively parallel processing (MPP) architectures data a administrators compared to legacy DB scenarios.

DWS provides the following functions:

  • Professional DWS cluster management

DWS Console provides comprehensive function menus allowing an easy and secure database management and maintenance using browsers, such as:

  • DWS clusters deployment
  • DWS cluster management and access
  • Cluster scaling-out
  • Cluster security parameter configurations
  • MRS cluster connections
  • Simple online data backup and restoration

On DWS Console, only a few clicks are required to create a snapshot to back up cluster configurations and data to secure and stable object-based storage service (OBS) space. Tenants can restore a DWS cluster by a cluster snapshot.

  • Comprehensive monitoring system

On DWS Console, tenants can use the dashboard and DWS cluster list to directly view the operating status of created DWS clusters. There are more than 20 performance monitoring metrics, through which tenants can view the current and historical status of DWS clusters.

DWS can be used in the following scenarios

Data analysis & reporting scenarios

The scenario includes the following concrete application scenarios:

  • Data exploration and data modeling 
  • Low-latency, near-real-time query and decision making
  • Repeated BI Reporting/Dashboard

These application scenarios have the common features include:

  • Heavy standard SQL oriented data analysis
  • Data are well–structured, relational.
  • Low-latency user interaction.

Using DWS as ETL to import data into DWS

The raw data was stored or uploaded onto OBS. It will be extracted, transformed and loaded (ETL) into DWS through DWS / Spark, and eventually arrives in DWS for further analysis. 

DWS have the ability to directly load data from OBS or HDFS file system. And it supports importing data in ORC format, which is a popular data file format in Hadoop eco-system. 

The scenario has such features as:

  • Raw data is stored on OBS.
  • DWS either loads raw data from OBS or directly compute on OBS.
  • DWS produces result data on its HDFS.
  • DWS loads results data from 3) for further analysis.

In this scenario, DWS user needs to create a separate DWS on his/her own.

Benefits

Immediate use after provisioning (short Time to Market)

DWS makes it easy to go from project conception to deployment. By using DWS Console, tenants can obtain the capabilities of a production-ready data warehouse in minutes. A deployment of a dedicated DWS cluster or even a server is not required any more.

Stable and reliable

DWS runs on highly reliable infrastructure. DWS synchronizes data from the primary data node to the standby data node. If the primary data node is unavailable, DWS will switch to the standby data node in seconds. Furthermore, DWS has many other features that enhance the database reliability, including take snapshot, and restoration.

Secure

DWS makes it easy to control network access to tenants’ databases. DWS also lets tenants run their DWS clusters in a virtual private cloud (VPC), which enables tenants to isolate their DWS clusters and to connect their existing IT infrastructure to the DWS clusters through an Elastic IP (EIP). In addition, DWS supports the use of SSL to make data transmission more secure.

Scalable

By using DWS Console, tenants is able to scale out their cluster by adding DWS nodes to cluster, to extend DWS cluster computing power and storage capacity.

Easy to administer

By using DWS Console, tenants can set up, operate, and scale out a cluster easily. In addition, tenants can easily perform database O&M, including connecting applications to DWS clusters, taking snapshot and restoring, and monitoring DWS clusters. Tenants can use Cloud Eye (CES) Console to view cluster and hosts key metrics, including CPU/memory, DB size, Session count, Shared Buffer Hit ratio and etc.

Cost-effective

Tenants pay very low fees and only for the resources they actually consume. In addition, tenants can start from DWS clusters with lower specifications and scale out the cluster at any time based on their requirements.

Specifications

Table 1.1: Specifications of DWS cluster classes

Node Type Number of vCPUs Memory Storage Per Node Node Range Total Capacity
d1.xlarge 6   1800*2 GB 32 115TB
m1.xlarge 4 32 GB 512 BG 32 16TB

Table 1.2: Specifications of backup storage

Backup Type Snapchot Storage
Refer to Object Storage Service (OBS). Backup Storage is the storage associated with Automated Backups and tenant-initiated DWS cluster snapshots. Increasing the backup retention period or taking additional DWS cluster snapshots increases the backup storage consumed by tenants’ databases.

DWS only provides instances quota configuration

Usage restrictions

  • DWS clusters must be created in a subnet of a VPC.
  • The node number of cluster can be extended up to 32.
  • The node of DWS cluster is invisible to tenants. It means that only the intranet IP address / public IP address and port of DWS cluster can be connected by tenants’ applications which is in the same VPC.
  • The DWS node hosting the database is invisible to tenants and the backup files are invisible to tenants.