Public Cloud – Flexible Engine

Machine Learning Service – give meaning to your data

Identify patterns in data to construct machine learning models

MLS provides a visualized operation interface for user to orchestrate training, evaluation, and prediction processes of machine learning models. In addition, it seamlessly integrates data analysis with prediction application, simplifying life cycle management of machine learning models. In addition, it seamlessly integrates data analysis with prediction application, simplifying life cycle management of machine learning models. It offers an ease-of-use and efficient platform of high performance for your data mining and analysis services.

Figure 1.1: Process for using MLS

You can log in to the MLS console to create and manage MLS instances. On the visualized MLS instance management interface, create and manage projects. In projects, create and edit workflows for data analysis

  • A MLS instance should be created for using the machine learning interface. You can create multiple MLS instances at the same time as well as manage and access them.
  • MLS Instance Operation Interface
    Each MLS instance provides a visualized interface for you to perform MLS operations. MLS also provides Restful API for you to run the predictive analysis job automatically.
  • Service Address
    Once a MLS instance is created. User can access the Machine Learning operation interface through the corresponding instance address.
    The internal access address is accessible to only clients in the same subnet as the MLS instance.
    The external access address exists only after you bind the MLS instance to an elastic IP address during instance creation and is accessible to any client on the Internet.
  • MLS Processing Node
    A node is a logical running unit, standing for a data processing sub-step.
    MLS encapsulates various data processing steps (data loading, data pre-processing, and machine learning algorithms) into different nodes, masking programming details. You can drag node icons, connect nodes, and modify node properties to flexibly import, export, convert, and analyze data.
  • Workflow
    A workflow is a process set which connecting multiple nodes to construct a predictive analysis model.
    You can combine multiple nodes by dragging and connecting them on the MLS instance operation interface to form a logical running definition (workflow) that is more complex than a single node.
  • Interactive Notebook
    MLS integrates Jupyter Notebook to provide users with a notebook as an integrated development environment of machine learning applications.
    Notebooks enable you to edit Python scripts and use MLlib (Spark native algorithm) for data analysis, modeling, and model application.
    Notebooks also provide users with a large number of third-party Python packages developed for scientific data computing.
  • Run Configuration Template
    A run configuration template is used to configure computing resources.
    You can create run configuration templates on the MLS instance operation interface and assign them to projects.
  • Project
    A project organizes user-defined information resources.
    You can create projects on the MLS instance operation interface to manage your workflows and run configuration templates.

Benefits

  • MLS offers users an easy way to use machine learning and to create flexible predictive analytics solutions in the cloud. In addition MLS significantly reduces the O&M complexity and costs, which helps to focus on the application and business levels
  • Permanent online service, release of resources, reduce IT investment, accelerate business
  • Automated operation and maintenance plus simple and intuitive interface
  • Interactive predictive analysis : Notebook Interactive programming tools
  • Compatibility with a large number of algorithms from open source community
  • Ease of use : drag-drop interface for creating a model without requiring programming skills
  • Model application: Clients can use the trained model to make bulk predictions of new data
  • Visualization : Built-in rich chart type, WYSIWYG , to enhance the efficiency of data exploration
  • Rich library of algorithms : The MLS service optimizes the commonly used machine learning algorithm, has better performance and linear acceleration ratio, supports the massive data distributed analysis processing open integrated interface, the user can extend the custom algorithm, the algorithm only user’s own visible

Scenarios

Market analysis

Shopping malls from customer consumption records to find a group of customer with common features (interest, income levels and consumption habits, etc.), analyze what kind of customer buys what products, so as to adjust market strategy.

Targeted recommendation

Banks analyzing customer’s personal financial information to recommend appropriate products (loans, financial products), at a small price to obtain large gains.

Fraud detection

Insurance company analyzing the historical behavior data of an insured person, establishes the fraudulent behavior model, and recognizes the insured person who falsely insures the insurance compensation.

Specifications

MLS supports run configuration templates of Spark tasks. You can create run configuration templates on the MLS instance operation interface on demand so that they can be used in modeling projects on the platform.

Table 1.1: Default specifications of run configuration templates

Parameter Value
Maximum number of instances per-tenant 5
Numer of VM per-instance 2
Number of CPU per-instance-VM 8U
Size of memory per-instance-VM 16 GB
Size of EVS storage space allocated per-instance-VM 100 GB
Maximum size of OBS backup space required per-instance 100 GB
Number of EIP to bind per-instance 1
Bandwidth of EIP bound to instance 5 Mbit/s
Memory size of spark driver 512 MB to 100 GB
Number of spark executors 1 to 10000
Number of CPU cores per-executor 1 to 100
Memory size per-executor 512 MB to 100 GB

Table 1.2: Default specifications of run configuration templates of MLS instances

Parameter Value Description
Driver Memory Size (MB) 512 Driver memory is used to run Spark programs and create SparkContext
Executors 1 Number of executor
Executor CPU Cores 1 Number of CPU cores of one executor
Executor Memory Size (MB) 512 Memory of one executor
Queue Name default Queue name of the current MRS cluster

Usage restrictions

  • Only support one AZ. 
  • MLS relies on MRS to provide computing and storage resources, so before creating an MLS instance you must have an MRS cluster, if not, please request an MRS cluster. MLS only support MRS1.3 cluster.
  • User-created MLS instances must be in the same VPC subnet and security group as the MRS cluster.
  • If the MLS instance is created, it is not supported to modify the specification. Currently, a MLS instance specifications are 8 core CPU, 16 GB Memory, 100GB Storage disk. If you need to use a higher-specification node, please re-create an MLS instance.

Restrictions of the Notebook function:

  • Only Python is supported. 
  • Magic commandline is not supported.
  • The function to download the ipython file is not available.
  • Only five notebook files can be opened at most.
  • Graphical visualization is not supported, that is, Python packages of data and model visualization cannot be invoked.