Azure ML Thursday 3: Tuning hyperparameters

On this third Azure ML Thursday we'll continue our series testing different models and tuning hyperparameters. Before playing with new algorithms or tuning parameters, be sure you know how to train and test your data!

Machine Learning Models

In Azure ML studio, the (very high-level) workflow is

  1. throw a Machine Learning model onto the canvas
  2. train it
  3. verify if the results are robust (for example via cross-validation)
  4. deploy as a web service

... And your ML application is in place1!

The success of your model depends mainly on its accuracy and robustness to predict new cases. Increasing the model's accuracy can be done in a few ways. You can:

  • Change the data (preprocessing, cleansing, filling NAs)
  • Change the model (exchange the Multiclass Logistic Regression for a Multiclass Decision Forest2)
  • Change the way of training (adjust train / test set size, do cross-validations)
  • Tune the model's properties (how fast the model should draw conclusions, error margins)

Today, we'll focus on tuning the model's properties. We won't discuss the details of all properties (you can easily look that up in the docs), instead we'll look at how to test for different parameter combinations insize Azure ML Studio.

As soon as you click on an untrained model inside your experiment, you'll be presented with some parameters - or, in ML parlance, hyperparameters - you can tweak.

Parameters pane of the Multiclass Logistic Regression model

Parameters pane of the Multiclass Logistic Regression model

There are basically two ways to tweak these parameters: either you fill them in by hand, or you do a so-called "parameter sweep".

Tuning parameters by hand

Tuning parameters by hand might be tedious, but gives you some direct feedback ideas about how the algorithms work. For example, we could try to adjust the memory size for L-BFGS3 on the default Iris Flower competition workflow:

Memory size for L-BFGS Overall accuracy (test) Average accuracy (test)
20 0.916667 0.944444
200 0.916667 0.944444
2 0.916667 0.944444

.. as you see, memory size for L-BFGS doesn't matter much in this example. Too bad.

Doing a parameter sweep

The other way of tuning parameters is doing a parameter sweep. This simply means you won't set "hard" values for the ML model, but provide ranges by selecting "parameter range" as trainer mode in the properties pane:

mlr parameter range

Having selected "parameter range" as trainer mode, you set ranges for the parameters, and connect the model to Tune Model Hyperparameters:

manual to tuning

After you've connected Tune Model Hyperparameters correctly, it still bears a red sign. The reason is because it doesn't know what to predict yet (the fact we're predicting is often called the "label"). Therefor, you need to select Tune Model Hyperparameters and select a label. Select the 'class' column here:

select label

After that, you select the parameter sweeping mode:

  • Random sweep: Tries x random guesses out of the possible parameter values you provided along with the model.
  • Entire grid: Calculates all possibilities. Perfect for testing a limited amount of parameter sets. All parameter combinations are covered (can take a lot of time!)
  • Random grid: Creates a grid of all possibilities, then samples a limited amount of random tries out of that grid. Great to get insight in how combinations of parameters perform.

The Entire Grid sweep sounds quite thorough, but really takes a lot of time, and research shows it doesn't always lead to better models. By default, I'd choose the Random Grid for a parameter sweep.

Down below you can select metrics for measuring the performance. Assert that the metric you select fits the problem and model you're testing here! In our example, we'll use the Accuracy metric.

After running the experiment, the left port of Tune Model Hyperparameters contains the results of the tuning run. Having chosen for a random grid, the results may differ from run to run, but here is one possible output:

OptimizationTolerance L1Weight L2Weight MemorySize Accuracy
0 0.1 0.01 20 0.944444
0.00001 0.1 0.01 5 0.944444
0 1 0.01 50 0.925926
0 0.01 1 20 0.888889
0 0.1 0.1 50 0.888889
0 0 0.1 50 0.888889
0 0 1 5 0.888889
0.00001 1 1 50 0.888889
0.00001 0.1 1 50 0.888889
0 1 1 20 0.87038

Notice that I've tuned the parameters only with the training set data here! This gives me the ability to test the best tuned model (which is outputted at the right side of Tune Model Hyperparameters) using the test data, resulting in the following outcome:

Uitkomst Azure ML hypertune

More precision and reliability

If you've read last week's post closely, a question should rise by now: doesn't this way of parameter sweeping have a huge risk of overfitting? After all, we're feeding the model training data, then tune it until score is maximized, and assuming that is the best model! Indeed, overfitting is a real danger here. We could work around it in two ways:

  1. Use the test set as leading measurement instead of the training set
  2. Cross-validate the results

Testset as leading measurement for parameter sweep

To use the testset as the leading measurement, you simply connect your testset to the right input of Tune Model Hyperparameters. Then, this is used as the leading indicator of success for the model.

tmh traintest

Cross-validate parameter sweep

To cross-validate the parameter sweep, first divide the dataset into folds using the Partition and Sample block, selecting Assign to Folds as the operation. This element comes before Tune Model Hyperparameters:

partition folds 1

Notice that you don't need to split the data before creating folds, as the cross-validation already creates test- and trainingsets4

In the properties of the Partition and Sample element I choose "assign to folds", set the number of folds and indicate whether it should be a stratified5 split:

partition folds 2

Even more precision

To achieve an even higher level of precision, we could try other ML models. Doing so is extremely easy: you just remove the Multiclass Logistic Regression, then drop in another model at the same place. If you want to learn more about which models to use, take a look at Brandon Rohrer's post "How to choose algorithms for Microsoft Azure Machine Learning", which covers exactly that.

Wrapping up - achieving 100%?

In this post we've explored how to tune your Machine Learning model inside Azure ML studio. Some datasets are easier to predict than others - the Iris Flower dataset used here has a very high predictability. Not only is the starter experiment's default prediction rate of 93.3% already extremely high for many real-world situations, but you can drive it up even further - by tuning the model's hyperparameters, but also by training the model with a higher portion of the data than the default 60%. 100% scores are possible using ML, but don't be fooled: the Iris Flower dataset is an open dataset which can be downloaded and used to fit the model for exactly this use case. I'm not saying that the 100% scores you see have been reached by cheating, but you could definitely do so!

Founder of this blog. Business Intelligence consultant, developer, coach, trainer and speaker at events. Currently working at Dura Vermeer. Loves to explain things, providing insight in complex issues. Watches the ongoing development of the Microsoft Business Intelligence stack closely. Keeping an eye on Big Data, Data Science and IoT.

Leave a Reply

1 Comment

  1. Yeefang Xiao

    Hi Koos

    Nice post and thank you for sharing your experiments and knowledge!
    I setup Cross-validate parameter sweep using neural network regression. As shown in Azure tutorial "How to perform cross-validation with a parameter sweep".... I " Add the Cross-Validate Model module. Connect the output of Partition and Sample to the Dataset input, and connect the output of Tune Model Hyperparameters to the Untrained model input." What puzzles me is that the "mean" evaluation results of cross validate model module does not match the "sweep results" output from "Tune Model Hyperparameters". In my case, the former is much worse than the later. Could you provide some insights toward the issue? Perhaps my understanding of how the sweep results are calculated is wrong. I would think they are the mean absolute error, the mean coefficient of determination,...etc, calculated using k-fold cross validation defined in the "partition and sample" module. Thank you.

Next ArticleAzure ML Thursday 4: ML in Python