Protopipe

Wine classification with neural networks

In this introductory tutorial we will train and test a single layer neural network for classifying the Wine Data Set from the UCI Machine Learning Repository.

Since this is an introductory tutorial, we will use the validation loss as an indicator of the model’s performance—instead of evaluating the model against an unseen test dataset—. We will use that value to fine tune some hyperparameters, such as the batch size or the learning rate of the optimizer.

The final project is available for download here.

1. Creating a new project

Empty projects screen

In the projects screen, press the Create new project button.

"Create new project" button

Write a name for the project.

"Create new project" dialog

And press Start. The work screen will appear.

Empty work screen

2. Uploading the data

Download the file wine.data from here.

Now we need to upload the file to the project. Press the Create card button.

"Create card" button

Press Upload file and select the file you just downloaded. A new entry, wine.data, will appear in the menu.

Data tab

Press the wine.data entry to create an Open file card.

"Open file" card

3. Preparing the data

According to the dataset description, wine.data is a comma-separated values file with the following columns:

Class (1, 2 or 3)
Alcohol
Malic acid
Ash
Alcalinity of ash
Magnesium
Total phenols
Flavanoids
Nonflavanoid phenols
Proanthocyanins
Color intensity
Hue
OD280/OD315 of diluted wines
Proline

Our model will take as input columns 2 to 14 and will give as output a prediction of the first column.

First of all we need to convert the file stream into a 2D tensor (also known as table).

Press the Create card button.

"Create card" button

In the Modules tab, navigate to Files and formats and press Read as CSV.

"Create card" menu

A new Read as CSV card will appear in the blueprint.

"Read as CSV" card

Connect the Stream output from Open file to the Stream input of Read as CSV.

Current pipeline

Now we just have to configure the parameters of the CSV reader—delimiter and header row(s)—. Getting a glimpse of the file contents can help us in this task.

Select the Open file card by pressing on it and press the Preview output button on the top bar:

"Preview output" button

A dialog will appear showing a preview of the contents of wine.data.

"Preview output" dialog

As we can see, this file has no headers and the values are delimited by commas (,). This is the current configuration of our CVS reader, so it is ready to process the file.

Select the Read as CSV card and press the Preview output button on the top bar. The preview dialog will show a table this time.

"Preview output" dialog

Now we need to split the input and output of the training table. Press the Create card button, navigate to Tables and press Split into X and Y by columns. A new Split into X and Y by columns card will appear in the blueprint.

"Split into X and Y by columns" card

Connect the Table output from Read as CSV to the Data input of Split into X and Y by columns.

Current pipeline

We want to predict the value of the 1st column of the table (index 0) using columns 2 to 14 (indexes 1 to 13) as input for our model. Set X column(s) to “1:13” and Y column(s) to “0” in Split into X and Y by columns.

"Split into X and Y by columns" card configured

In order to obtain better results, it is recommended to normalize the data before giving it to a neural network. In our case, we will standard scale all the inputs.

Press the Create card button, navigate to Normalization and press Standard scale columns. A new Standard scale columns card will appear in the blueprint.

"Standard scale columns" card

Connect the X output from Split into X and Y by columns to the Data input of Standard scale columns.

Current pipeline

Set Column(s) to “0:12” in Standard scale columns.

"Standard scale columns" card configured

Since the dataset assigns one among three classes to each wine, our neural network will have 3 outputs. For a given input, they will represent the degree of membership to each class.

At this point the Y output of Split into X and Y by columns is a table with just one column, that takes integer values between 1 and 3 (both included). We need to one-hot encode that, so the result will be a table with three columns with only 0 and 1 values.

Press the Create card button, navigate to Tables and press One-hot encode columns. A new One-hot encode columns card will appear in the blueprint.

"One-hot encode columns" card

Connect the Y output from Split into X and Y by columns to the Data input of One-hot encode columns.

Current pipeline

Set Column(s) to “0” in One-hot encode columns.

"One-hot encode columns" card configured

Our data is ready. Now we can start working on the training part of the pipeline.

4. Training the model

First of all we need to create our neural network. Press the Create card button, navigate to Models ≫ Neural networks and press Create neural network. A new Create neural network card will appear in the blueprint.

"Create neural network" card

Press the Edit button in the card to open the neural network editor.

Neural network editor

Press the Create card button in the neural network editor, navigate to Layers and press Dense layer. A new Dense layer card will appear in the neural network editor.

"Dense layer" card

Connect the Data output from Input to the Input input of Dense layer.

Current pipeline

Since our neural network has 13 inputs, so set Shape to “13” in Input.

"Input" card configured

Our neural network has 3 outputs, so set Units to “3” and also Activation to “Sigmoid” in Dense layer.

"Dense layer" card configured

Finally, connect the Output output from Dense layer to the Data input of Output.

Current pipeline

Our neural network is complete. Press Save to close the dialog.

Press the Create card button, navigate to Models ≫ Neural networks and press Train neural network. A new Train neural network card will appear in the blueprint.

"Train neural network" card

We will use 15% of the dataset for validation during 10 epochs. Set Validation split to “0.15” and Epochs to “10” in Train neural network.

"Train neural network" card configured

As we can see, we have to provide an optimizer and a loss function for our training.

Press the Create card button, navigate to Models ≫ Neural networks ≫ Optimizers and press Create SGD neural network optimizer. A new Create SGD neural network optimizer card will appear in the blueprint.

"Create SGD neural network optimizer" card

Press the Create card button, navigate to Models ≫ Neural networks ≫ Losses and press Create neural network loss function. A new Create neural network loss function card will appear in the blueprint.

"Create neural network loss function" card

Now let’s connect all the inputs:

Connect the Neural network output from Create neural network to the Neural network input of Train neural network.
Connect the Scaled data output from Standard scale columns to the Training X input of Train neural network.
Connect the Encoded data output from One-hot encode columns to the Training Y input of Train neural network.
Connect the Optimizer output from Create SGD neural network optimizer to the Optimizer input of Train neural network.
Connect the Function output from Create neural network loss function to the Loss input of Train neural network.

Current pipeline

As we can see, the training process of a neural network depends on several parameters that can be tuned (e.g., batch size, epochs, learning rate, number of layers, number of units in each layer). We do not know what combination of values lead to the best accuracy, or what effect each parameter has on the overall performance, but that is not a problem. Protopipe has a way of answering this kind of questions.

In this tutorial we will analyze the effect of the batch size and the learning rate on the training process.

Press the Create card button, navigate to Parameters and press Integer parameter. A new dialog will appear asking for the name of this parameter.

Dialog asking for the name of the parameter

Write “Batch size” and press Set. A new Integer parameter card will appear in the blueprint.

Connect the Value output from Integer parameter to the Batch size input of Train neural network.

Current pipeline

We must specify a domain of possible values for this parameter. In this tutorial we will try 8, 16 and 32, so set Domain to “8, 16, 32” in Integer parameter.

"Integer parameter" card configured

Now we will do the same for learning rate. Press the Create card button, navigate to Parameters and press Float parameter. Name this parameter “Learning rate”.

Connect the Value output from Float parameter to the Learning rate input of Create SGD neural network optimizer.

Current pipeline

The domain of possible values for this parameter will be between 0.01 and 0.15, so set Domain to “0.01:0.15” in Float parameter.

"Float parameter" card configured

Finally, in order to store the validation loss in our final report, we need to return it. Press the Create card button, navigate to Returns and press Return float. A new dialog will appear asking for the name of this return value.

Write “Validation loss” and press Set. A new Return float card will appear in the blueprint.

Finally, connect the Validation loss output from Train neural network to the Value input of Return float.

Current pipeline

We are ready to run the pipeline.

5. Running the pipeline

Press the Fine tune settings button on the top bar.

"Fine tune settings" button

A new panel will appear at the right side of the screen.

"Fine tune settings" panel

In Assignation of the values choose “Brute-force” and then set “5 samples” for the Learning rate float parameter.

"Fine tune settings" panel

Press the Start processing button to run all the experiments.

"Start processing" button

A new panel will appear at the right side of the screen, showing real-time information about the state of the processing.

"Processing" panel

When the processing successfully finishes a new dialog will appear.

"Processing finished" dialog

Press See report to open the Reports screen, that will contain a table that summarizes all the experiments performed.

Table of performed experiments

Sort the table by “Validation loss” in descending order to check what model performed better.

Table of performed experiments sorted by mean accuracy

6. Analysis

On the left side panel, under the most recent report node, press Cross-sectional analysis.

Cross-sectional analysis screen

In this screen you can compare the effect of a parameter (X axis) on a return value (Y axis).

For example, here we can see how the Learning rate affects Validation loss:

Learning rate vs. Validation loss

7. Conclusion

In this tutorial we designed a pipeline for training and testing a classifier and analyzed the obtained results after performing multiple experiments with different combinations of hyperparameter values.

This tutorial can be extended by fine tuning other parameters or generating reports with the accuracy obtained for each class separately.