Machine Learning Workflows and Tools – II
  • Jordi Llobet
  • 2025 Mar 05

Machine Learning Workflows and Tools – II

3.- Machine Learning Workflows

Basically Machine Learning consists of 3 phases:

  • Classifier Definition: Implement the feature representation of the training data. Select the most appropriate type of classifier according to the problem we want to solve.
  • Evaluation: Select the criteria that define whether the classifier is optimal. For example, percentage of correct predictions in the testing phase.
  • Optimization: Define the parameters that provide a better classifier according to the evaluation criteria chosen in the previous phase
Machine Learning Workflows

Figura 3 - Machine Learning Workflows

4.- Feature Representation in Machine Learning

Features are the set of data that define the instances that we will use as training data. Later, through the model defined with Machine Learning with the characteristics of the instance to be predicted, we will obtain the label that defines the new instance.

Feature Representation Examples:

  • If we want to classify which emails may be spam, a Feature Representation could be a list of words with an attribute indicating how frequently those words appear in the email.
  • For image classification the Feature Representation could be a matrix with the color of each pixel.
  • If we want to classify types of fish (or types of fruit as in the diagram in figure 1) a Feature Representation could be a set of attributes with their values.
Machine Learning - Example of feature representation

Figura 4 - Example of feature representation

5.- Python Tools for Machine Learning

Python is the language used to implement Machine Learning models. Within Python, the libraries used by Machine Learning are the following:

scikit-learn:

is an open source library that unifies the main algorithms and functions of Machine Learning under a single framework. In this way, it greatly facilitates all stages of creation, evaluation and optimization of predictive models. Links to the library documentation:

SciPy Library:

Provides a variety of useful scientific computing tools. These include statistical distributions, function optimization, linear algebra, and a variety of specialized mathematical functions. With scikit-learn, support is provided for sparse matrices, a way of storing large tables consisting mostly of zeros. Links to the library documentation:

NumPy Library:

Provides fundamental data structures used by scikit-learn, particularly multidimensional arrays. In general, data input to scikit-learn will be in the form of a NumPy array. Links to library documentation:

Pandas Library:

Provides key data structures such as DataFrame. Additionally, it supports import and export reading and writing data in different formats. Links to the library documentation:

Finally, the following libraries for graphical data representation:

comillas

Features are the set of data that define the instances that we will use as training data.

There are no comments yet.
Leave a comment
Your message is required.