Artificial neural network (ANN) |
Technique that is build up of a
network of interconnected nodes (neurons) that process signals over
weighted arcs. Central to deep learning with a wide range of complex
variations |
Convolutional neural network (CNN) |
Class of neural networks that is
charactericed by convolution filters that slide over input data to
extract relevant patterns. Frequently used in in deep learning-based
image analysis |
Decision tree |
Among the most popular ML algorithms that learns to
split data on certain conditions of variables to classify or predict an
outcome variable, creating an hierarchical tree shape with predictions
in leaf nodes |
Generative adverserial network (GAN) |
Class of deep neural networks for
the generation of new data samples. GANs has formed a rapidly advancing
field since the 2016 introduction, used in applications such as
deepfakes |
Gradient boosting |
Machine learning model type that uses an ensemble of
weak prediction models (often decision trees), optimized over an
differentiable loss function. XGBoost and LightGBM are popular
algorithms in this family |
k-means |
Unsupervised clustering method that aims to partition
observations into k clusters, where each observation is mapped to the
closest cluster centroid |
LightGBM |
Popular ML algorithm of relatively recent origin (2016),
similar performance to XGBoost but with more efficient training due to
improved decision tree splitting strategy |
Natural language processing (NLP) |
The discipline in AI involved in the
understanding of written and spoken human language |
Overfitting |
A model that captures the training data too closely,
hereby hindering generalization and prediction on future
data |
Principal component analysis (PCA) |
Dimensionaliry reduction technique
that uses linear transformation to map data to a lower dimension than
the initial data |
Random forest |
Popular ML algorithm that builds an ensemble of decision
trees, improving on the performance and generalizability of single
decision trees |
Support vector machine (SVM) |
Supervised model that aims to find the
optimal hyperplane that best seperates different categories of
observations |
Tabular data |
Data that is organized in a table with rows and
column |
Transfer learning |
Technique to improve model learning by leveraging
knowledge gained on a related problem. Often used for recalibrating
large-scale pre-trained deep learning models |
Unstructured data |
Data that has an internal structure but one that is
not represented in a row-column table, such as image, text and
audio |
XGBoost |
Popular ML algorithm that uses gradient boosting and builds
decision trees iteratively, often delivering best-of-class performance
and fast model training |