28
loading...
This website collects cookies to deliver better user experience
For a complete guide of Feature Selection & Feature Engineering in one book, you can check this link.
Simple models are easier to interpret: It’s much easier to understand the output of a model that uses 10 variables than an output that uses 100 variables.
Shorter training time: Reducing the number of variables reduces the computation cost, speeds up model training, and perhaps most importantly—simpler models tend to have faster prediction times.
Enhanced generalization by reducing overfitting: Oftentimes, many of the variables are just noise with little prediction value. However, the ML model learns from this noise and causes overfitting while simultaneously reducing generalization. By eliminating these irrelevant noisy features, we can substantially improve the generalization of ML models.
Variable redundancy: Features of a given dataset are frequently highly correlated, and we know that highly-correlated features provide the same information, which makes them redundant. In cases like these, we can keep just one feature and remove the redundant features without losing any information. Less redundant data means less opportunity for the model to make noise-based predictions.
For the rest of the articles in this series, we’ll assume that you’ve worked through all the necessary feature cleaning and engineering steps on your dataset.
Feature selection is basically a process that selects and excludes some features without modifying them at all.
Dimensionality reduction modifies or transforms features into a lower dimension. In essence, dimensionality reduction creates a whole new feature space that looks approximately like the first one, but smaller in terms of dimensions.
Combination of a search technique for proposing a new feature subset.
An evaluation measure that scores how well is the different feature subsets.
Filter Methods: Rely on the features’ characteristics without using any machine learning algorithm. Very well-suited for a quick “screen and removal” of irrelevant features.
Wrapper methods: Consider the selection of a set of features as a search problem, then uses a predictive machine learning algorithm to select the best feature subset. In essence, these methods train a new model on each feature subset, which makes it obviously very computationally expensive. However, they provide the best performing feature subset for a given machine learning algorithm.
Embedded methods: Just like the wrapper methods, embedded methods take the interaction of features and models into consideration. They also perform feature selection as part of the model construction process, and they are less computationally expensive.