49
loading...
This website collects cookies to deliver better user experience
Logistic Regression is not a regression algorithm, its name does consist of the word "Regression" but it's a classification algorithm.
Note:- X0 is basically 1, we will it later why?
sigmoid
function is very similar to that of Linear Regression. Let's now move ahead and define the cost function for Logistic Regression.There's a negative sign in the original cost function because when training the algorithm we want probabilities to be large but here we are representing it to minimize the cost.
minimise loss => max log probability
Calculation of gradients from cost function is demonstrated in 2nd Article.
sigmoid function
.LogisticRegression
class using Python.Note: You can find all the codes for this article from here. It's highly recommended to follow the Jupyter notebook while going through this section.
y=wx+b
. So this b
gets handled by this extra column of 1s with matrix multiplication of features and parameters arrays.class LogisticRegression:
def __init__(self) -> None:
self.X = None
self.Y = None
self.parameters = None
self.cost_history = []
self.mu = None
self.sigma = None
def sigmoid(self, x):
z = np.array(x)
g = np.zeros(z.shape)
g = 1/(1 + np.exp(-z) )
return g
def calculate_cost(self):
"""
Returns the cost and gradients.
parameters: None
Returns:
cost : Caculated loss (scalar).
gradients: array containing the gradients w.r.t each parameter
"""
m = self.X.shape[0]
z = np.dot(self.X, self.parameters)
z = z.reshape(-1)
z = z.astype(np.float128, copy=False)
y_hat = self.sigmoid(z)
cost = -1 * np.mean(self.Y*(np.log(y_hat)) + (1-self.Y)*(np.log(1-y_hat)))
gradients = np.zeros(self.X.shape[1])
for n in range(len(self.parameters)):
temp = np.mean((y_hat-self.Y)*self.X[:,n])
gradients[n] = temp
# Vectorized form
# gradients = np.dot(self.X.T, error)/m
return cost, gradients
def init_parameters(self):
"""
Initialize the parameters as array of 0s
parameters: None
Returns: None
"""
self.parameters = np.zeros((self.X.shape[1],1))
def feature_normalize(self, X):
"""
Normalize the samples.
parameters:
X : input/feature matrix
Returns:
X_norm : Normalized X.
"""
X_norm = X.copy()
mu = np.mean(X, axis=0)
sigma = np.std(X, axis=0)
self.mu = mu
self.sigma = sigma
for n in range(X.shape[1]):
X_norm[:,n] = (X_norm[:,n] - mu[n]) / sigma[n]
return X_norm
def fit(self, x, y, learning_rate=0.01, epochs=500, is_normalize=True, verbose=0):
"""
Iterates and find the optimal parameters for input dataset
parameters:
x : input/feature matrix
y : target matrix
learning_rate: between 0 and 1 (default is 0.01)
epochs: number of iterations (default is 500)
is_normalize: boolean, for normalizing features (default is True)
verbose: iterations after to print cost
Returns:
parameters : Array of optimal value of weights.
"""
self.X = x
self.Y = y
self.cost_history = []
if self.X.ndim == 1: # adding extra dimension, if X is a 1-D array
self.X = self.X.reshape(-1,1)
is_normalize = False
if is_normalize:
self.X = self.feature_normalize(self.X)
self.X = np.concatenate([np.ones((self.X.shape[0],1)), self.X], axis=1)
self.init_parameters()
for i in range(epochs):
cost, gradients = self.calculate_cost()
self.cost_history.append(cost)
self.parameters -= learning_rate * gradients.reshape(-1,1)
if verbose:
if not (i % verbose):
print(f"Cost after {i} epochs: {cost}")
return self.parameters
def predict(self,x, is_normalize=True, threshold=0.5):
"""
Returns the predictions after fitting.
parameters:
x : input/feature matrix
Returns:
predictions: Array of predicted target values.
"""
x = np.array(x, dtype=np.float64) # converting list to numpy array
if x.ndim == 1:
x = x.reshape(1,-1)
if is_normalize:
for n in range(x.shape[1]):
x[:,n] = (x[:,n] - self.mu[n]) / self.sigma[n]
x = np.concatenate([np.ones((x.shape[0],1)), x], axis=1)
return [1 if i > threshold else 0 for i in self.sigmoid(np.dot(x,self.parameters))]
sigmoid
: We added this new method to calculate the sigmoid of the continuous values generated from the linear hypothesis i.e. from θTX to get the probabilities.calculate_cost
: We change the definition of this function because our cost function is changed too, it's not confusing if you are well aware of the formulas I gave and the numpy
library then it won't be difficult for you to understand.predict
: This function takes the input and returns the array of predicted values 0 and 1. There's an extra parameter threshold
which had a default value of 0.5, if the predicted value > 0.5 then it'll predict 1 otherwise 0 for the predicted array. You can change this threshold
according to your confidence level.All the codes and implementations are provided in this jupyter notebook, follow it for better understanding in this section.
LogisticRegression
class and try it out.scipy.optimize.minimize
function bypassing the cost function into it.costFunction
, initial_theta
(initially 0s) and my X
and Y
as arguments. It easily calculated the optimal parameters in 0.3 seconds much faster than gradient descent which took about 6.5 seconds.Note: costFunction
is similar to what we have in our class method as calculate_cost
, I just put it outside to show you the work of scipy.optimize.minimize
function.
LogisticRegression
class on the student's dataset. Let's move ahead and understand the problem of overfitting in the next section. Till then have a short 5-minute break.mapFeature
that take individual features as input and return new transformed features. If you wanna know how it's working then consider referring to the notebook and it's recommended to follow it while reading this article.
Calculating of gradients from cost function is demonstrated in 2nd Article.
calculate_cost
method because only this method is responsible for calculating both cost and gradients. The modified version is shown below:class RegLogisticRegression:
def __init__(self) -> None:
self.X = None
self.Y = None
self.parameters = None
self.cost_history = []
self.mu = None
self.sigma = None
def sigmoid(self, x):
z = np.array(x)
g = np.zeros(z.shape)
g = 1/(1 + np.exp(-z) )
return g
def sigmoid_derivative(self, x):
derivative = self.sigmoid(x) * (1 - self.sigmoid(x))
return derivative
def calculate_cost(self, lambda_):
"""
Returns the cost and gradients.
parameters: None
Returns:
cost : Caculated loss (scalar).
gradients: array containing the gradients w.r.t each parameter
"""
m = self.X.shape[0]
z = np.dot(self.X, self.parameters)
z = z.reshape(-1)
z = z.astype(np.float128, copy=False)
y_hat = self.sigmoid(z)
cost = -1 * np.mean(self.Y*(np.log(y_hat)) + (1-self.Y)*(np.log(1-y_hat))) + lambda_ * (np.sum(self.parameters[1:]**2))/(2*m)
gradients = np.zeros(self.X.shape[1])
for n in range(len(self.parameters)):
if n == 0:
temp = np.mean((y_hat-self.Y)*self.X[:,n])
else:
temp = np.mean((y_hat-self.Y)*self.X[:,n]) + lambda_*self.parameters[n]/m
gradients[n] = temp
# gradients = np.dot(self.X.T, error)/m
return cost, gradients
def init_parameters(self):
"""
Initialize the parameters as array of 0s
parameters: None
Returns:None
"""
self.parameters = np.zeros((self.X.shape[1],1))
def feature_normalize(self, X):
"""
Normalize the samples.
parameters:
X : input/feature matrix
Returns:
X_norm : Normalized X.
"""
X_norm = X.copy()
mu = np.mean(X, axis=0)
sigma = np.std(X, axis=0)
self.mu = mu
self.sigma = sigma
for n in range(X.shape[1]):
X_norm[:,n] = (X_norm[:,n] - mu[n]) / sigma[n]
return X_norm
def fit(self, x, y, learning_rate=0.01, epochs=500, lambda_=0,is_normalize=True, verbose=0):
"""
Iterates and find the optimal parameters for input dataset
parameters:
x : input/feature matrix
y : target matrix
learning_rate: between 0 and 1 (default is 0.01)
epochs: number of iterations (default is 500)
is_normalize: boolean, for normalizing features (default is True)
verbose: iterations after to print cost
Returns:
parameters : Array of optimal value of weights.
"""
self.X = x
self.Y = y
self.cost_history = []
if self.X.ndim == 1: # adding extra dimension, if X is a 1-D array
self.X = self.X.reshape(-1,1)
is_normalize = False
if is_normalize:
self.X = self.feature_normalize(self.X)
self.X = np.concatenate([np.ones((self.X.shape[0],1)), self.X], axis=1)
self.init_parameters()
for i in range(epochs):
cost, gradients = self.calculate_cost(lambda_=lambda_)
self.cost_history.append(cost)
self.parameters -= learning_rate * gradients.reshape(-1,1)
if verbose:
if not (i % verbose):
print(f"Cost after {i} epochs: {cost}")
return self.parameters
def predict(self,x, is_normalize=True, threshold=0.5):
"""
Returns the predictions after fitting.
parameters:
x : input/feature matrix
Returns:
predictions : Array of predicted target values.
"""
x = np.array(x, dtype=np.float64) # converting list to numpy array
if x.ndim == 1:
x = x.reshape(1,-1)
if is_normalize:
for n in range(x.shape[1]):
x[:,n] = (x[:,n] - self.mu[n]) / self.sigma[n]
x = np.concatenate([np.ones((x.shape[0],1)), x], axis=1)
return [1 if i > threshold else 0 for i in self.sigmoid(np.dot(x,self.parameters))]
RegLogisticRegression
class. Let's address the previous problem of overfitting on polynomial regression by using a set of values for λ to pick the right one.