31
loading...
This website collects cookies to deliver better user experience
MLOps is a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently.
Requirements Engineering (RE) refers to the process of defining, documenting, and maintaining requirements in the engineering design process.
# importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# reading the dataset with pandas
dataset = pd.read("train.csv")
# Exploring the first five rows
data.head()
# printing the number of columns
print(f"There are {len(data.columns)} columns in the dataset")
'MSSubClass',"LotFrontage','YrSold','SaleType','SaleCondition','SalePrice
data_new = data[['MSSubClass','LotFrontage','YrSold','SaleType','SaleCondition','SalePrice']]
# checking for missing values
data_new.info()
# Filling with a constant number
data_new['LotFrontage'] = data_new['LotFrontage'].fillna(-9999)
# check for missing values again
data_new.info()
data_new.info()
result, you will notice that we have two columns with object data type.# using label encoder to encode the categorical columns
from sklearn.preprocessing import LabelEncoder, StandardScaler
# saletype label encoder
lb_st = LabelEncoder()
# salecondition label encoder
lb_sc = LabelEncoder()
lb_st.fit(data_new['SaleType'])
lb_sc.fit(data_new['SaleCondition'])
data_new['SaleType'] = lb_st.transform(data_new['SaleType'])
data_new["SaleCondition"] = lb_sc.transform(data_new['SaleCondition'])
.fit()
method encodes each category to a number, and the .transform()
method transforms the category to number (you need to fit before you can transform).# Separating the dataset into features(X) and target(y)
X = data_new.drop("SalePrice",axis=1)
y = data_new['SalePrice']
# splitting the dataset into train and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train,y_test = train_test_split(X,y)
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
# training the model
lr.fit(X_train,y_train)
from sklearn.metrics import mean_squared_error
np.sqrt(mean_squared_error(lr.predict(X_test),y_test))
# this will give approximately 73038
# joblib is for saving objects for later use
import joblib
joblib.dump(lr,'lr_model')
joblib.dump(lb_sc,'lb_sc')
joblib.dump(lb_st,'lb_st')
joblib.dump()
accepts two default parameters: the object you want to save and the name you wish to save the object with.