Linear Regression Raw Python Implementation

3 min readMay 13, 2021

Raw Python Implementation of Simple Linear Regression (Univariate).

This blog is not about teaching Linear Regression or talking about how it works. This blog is simply about implementing the Linear Regression in Raw Python without using any library function (except a few cases).

There are few steps we have to follow to reach our goal, which are,

Read the dataset
Split into train and test dataset
Fit the model
Predict for the test dataset
Plot training and testing results
Model Evaluation

We will be using library functions for the first, second, and fifth steps.

Reading the dataset

In this step, we will read the dataset, and split the columns into the dependent and independent variables.

The dataset could be found here. https://www.kaggle.com/rohankayan/years-of-experience-and-salary-dataset

Splitting into train and test dataset

This part is pretty straightforward. We will use Scikit-learn’s train_test_split function for this part.

Model fitting

This is the most important part of any regression analysis. We will fit our model using the training dataset.

This is the function we need to fit our model.

To fit the model

We need a few more additional functions to calculate the slope and y-intercept as we are not using (mostly) any library or built-in functions.

The following function is to calculate the slope of the predicted line.

To calculate Slope

To calculate the slope we need to find the covariance of both of the variables, independent, and dependent and the variance of the independent variable.

The following functions are for calculating the covariance, and the variance.

To calculate Covariance

To calculate Variance

This function below is for calculating the y-intercept of the predicted line.

To calculate y-intercept

For this, we need to calculate the mean, and for the mean, we of course need to calculate the sum. The following functions will do the work.

To calculate mean

To calculate sum

Predicting for the test dataset

This is the part where we will make predictions for our test dataset. We will first calculate the slope and the y-intercept, then find the predictions for the test dataset using the formula of a straight line.

The formula of straignt line, y = a + bx

To get the predictions

We are mostly finished as we have trained our model using the training dataset, and made predictions using the testing dataset. In this part, we will plot the training and testing data points. We will use matplotlib for the plotting.

Model Evaluation

Now the only part remaining is the Model Evaluation. We will use three matrices to evaluate the model, MAE (Mean Absolute Error), MSE (Mean Squared Error), and R-Squared score.

MAE and MSE are the two most commonly used regression loss functions. MAE is the sum of absolute differences between our target and predicted variables. MSE is the sum of squared distances between our target variable and predicted values. Both of the measures range from 0 to ∞.

R-Squared, also known as the Coefficient of Determination, is a value between 0 and 1 that measures how well our regression line fits our data. The closer R-Squared is to 1 or 100% the better our model will be at predicting our dependent variable.

The numerator of the formula, is the Residual sum of squared errors (SSres).
The denominator of the formula is the Total sum of squared errors (SStot).

To calculate R-Squared

And those are the formulas we need. To sum it all,

And that’s all you need to implement Simple Linear Regression using raw Python.

Thank you.