Linear Regression Raw Python Implementation
Raw Python Implementation of Simple Linear Regression (Univariate).

This blog is not about teaching Linear Regression or talking about how it works. This blog is simply about implementing the Linear Regression in Raw Python without using any library function (except a few cases).
There are few steps we have to follow to reach our goal, which are,
- Read the dataset
- Split into train and test dataset
- Fit the model
- Predict for the test dataset
- Plot training and testing results
- Model Evaluation
We will be using library functions for the first, second, and fifth steps.
Reading the dataset
In this step, we will read the dataset, and split the columns into the dependent and independent variables.
The dataset could be found here. https://www.kaggle.com/rohankayan/years-of-experience-and-salary-dataset
Splitting into train and test dataset
This part is pretty straightforward. We will use Scikit-learn’s train_test_split function for this part.
Model fitting
This is the most important part of any regression analysis. We will fit our model using the training dataset.
This is the function we need to fit our model.
We need a few more additional functions to calculate the slope and y-intercept as we are not using (mostly) any library or built-in functions.
The following function is to calculate the slope of the predicted line.
To calculate the slope we need to find the covariance of both of the variables, independent, and dependent and the variance of the independent variable.
The following functions are for calculating the covariance, and the variance.
This function below is for calculating the y-intercept of the predicted line.
For this, we need to calculate the mean, and for the mean, we of course need to calculate the sum. The following functions will do the work.
Predicting for the test dataset
This is the part where we will make predictions for our test dataset. We will first calculate the slope and the y-intercept, then find the predictions for the test dataset using the formula of a straight line.
The formula of straignt line, y = a + bx
We are mostly finished as we have trained our model using the training dataset, and made predictions using the testing dataset. In this part, we will plot the training and testing data points. We will use matplotlib for the plotting.


Model Evaluation
Now the only part remaining is the Model Evaluation. We will use three matrices to evaluate the model, MAE (Mean Absolute Error), MSE (Mean Squared Error), and R-Squared score.
MAE and MSE are the two most commonly used regression loss functions. MAE is the sum of absolute differences between our target and predicted variables. MSE is the sum of squared distances between our target variable and predicted values. Both of the measures range from 0 to ∞.
R-Squared, also known as the Coefficient of Determination, is a value between 0 and 1 that measures how well our regression line fits our data. The closer R-Squared is to 1 or 100% the better our model will be at predicting our dependent variable.

The numerator of the formula, is the Residual sum of squared errors (SSres).
The denominator of the formula is the Total sum of squared errors (SStot).
And those are the formulas we need. To sum it all,
And that’s all you need to implement Simple Linear Regression using raw Python.
Thank you.