How to Perform Linear Regression in Google Sheets

Video google sheet linear regression

Do you want to unlock the power of data analysis in Google Sheets? Look no further! In this article, we’ll explore the fascinating world of linear regression using Google Sheets. This powerful method allows us to quantify the relationship between explanatory variables and a response variable. Whether you’re a data enthusiast, a student, or a business professional, linear regression can provide valuable insights into your data.

Understanding Linear Regression

Before we dive into the details, let’s understand the basics of linear regression. When we have a single explanatory variable, we use simple linear regression. On the other hand, when we have two or more explanatory variables, we use multiple linear regression.

To perform linear regression in Google Sheets, we can utilize the LINEST() function. This function has the following syntax:

LINEST(known_y's, known_x's, calculate_b, detailed)

Here’s what each parameter represents:

  • known_y's: The range of response variable values.
  • known_x's: The range of explanatory variable values.
  • calculate_b: A boolean value indicating whether to calculate the y-intercept. This is true by default and we’ll leave it as is for our linear regression.
  • detailed: A boolean value indicating whether to provide additional regression statistics in addition to slope and intercept. This is false by default, but we’ll specify it as true in our examples.

Now that we know the basics, let’s explore how to use linear regression in practice.

Simple Linear Regression in Google Sheets

Suppose we want to understand the relationship between the number of study hours and exam scores. To study this relationship, we can perform simple linear regression using the number of study hours as the explanatory variable and the exam score as the response variable.

Take a look at the screenshot below to see how to perform simple linear regression using a dataset of 20 students. The formula used in cell D2 is:

=LINEST(B2:B21, A2:A21, TRUE, TRUE)

Linear Regression in Google Sheets

The annotated screenshot below provides explanations for the results:

Regression Results in Google Sheets

Let’s interpret the key numbers in the results:

  • R-squared: 0.72725 – This is known as the coefficient of determination. It represents the proportion of the response variable’s variance explained by the explanatory variable. In this example, approximately 72.73% of the variation in exam scores can be explained by the number of study hours.
  • Standard Error: 5.2805 – This is the average distance at which observed values deviate from the regression line. In this example, observed values deviate on average by 5.2805 units from the regression line.
  • Coefficients: The coefficients provide the necessary numbers to write the estimated regression equation. In this example, the estimated regression equation is:
Exam Score = 67.16 + 5.2503*(Hours)

We interpret the coefficient for hours as meaning that for each additional study hour, the exam score is expected to increase by an average of 5.2503. We interpret the intercept coefficient as meaning that the expected exam score for a student who doesn’t study is 67.16.

We can use this estimated regression equation to calculate the expected exam score for a student based on the number of study hours. For example, a student who studies for three hours should obtain a score of 82.91:

Exam Score = 67.16 + 5.2503*(3) = 82.91

Multiple Linear Regression in Google Sheets

Now let’s explore another scenario. Suppose we want to investigate whether the number of study hours and the number of preparatory exams passed affect a student’s score on a university entrance exam.

To study this relationship, we can perform multiple linear regression using the number of study hours and the number of preparatory exams passed as explanatory variables, and the score on the entrance exam as the response variable. Take a look at the screenshot below to see how to perform multiple linear regression using a dataset of 20 students. The formula used in cell E2 is:

=LINEST(C2:C21, A2:B21, TRUE, TRUE)

Multiple Linear Regression in Google Sheets

Here’s how to interpret the key numbers in the results:

  • R-squared: 0.734 – This is the coefficient of determination, representing the proportion of the response variable’s variance explained by the explanatory variables. In this example, 73.4% of the variation in exam scores can be explained by the number of study hours and the number of preparatory exams passed.
  • Standard Error: 5.3657 – This is the average distance at which observed values deviate from the regression line. In this example, observed values deviate on average by 5.3657 units from the regression line.
  • Estimated Regression Equation: We can use the coefficients from the model results to create the following estimated regression equation:
Exam Score = 67.67 + 5.56*(Hours) - 0.60*(Preparatory Exams)

We can use this estimated regression equation to calculate the expected exam score for a student based on the number of study hours and the number of preparatory exams passed. For example, a student who studies for three hours and passes one preparatory exam should obtain a score of 83.75:

Exam Score = 67.67 + 5.56*(3) - 0.60*(1) = 83.75

Take Your Data Analysis to the Next Level

Now that you’ve learned how to perform linear regression in Google Sheets, you’re well on your way to becoming an expert in data analysis. But wait, there’s more! To further enhance your data analytics skills, check out these additional tutorials:

Ready to unlock the true potential of data analysis with Google Sheets? Start using these methods now and take your data analysis to the next level! And remember, if you ever need more data analysis tips and tricks, visit Crawlan.com. Happy analyzing, my friend!

Related posts