Finding Trends in Data – GoLang Linear Regression

Whether you are trying to pick the best stocks, or evaluate the relation between climate and co2 emissions, it is always helpful to see a mathematic trend. This article will go through the process of building an algorithm to assist you in finding linear trends in data sets. If you are more interested in non-linear trends, you can view my other article on Finding Trends in Data – Non-Linear.

What is a trend? It is a”best-fit” linear equation for a data set.

y = mx + b

In this equation, you have a trend relationship between x and a slope (m) multiplied together, plus a constant (b), that results in its y value, allowing for a table or plot of x with its corresponding y values.

What we Seek

What we want in the end is a slope and y-intercept that relates to our data and a line that best fits that data.

Lets build a structure to hold our equation:

type Equation struct {
	M, B float64
}

As is seen, we will hold our two constants (slope[M] and y-intercept[B]) in the Equation structure. Note that the variables inside the Equation structure is capitalized and thus public.

We have what we seek, but how can we find these two values? Given a dataset, one can easily find these numbers with the aid of a computer algorithm. Lets see what variables will be needed.

Our Equations

M = (total sum y)*(total sum x^2)-(total sum x)*(total sum x * y)/n(total sum x^2)-(total sum x)^2

B = n(total sum x * y)-(total sum x)*(total sum y)/n(total sum x^2)-(total sum x)^2

To translate this into code, we can simply create a private structure to assist us in keeping track of everything.

All the variables we Need

So in looking at the linear regression formulas, we will need

  1. (total sum x) = x1 + x2 + … + xn = xTotal
  2. (total sum y) = y1 + y2 + … + yn = yTotal
  3. (total sum x^2) = x21 + x22 + … + x2n = xSqTotal
  4. (total sum xy) = x1y1 + x2y2 + … + xnyn = xYTotal
  5. n = the number of values in the array x

Then we build a structure to hold everything we need.

type LinearRegression struct {
	x, y []float64
	xTotal, yTotal, xSqTotal, xYTotal, numerator, denominator,
	 slope, yInt float64	
}

x and y: will just hold onto a copy of our dataset input

we will hold our totals in various total variables

other important variables needed for our calculations.

Return What we Want

Between our two structures that we have created (Equation and LinearRegression) we now have the desired output and the database to hold our calculations.

We are interested only in getting an equation. We will have to build a function that takes the x and y arrays of data, and spits out the corresponding Equation structure.

func (e *Equation) ReturnEquationParts(x,y []float64) {
	var l LinearRegression
	l.LinearRegressionInit(x, y)
	e.M = l.slope
	e.B = l.yInt
}

Now it is obvious that efficiency could greatly be improved by removing all non-essentials. This is a lesson, splitting up what we want (Equation), and how to get there (LinearRegression), helps in visualizing the process.

Linear Regression

When you look at the calculation requirements from the equation above, you see that the hardest part is finding totals. So lets build a function that finds all of these totals, and then calculate everything.

func (l *LinearRegression)LinearRegressionInit(x,y []float64){
	l.x = x
	l.y = y
	for i, _ := range l.x {
		l.xTotal = l.xTotal + l.x[i]
		l.yTotal = l.yTotal + l.y[i]
		l.xSqTotal = l.xSqTotal + math.Pow(l.x[i], 2)
		l.xYTotal = l.xYTotal + (l.x[i] * l.y[i])
	}
	
	l.numerator = (float64(len(l.x)) * l.xYTotal) - (l.xTotal * l.yTotal) 
	l.denominator = (float64(len(l.x)) * l.xSqTotal) - (math.Pow(l.xTotal, 2))
	yIntNumerator := (l.yTotal * l.xSqTotal) - (l.xTotal * l.xYTotal)
	l.yInt = yIntNumerator / l.denominator
	l.slope = l.numerator / l.denominator
}

As you can see, we simply loop through the arrays and build out all of our sums in one loop. Then you place the real numbers that were totaled up, into the equation. Calculate the denominator (they are the same for each equation). Last, divide each numerator by the denominator to get the value of slope and y-intercept.

Build main()

That’s it. To actually use this code, we can create a simple main method.This can also be added to any existing functions.

func main(){
	x := []float64{0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0}
	y := []float64{0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0}


	var equation Equation
	equation.ReturnEquationParts(x,y)
	fmt.Println(`y = `, equation.M, `x + `, equation.B)	
}

This particular set of x and y arrays, should return:

y =  1 x +  0

Putting it all together

package main

import "math"
import "fmt"

type LinearRegression struct {
	x, y []float64
	xTotal, yTotal, xSqTotal, xYTotal, numerator, denominator,
	 slope, yInt float64
	
}

type Equation struct {
	M, B float64
}

func (e *Equation) ReturnEquationParts(x,y []float64) {
	var l LinearRegression
	l.LinearRegressionInit(x, y)
	e.M = l.slope
	e.B = l.yInt
}

func (l *LinearRegression)LinearRegressionInit(x,y []float64){
	l.x = x
	l.y = y
	//fmt.Println(len(l.X))
	for i, _ := range l.x {
		l.xTotal = l.xTotal + l.x[i]
		l.yTotal = l.yTotal + l.y[i]
		l.xSqTotal = l.xSqTotal + math.Pow(l.x[i], 2)
		l.xYTotal = l.xYTotal + (l.x[i] * l.y[i])
	}
	
	l.numerator = (float64(len(l.x)) * l.xYTotal) - (l.xTotal * l.yTotal) 
	l.denominator= (float64(len(l.x)) * l.xSqTotal) - (math.Pow(l.xTotal, 2))
	yIntNumerator := (l.yTotal * l.xSqTotal) - (l.xTotal * l.xYTotal)
	l.yInt = yIntNumerator / l.denominator
	l.slope = l.numerator / l.denominator
}

func main(){
	x := []float64{0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0}
	y := []float64{0.0, 2.0, 7.0, 8.0, 3.0, 2.0, 4.0}


	var equation Equation
	equation.ReturnEquationParts(x,y)
	fmt.Println(`y = `, equation.M, `x + `, equation.B)
	
}

Leave a Reply