Karthikeyan K
4 min readSep 22, 2019

--

Gradient Descent Clearly Explained

Algorithm that searches the minimum or maximum value and updates the value until it get’s converges to minimum or maximum value.We are going to see how it updates the respective things.

So, before getting into Gradient descent for regression model check out the post . Even if you know about how regression model works check out this post it may help

https://medium.com/@kkarthikeyanvk/guide-to-simply-explained-linear-974da3d3c4f

We have to find the Minimum of the cost Function .

We Minimize the cost function using Gradient Descent .

By mathematical formula,

Above Equation is the Gradient Descent equation . We will try to decode the equation .So, that at the end we will be clear about how gradient descent works .

We will start from the First Derivative of cost function.

Parameter of intercept .

What First Derivative does ?

Simple, it gives the slope of the function at that point if you take derivation at theta.

We get a slope of the theta at that point of a convex function. Actually the slope value may be High at that point .

Given different slope value at different angle :

Remember :- x-axis = Intercept of line(c)

y-axis = Least square

From the above image we can see that the dark blue line has higher slope than green ,violet . As you go down in curve at light blue line has a slope of zero . Then again it increases in yellow line . Hence , slope helps to obtain the minimum solution if slope value equal zero.

How to get into that global minimum ?

We have to subtract the intercept parameter (theta) with some value until it gets towards the global minimum .

To fix the value we will use slope to get into global minimum .

Z = alpha*slope

We subtract out Z from that starting theta . That reduces the theta value and the slope. Process is been done until slope is equal to zero .So that it converges to global minimum .

we multiply slope with the alpha known as the Learning rate . In such a way that it takes the steepest step towards the global minimum.

Now what about alpha learning rate ?

Take alpha = 0.001

It take the Smallest Steep every time we iterate the equation i . This may lead to convergence but may take a long time to converge .

Take alpha = 10

It take the biggest steep every time we iterate the equation . This may lead to Divergence means it may cross the global minimum .

Learning rate is kept usually 0.01 and it depends sometimes .

So we subtract out alpha*slope from the original theta value until it converges towards the global minimum .where the cost function is minimum.

Gradient descent is applied parallel to both slope of a line and the intercept .

In terms of Multiple linear regression there will be more than one slope . We now not fit a line we fit a plane .

That’s how gradient descent works .

CHEERS !!!

--

--