Calculating cost with gradient descent and learning rate

Task


Results and Observations

The below modfel was run as per the instructions above, with alterations to the learning rate vale and number of iterations.

100 iterations and learning rate = 0.08; cost = 0.004

50 iterations and learning rate = 0.08; cost = 0.065

50 iterations and learning rate = 0.04; cost = 0.247

100 iterations and learning rate = 0.04; cost = 0.063

200 iterations and learning rate = 0.04; cost = 0.004

600 iterations and learning rate = 0.02; cost = 0.0002

600 iterations and learning rate = 0.02; cost = 7.15 x 10*-8

100 iterations and learning rate = 0.09; cost is huge!

It was observed that when the learning rate was increased too far (0.09) that ‘m’ (slope) ans ‘b’ (intercept) fluctate massively between itterations indicating that the model is taking steps that are far too large and result in the minumum being missed - in this case by a large margin.

Decreasing iterations increases the cost which was as expected as there is less learning opportunity for the model.

Decreasing learning rate whilst keeping iterations constant increases the cost. The steps taken are smaller and therefore it takes longer to reach the minimum

Doubling the iterations and halving the learning rate results in the same overal cost. Each step was half the size so it would need twice as many ‘steps’ to reach the same point

If the learning rate is too high then the model makes inappropriately large steps and does not descend towards the minimum but instead ‘jumps’ around as can be seen at 600 iterations with learning rate 0.02. Multiple iterations with a small leanring rate may be very accurate but in real life sitations with larger data sets would be very slow and computationally expensive. Balance is required.

# code credit:codebasics https://codebasics.io/coming-soon

import numpy as np

def gradient_descent(x,y):
    m_curr = b_curr = 0
    iterations = 20    #change value
    n = len(x)
    learning_rate = 0.08 #change value

    for i in range(iterations):
        y_predicted = m_curr * x + b_curr
        cost = (1/n) * sum([val**2 for val in (y-y_predicted)])
        md = -(2/n)*sum(x*(y-y_predicted))
        bd = -(2/n)*sum(y-y_predicted)
        m_curr = m_curr - learning_rate * md
        b_curr = b_curr - learning_rate * bd
        print ("m {}, b {}, cost {} iteration {}".format(m_curr,b_curr,cost, i))

x = np.array([1,2,3,4,5])
y = np.array([5,7,9,11,13])

gradient_descent(x,y)
m 4.96, b 1.44, cost 89.0 iteration 0
m 0.4991999999999983, b 0.26879999999999993, cost 71.10560000000002 iteration 1
m 4.451584000000002, b 1.426176000000001, cost 56.8297702400001 iteration 2
m 0.892231679999997, b 0.5012275199999995, cost 45.43965675929613 iteration 3
m 4.041314713600002, b 1.432759910400001, cost 36.35088701894832 iteration 4
m 1.2008760606719973, b 0.7036872622079998, cost 29.097483330142282 iteration 5
m 3.7095643080294423, b 1.4546767911321612, cost 23.307872849944438 iteration 6
m 1.4424862661541864, b 0.881337636696883, cost 18.685758762535738 iteration 7
m 3.4406683721083144, b 1.4879302070713722, cost 14.994867596913156 iteration 8
m 1.6308855378034224, b 1.0383405553279617, cost 12.046787238456794 iteration 9
m 3.2221235247119777, b 1.5293810083298451, cost 9.691269350698109 iteration 10
m 1.7770832372205707, b 1.1780607551353204, cost 7.8084968312098315 iteration 11
m 3.0439475772474127, b 1.5765710804477953, cost 6.302918117062937 iteration 12
m 1.8898457226770244, b 1.3032248704973899, cost 5.098330841763168 iteration 13
m 2.898169312926714, b 1.6275829443328358, cost 4.133961682056365 iteration 14
m 1.9761515088959358, b 1.4160484030347593, cost 3.361340532576948 iteration 15
m 2.7784216197824048, b 1.6809279342791488, cost 2.741808050753047 iteration 16
m 2.0415541605113807, b 1.5183370872989306, cost 2.244528230107478 iteration 17
m 2.6796170361078637, b 1.735457156285639, cost 1.8449036666988363 iteration 18
m 2.090471617540917, b 1.611567833948162, cost 1.5233119201782324 iteration 19