Class03 Answer:

Calculate RMSE for that line.

RMSE is an acronym for Root Mean Square Error.


This script should calculate RMSE of straight line between first and last prices of 2016.


import pandas as pd
import numpy  as np

csvfile_s = ''
cp_df     = pd.read_csv(csvfile_s).sort_values(['cdate'])
cp2016_sr = (cp_df.cdate > '2016') & (cp_df.cdate < '2017')
cp2016_df = cp_df[['cdate','cp']][cp2016_sr]
x1x0_i    = cp2016_df.index.size
y1y0_f    = cp2016_df.iloc[-1].cp-cp2016_df.iloc[0].cp
m_f       = y1y0_f / x1x0_i
b_f       = cp2016_df.iloc[0].cp
# My equation for straight line:
def yval(x_in):
    return m_f * x_in + b_f
# I should collect points to plot straight line:
yvals_l = [ yval(x_i) for x_i in range(x1x0_i) ]
# Add the points to the DataFrame:
cp2016_df['sl'] = yvals_l

# Goog: In Pandas how to combine columns?
# I should square the difference of each error:
sqdiffe = (cp2016_df.cp -**2

# I should find mean and then sqrt:
print('RMSE between straight line and closing price:')
rmse_f = np.sqrt(np.mean(sqdiffe))


I ran the above script and saw this:

dan@h79:~/ml4/public/class03demos $ python
RMSE between straight line and closing price:
dan@h79:~/ml4/public/class03demos $
dan@h79:~/ml4/public/class03demos $

Class03 Lab About Blog Contact Class01 Class02 Class03 Class04 Class05 Class06 Class07 Class08 Class09 Class10 dan101 Forum Google Hangout Vboxen