Class03 Answer:

Calculate RMSE for that fitted line

I use NumPy, Pandas below.


"""
class03p15.py

This script should use Linear Algebra to find RMSE of a fitted line.
ref:
http://www.ml4.us/class03/pdf1.png
http://www.stat.purdue.edu/~jennings/stat514/stat512notes/topic3.pdf
"""

import pandas as pd
import numpy  as np

csvfile   = 'http://spy611.com/csv/allpredictions.csv'
cp_df     = pd.read_csv(csvfile).sort_values(['cdate'])
cp2016_sr = (cp_df.cdate > '2016') & (cp_df.cdate < '2017')
cp2016_df = cp_df[['cdate','cp']][cp2016_sr]

def colvec(arylst):
    # This should help me create column vectors from arrays or lists:
    return np.array(arylst).reshape((len(arylst),1))

x_a      = colvec(range(len(cp2016_df)))
ones_l   = [1]*len(cp2016_df)
ones_a   = colvec(ones_l)
xvals_a  = np.hstack((ones_a,x_a))
yvals_a  = colvec(cp2016_df.cp)
middle_a = np.linalg.pinv(np.matmul(xvals_a.T,xvals_a))
rhs_a    = np.matmul(xvals_a.T,yvals_a)
beta_a   = np.matmul(middle_a,rhs_a)

x_in_a   = xvals_a

yhat_a   = np.matmul(x_in_a,beta_a)

cp2016_df['yhat'] = yhat_a

sqdiffe = (cp2016_df.cp - cp2016_df.yhat)**2
print('RMSE between fitted line and closing price:')
rmse_f = np.sqrt(np.mean(sqdiffe))
print(rmse_f)

'bye'

I ran the above script and saw this:


dan@h79:~/ml4/public/class03demos $ python class03p15.py
RMSE between fitted line and closing price:
45.6495298796
dan@h79:~/ml4/public/class03demos $ 
dan@h79:~/ml4/public/class03demos $ 

Class03 Lab


ml4.us About Blog Contact Class01 Class02 Class03 Class04 Class05 Class06 Class07 Class08 Class09 Class10 dan101 Forum Google Hangout Vboxen