Class06 Answer:

Write scripts which learn from GSPC prices.

I think the best time to learn from GSPC prices is at night, right after prices are updated.

So I wrote a script called night.bash which is listed below:


# ~/mljs/night.bash

# I should run this script at 8pm Calif-time.
# Demo:
# ./night.bash

# This script should generate some static content each night
# after the market has closed and the most recent GSPC-closing-price
# is available from Yahoo.

# The static content should help me judge feasibility of keras-js for
# predicting GSPC daily direction.

# If you have questions, e-me:

# I should cd to the right place:

cd `dirname $0`

# I should get prices:

# I should compute features from the prices (and dates):
python SLOPES='[2,3,4,5,6,7,8,9]'
# The above call should give me feat.csv

# I should learn, test, and report:
./keras_tensorflow.bash TRAINSIZE=25 TESTYEAR=2017


night.bash depends on this syntax to generate features:

# I should compute features from the prices (and dates):
python SLOPES='[2,3,4,5,6,7,8,9]'
# The above call should give me feat.csv, the Python script which generates features and then writes them to feat.csv is listed below:


This script should generate a CSV file full of feature data
from GSPC prices from Yahoo.

SLOPES should specify moving-avg durations, in days, which I compute slopes from.
I should have at least two SLOPE values and they should be between 2 and 32.

~/anaconda3/bin/python SLOPES='[2,3,4,5,6,7,8,9]'

import numpy  as np
import pandas as pd
import pdb

# I should check cmd line arg
import sys
if (len(sys.argv) != 2):
  print('You typed something wrong:')
  print("~/anaconda3/bin/python SLOPES='[2,3,4,5,6,7,8,9]'")
arg1_l = sys.argv[1].split('=')
if (arg1_l[0] != 'SLOPES'):
  print('I cannot determine SLOPES from your command line.')
  print("~/anaconda3/bin/python SLOPES='[2,3,4,5,6,7,8,9]'")

# I should get integers from arg1_l:
slopes_s = arg1_l[1]
slopes_a = []
for slope_s in slopes_s.split(','):
    slope_i = int(slope_s.replace('[','').replace(']',''))

gspc_df = pd.read_csv('gspc2.csv')

# I should compute pctlead:
gspc_df['pctlead'] = (100.0 * (gspc_df.cp.shift(-1) - gspc_df.cp) / gspc_df.cp).fillna(0)

# I should compute mvgavg-slope for each slope_i

# ref:

for slope_i in slopes_a:
  rollx          = gspc_df.rolling(window=slope_i)
  col_s          = 'slope'+str(slope_i)
  slope_sr       = 100.0 * (rollx.mean().cp - rollx.mean().cp.shift(1))/rollx.mean().cp
  gspc_df[col_s] = slope_sr

# I should generate Date features:
dt_sr = pd.to_datetime(gspc_df.cdate)
dow_l = [float(dt.strftime('%w' ))/100.0 for dt in dt_sr]
moy_l = [float(dt.strftime('%-m'))/100.0 for dt in dt_sr]
dom_l = [float(dt.strftime('%-d'))       for dt in dt_sr]
wom_l = [round(dom/5)/100.0             for dom in dom_l]
gspc_df['dow'] = dow_l
gspc_df['moy'] = moy_l
# FAIL: gspc_df['wom'] = wom_l

# I should write to CSV file to be used later:
gspc_df.to_csv('feat.csv', float_format='%4.4f', index=False)

After I create the features, I should learn from them.

The bash syntax which gets me started on that task is listed below:

# I should learn, test, and report:
./keras_tensorflow.bash TRAINSIZE=25 TESTYEAR=2016, the Python script which learns from features is listed below:


This script should learn from observations in feat.csv

Then it should test its learned models on observations later than the training observations.

Next it should report effectiveness of the models.

./keras_tensoflow.bash TRAINSIZE=25 TESTYEAR=2016

Above demo will train from 25 years of observations and predict each day of 2016

import numpy  as np
import pandas as pd
import pdb

# I should specify params for fit().
# I should use epochs_i to push model training harder.
# A large epochs_i gives a model which is accurate on the training data.
# A small epochs_i gives me a model quicker.
epochs_i     = 128
batch_size_i = 256 # Smaller is better but slower

# I should check cmd line args
import sys
if (len(sys.argv) != 3):
  print('You typed something wrong:')
  print("python TRAINSIZE=25 TESTYEAR=2016")

# I should get cmd line args:
trainsize     = int(sys.argv[1].split('=')[1])
testyear_s    =     sys.argv[2].split('=')[1]
train_end_i   = int(testyear_s)
train_end_s   =     testyear_s
train_start_i = train_end_i - trainsize
train_start_s = str(train_start_i)
# train and test observations should not overlap:
test_start_i  = train_end_i
test_start_s  = str(test_start_i)
test_end_i    = test_start_i+1
test_end_s    = str(test_end_i)

feat_df  = pd.read_csv('feat.csv')
train_sr = (feat_df.cdate > train_start_s) & (feat_df.cdate < train_end_s)
test_sr  = (feat_df.cdate > test_start_s)  & (feat_df.cdate < test_end_s)
train_df = feat_df[train_sr]
test_df  = feat_df[test_sr]

# I should get training data:
xtrain_a = np.array(train_df)[:,3:]
ytrain_a = np.array(train_df.pctlead)

# I should get classification from ytrain_a:
class_train_a   = (ytrain_a > np.mean(ytrain_a))
class_train1h_l = [[0,1] if cl else [1,0] for cl in class_train_a]
# [0,1] means up-observation
# [1,0] means down-observation
ytrain1h_a = np.array(class_train1h_l)

# I should build a Keras model:
from keras.models      import Sequential
from keras.layers      import Dense, Dropout, Activation
# I should use Keras API to create a neural network model.
# Ref:

# I should look at the last observation to see number of inputs
input_i = len(xtrain_a[-1])

# I should look at the last observation to see number of outputs:
output_i = len(ytrain1h_a[-1])
# These are classification models.
# The number of outputs should be the number of classes I want to predict.
# Usually for stockmarket, the number of classes is 2 (below-mean, above-mean).

# I should collect predictions in a DF:
predictions_df = test_df.copy()

# I should get test data:
xtest_a = np.array(test_df)[:,3:]
ytest_a = np.array(test_df.pctlead)

keras1_model = Sequential()
keras1_model.add(Dense(input_i, input_shape=(input_i,)))
keras1_model.compile(loss='categorical_crossentropy', optimizer='adam'), ytrain1h_a, batch_size=batch_size_i, epochs=epochs_i)
# It should be able to predict now:
keras1_a = keras1_model.predict(xtest_a)[:,1]

keras2_model = Sequential()
keras2_model.add(Dense(input_i, input_shape=(input_i,)))
# I should enhance by inserting a hidden layer of input_i neurons.
# I should enhance by adding 20% Dropout.
# Enhancement finished.
keras2_model.compile(loss='categorical_crossentropy', optimizer='adam'), ytrain1h_a, batch_size=batch_size_i, epochs=epochs_i)
# It should be able to predict now:
keras2_a = keras2_model.predict(xtest_a)[:,1]

# I should collect the predictions:
predictions_df['keras1'] = keras1_a.tolist()
predictions_df['keras2'] = keras2_a.tolist()

# I should create a CSV to report from:
predictions_df.to_csv('gspc_predictions'+testyear_s+'.csv', float_format='%4.5f', index=False)

# I should report long-only-effectiveness:
eff_lo_f = np.sum(predictions_df.pctlead)

# I should report keras1-model-effectiveness:
eff_sr     = predictions_df.pctlead * np.sign(predictions_df.keras1 - 0.5)
predictions_df['eff_keras1'] = eff_sr
eff_logr_f                 = np.sum(eff_sr)

# I should report keras2-model-effectiveness:
eff_sr     = predictions_df.pctlead * np.sign(predictions_df.keras2 - 0.5)
predictions_df['eff_keras2'] = eff_sr
eff_logr_f                 = np.sum(eff_sr)

# I should plot rgb vis:

import matplotlib
# Order is important here.
# Do not move the next import:
import matplotlib.pyplot as plt

rgb0_df          = predictions_df[:-1][['cdate','cp']]
rgb0_df['cdate'] = pd.to_datetime(rgb0_df['cdate'], format='%Y-%m-%d')
rgb0_df.columns  = ['cdate','Long Only']

# I should create effectiveness-line for keras1 predictions.

# I have two simple rules:
# 1. If blue line moves 1%, then model-line moves 1%.
# 2. If model is True, model-line goes up.
len_i       = len(rgb0_df)
blue_l      = [cp for cp in predictions_df.cp]

pred_keras1_l = [pred_keras1 for pred_keras1 in predictions_df.keras1]
keras1_l      = [blue_l[0]]
for row_i in range(len_i):
  blue_delt = blue_l[row_i+1]-blue_l[row_i]
  keras1_delt = np.sign(pred_keras1_l[row_i]-0.5) * blue_delt
rgb0_df['keras1'] = keras1_l[:-1]

# keras2 now:
pred_keras2_l = [pred_keras2 for pred_keras2 in predictions_df.keras2]
keras2_l      = [blue_l[0]]
for row_i in range(len_i):
  blue_delt = blue_l[row_i+1]-blue_l[row_i]
  keras2_delt = np.sign(pred_keras2_l[row_i]-0.5) * blue_delt
rgb0_df['keras2'] = keras2_l[:-1]

rgb1_df = rgb0_df.set_index(['cdate'])
rgb1_df.plot.line(title="RGB Effectiveness Visualization "+testyear_s, figsize=(11,7))

# I should save models.
import encoder
# ref:

with open('keras1_model'+testyear_s+'.json', 'w') as f:
enc = encoder.Encoder('keras1_model'+testyear_s+'.hdf5')
print('keras1_model saved as: keras1_model.hdf5 and keras1_model.json')

with open('keras2_model'+testyear_s+'.json', 'w') as f:
enc = encoder.Encoder('keras2_model'+testyear_s+'.hdf5')
print('keras2_model saved as: keras2_model.hdf5 and keras2_model.json')


Class06 Lab About Blog Contact Class01 Class02 Class03 Class04 Class05 Class06 Class07 Class08 Class09 Class10 dan101 Forum Google Hangout Vboxen