Class07 Answer:

Visualize model effectiveness,

Visualize model accuracy.

I started this lab by writing a script which reports data:


# rpt_predictions.r

# This script should read predictions.csv and report 
# -Long-Only Effectiveness
# -Model Effectiveness
# -Long-Only Accuracy
# -Model Accuracy

# Demo:
# R -f rpt_predictions.r

# I should read predictions.csv
predictions_df = read.csv('predictions.csv')

# I should report Long-Only Effectiveness:
long_only_eff_f = sum(predictions_df$pctlead)
print('long_only_eff:')
print(long_only_eff_f)

# I should calculate Model Effectiveness.
# Logic should be:
# If prediction has same sign as pctlead, that prediction is both effective and accurate:
predictions_df$effectiveness = sign(predictions_df$prediction) * predictions_df$pctlead

# I should report Model Effectiveness:
model_eff_f = sum(predictions_df$effectiveness)
print('model_eff:')
print(model_eff_f)

# I should calculate Long-Only Accuracy.
pctlead_up_v = sign(predictions_df$pctlead)
# I should count the 1 values
lo_count_up_i = sum(pctlead_up_v == 1)
# Accuracy should be count of up days divided by all days:
lo_accuracy_f = 100.0 * lo_count_up_i / length(predictions_df$pctlead)
print('Long-Only accuracy:')
print(lo_accuracy_f)

# I should calculate model Accuracy.
# If the signs match, that should be accurate:
predictions_df$accuracy = sign(predictions_df$prediction) * sign(predictions_df$pctlead)

# I should count the 1 values
tf_v = (predictions_df$accuracy == 1)
# I should use sum() to sum the 1 values which should be same as count of TRUE values.
true_count_i = sum(tf_v)
all_count_i  = length(predictions_df$accuracy)
accuracy_f   = 100.0 * true_count_i / all_count_i
print('model accuracy:')
print(accuracy_f)

'bye'

I ran the above script and saw this:


dan@h80:~/ml4/public/class07 $ R -f rpt_predictions.r

R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> # rpt_predictions.r
> 
> # This script should read predictions.csv and report 
> # -Long-Only Effectiveness
> # -Model Effectiveness
> # -Long-Only Accuracy
> # -Model Accuracy
> 
> # Demo:
> # R -f rpt_predictions.r
> 
> # I should read predictions.csv
> predictions_df = read.csv('predictions.csv')
> 
> # I should report Long-Only Effectiveness:
> long_only_eff_f = sum(predictions_df$pctlead)
> print('long_only_eff:')
[1] "long_only_eff:"
> print(long_only_eff_f)
[1] 97.48878
> 
> # I should calculate Model Effectiveness.
> # Logic should be:
> # If prediction has same sign as pctlead, that prediction is both effective and accurate:
> predictions_df$effectiveness = sign(predictions_df$prediction) * predictions_df$pctlead
> 
> # I should report Model Effectiveness:
> model_eff_f = sum(predictions_df$effectiveness)
> print('model_eff:')
[1] "model_eff:"
> print(model_eff_f)
[1] 41.61526
> 
> # I should calculate Long-Only Accuracy.
> pctlead_up_v = sign(predictions_df$pctlead)
> # I should count the 1 values
> lo_count_up_i = sum(pctlead_up_v == 1)
> # Accuracy should be count of up days divided by all days:
> lo_accuracy_f = 100.0 * lo_count_up_i / length(predictions_df$pctlead)
> print('Long-Only accuracy:')
[1] "Long-Only accuracy:"
> print(lo_accuracy_f)
[1] 53.30697
> 
> # I should calculate model Accuracy.
> # If the signs match, that should be accurate:
> predictions_df$accuracy = sign(predictions_df$prediction) * sign(predictions_df$pctlead)
> 
> # I should count the 1 values
> tf_v = (predictions_df$accuracy == 1)
> # I should use sum() to sum the 1 values which should be same as count of TRUE values.
> true_count_i = sum(tf_v)
> all_count_i  = length(predictions_df$accuracy)
> accuracy_f   = 100.0 * true_count_i / all_count_i
> print('model accuracy:')
[1] "model accuracy:"
> print(accuracy_f)
[1] 51.15359
> 
> 'bye'
[1] "bye"
> 
dan@h80:~/ml4/public/class07 $ 
dan@h80:~/ml4/public/class07 $ 
dan@h80:~/ml4/public/class07 $

The above output gives me the information listed below:

The above analysis tells me that I should avoid using the Heatmap Model to predict S&P 500.

Class07 Lab


ml4.us About Blog Contact Class01 Class02 Class03 Class04 Class05 Class06 Class07 Class08 Class09 Class10 dan101 Forum Google Hangout Vboxen