I started this lab by writing a script which reports data:

- Long-Only Effectiveness
- Model Effectiveness
- Long-Only Accuracy
- Model Accuracy

```
# rpt_predictions.r
# This script should read predictions.csv and report
# -Long-Only Effectiveness
# -Model Effectiveness
# -Long-Only Accuracy
# -Model Accuracy
# Demo:
# R -f rpt_predictions.r
# I should read predictions.csv
predictions_df = read.csv('predictions.csv')
# I should report Long-Only Effectiveness:
long_only_eff_f = sum(predictions_df$pctlead)
print('long_only_eff:')
print(long_only_eff_f)
# I should calculate Model Effectiveness.
# Logic should be:
# If prediction has same sign as pctlead, that prediction is both effective and accurate:
predictions_df$effectiveness = sign(predictions_df$prediction) * predictions_df$pctlead
# I should report Model Effectiveness:
model_eff_f = sum(predictions_df$effectiveness)
print('model_eff:')
print(model_eff_f)
# I should calculate Long-Only Accuracy.
pctlead_up_v = sign(predictions_df$pctlead)
# I should count the 1 values
lo_count_up_i = sum(pctlead_up_v == 1)
# Accuracy should be count of up days divided by all days:
lo_accuracy_f = 100.0 * lo_count_up_i / length(predictions_df$pctlead)
print('Long-Only accuracy:')
print(lo_accuracy_f)
# I should calculate model Accuracy.
# If the signs match, that should be accurate:
predictions_df$accuracy = sign(predictions_df$prediction) * sign(predictions_df$pctlead)
# I should count the 1 values
tf_v = (predictions_df$accuracy == 1)
# I should use sum() to sum the 1 values which should be same as count of TRUE values.
true_count_i = sum(tf_v)
all_count_i = length(predictions_df$accuracy)
accuracy_f = 100.0 * true_count_i / all_count_i
print('model accuracy:')
print(accuracy_f)
'bye'
```

I ran the above script and saw this:

```
dan@h80:~/ml4/public/class07 $ R -f rpt_predictions.r
R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> # rpt_predictions.r
>
> # This script should read predictions.csv and report
> # -Long-Only Effectiveness
> # -Model Effectiveness
> # -Long-Only Accuracy
> # -Model Accuracy
>
> # Demo:
> # R -f rpt_predictions.r
>
> # I should read predictions.csv
> predictions_df = read.csv('predictions.csv')
>
> # I should report Long-Only Effectiveness:
> long_only_eff_f = sum(predictions_df$pctlead)
> print('long_only_eff:')
[1] "long_only_eff:"
> print(long_only_eff_f)
[1] 97.48878
>
> # I should calculate Model Effectiveness.
> # Logic should be:
> # If prediction has same sign as pctlead, that prediction is both effective and accurate:
> predictions_df$effectiveness = sign(predictions_df$prediction) * predictions_df$pctlead
>
> # I should report Model Effectiveness:
> model_eff_f = sum(predictions_df$effectiveness)
> print('model_eff:')
[1] "model_eff:"
> print(model_eff_f)
[1] 41.61526
>
> # I should calculate Long-Only Accuracy.
> pctlead_up_v = sign(predictions_df$pctlead)
> # I should count the 1 values
> lo_count_up_i = sum(pctlead_up_v == 1)
> # Accuracy should be count of up days divided by all days:
> lo_accuracy_f = 100.0 * lo_count_up_i / length(predictions_df$pctlead)
> print('Long-Only accuracy:')
[1] "Long-Only accuracy:"
> print(lo_accuracy_f)
[1] 53.30697
>
> # I should calculate model Accuracy.
> # If the signs match, that should be accurate:
> predictions_df$accuracy = sign(predictions_df$prediction) * sign(predictions_df$pctlead)
>
> # I should count the 1 values
> tf_v = (predictions_df$accuracy == 1)
> # I should use sum() to sum the 1 values which should be same as count of TRUE values.
> true_count_i = sum(tf_v)
> all_count_i = length(predictions_df$accuracy)
> accuracy_f = 100.0 * true_count_i / all_count_i
> print('model accuracy:')
[1] "model accuracy:"
> print(accuracy_f)
[1] 51.15359
>
> 'bye'
[1] "bye"
>
dan@h80:~/ml4/public/class07 $
dan@h80:~/ml4/public/class07 $
dan@h80:~/ml4/public/class07 $
```

The above output gives me the information listed below:

- For years 2000-2017 (+2018-01), Long-Only Effectiveness: 97.49%
- Heatmap Model Effectiveness: 41.62%
- Long-Only Accuracy: 53.3%
- Heatmap Model Accuracy: 51.2%

The above analysis tells me that I should avoid using the Heatmap Model to predict S&P 500.

ml4.us
About
Blog
Contact
Class01
Class02
Class03
Class04
Class05
Class06
Class07
Class08
Class09
Class10
dan101 Forum
Google Hangout
Vboxen