## Report Predictions Effectiveness Sum, Grouped by Pair and Training Size

I started this lab by asking two questions:

• Of the five pairs, which is the most predictable pair?
• For each pair, what is an optimal amount of training data?

To help me answer the above questions, I wrote the script listed below:

``````
# rpt.py

# This script should help me find most predictable pair.
# This script should help me find optimal training sizes.

# The predictions are in files which look like this:
#   -rw-rw-r--  1 fx411 fx411   274782 Nov  7 08:28 predictions93000EURUSD.csv

# The columns are listed below:
# ts, price, piplead, eff, acc

import pandas as pd
import numpy  as np
import glob

fn_l = glob.glob('../csv/predictions*.csv')

all_sum = 0
pair_trainsize_eff_acc_l = []
for fn in sorted(fn_l):
eff_sum     = np.sum(r0_df.eff)
all_sum     = all_sum+eff_sum
acc_pct     = np.round(100*np.sum(r0_df.acc) / len(r0_df),1)
trainsize_i = fn[18:-10]
pair_s      = fn[-10:-4]
pair_trainsize_eff_acc_l.append([pair_s,trainsize_i,eff_sum,acc_pct])
# I should convert pair_trainsize_eff_acc_l into a DataFrame.
pair_trainsize_eff_acc_a  = np.array(pair_trainsize_eff_acc_l)
pair_trainsize_eff_acc_df = pd.DataFrame({'pair':  pair_trainsize_eff_acc_a[:,0]
,'trainsize':pair_trainsize_eff_acc_a[:,1]
,'eff':      pair_trainsize_eff_acc_a[:,2]
,'acc':      pair_trainsize_eff_acc_a[:,3]
})

# I should change eff and acc to floats:
eff_l    = [float(my_s) for my_s in pair_trainsize_eff_acc_df.eff]
acc_l    = [float(my_s) for my_s in pair_trainsize_eff_acc_df.acc]
ptea1_df = pair_trainsize_eff_acc_df.copy()[['pair','trainsize']]
ptea1_df['eff'] = eff_l
ptea1_df['acc'] = acc_l

# To see the most effective combinations of pair and trainsize, I should sort:

'bye'
``````

I ran the above script and saw this output:

``````
fx411@h79:~/fx411/script \$
fx411@h79:~/fx411/script \$
fx411@h79:~/fx411/script \$ python rpt.py
pair trainsize        eff   acc
15  AUDUSD    107000  8038.4917  52.0
1   EURUSD    101000  4621.5164  51.4
11  EURUSD    105000  4155.3167  51.4
10  AUDUSD    105000  3929.9220  51.1
17  GBPUSD    107000  3691.8401  52.0
6   EURUSD    103000  2991.2085  51.5
7   GBPUSD    103000   593.0259  51.7
16  EURUSD    107000   365.4563  51.7
19  USDJPY    107000   300.0485  52.4
5   AUDUSD    103000  -278.6046  50.8
12  GBPUSD    105000  -374.0834  51.0
14  USDJPY    105000 -2638.9093  51.6
0   AUDUSD    101000 -2876.6974  50.6
9   USDJPY    103000 -2922.7771  51.6
4   USDJPY    101000 -3691.3764  51.9
2   GBPUSD    101000 -7196.7381  50.3
fx411@h79:~/fx411/script \$
``````

Above, I can see that the most effective combination of pair and trainsize is: AUDUSD, 107000.

When I study the pairs near the top, I see that EURUSD appears to be the most predictiable.

The least predictive pair is USDJPY.

For each pair, the optimal number of training observations is listed below:

• AUDUSD: 10700
• EURUSD: 10100
• GBPUSD: 10700
• USDJPY: 10700

I wrote and ran a script to plot the combinations:

``````
# plotbg12.py

# This script should transform csv files into blue-green-visualizations.

import pandas as pd
import numpy  as np
import glob

fn_l = glob.glob('../csv/predictions*.csv')
pair_trainsize_eff_acc_l = []
for fn in sorted(fn_l):
eff_sum     = np.sum(r0_df.eff)
acc_pct     = np.round(100*np.sum(r0_df.acc) / len(r0_df),1)
trainsize_i = fn[18:-10]
pair_s      = fn[-10:-4]
pair_trainsize_eff_acc_l.append([pair_s,trainsize_i,eff_sum,acc_pct])
# I should convert pair_trainsize_eff_acc_l into a DataFrame.
pair_trainsize_eff_acc_a  = np.array(pair_trainsize_eff_acc_l)
pair_trainsize_eff_acc_df = pd.DataFrame({'pair':  pair_trainsize_eff_acc_a[:,0]
,'trainsize':pair_trainsize_eff_acc_a[:,1]
,'eff':      pair_trainsize_eff_acc_a[:,2]
,'acc':      pair_trainsize_eff_acc_a[:,3]
})

# I should change eff and acc to floats:
eff_l    = [float(my_s) for my_s in pair_trainsize_eff_acc_df.eff]
acc_l    = [float(my_s) for my_s in pair_trainsize_eff_acc_df.acc]
ptea1_df = pair_trainsize_eff_acc_df.copy()[['pair','trainsize']]
ptea1_df['eff'] = eff_l
ptea1_df['acc'] = acc_l

# To see the most effective combinations of pair and trainsize, I should sort:

# I should use DF.index and DF.loc[] to iterate the DF:
for idx_i in ptea2_df.index:
row         = ptea2_df.loc[idx_i]
trainsize_s = row.trainsize
pair_s      = row.pair
csv_s       = '../csv/predictions'+trainsize_s+pair_s+'.csv'

p1_df       = p0_df.copy()[['ts','cp','eff']]
p1_df['Date'] = pd.to_datetime(p1_df.ts, unit='s')

# I should build the green data; it should start at same place as blue data:
green_l = [p1_df.cp[0]]
len_i   = len(p1_df)
# Integer navigator.
row_i = 0
while row_i < len_i-1:
# I should track where I am.
row_i += 1
# I should track blue_line delta:
blue_delt_f = abs(p1_df.cp[row_i] - p1_df.cp[row_i-1])
# I should add to the green line:
if p1_df.eff[row_i-1] > 0 :
green_l.append( green_l[row_i-1] + blue_delt_f )
else:
green_l.append( green_l[row_i-1] - blue_delt_f )
p1_df['Logistic_Regression'] = green_l
# In pandas, how to create index from column?
p2_df         = p1_df.set_index(['Date'])
p3_df         = p2_df[['cp','Logistic_Regression']]
p3_df.columns = [['Price','Logistic_Regression']]
trainsize_i   = csv_s[18:-10]
pair_s        = csv_s[-10:-4]
# Now I should plot the visualization:
import matplotlib
matplotlib.use('Agg')
# Order is important here.
# Do not move the next import:
import matplotlib.pyplot as plt
title_s =pair_s+" Price and Logistic Regression Predictions From "+str(trainsize_i)+" Row Training Set"
p3_df.plot.line(title=title_s, figsize=(11,7))
png_s = '../public/plots/img'+str(trainsize_i)+pair_s+'.png'
plt.savefig(png_s)
plt.close()
print('We should have a new plot now: '+png_s)
'bye'
``````