Class09 Answer:

Report Predictions Effectiveness Sum, Grouped by Pair and Training Size

I started this lab by asking two questions:

To help me answer the above questions, I wrote the script listed below:


# rpt.py

# This script should help me find most predictable pair.
# This script should help me find optimal training sizes.


# The predictions are in files which look like this:
#   -rw-rw-r--  1 fx411 fx411   274782 Nov  7 08:28 predictions93000EURUSD.csv

# The columns are listed below:
# ts, price, piplead, eff, acc

import pandas as pd
import numpy  as np
import glob

fn_l = glob.glob('../csv/predictions*.csv')

all_sum = 0
pair_trainsize_eff_acc_l = []
for fn in sorted(fn_l):
    r0_df = pd.read_csv(fn,names=['ts','price','piplead','prediction','eff','acc'])
    eff_sum     = np.sum(r0_df.eff)
    all_sum     = all_sum+eff_sum
    acc_pct     = np.round(100*np.sum(r0_df.acc) / len(r0_df),1)
    trainsize_i = fn[18:-10]
    pair_s      = fn[-10:-4]
    pair_trainsize_eff_acc_l.append([pair_s,trainsize_i,eff_sum,acc_pct])
# I should convert pair_trainsize_eff_acc_l into a DataFrame.
pair_trainsize_eff_acc_a  = np.array(pair_trainsize_eff_acc_l)
pair_trainsize_eff_acc_df = pd.DataFrame({'pair':  pair_trainsize_eff_acc_a[:,0]
                                      ,'trainsize':pair_trainsize_eff_acc_a[:,1]
                                      ,'eff':      pair_trainsize_eff_acc_a[:,2]
                                      ,'acc':      pair_trainsize_eff_acc_a[:,3]
})

# I should change eff and acc to floats:
eff_l    = [float(my_s) for my_s in pair_trainsize_eff_acc_df.eff]
acc_l    = [float(my_s) for my_s in pair_trainsize_eff_acc_df.acc]
ptea1_df = pair_trainsize_eff_acc_df.copy()[['pair','trainsize']]
ptea1_df['eff'] = eff_l
ptea1_df['acc'] = acc_l

# To see the most effective combinations of pair and trainsize, I should sort:
print(ptea1_df.sort_values(by='eff', ascending=False).head(30))

'bye'

I ran the above script and saw this output:


fx411@h79:~/fx411/script $ 
fx411@h79:~/fx411/script $ 
fx411@h79:~/fx411/script $ python rpt.py
      pair trainsize        eff   acc
15  AUDUSD    107000  8038.4917  52.0
8   USDCAD    103000  5271.0216  52.1
1   EURUSD    101000  4621.5164  51.4
11  EURUSD    105000  4155.3167  51.4
10  AUDUSD    105000  3929.9220  51.1
17  GBPUSD    107000  3691.8401  52.0
6   EURUSD    103000  2991.2085  51.5
13  USDCAD    105000  1737.2926  50.0
3   USDCAD    101000  1272.2763  50.0
7   GBPUSD    103000   593.0259  51.7
16  EURUSD    107000   365.4563  51.7
19  USDJPY    107000   300.0485  52.4
5   AUDUSD    103000  -278.6046  50.8
12  GBPUSD    105000  -374.0834  51.0
18  USDCAD    107000 -2203.8009  49.7
14  USDJPY    105000 -2638.9093  51.6
0   AUDUSD    101000 -2876.6974  50.6
9   USDJPY    103000 -2922.7771  51.6
4   USDJPY    101000 -3691.3764  51.9
2   GBPUSD    101000 -7196.7381  50.3
fx411@h79:~/fx411/script $

Above, I can see that the most effective combination of pair and trainsize is: AUDUSD, 107000.

When I study the pairs near the top, I see that EURUSD appears to be the most predictiable.

The least predictive pair is USDJPY.

For each pair, the optimal number of training observations is listed below:

I wrote and ran a script to plot the combinations:


# plotbg12.py

# This script should transform csv files into blue-green-visualizations.

import pandas as pd
import numpy  as np
import glob

fn_l = glob.glob('../csv/predictions*.csv')
pair_trainsize_eff_acc_l = []
for fn in sorted(fn_l):
    r0_df = pd.read_csv(fn,names=['ts','price','piplead','prediction','eff','acc'])
    eff_sum     = np.sum(r0_df.eff)
    acc_pct     = np.round(100*np.sum(r0_df.acc) / len(r0_df),1)
    trainsize_i = fn[18:-10]
    pair_s      = fn[-10:-4]
    pair_trainsize_eff_acc_l.append([pair_s,trainsize_i,eff_sum,acc_pct])
# I should convert pair_trainsize_eff_acc_l into a DataFrame.
pair_trainsize_eff_acc_a  = np.array(pair_trainsize_eff_acc_l)
pair_trainsize_eff_acc_df = pd.DataFrame({'pair':  pair_trainsize_eff_acc_a[:,0]
                                      ,'trainsize':pair_trainsize_eff_acc_a[:,1]
                                      ,'eff':      pair_trainsize_eff_acc_a[:,2]
                                      ,'acc':      pair_trainsize_eff_acc_a[:,3]
})

# I should change eff and acc to floats:
eff_l    = [float(my_s) for my_s in pair_trainsize_eff_acc_df.eff]
acc_l    = [float(my_s) for my_s in pair_trainsize_eff_acc_df.acc]
ptea1_df = pair_trainsize_eff_acc_df.copy()[['pair','trainsize']]
ptea1_df['eff'] = eff_l
ptea1_df['acc'] = acc_l

# To see the most effective combinations of pair and trainsize, I should sort:
print(ptea1_df.sort_values(by='eff', ascending=False).head(30))

ptea2_df = ptea1_df.sort_values(by='eff', ascending=False).head(30)[['trainsize','pair']]
# I should use DF.index and DF.loc[] to iterate the DF:
for idx_i in ptea2_df.index:
  row         = ptea2_df.loc[idx_i]
  trainsize_s = row.trainsize
  pair_s      = row.pair
  csv_s       = '../csv/predictions'+trainsize_s+pair_s+'.csv'

  p0_df  = pd.read_csv(csv_s, names=['ts','cp','piplead','problr','eff','acc'])
  p1_df       = p0_df.copy()[['ts','cp','eff']]
  p1_df['Date'] = pd.to_datetime(p1_df.ts, unit='s')
  
  # I should build the green data; it should start at same place as blue data:
  green_l = [p1_df.cp[0]]
  len_i   = len(p1_df)
  # Integer navigator.
  row_i = 0
  while row_i < len_i-1:
      # I should track where I am.
      row_i += 1
      # I should track blue_line delta:
      blue_delt_f = abs(p1_df.cp[row_i] - p1_df.cp[row_i-1])
      # I should add to the green line:
      if p1_df.eff[row_i-1] > 0 :
          green_l.append( green_l[row_i-1] + blue_delt_f )
      else:
          green_l.append( green_l[row_i-1] - blue_delt_f )
  p1_df['Logistic_Regression'] = green_l
  # In pandas, how to create index from column?
  p2_df         = p1_df.set_index(['Date'])
  p3_df         = p2_df[['cp','Logistic_Regression']]
  p3_df.columns = [['Price','Logistic_Regression']]
  trainsize_i   = csv_s[18:-10]
  pair_s        = csv_s[-10:-4]
  # Now I should plot the visualization:
  import matplotlib
  matplotlib.use('Agg')
  # Order is important here.
  # Do not move the next import:
  import matplotlib.pyplot as plt
  title_s =pair_s+" Price and Logistic Regression Predictions From "+str(trainsize_i)+" Row Training Set"
  p3_df.plot.line(title=title_s, figsize=(11,7))
  png_s = '../public/plots/img'+str(trainsize_i)+pair_s+'.png'
  plt.savefig(png_s)
  plt.close()
  print('We should have a new plot now: '+png_s)
  'bye'


Class09 Lab


ml4.us About Blog Contact Class01 Class02 Class03 Class04 Class05 Class06 Class07 Class08 Class09 Class10 dan101 Forum Google Hangout Vboxen