Class03 Answer:

Write Pandas syntax which mimics this SQL:


SELECT max(closep)     FROM
  (SELECT cdate,closep FROM prices WHERE cdate > (SELECT MAX(cdate)-10 FROM prices)) iv;

DROP   TABLE IF EXISTS prices2;
CREATE TABLE prices2 AS SELECT cdate,openp,closep FROM prices WHERE cdate > '2016-08-08';

DROP   TABLE IF EXISTS prices3;
CREATE TABLE prices3 AS SELECT cdate,openp,closep, closep - openp AS diff FROM prices2;

I ran the above SQL and saw this:


ann=# 
ann=# SELECT max(closep)     FROM
ann-#   (SELECT cdate,closep FROM prices WHERE cdate > (SELECT MAX(cdate)-10 FROM prices)) iv;
    max     
------------
 155.179993
(1 row)

ann=# DROP   TABLE IF EXISTS prices2;
NOTICE:  table "prices2" does not exist, skipping
DROP TABLE
ann=# CREATE TABLE prices2 AS SELECT cdate,openp,closep FROM prices WHERE cdate > '2016-08-08';
SELECT 215
ann=# DROP   TABLE IF EXISTS prices3;
NOTICE:  table "prices3" does not exist, skipping
DROP TABLE
ann=# CREATE TABLE prices3 AS SELECT cdate,openp,closep, closep - openp AS diff FROM prices2;
SELECT 215
ann=# 
ann=#

I did this lab by writing the script listed below:


"""
class03pd21.py

This script should mimic the SQL listed below:

SELECT max(closep)     FROM
  (SELECT cdate,closep FROM prices WHERE cdate > (SELECT MAX(cdate)-10 FROM prices)) iv;

DROP   TABLE IF EXISTS prices2;
CREATE TABLE prices2 AS SELECT cdate,openp,closep FROM prices WHERE cdate > '2016-08-08';

DROP   TABLE IF EXISTS prices3;
CREATE TABLE prices3 AS SELECT cdate,openp,closep, closep - openp AS diff FROM prices2;
"""

# SQL lacks the ability to get data from the web.
# Pandas can get data from the web:

import pandas as pd
from datetime import datetime, timedelta

prices_df         = pd.read_csv('http://ml4.us/csv/ibm.csv')
prices_df.columns = ['cdate_s', 'openp', 'highp', 'lowp', 'closep', 'adjp', 'volume']

# Mimic
# SELECT max(closep)     FROM
#   (SELECT cdate,closep FROM prices WHERE cdate > (SELECT MAX(cdate)-10 FROM prices)) iv;

# I should subtract 10days from max-cdate:
max_10    = pd.to_datetime(prices_df.cdate_s).max() - timedelta(days=10)
# I should create a predicate from max-cdate:
max_10_sr = (pd.to_datetime(prices_df.cdate_s) > max_10)
# I should apply the predicate, get closep from resulting df, get max from closep series:
max_cp    = prices_df[['cdate_s','closep']][max_10_sr].closep.max()
print(max_cp)

# Mimic
# CREATE TABLE prices2 AS SELECT cdate,openp,closep FROM prices WHERE cdate > '2016-08-08';

# I should create a predicate:
pred1_sr   = prices_df.cdate_s > '2016-08-08'
# I should create another DF:
prices2_df = prices_df.copy()[['cdate_s','openp','closep']][pred1_sr]

# Mimic
# CREATE TABLE prices3 AS SELECT cdate,openp,closep, closep - openp AS diff FROM prices2;

prices3_df         = prices2_df.copy()
prices3_df['diff'] = prices2_df.closep - prices2_df.openp

'bye'

I saw this:


dan@h79:~/ml4/public/class03demos $ python class03pd21.py 
155.179993
dan@h79:~/ml4/public/class03demos $ 
dan@h79:~/ml4/public/class03demos $ 

Class03 Lab


ml4.us About Blog Contact Class01 Class02 Class03 Class04 Class05 Class06 Class07 Class08 Class09 Class10 dan101 Forum Google Hangout Vboxen