Class04 Answer:

Use a window-function to create a pctlead column.

This lab requires some knowledge.


{
/* ~/sparkapps/logr10/logr12f.scala
This script should download prices and predict daily direction of GSPC.
It should generate a label which I assume to be dependent on price calculations.
A label should classify an observation as down or up. Down is 0.0, up is 1.0.
It should generate independent features from slopes of moving averages of prices.
It should create a Logistic Regression model from many years of features.
Demo:
spark-shell -i logr12f.scala
*/

import org.apache.spark.sql.SQLContext
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.linalg.{Vector, Vectors}
import org.apache.spark.ml.param.ParamMap
import org.apache.spark.sql.Row
import sys.process._

// I should get prices:
"/usr/bin/curl -L ml4.herokuapp.com/csv/GSPC.csv -o /tmp/gspc.csv"!

val sqlContext = new SQLContext(sc)
  
val dp10df = sqlContext
  .read
  .format("com.databricks.spark.csv")
  .option("header","true")
  .option("inferSchema","true")
  .load("/tmp/gspc.csv")

dp10df.createOrReplaceTempView("tab")

spark.sql("SELECT COUNT(Date),MIN(Date),MAX(Date),MIN(Close),MAX(Close)FROM tab").show

// I should compute a label I can use to classify observations.

var sqls="SELECT Date,Close,LEAD(Close,1)OVER(ORDER BY Date) leadp FROM tab ORDER BY Date"

val dp11df=spark.sql(sqls);dp11df.createOrReplaceTempView("tab")

sqls="SELECT Date,Close,100*(leadp-Close)/Close pctlead FROM tab ORDER BY Date"

val dp12df=spark.sql(sqls)

dp12df.show

// UNDER CONSTRUCTION
}

I saw something like this:


dan@h80:~/ml4/public/class04/logr10 $ spark-shell -i logr12f.scala
Spark context Web UI available at http://192.168.1.80:4042
Spark context available as 'sc' (master = local[*], app id = local-1515734919551).
Spark session available as 'spark'.
Loading logr12f.scala...
warning: there was one deprecation warning; re-run with -deprecation for details
warning: there was one feature warning; re-run with -feature for details
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 1252k  100 1252k    0     0  1325k      0 --:--:-- --:--:-- --:--:-- 7958k
+-----------+-------------------+-------------------+----------+----------+
|count(Date)|          min(Date)|          max(Date)|min(Close)|max(Close)|
+-----------+-------------------+-------------------+----------+----------+
|      17116|1950-01-03 00:00:00|2018-01-09 00:00:00|     16.66|2753.52002|
+-----------+-------------------+-------------------+----------+----------+

+-------------------+---------+--------------------+
|               Date|    Close|             pctlead|
+-------------------+---------+--------------------+
|1950-01-03 00:00:00|    16.66|  1.1404561824729968|
|1950-01-04 00:00:00|    16.85|  0.4747774480712065|
|1950-01-05 00:00:00|    16.93| 0.29533372711164035|
|1950-01-06 00:00:00|    16.98|   0.588928150765594|
|1950-01-09 00:00:00|    17.08| -0.2927341920374689|
|1950-01-10 00:00:00|17.030001|  0.3523135436104863|
|1950-01-11 00:00:00|    17.09| -1.9309537741369123|
|1950-01-12 00:00:00|    16.76| -0.5369928400954644|
|1950-01-13 00:00:00|    16.67|  0.2999340131973586|
|1950-01-16 00:00:00|16.719999|  0.8373325859648619|
|1950-01-17 00:00:00|16.860001|-0.05931790869999...|
|1950-01-18 00:00:00|    16.85| 0.11870029673588751|
|1950-01-19 00:00:00|16.870001|  0.1778245300637511|
|1950-01-20 00:00:00|     16.9|  0.1183431952662907|
|1950-01-23 00:00:00|    16.92| -0.3546040189125369|
|1950-01-24 00:00:00|16.860001| -0.7117496612248245|
|1950-01-25 00:00:00|    16.74|-0.05973715651133818|
|1950-01-26 00:00:00|    16.73|  0.5379557680812902|
|1950-01-27 00:00:00|    16.82|  1.1890606420927425|
|1950-01-30 00:00:00|    17.02|  0.1762573443008232|
+-------------------+---------+--------------------+
only showing top 20 rows


Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.1
      /_/
         
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_152)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 
scala> 

Class04 Lab


learn4.us About Blog Contact Class01 Class02 Class03 Class04 Class05 Class06 Class07 Class08 Class09 Class10 dan101 Forum Google Hangout Vboxen