Class05 Answer:

Build a visualization of behavior of optimizer.minimize(loss)

I started this lab by finding an example of optimizer.minimize(loss).

I found one at the "GET STARTED" URL listed below:

https://www.tensorflow.org/get_started/get_started#complete_program

I copied the above example into a file:


"""
tensorflow_test.py


I found a simple demo by loading this URL:

https://www.tensorflow.org

On the above page I found a link: "GET STARTED"

It sent me to the URL listed below:

https://www.tensorflow.org/get_started

There I found a simple page of syntax.
Now, you are looking at it.

Demo:
~/anaconda3/bin/python tensorflow_test.py
"""

import tensorflow as tf
import numpy as np

# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data * 0.1 + 0.3

# Try to find values for W and b that compute y_data = W * x_data + b
# (We know that W should be 0.1 and b 0.3, but TensorFlow will
# figure that out for us.)
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))
y = W * x_data + b

# Minimize the mean squared errors.
loss      = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train     = optimizer.minimize(loss)

# Before starting, initialize the variables.  We will 'run' this first.
#init = tf.initialize_all_variables()
init  = tf.global_variables_initializer() # better than above line.

# Launch the graph.
sess = tf.Session()
sess.run(init)

# Fit the line.
for step in range(201):
    sess.run(train)
    if step % 20 == 0:
        print(step, sess.run(W), sess.run(b))

# Learns best fit is W: [0.1], b: [0.3]

I enhanced the above file:


"""
demo13.py

This script should help me visualize behavior of optimizer.minimize(loss)
Ref:
http://ml4.us/cclasses/class05tf18

Demo:
~/anaconda3/bin/python demo13.py
"""

import tensorflow as tf
import numpy      as np
import pandas     as pd

# Create phony x, y data points in NumPy, y = x * 0.1 + 0.3 + noise
pts_i   = 20
noise_a = 0.05*np.random.rand(pts_i) # Notice this.
x_data  = np.random.rand(pts_i).astype(np.float32)
y_data  = x_data * 0.1 + 0.3 + noise_a

# Try to find values for W and b that compute y_data = W * x_data + b
# (We know that W should be 0.1 and b 0.3, but TensorFlow will
# figure that out for us.)
W = tf.Variable(tf.zeros([1]))
b = tf.Variable(tf.zeros([1]))
y = W * x_data + b

# Minimize the mean squared errors.
loss      = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train     = optimizer.minimize(loss)
# I should visualize behavior of above call.

# Before starting, initialize the variables.  We will 'run' this first.
#init = tf.initialize_all_variables()
init  = tf.global_variables_initializer() # better than above line.

# Launch the graph.
sess = tf.Session()
sess.run(init)

# I should create lists to collect artifacts of optimizer:
w_l = []
b_l = []
l_l = []

# Fit the line.
for step in range(9):
    tf_W = sess.run(W)
    tf_b = sess.run(b)
    tf_y = sess.run(y)
    tf_loss = sess.run(loss)
    w_l.append(tf_W[0])
    b_l.append(tf_b[0])
    l_l.append(tf_loss)
    # Now I should change W,b:
    sess.run(train)
print('I calculate W to be: '+str(tf_W))
print('I calculate b to be: '+str(tf_b))
print('yhat is '+str(tf_W[0])+' * x_data +'+str(tf_b[0]))

# I should create lists to collect artifacts of optimizer:
dw_l = []
db_l = []
dl_l = []
for i_i in range(len(w_l)-1):
  dw_l.append(w_l[i_i+1]-w_l[i_i])
  db_l.append(b_l[i_i+1]-b_l[i_i])
  dl_l.append(l_l[i_i+1]-l_l[i_i])
# I should make dw_l, db_l, dl_l same length as w_l
dw_l.append(0.0)
db_l.append(0.0)
dl_l.append(0.0)
# I should collect dL/dW, dL/db:
gw_a = np.array(dl_l)/(0.00001+np.array(dw_l))
gb_a = np.array(dl_l)/(0.00001+np.array(db_l))
gw_l = gw_a.tolist()
gb_l = gb_a.tolist()

opt_d           = {'W':w_l}
opt_df          = pd.DataFrame(opt_d)
opt_df['b']     = b_l
opt_df['loss']  = l_l
opt_df['dw']    = dw_l
opt_df['db']    = db_l
opt_df['dL']    = dl_l
opt_df['dL/dw'] = gw_l
opt_df['dL/db'] = gb_l

# I should plot artifacts of optimizer:
import matplotlib
matplotlib.use('Agg')
# Order is important here.
# Do not move the next import:
import matplotlib.pyplot as plt
plt.figure(figsize=(9,6))

opt_df[['W','b']].plot.line(title="W,b vs calls to optimizer")
plt.grid(True)
plt.savefig('w.png')
plt.close()

opt_df[1:].loss.plot.line(title="loss vs calls to optimizer")
plt.savefig('loss.png')
plt.close()

opt_df[['dw','db']].plot.line(title="dW,db vs calls to optimizer")
plt.grid(True)
plt.savefig('dwdb.png')
plt.close()

opt_df[['dL/dw','dL/db']].plot.line(title="dL/dW,dL/db vs calls to optimizer")
plt.grid(True)
plt.savefig('dldwdb.png')
plt.close()

# I should plot y_data vs x_data
plt.scatter(x_data,y_data,c='b')
# I should plot yhat too:
yhat_a = tf_W * x_data + tf_b
plt.plot(x_data,yhat_a,c='g')
plt.title('y_data vs x_data (blue) and yhat vs x_data (green)')
plt.grid(True)
plt.savefig('yvsx.png')
plt.close()

'bye'

I ran the above script:


ml4@ub100:~/ml4/public/class05tf $ ~/anaconda3/bin/python demo13.py
I calculate W to be: [ 0.13923767]
I calculate b to be: [ 0.29939163]
yhat is 0.139238 * x_data +0.299392
/home/ml4/anaconda3/lib/python3.5/site-packages/matplotlib/font_manager.py:273:
UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
/home/ml4/anaconda3/lib/python3.5/site-packages/matplotlib/font_manager.py:273:
UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
ml4@ub100:~/ml4/public/class05tf $ 
ml4@ub100:~/ml4/public/class05tf $
ml4@ub100:~/ml4/public/class05tf $

The above script built some visualizations of behavior of optimizer.minimize(loss):

Q: What is 'loss'?

A: I see 'loss' as loss of information.

The above green line contains information about the blue dots in the scatter plot.

When I move the green line away from the dots, the line loses its predictive power.

Or to describe it differently, the line loses information.

When I move the line back into the dots, so that it fits the dots as best possible, then the loss of information is at a minimum.

To say it more concisely, "A fitted line gives minimum loss".

Class05 Lab


ml4.us About Blog Contact Class01 Class02 Class03 Class04 Class05 Class06 Class07 Class08 Class09 Class10 dan101 Forum Google Hangout Vboxen