Tensorboard可视化用法

本文介绍Tensorboard的基本用法,包括(1)可视化计算图,以便于分析网络结构;(2)实时分析各个计算节点的计算时间和内存消耗情况;(3)动态显示模型训练过程各个指标随迭代过程的变化过程;(4)降维显示输出层的分布情况,以加快错误案例分析过程。

计算图可视化

命名空间

1
2
3
4
5
6
7
8
9
10
11
12
# 一个简单的写日志例子

import tensorflow as tf

input1 = tf.constant([1.0, 2.0, 3.0], name="input1")
input2 = tf.Variable(tf.random_uniform([3]), name='input2')

output = tf.add_n([input1, input2], name="add")

# 生成一个写日志文件的writer,并将当前计算图写入日志。
writer = tf.summary.FileWriter('/home/seisinv/data/ai/test/log',tf.get_default_graph())
writer.close()
没有使用命名空间的计算图


上面的图可以看出,很多系统初始化过程也显示出来了,导致排列很乱。Tensorflow提供两个函数——tf.variable_scopetf.name_scope来管理变量的命名空间,这样在默认情况下,只有顶层的命名空间中的节点才会显示出来。需要注意的是这两个函数在使用tf.get_variable时有些不同。

1
2
3
4
5
6
7
8
9
10
tf.reset_default_graph()

with tf.variable_scope("foo"):
a = tf.get_variable("a",[1])
print(a.name)
with tf.name_scope("bar"):
a = tf.Variable([1])
print(a.name)
b = tf.get_variable("b",[1]) # 不受name_scope的影响
print(b.name)
foo/a:0
bar/Variable:0
b:0
1
2
3
4
5
6
7
8
9
10
11
12
# 改进显示方式

tf.reset_default_graph()

with tf.name_scope("input1"):
input1 = tf.constant([1.0, 2.0, 3.0], name="input1")
with tf.name_scope("input2"):
input2 = tf.Variable(tf.random_uniform([3]), name="input2")
output = tf.add_n([input1, input2], name="add")

writer = tf.summary.FileWriter('/home/seisinv/data/ai/test/log',tf.get_default_graph())
writer.close()
使用命名空间之后简化的计算图


一个神经网络实例

1
2
import os, time
import matplotlib.pyplot as plt
1
2
3
4
# 神经网络结构相关的参数
INPUT_NODE = 784
OUTPUT_NODE = 10
LAYER1_NODE = 500
1
2
3
4
5
6
7
8
9
# 神经网络训练相关的参数
BATCH_SIZE = 100
LEARNING_RATE_BASE = 0.8
LEARNING_RATE_DECAY = 0.99
REGULARIZATION_RATE = 0.0001
TRAINING_STEPS = 3000
MOVING_AVERAGE_DECAY = 0.99
MODEL_SAVE_PATH = 'model/'
MODEL_NAME = 'model_nn_mnist.ckpt'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def get_weight_variable(shape, regularizer):
weights = tf.get_variable(
"weights", shape,
initializer = tf.truncated_normal_initializer(stddev=0.1))

# 当需要正则化时,将当前变量的正则化损失加入自定义的losses的集合。
if regularizer != None:
tf.add_to_collection('losses', regularizer(weights))
return weights

def inference(input_tensor, regularizer):
with tf.variable_scope('layer1'):
weights = get_weight_variable(
[INPUT_NODE, LAYER1_NODE], regularizer)
biases = tf.get_variable(
"biases", [LAYER1_NODE],
initializer = tf.constant_initializer(0.0))
layer1 = tf.nn.relu(tf.matmul(input_tensor, weights) + biases)

with tf.variable_scope('layer2'):
weights = get_weight_variable(
[LAYER1_NODE, OUTPUT_NODE], regularizer)
biases = tf.get_variable(
"biases", [OUTPUT_NODE],
initializer = tf.constant_initializer(0.0))
layer2 = tf.matmul(layer1, weights) + biases

return layer2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
from tensorflow.examples.tutorials.mnist import input_data

def train(mnist):
# 将处理输入数据的计算都放在一个命名空间下
with tf.name_scope('input'):
x = tf.placeholder(tf.float32, [None, INPUT_NODE], name='x-input')
y_ = tf.placeholder(tf.float32, [None, OUTPUT_NODE], name='y-input')

# 建立推断过程
regularizer = tf.contrib.layers.l2_regularizer(REGULARIZATION_RATE)
y = inference(x, regularizer)
global_step = tf.Variable(0, trainable=False)

# 将处理模型平滑平均的计算都放在一个命名空间下
with tf.name_scope('moving_average'):
variable_average = tf.train.ExponentialMovingAverage(
MOVING_AVERAGE_DECAY, global_step)
variable_average_op = variable_average.apply(
tf.trainable_variables())

# 将计算损失函数相关的计算都放在一个命名空间下
with tf.name_scope('loss_function'):
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))
cross_entropy_mean = tf.reduce_mean(cross_entropy)
# 加上正则化项
loss = cross_entropy_mean + tf.add_n(tf.get_collection('losses'))

# 将定义学习率、优化方法以及每一轮训练需要执行的操作都放在一个命名空间下
with tf.name_scope('train_step'):
learning_rate = tf.train.exponential_decay(
LEARNING_RATE_BASE,
global_step,
mnist.train.num_examples / BATCH_SIZE,
LEARNING_RATE_DECAY)

train_step = tf.train.GradientDescentOptimizer(learning_rate)\
.minimize(loss, global_step = global_step)

# 控制依赖性,必须先执行函数中的操作,再返回执行后面的操作,这里只是一个标识符
with tf.control_dependencies([train_step, variable_average_op]):
train_op = tf.no_op(name='train')

# 持久化类
saver = tf.train.Saver()
with tf.Session() as sess:
tf.global_variables_initializer().run()

for i in range(TRAINING_STEPS):
xs, ys = mnist.train.next_batch(BATCH_SIZE)
_, loss_value, step = sess.run([train_op, loss, global_step],
feed_dict={x: xs, y_: ys})

# 每隔1000轮保存一次模型
if i%1000 == 0:
print("After %s training step(s), loss on training "
"batch is %g." %(step, loss_value))
saver.save(sess, os.path.join(MODEL_SAVE_PATH, MODEL_NAME), global_step = global_step)

# 将当前的计算图输出到日志文件
writer = tf.summary.FileWriter('/media/seisinv/Data/04_data/ai/test/log',tf.get_default_graph())
writer.close()
1
2
3
4
tf.reset_default_graph()

mnist = input_data.read_data_sets('/home/seisinv/data/mnist/', one_hot=True)
train(mnist)
Extracting /home/seisinv/data/mnist/train-images-idx3-ubyte.gz
Extracting /home/seisinv/data/mnist/train-labels-idx1-ubyte.gz
Extracting /home/seisinv/data/mnist/t10k-images-idx3-ubyte.gz
Extracting /home/seisinv/data/mnist/t10k-labels-idx1-ubyte.gz
After 1 training step(s), loss on training batch is 3.25722.
After 1001 training step(s), loss on training batch is 0.255171.
After 2001 training step(s), loss on training batch is 0.15627.
After 3001 training step(s), loss on training batch is 0.184929.
After 4001 training step(s), loss on training batch is 0.12473.
After 5001 training step(s), loss on training batch is 0.105338.
After 6001 training step(s), loss on training batch is 0.100877.
After 7001 training step(s), loss on training batch is 0.0858754.
After 8001 training step(s), loss on training batch is 0.0805109.
After 9001 training step(s), loss on training batch is 0.0716059.
After 10001 training step(s), loss on training batch is 0.0677174.
After 11001 training step(s), loss on training batch is 0.0638117.
After 12001 training step(s), loss on training batch is 0.0587943.
After 13001 training step(s), loss on training batch is 0.0571106.
After 14001 training step(s), loss on training batch is 0.0534435.
After 15001 training step(s), loss on training batch is 0.0495253.
After 16001 training step(s), loss on training batch is 0.0465476.
After 17001 training step(s), loss on training batch is 0.0477468.
After 18001 training step(s), loss on training batch is 0.0421561.
After 19001 training step(s), loss on training batch is 0.0456449.
After 20001 training step(s), loss on training batch is 0.0444757.
After 21001 training step(s), loss on training batch is 0.0374509.
After 22001 training step(s), loss on training batch is 0.0379014.
After 23001 training step(s), loss on training batch is 0.0426304.
After 24001 training step(s), loss on training batch is 0.0433503.
After 25001 training step(s), loss on training batch is 0.0399665.
After 26001 training step(s), loss on training batch is 0.0384408.
After 27001 training step(s), loss on training batch is 0.035355.
After 28001 training step(s), loss on training batch is 0.0338901.
After 29001 training step(s), loss on training batch is 0.0374805.

下图是上面代码运行之后使用tensorboard生成的计算图。带箭头的实线表示数据的传输方向,边上标注了张量的维度信息。但是当两个节点之间传输的张量多于1个时,只显示张量的个数。边的粗细代表两个节点之间传输的张量维度的总大小虚线表示计算之间的依赖关系。

训练过程生成的计算图


节点信息

Tensorboard除了可以显示计算图之外,还可以显示每个节点的运行时间和内存消耗情况,提供代码优化的重要信息。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
from tensorflow.examples.tutorials.mnist import input_data

def train(mnist):
# 将处理输入数据的计算都放在一个命名空间下
with tf.name_scope('input'):
x = tf.placeholder(tf.float32, [None, INPUT_NODE], name='x-input')
y_ = tf.placeholder(tf.float32, [None, OUTPUT_NODE], name='y-input')

# 建立推断过程
regularizer = tf.contrib.layers.l2_regularizer(REGULARIZATION_RATE)
y = inference(x, regularizer)
global_step = tf.Variable(0, trainable=False)

# 将处理模型平滑平均的计算都放在一个命名空间下
with tf.name_scope('moving_average'):
variable_average = tf.train.ExponentialMovingAverage(
MOVING_AVERAGE_DECAY, global_step)
variable_average_op = variable_average.apply(
tf.trainable_variables())

# 将计算损失函数相关的计算都放在一个命名空间下
with tf.name_scope('loss_function'):
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))
cross_entropy_mean = tf.reduce_mean(cross_entropy)
# 加上正则化项
loss = cross_entropy_mean + tf.add_n(tf.get_collection('losses'))

# 将定义学习率、优化方法以及每一轮训练需要执行的操作都放在一个命名空间下
with tf.name_scope('train_step'):
learning_rate = tf.train.exponential_decay(
LEARNING_RATE_BASE,
global_step,
mnist.train.num_examples / BATCH_SIZE,
LEARNING_RATE_DECAY)

train_step = tf.train.GradientDescentOptimizer(learning_rate)\
.minimize(loss, global_step = global_step)

# 控制依赖性,必须先执行函数中的操作,再返回执行后面的操作,这里只是一个标识符
with tf.control_dependencies([train_step, variable_average_op]):
train_op = tf.no_op(name='train')

train_writer = tf.summary.FileWriter("/media/seisinv/Data/04_data/ai/test/log2", tf.get_default_graph())

# 持久化类
saver = tf.train.Saver()
with tf.Session() as sess:
tf.global_variables_initializer().run()

for i in range(TRAINING_STEPS):
xs, ys = mnist.train.next_batch(BATCH_SIZE)

# 每隔1000轮保存一次模型、记录一次运行状态
if i%1000 == 0:
# 配置运行时需要记录的信息
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)

# 运行时记录运行信息的proto
run_metadata = tf.RunMetadata()

# 将配置信息和记录运行信息传入运行过程,从而记录运行时每个节点的时间、内存开销
_, loss_value, step = sess.run([train_op, loss, global_step],
feed_dict={x: xs, y_: ys},
options=run_options, run_metadata=run_metadata)

# 将节点在运行时的信息写入日志文件
train_writer.add_run_metadata(run_metadata, 'step%03d'%i)

print("After %s training step(s), loss on training "
"batch is %g." %(step, loss_value))
saver.save(sess, os.path.join(MODEL_SAVE_PATH, MODEL_NAME), global_step = global_step)
else:
_, loss_value, step = sess.run([train_op, loss, global_step],
feed_dict={x: xs, y_: ys})

# 将当前的计算图输出到日志文件
writer.close()
1
2
3
4
tf.reset_default_graph()

mnist = input_data.read_data_sets('/home/seisinv/data/mnist/', one_hot=True)
train(mnist)
Extracting /home/seisinv/data/mnist/train-images-idx3-ubyte.gz
Extracting /home/seisinv/data/mnist/train-labels-idx1-ubyte.gz
Extracting /home/seisinv/data/mnist/t10k-images-idx3-ubyte.gz
Extracting /home/seisinv/data/mnist/t10k-labels-idx1-ubyte.gz
After 1 training step(s), loss on training batch is 2.96149.
After 1001 training step(s), loss on training batch is 0.237839.
After 2001 training step(s), loss on training batch is 0.149025.

下图显示了不同的计算节点时间和内存消耗的可视化效果图,颜色越深表示时间消耗越长。在性能调优时,一般会选择迭代轮数较大时的数据作为不同计算节点的时间/内存消耗标准,因为这样可以减少初始化对性能的影响。

颜色代表时间消耗的计算图

训练步的时间和内存消耗


监控指标可视化

Tensorboard除了可以可视化计算图、查看各个节点的计算时间和内存消耗,还可以监控程序运行状态的指标。除了GRAPH外,TensorBoard还提供了SCALARS, IMAGES, AUDIO, DISTRIBUTIONS, HISTOGRAMS和TEXT六个界面来可视化其他的监控指标。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
SUMMARY_DIR = '/media/seisinv/Data/04_data/ai/test/log2'
BATCH_SIZE = 100
TRAIN_STEPS = 3000

def variable_summaries(var, name):
"""
生成变量监控信息并定义生成监控信息日志的操作,在sess.run中执行
"""
# 将生成监控信息的操作放在同一命名空间
with tf.name_scope('summaries'):
# 记录张量中元素的取值分布,在HISTOGRAM和DISTRIBUTION栏都会出现对应的图表
tf.summary.histogram(name, var)

# 计算均值,定义生成日志的操作
mean = tf.reduce_mean(var)
tf.summary.scalar('mean/'+name, mean)

# 计算标准差,定义生成日志的操作
stddev = tf.sqrt(tf.reduce_mean(tf.square(var-mean)))
tf.summary.scalar('stddev/'+name, stddev)

def nn_layer(input_tensor, input_dim, output_dim,
layer_name, act=tf.nn.relu):
"""
生成一层全连接层神经网络
"""
with tf.name_scope(layer_name):
with tf.name_scope("weights"):
weights = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=0.1))
variable_summaries(weights, layer_name+'/weights')
with tf.name_scope("biases"):
biases = tf.Variable(tf.constant(0.0, shape=[output_dim]))
variable_summaries(biases, layer_name+'/biases')
with tf.name_scope('Wx_plus_b'):
preactivate = tf.matmul(input_tensor, weights) + biases

tf.summary.histogram(layer_name+'/pre_activations', preactivate)

activations = act(preactivate, name='activation')

tf.summary.histogram(layer_name+'/activations', activations)

return activations

def main():
mnist = input_data.read_data_sets('/home/seisinv/data/mnist/', one_hot=True)

with tf.name_scope('input'):
x = tf.placeholder(tf.float32, [None, 784], name='x-input')
y_ = tf.placeholder(tf.float32, [None, 10], name='y-input')

hidden1 = nn_layer(x, 784, 500, 'layer1')
y = nn_layer(hidden1, 500, 10, 'layer2', act=tf.identity)

with tf.name_scope('cross_entropy'):
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y, labels=y_))
tf.summary.scalar('cross_entropy', cross_entropy)

with tf.name_scope('train'):
train_step = tf.train.AdamOptimizer(0.001).minimize(cross_entropy)

with tf.name_scope('accuracy'):
with tf.name_scope('correct_prediction'):
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
with tf.name_scope('accuracy'):
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar('accuracy', accuracy)

# 整理所有的日志生成操作
merged = tf.summary.merge_all()

with tf.Session() as sess:
summary_writer = tf.summary.FileWriter(SUMMARY_DIR, sess.graph)

tf.global_variables_initializer().run()

for i in range(TRAIN_STEPS):
xs, ys = mnist.train.next_batch(BATCH_SIZE)

summary, _ = sess.run([merged, train_step],
feed_dict={x: xs, y_: ys})

summary_writer.add_summary(summary, i)

summary_writer.close()
1
2
tf.reset_default_graph()
main()
Extracting /home/seisinv/data/mnist/train-images-idx3-ubyte.gz
Extracting /home/seisinv/data/mnist/train-labels-idx1-ubyte.gz
Extracting /home/seisinv/data/mnist/t10k-images-idx3-ubyte.gz
Extracting /home/seisinv/data/mnist/t10k-labels-idx1-ubyte.gz

下面是TensorBoard生成的日志文件

SCALAR栏

SCALAR栏

DISTRIBUTIONS栏

HISTOGRAMS栏


有以下日志生成函数:

  1. tf.summary.scalar,SCALAR, 显示标量监控数据随迭代进行的变化
  2. tf.summary.image,IMAGES, 可视化当前使用的训练/测试图片
  3. tf.summary.audio,AUDIO,使用的音频数据
  4. tf.summary.text,TEXT,使用的文本数据
  5. tf.summary.histogram,HISTOGRAMS, DISTRIBUTIONS,张量分布监控数据随迭代进行的变化

通过监控神经网络变量的取值变化、模型在训练batch上的损失函数大小以及学习率的变化情况,可以更加方便的掌握模型的训练情况。

高维向量可视化

TensorBoard提供了可视化高维向量的工具PROJECTOR,该工具需要用户准备一个sprite图像和一个tsv文件给出每张图片对应的标签信息。

以下代码给出了如何使用MNIST测试数据生成PROJECTOR所需要的文件。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import matplotlib.pyplot as plt
import numpy as np
import os
%matplotlib inline

LOG_DIR = '/media/seisinv/Data/04_data/ai/test/log3'
SPRITE_FILE = 'mnist_sprite.jpg'
META_FILE = 'mnist_meta.tsv'

def create_sprite_image(images):
if isinstance(images, list):
images = np.array(images)
if True:
images = np.array(images)
img_h = images.shape[1]
img_w = images.shape[2]

# sprite图像可以理解为所有小图片拼成的大正方形图片,其中每个元素就是原来的小图片。
# 于是正方形的边长为sqrt(n),其中n为小图片的数量。
m = int(np.ceil(np.sqrt(images.shape[0])))

# 初始化
sprite_image = np.ones((img_h*m, img_w*m))

for i in range(m):
for j in range(m):
cur = i*m + j

if cur < images.shape[0]:
sprite_image[i*img_h:(i+1)*img_h,
j*img_w:(j+1)*img_w] = images[cur]
return sprite_image

mnist = input_data.read_data_sets('/home/seisinv/data/mnist/', one_hot=False)

# 生成sprite图像
to_visualise = 1 - np.reshape(mnist.test.images, (-1, 28, 28))
sprite_image = create_sprite_image(to_visualise)

# 保存图像
path_for_sprites = os.path.join(LOG_DIR, SPRITE_FILE)
plt.imsave(path_for_sprites, sprite_image, cmap='gray')
plt.imshow(sprite_image, cmap='gray')

# 生成每张图片对应的标签并写道对应的日志目录下
path_for_meta = os.path.join(LOG_DIR, META_FILE)
with open(path_for_meta,'w') as f:
f.write("Index\tLabel\n")
for index,label in enumerate(mnist.test.labels):
f.write("%d\t%d\n" %(index, label))
Extracting /home/seisinv/data/mnist/train-images-idx3-ubyte.gz
Extracting /home/seisinv/data/mnist/train-labels-idx1-ubyte.gz
Extracting /home/seisinv/data/mnist/t10k-images-idx3-ubyte.gz
Extracting /home/seisinv/data/mnist/t10k-labels-idx1-ubyte.gz
png

png

生成好辅助数据之后,以下代码展示如何使用TensorFlow生成PROJECTOR所需要的日志文件,以可视化MNIST测试数据在最后的输出层向量。

1
from tensorflow.contrib.tensorboard.plugins import projector
1
2
3
4
5
6
7
8
9
10
11
BATCH_SIZE = 100
LEARNING_RATE_BASE = 0.8
LEARNING_RATE_DECAY = 0.99
REGULARIZATION_RATE = 0.0001
TRAINING_STEPS = 10000
MOVING_AVERAGE_DECAY = 0.99

LOG_DIR = '/media/seisinv/Data/04_data/ai/test/log3'
SPRITE_FILE = '/media/seisinv/Data/04_data/ai/test/log3/mnist_sprite.jpg'
META_FIEL = "/media/seisinv/Data/04_data/ai/test/log3/mnist_meta.tsv"
TENSOR_NAME = "FINAL_LOGITS"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
INPUT_NODE = 784
OUTPUT_NODE = 10
LAYER1_NODE = 500

def get_weight_variable(shape, regularizer):
weights = tf.get_variable("weights", shape, initializer=tf.truncated_normal_initializer(stddev=0.1))
if regularizer != None: tf.add_to_collection('losses', regularizer(weights))
return weights


def inference(input_tensor, regularizer):
with tf.variable_scope('layer1'):

weights = get_weight_variable([INPUT_NODE, LAYER1_NODE], regularizer)
biases = tf.get_variable("biases", [LAYER1_NODE], initializer=tf.constant_initializer(0.0))
layer1 = tf.nn.relu(tf.matmul(input_tensor, weights) + biases)

with tf.variable_scope('layer2'):
weights = get_weight_variable([LAYER1_NODE, OUTPUT_NODE], regularizer)
biases = tf.get_variable("biases", [OUTPUT_NODE], initializer=tf.constant_initializer(0.0))
layer2 = tf.matmul(layer1, weights) + biases

return layer2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
def train(mnist):
# 输入数据的命名空间。
with tf.name_scope('input'):
x = tf.placeholder(tf.float32, [None, INPUT_NODE], name='x-input')
y_ = tf.placeholder(tf.float32, [None, OUTPUT_NODE], name='y-input')
regularizer = tf.contrib.layers.l2_regularizer(REGULARIZATION_RATE)
y = inference(x, regularizer)
global_step = tf.Variable(0, trainable=False)

# 处理滑动平均的命名空间。
with tf.name_scope("moving_average"):
variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
variables_averages_op = variable_averages.apply(tf.trainable_variables())

# 计算损失函数的命名空间。
with tf.name_scope("loss_function"):
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))
cross_entropy_mean = tf.reduce_mean(cross_entropy)
loss = cross_entropy_mean + tf.add_n(tf.get_collection('losses'))

# 定义学习率、优化方法及每一轮执行训练的操作的命名空间。
with tf.name_scope("train_step"):
learning_rate = tf.train.exponential_decay(
LEARNING_RATE_BASE,
global_step,
mnist.train.num_examples / BATCH_SIZE, LEARNING_RATE_DECAY,
staircase=True)

train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)

with tf.control_dependencies([train_step, variables_averages_op]):
train_op = tf.no_op(name='train')

# 训练模型。
with tf.Session() as sess:
tf.global_variables_initializer().run()
for i in range(TRAINING_STEPS):
xs, ys = mnist.train.next_batch(BATCH_SIZE)
_, loss_value, step = sess.run([train_op, loss, global_step], feed_dict={x: xs, y_: ys})

if i % 1000 == 0:
print("After %d training step(s), loss on training batch is %g." % (i, loss_value))
# 计算测试数据对应的输出层矩阵
final_result = sess.run(y, feed_dict={x: mnist.test.images})

return final_result
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def visualisation(final_result):
"""
生成可视化最终输出层向量所需要的日志文件
"""
# PROJECTOR可视化的是Tensorflow中的变量,需要定义一个新的变量来保存输出层向量的取值
y = tf.Variable(final_result, name = TENSOR_NAME)
summary_writer = tf.summary.FileWriter(LOG_DIR)

# 通过projector.ProjectorConfig类帮助生成日志文件
config = projector.ProjectorConfig()
# 增加一个可视化的embeding结果
embedding = config.embeddings.add()
# 指定这个embeding结果对应的tensorflow变量
embedding.tensor_name = y.name

# 指定embeding结果所对应的原始数据信息,这里是每张测试图片对应的真实类别
embedding.metadata_path = META_FIEL

# 指定sprite图像,可选,如果没有提供,那么可视化的结果就是一个小圆点,而不是图片
embedding.sprite.image_path = SPRITE_FILE
# 指定单张图片的大小,有助于从spite图片中截取正确的原始图片
embedding.sprite.single_image_dim.extend([28,28])

# 将PROJECTOR所需要的内容写入日志文件
projector.visualize_embeddings(summary_writer, config)

sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
saver.save(sess, os.path.join(LOG_DIR, "model"), TRAINING_STEPS)

summary_writer.close()
1
2
3
4
5
6
7
def main(argv=None): 
mnist = input_data.read_data_sets("/home/seisinv/data/mnist/", one_hot=True)
final_result = train(mnist)
visualisation(final_result)

if __name__ == '__main__':
main()
Extracting /home/seisinv/data/mnist/train-images-idx3-ubyte.gz
Extracting /home/seisinv/data/mnist/train-labels-idx1-ubyte.gz
Extracting /home/seisinv/data/mnist/t10k-images-idx3-ubyte.gz
Extracting /home/seisinv/data/mnist/t10k-labels-idx1-ubyte.gz
After 0 training step(s), loss on training batch is 3.13751.
After 1000 training step(s), loss on training batch is 0.272537.
After 2000 training step(s), loss on training batch is 0.206967.
After 3000 training step(s), loss on training batch is 0.135616.
After 4000 training step(s), loss on training batch is 0.120108.
After 5000 training step(s), loss on training batch is 0.1051.
After 6000 training step(s), loss on training batch is 0.0955253.
After 7000 training step(s), loss on training batch is 0.0826855.
After 8000 training step(s), loss on training batch is 0.0760914.
After 9000 training step(s), loss on training batch is 0.0718116.

下图显示了使用PROJECTOR工具对输出向量经过PCA降维之后的显示结果,可以看出,经过10000次迭代之后,不同类别的图片比较好的区分开来了。在右边可以搜索特定的标签,这样可以快速的找到类别中比较难分的图片,加快错误案例的分析过程

当然,除了PCA以外,tensorflow也支持t-SNE降维方法。

PROJECTOR

搜索标签为5的PCA投影区域,可以快速找到难分的图片


结论

本文介绍了TensorBoard这种检测TensorFlow运行状态的交互工具,通过输出日志文件,可以实现的功能包括:

  1. 了解计算图的结构
  2. 分析每个计算节点的运行时间及内存消耗情况,以提供优化程序的重要参考信息
  3. 可视化模型训练过程的各种指标,直观地了解训练情况以提供优化模型的重要信息
  4. 降维分析输出层的分布情况,快速找到难分的图片,加快错误案例分析过程

参考资料

  • 郑泽宇、梁博文和顾思宇,Tensorflow: 实战Google深度学习框架(第二版)