由于许多机器学习算法依赖于矩阵乘法(或者至少可以使用矩阵乘法)来测试我的GPU,我计划创建矩阵a,b,将它们相乘并记录计算完成所需的时间.
这里的代码将生成两个维度为300000,20000的矩阵并将它们相乘:
import tensorflow as tf import numpy as np init = tf.global_variables_initializer() sess = tf.Session() sess.run(init) #a = np.array([[1, 2, 3], [4, 5, 6]]) #b = np.array([1, 2, 3]) a = np.random.rand(300000,20000) b = np.random.rand(300000,20000) println("Init complete"); result = tf.mul(a , b) v = sess.run(result) print(v)
这是比较GPU性能的充分测试吗?我还应该考虑哪些其他因素?
这是一个matmul基准测试的例子,它避免了常见的陷阱,并与Titan X Pascal上的官方11 TFLOP标记相匹配.
import os import sys os.environ["CUDA_VISIBLE_DEVICES"]="1" import tensorflow as tf import time n = 8192 dtype = tf.float32 with tf.device("/gpu:0"): matrix1 = tf.Variable(tf.ones((n, n), dtype=dtype)) matrix2 = tf.Variable(tf.ones((n, n), dtype=dtype)) product = tf.matmul(matrix1, matrix2) # avoid optimizing away redundant nodes config = tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0))) sess = tf.Session(config=config) sess.run(tf.global_variables_initializer()) iters = 10 # pre-warming sess.run(product.op) start = time.time() for i in range(iters): sess.run(product.op) end = time.time() ops = n**3 + (n-1)*n**2 # n^2*(n-1) additions, n^3 multiplications elapsed = (end - start) rate = iters*ops/elapsed/10**9 print('\n %d x %d matmul took: %.2f sec, %.2f G ops/sec' % (n, n, elapsed/iters, rate,))