我创建了一个使用numpy的滑动窗口算法,该算法在wav音频文件上滑动并将其切片输入到我的NN中,在tensorflow中检测音频切片中的功能.一旦张量流完成它,它就会将其输出返回到numpy land,在那里我将切片重新组合成一个与原始文件的每个样本位置匹配的预测数组:
import tensorflow as tf import numpy as np import nn def slide_predict(layers, X, modelPath): output = None graph = tf.Graph() with graph.as_default(): input_layer_size, hidden_layer_size, num_labels = layers X_placeholder = tf.placeholder(tf.float32, shape=(None, input_layer_size), name='X') Theta1 = tf.Variable(nn.randInitializeWeights(input_layer_size, hidden_layer_size), name='Theta1') bias1 = tf.Variable(nn.randInitializeWeights(hidden_layer_size, 1), name='bias1') Theta2 = tf.Variable(nn.randInitializeWeights(hidden_layer_size, num_labels), name='Theta2') bias2 = tf.Variable(nn.randInitializeWeights(num_labels, 1), name='bias2') hypothesis = nn.forward_prop(X_placeholder, Theta1, bias1, Theta2, bias2) sess = tf.Session(graph=graph) saver = tf.train.Saver() init = tf.global_variables_initializer() sess.run(init) saver.restore(sess, modelPath) window_size = layers[0] pad_amount = (window_size * 2) - (X.shape[0] % window_size) X = np.pad(X, (pad_amount, 0), 'constant') for w in range(window_size): start = w end = -window_size + w X_shifted = X[start:end] X_matrix = X_shifted.reshape((-1, window_size)) prediction = sess.run(hypothesis, feed_dict={X_placeholder: X_matrix}) output = prediction if (output is None) else np.hstack((output, prediction)) sess.close() output.shape = (X.size, -1) return output
不幸的是,这个算法很慢.我沿途放置了一些日志,到目前为止,最慢的部分是我实际运行张量流图的部分.这可能是由于实际的张量流计算很慢(如果是这样,我可能只是SOL),但我想知道是否大部分缓慢不是因为我在来回反复传输大型音频文件超出张量流.所以我的问题是:
1)像这样反复喂食占位符会明显慢于X
喂它一次并计算内部张量流的值吗?
2)如果是,那么在tensorflow中实现滑动窗口算法的最佳方法是进行此计算吗?
第一个问题是你的算法是在有二次的时间复杂度window_size
,因为调用的,np.hstack()
在每次迭代中构建output
阵列,这两个副本的当前值output
,并prediction
进入一个新的数组:
for w in range(window_size): # ... output = prediction if (output is None) else np.hstack((output, prediction))
在循环终止之后,不是np.hstack()
在每次迭代中调用,而是构建Python的prediction
数组列表并调用np.hstack()
它们一次会更有效:
output_list = [] for w in range(window_size): # ... prediction = sess.run(...) output_list.append(prediction) output = np.hstack(output_list)
第二个问题是,如果sess.run()
调用中的计算量很小,将大值输入TensorFlow可能效率低,因为这些值(当前)被复制到C++中(结果被复制出来.一个有用的策略是尝试使用tf.map_fn()
构造将滑动窗口循环移动到TensorFlow图形中.例如,您可以按如下方式重构程序:
# NOTE: If you call this function often, you may want to (i) move the `np.pad()` # into the graph as `tf.pad()`, and (ii) replace `X_t` with a placeholder. X = np.pad(X, (pad_amount, 0), 'constant') X_t = tf.convert_to_tensor(X) def window_func(w): start = w end = w - window_size X_matrix = tf.reshape(X_t[start:end], (-1, window_size)) return nn.forward_prop(X_matrix, Theta1, bias1, Theta2, bias2) output_t = tf.map_fn(window_func, tf.range(window_size)) # ... output = sess.run(output_t)