当我按照https://www.tensorflow.org/tutorials/deep_cnn所述执行cifar10模型时,使用单个GPU大约4小时后达到86%的准确率,当我使用2个GPU时,精度降至84%但达到84 2 GPU上的%准确度比1快.
我的直觉是,在https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py中定义的average_gradients函数会返回一个不太精确的渐变值,因为渐变的平均值将不如实际的梯度值.
如果梯度不太准确,则控制作为训练一部分学习的函数的参数不太准确.查看代码(https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py)为什么平均多个GPU上的梯度比在单个GPU上计算梯度更不准确?
我对平均值的平均值是否产生一个不太准确的值是正确的?
模型中的随机性描述为:
The images are processed as follows: They are cropped to 24 x 24 pixels, centrally for evaluation or randomly for training. They are approximately whitened to make the model insensitive to dynamic range. For training, we additionally apply a series of random distortions to artificially increase the data set size: Randomly flip the image from left to right. Randomly distort the image brightness. Randomly distort the image contrast.
src:https://www.tensorflow.org/tutorials/deep_cnn
这会对训练准确性产生影响吗?
更新:
试图进一步研究这一点,用不同数量的GPU进行损失函数值训练.
Training with 1 GPU : loss value : .7 , Accuracy : 86% Training with 2 GPU's : loss value : .5 , Accuracy : 84%
对于更高的精度,损失值是否应该更低,反之则不然?