我已经在Python中编写了一个3层神经网络,基于本教程,用Rock,Paper,Scissors 编写样本数据,使用-1表示摇滚,0表示纸张,1表示剪刀,以及类似的数组.在教程中.我的功能似乎在每次运行时陷入相对最小值,我正在寻找一种方法来解决这个问题.该计划如下.
#math module import numpy as np #sigmoid function converts numbers to percentages(between 0 and 1) def nonlin(x, deriv = False): if (deriv == True): #sigmoid derivative is just return x*(1-x)#output * (output - 1) return 1/(1+np.exp(-x)) #print the sigmoid function #input data: using MOCK RPS DATA, -1:ROCK, 0:PAPER, 1:SCISSORS input_data = np.array([[1, 1, 1], [0, 0, 0], [-1, -1, -1], [-1, 1, -1]]) #also for training output_data = np.array([[1], [0], [-1], [1]]) #random numbers to not get stuck in local minima for fitness np.random.seed(1) #create random weights to be trained in loop firstLayer_weights = 2*np.random.random((3, 4)) - 1 #size of matrix secondLayer_weights = 2*np.random.random((4, 1)) - 1 for value in xrange(60000): # loops through training #pass input through weights to output: three layers layer0 = input_data #layer1 takes dot product of the input and weight matrices, then maps them to sigmoid function layer1 = nonlin(np.dot(layer0, firstLayer_weights)) #layer2 takes dot product of layer1 result and weight matrices, then maps the to sigmoid function layer2 = nonlin(np.dot(layer1, secondLayer_weights)) #check computer predicted result against actual data layer2_error = output_data - layer2 #if value is a factor of 10,000, so six times (out of 60,000), #print how far off the predicted value was from the data if value % 10000 == 0: print "Error:" + str(np.mean(np.abs(layer2_error))) #average error #find out how much to re-adjust weights based on how far off and how confident the estimate layer2_change = layer2_error * nonlin(layer2, deriv = True) #find out how layer1 led to error in layer 2, to attack root of problem layer1_error = layer2_change.dot(secondLayer_weights.T) #^^sends error on layer2 backwards across weights(dividing) to find original error: BACKPROPAGATION #same thing as layer2 change, change based on accuracy and confidence layer1_change = layer1_error * nonlin(layer1, deriv=True) #modify weights based on multiplication of error between two layers secondLayer_weights = secondLayer_weights + layer1.T.dot(layer2_change) firstLayer_weights = firstLayer_weights + layer0.T.dot(layer1_change)
如您所见,此部分涉及的数据如下:
input_data = np.array([[1, 1, 1], [0, 0, 0], [-1, -1, -1], [-1, 1, -1]]) #also for training output_data = np.array([[1], [0], [-1], [1]])
权重在这里:
firstLayer_weights = 2*np.random.random((3, 4)) - 1 #size of matrix secondLayer_weights = 2*np.random.random((4, 1)) - 1
似乎在第一代之后,权重在剩余的编译中以最小的效率进行校正,这让我相信它们达到了相对最小值,如下所示:
什么是纠正这个问题的快速有效的替代方案?
您的网络的一个问题是输出(元素的值layer2
)只能在0和1之间变化,因为您使用的是S形非线性.由于您的四个目标值之一为-1且最接近的可能预测为0,因此始终存在至少25%的错误.以下是一些建议:
使用一个热编码为输出:即,具有三个输出节点,一个用于每个的ROCK
,PAPER
和SCISSORS
-和训练网络来计算在这些输出(通常使用SOFTMAX和交叉熵损失)的概率分布.
使网络的输出层成为线性层(应用权重和偏差,但不是非线性).添加另一个图层,或从当前输出图层中删除非线性.
您可以尝试的其他事项,但不太可能可靠地工作,因为您实际上处理的是分类数据而不是连续输出:
缩放数据,使训练数据中的所有输出都在0到1之间.
使用产生介于-1和1之间的值的非线性(例如tanh).