我已经在这里训练了recipie以下的烤宽面条一个简单的长短期记忆(LSTM)型号:https://github.com/Lasagne/Recipes/blob/master/examples/lstm_text_generation.py
这是架构:
l_in = lasagne.layers.InputLayer(shape=(None, None, vocab_size)) # We now build the LSTM layer which takes l_in as the input layer # We clip the gradients at GRAD_CLIP to prevent the problem of exploding gradients. l_forward_1 = lasagne.layers.LSTMLayer( l_in, N_HIDDEN, grad_clipping=GRAD_CLIP, nonlinearity=lasagne.nonlinearities.tanh) l_forward_2 = lasagne.layers.LSTMLayer( l_forward_1, N_HIDDEN, grad_clipping=GRAD_CLIP, nonlinearity=lasagne.nonlinearities.tanh) # The l_forward layer creates an output of dimension (batch_size, SEQ_LENGTH, N_HIDDEN) # Since we are only interested in the final prediction, we isolate that quantity and feed it to the next layer. # The output of the sliced layer will then be of size (batch_size, N_HIDDEN) l_forward_slice = lasagne.layers.SliceLayer(l_forward_2, -1, 1) # The sliced output is then passed through the softmax nonlinearity to create probability distribution of the prediction # The output of this stage is (batch_size, vocab_size) l_out = lasagne.layers.DenseLayer(l_forward_slice, num_units=vocab_size, W = lasagne.init.Normal(), nonlinearity=lasagne.nonlinearities.softmax) # Theano tensor for the targets target_values = T.ivector('target_output') # lasagne.layers.get_output produces a variable for the output of the net network_output = lasagne.layers.get_output(l_out) # The loss function is calculated as the mean of the (categorical) cross-entropy between the prediction and target. cost = T.nnet.categorical_crossentropy(network_output,target_values).mean() # Retrieve all parameters from the network all_params = lasagne.layers.get_all_params(l_out) # Compute AdaGrad updates for training print("Computing updates ...") updates = lasagne.updates.adagrad(cost, all_params, LEARNING_RATE) # Theano functions for training and computing cost print("Compiling functions ...") train = theano.function([l_in.input_var, target_values], cost, updates=updates, allow_input_downcast=True) compute_cost = theano.function([l_in.input_var, target_values], cost, allow_input_downcast=True) # In order to generate text from the network, we need the probability distribution of the next character given # the state of the network and the input (a seed). # In order to produce the probability distribution of the prediction, we compile a function called probs. probs = theano.function([l_in.input_var],network_output,allow_input_downcast=True)
并通过以下方式培训模型:
for it in xrange(data_size * num_epochs / BATCH_SIZE): try_it_out() # Generate text using the p^th character as the start. avg_cost = 0; for _ in range(PRINT_FREQ): x,y = gen_data(p) #print(p) p += SEQ_LENGTH + BATCH_SIZE - 1 if(p+BATCH_SIZE+SEQ_LENGTH >= data_size): print('Carriage Return') p = 0; avg_cost += train(x, y) print("Epoch {} average loss = {}".format(it*1.0*PRINT_FREQ/data_size*BATCH_SIZE, avg_cost / PRINT_FREQ))
如何保存模型,以便我不需要再次训练?使用scikit我通常只是挑选模型对象.然而,我不清楚与Theano /烤宽面条的类似过程.
您可以使用numpy保存权重:
np.savez('model.npz', *lasagne.layers.get_all_param_values(network_output))
然后再加载它们,如下所示:
with np.load('model.npz') as f: param_values = [f['arr_%d' % i] for i in range(len(f.files))] lasagne.layers.set_all_param_values(network_output, param_values)
资料来源:https://github.com/Lasagne/Lasagne/blob/master/examples/mnist.py
至于模型定义本身:在设置预训练权重之前,一个选项肯定是保留代码并重新生成网络.