Tensorflow r0.12的tf.nn.rnn_cell.LSTMCell文档将其描述为init:
tf.nn.rnn_cell.LSTMCell.__call__(inputs, state, scope=None)
其中state
如下:
state:如果state_is_tuple为False,则必须是状态Tensor,2-D,batch x state_size.如果state_is_tuple为True,则它必须是状态Tensors的元组,两者都是2-D,列大小为c_state和m_state.
它们是什么c_state
以及m_state
它们如何适合LSTM?我在文档中的任何地方都找不到对它们的引用.
以下是文档中该页面的链接.
我同意文件不清楚.看一下tf.nn.rnn_cell.LSTMCell.__call__
澄清(我从TensorFlow 1.0.0获取代码):
def __call__(self, inputs, state, scope=None): """Run one step of LSTM. Args: inputs: input Tensor, 2D, batch x num_units. state: if `state_is_tuple` is False, this must be a state Tensor, `2-D, batch x state_size`. If `state_is_tuple` is True, this must be a tuple of state Tensors, both `2-D`, with column sizes `c_state` and `m_state`. scope: VariableScope for the created subgraph; defaults to "lstm_cell". Returns: A tuple containing: - A `2-D, [batch x output_dim]`, Tensor representing the output of the LSTM after reading `inputs` when previous state was `state`. Here output_dim is: num_proj if num_proj was set, num_units otherwise. - Tensor(s) representing the new state of LSTM after reading `inputs` when the previous state was `state`. Same type and shape(s) as `state`. Raises: ValueError: If input size cannot be inferred from inputs via static shape inference. """ num_proj = self._num_units if self._num_proj is None else self._num_proj if self._state_is_tuple: (c_prev, m_prev) = state else: c_prev = array_ops.slice(state, [0, 0], [-1, self._num_units]) m_prev = array_ops.slice(state, [0, self._num_units], [-1, num_proj]) dtype = inputs.dtype input_size = inputs.get_shape().with_rank(2)[1] if input_size.value is None: raise ValueError("Could not infer input size from inputs.get_shape()[-1]") with vs.variable_scope(scope or "lstm_cell", initializer=self._initializer) as unit_scope: if self._num_unit_shards is not None: unit_scope.set_partitioner( partitioned_variables.fixed_size_partitioner( self._num_unit_shards)) # i = input_gate, j = new_input, f = forget_gate, o = output_gate lstm_matrix = _linear([inputs, m_prev], 4 * self._num_units, bias=True, scope=scope) i, j, f, o = array_ops.split( value=lstm_matrix, num_or_size_splits=4, axis=1) # Diagonal connections if self._use_peepholes: with vs.variable_scope(unit_scope) as projection_scope: if self._num_unit_shards is not None: projection_scope.set_partitioner(None) w_f_diag = vs.get_variable( "w_f_diag", shape=[self._num_units], dtype=dtype) w_i_diag = vs.get_variable( "w_i_diag", shape=[self._num_units], dtype=dtype) w_o_diag = vs.get_variable( "w_o_diag", shape=[self._num_units], dtype=dtype) if self._use_peepholes: c = (sigmoid(f + self._forget_bias + w_f_diag * c_prev) * c_prev + sigmoid(i + w_i_diag * c_prev) * self._activation(j)) else: c = (sigmoid(f + self._forget_bias) * c_prev + sigmoid(i) * self._activation(j)) if self._cell_clip is not None: # pylint: disable=invalid-unary-operand-type c = clip_ops.clip_by_value(c, -self._cell_clip, self._cell_clip) # pylint: enable=invalid-unary-operand-type if self._use_peepholes: m = sigmoid(o + w_o_diag * c) * self._activation(c) else: m = sigmoid(o) * self._activation(c) if self._num_proj is not None: with vs.variable_scope("projection") as proj_scope: if self._num_proj_shards is not None: proj_scope.set_partitioner( partitioned_variables.fixed_size_partitioner( self._num_proj_shards)) m = _linear(m, self._num_proj, bias=False, scope=scope) if self._proj_clip is not None: # pylint: disable=invalid-unary-operand-type m = clip_ops.clip_by_value(m, -self._proj_clip, self._proj_clip) # pylint: enable=invalid-unary-operand-type new_state = (LSTMStateTuple(c, m) if self._state_is_tuple else array_ops.concat([c, m], 1)) return m, new_state
关键是:
c = (sigmoid(f + self._forget_bias) * c_prev + sigmoid(i) * self._activation(j))
和
m = sigmoid(o) * self._activation(c)
和
new_state = (LSTMStateTuple(c, m)
如果比较的代码,以计算c
和m
与所述LSTM方程(见下文),可以看到它对应于小区状态(通常表示为c
)和隐藏状态(通常表示为h
),分别为:
new_state = (LSTMStateTuple(c, m)
表示返回状态元组的第一个元素是c
(cell state aka c_state
),返回状态元组的第二个元素是m
(hidden state aka m_state
).
我偶然发现了同样的问题,这是我理解的方式!简约LSTM示例:
import tensorflow as tf sample_input = tf.constant([[1,2,3]],dtype=tf.float32) LSTM_CELL_SIZE = 2 lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(LSTM_CELL_SIZE, state_is_tuple=True) state = (tf.zeros([1,LSTM_CELL_SIZE]),)*2 output, state_new = lstm_cell(sample_input, state) init_op = tf.global_variables_initializer() sess = tf.Session() sess.run(init_op) print sess.run(output)
请注意,state_is_tuple=True
当传递state
给它时cell
,它需要在tuple
表单中.c_state
并且m_state
可能是"记忆状态"和"细胞状态",但我老实说不确定,因为这些术语仅在文档中提及.在代码和论文中LSTM
- 字母h
和c
常用于表示"输出值"和"单元状态".
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
这些张量代表单元格的组合内部状态,应该一起传递.旧的方法是简单地连接它们,而新的方法是使用元组.
老路:
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(LSTM_CELL_SIZE, state_is_tuple=False) state = tf.zeros([1,LSTM_CELL_SIZE*2]) output, state_new = lstm_cell(sample_input, state)
新方法:
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(LSTM_CELL_SIZE, state_is_tuple=True) state = (tf.zeros([1,LSTM_CELL_SIZE]),)*2 output, state_new = lstm_cell(sample_input, state)
所以,基本上我们所做的一切,都是state
从1张长度4
变成两张长度的张量2
.内容保持不变.[0,0,0,0]
成为([0,0],[0,0])
.(这应该让它更快)