15赞

Tensorflow LSTM中的c_state和m_state是什么？

作者：jerry613 | 2023-09-06 17:50

如何解决《TensorflowLSTM中的c_state和m_state是什么？》经验，为你挑选了2个好方法。

Tensorflow r0.12的tf.nn.rnn_cell.LSTMCell文档将其描述为init:

tf.nn.rnn_cell.LSTMCell.__call__(inputs, state, scope=None)

其中state如下:

state:如果state_is_tuple为False,则必须是状态Tensor,2-D,batch x state_size.如果state_is_tuple为True,则它必须是状态Tensors的元组,两者都是2-D,列大小为c_state和m_state.

它们是什么c_state以及m_state它们如何适合LSTM？我在文档中的任何地方都找不到对它们的引用.

以下是文档中该页面的链接.

1> Franck Derno..：

我同意文件不清楚.看一下tf.nn.rnn_cell.LSTMCell.__call__澄清(我从TensorFlow 1.0.0获取代码):

def __call__(self, inputs, state, scope=None):
    """Run one step of LSTM.

    Args:
      inputs: input Tensor, 2D, batch x num_units.
      state: if `state_is_tuple` is False, this must be a state Tensor,
        `2-D, batch x state_size`.  If `state_is_tuple` is True, this must be a
        tuple of state Tensors, both `2-D`, with column sizes `c_state` and
        `m_state`.
      scope: VariableScope for the created subgraph; defaults to "lstm_cell".

    Returns:
      A tuple containing:

      - A `2-D, [batch x output_dim]`, Tensor representing the output of the
        LSTM after reading `inputs` when previous state was `state`.
        Here output_dim is:
           num_proj if num_proj was set,
           num_units otherwise.
      - Tensor(s) representing the new state of LSTM after reading `inputs` when
        the previous state was `state`.  Same type and shape(s) as `state`.

    Raises:
      ValueError: If input size cannot be inferred from inputs via
        static shape inference.
    """
    num_proj = self._num_units if self._num_proj is None else self._num_proj

    if self._state_is_tuple:
      (c_prev, m_prev) = state
    else:
      c_prev = array_ops.slice(state, [0, 0], [-1, self._num_units])
      m_prev = array_ops.slice(state, [0, self._num_units], [-1, num_proj])

    dtype = inputs.dtype
    input_size = inputs.get_shape().with_rank(2)[1]
    if input_size.value is None:
      raise ValueError("Could not infer input size from inputs.get_shape()[-1]")
    with vs.variable_scope(scope or "lstm_cell",
                           initializer=self._initializer) as unit_scope:
      if self._num_unit_shards is not None:
        unit_scope.set_partitioner(
            partitioned_variables.fixed_size_partitioner(
                self._num_unit_shards))
      # i = input_gate, j = new_input, f = forget_gate, o = output_gate
      lstm_matrix = _linear([inputs, m_prev], 4 * self._num_units, bias=True,
                            scope=scope)
      i, j, f, o = array_ops.split(
          value=lstm_matrix, num_or_size_splits=4, axis=1)

      # Diagonal connections
      if self._use_peepholes:
        with vs.variable_scope(unit_scope) as projection_scope:
          if self._num_unit_shards is not None:
            projection_scope.set_partitioner(None)
          w_f_diag = vs.get_variable(
              "w_f_diag", shape=[self._num_units], dtype=dtype)
          w_i_diag = vs.get_variable(
              "w_i_diag", shape=[self._num_units], dtype=dtype)
          w_o_diag = vs.get_variable(
              "w_o_diag", shape=[self._num_units], dtype=dtype)

      if self._use_peepholes:
        c = (sigmoid(f + self._forget_bias + w_f_diag * c_prev) * c_prev +
             sigmoid(i + w_i_diag * c_prev) * self._activation(j))
      else:
        c = (sigmoid(f + self._forget_bias) * c_prev + sigmoid(i) *
             self._activation(j))

      if self._cell_clip is not None:
        # pylint: disable=invalid-unary-operand-type
        c = clip_ops.clip_by_value(c, -self._cell_clip, self._cell_clip)
        # pylint: enable=invalid-unary-operand-type

      if self._use_peepholes:
        m = sigmoid(o + w_o_diag * c) * self._activation(c)
      else:
        m = sigmoid(o) * self._activation(c)

      if self._num_proj is not None:
        with vs.variable_scope("projection") as proj_scope:
          if self._num_proj_shards is not None:
            proj_scope.set_partitioner(
                partitioned_variables.fixed_size_partitioner(
                    self._num_proj_shards))
          m = _linear(m, self._num_proj, bias=False, scope=scope)

        if self._proj_clip is not None:
          # pylint: disable=invalid-unary-operand-type
          m = clip_ops.clip_by_value(m, -self._proj_clip, self._proj_clip)
          # pylint: enable=invalid-unary-operand-type

    new_state = (LSTMStateTuple(c, m) if self._state_is_tuple else
                 array_ops.concat([c, m], 1))
    return m, new_state

关键是:

c = (sigmoid(f + self._forget_bias) * c_prev + sigmoid(i) *
         self._activation(j))

和

m = sigmoid(o) * self._activation(c)

和

new_state = (LSTMStateTuple(c, m)

如果比较的代码,以计算c和m与所述LSTM方程(见下文),可以看到它对应于小区状态(通常表示为c)和隐藏状态(通常表示为h),分别为:

在此输入图像描述

new_state = (LSTMStateTuple(c, m)表示返回状态元组的第一个元素是c(cell state aka c_state),返回状态元组的第二个元素是m(hidden state aka m_state).

2> avloss..：

我偶然发现了同样的问题,这是我理解的方式!简约LSTM示例:

import tensorflow as tf

sample_input = tf.constant([[1,2,3]],dtype=tf.float32)

LSTM_CELL_SIZE = 2

lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(LSTM_CELL_SIZE, state_is_tuple=True)
state = (tf.zeros([1,LSTM_CELL_SIZE]),)*2

output, state_new = lstm_cell(sample_input, state)

init_op = tf.global_variables_initializer()

sess = tf.Session()
sess.run(init_op)
print sess.run(output)

请注意,state_is_tuple=True当传递state给它时cell,它需要在tuple表单中.c_state并且m_state可能是"记忆状态"和"细胞状态",但我老实说不确定,因为这些术语仅在文档中提及.在代码和论文中LSTM- 字母h和c常用于表示"输出值"和"单元状态". http://colah.github.io/posts/2015-08-Understanding-LSTMs/ 这些张量代表单元格的组合内部状态,应该一起传递.旧的方法是简单地连接它们,而新的方法是使用元组.

老路:

lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(LSTM_CELL_SIZE, state_is_tuple=False)
state = tf.zeros([1,LSTM_CELL_SIZE*2])

output, state_new = lstm_cell(sample_input, state)

新方法:

lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(LSTM_CELL_SIZE, state_is_tuple=True)
state = (tf.zeros([1,LSTM_CELL_SIZE]),)*2

output, state_new = lstm_cell(sample_input, state)

所以,基本上我们所做的一切,都是state从1张长度4变成两张长度的张量2.内容保持不变.[0,0,0,0]成为([0,0],[0,0]).(这应该让它更快)

推荐阅读

程序员
如何在Scrollview中使用Recyclerview

如何解决《如何在Scrollview中使用Recyclerview》经验，为你挑选了1个好方法。 ... [详细]
程序员
在Perl中为ISO 8601次添加时间

如何解决《在Perl中为ISO8601次添加时间》经验，为你挑选了1个好方法。 ... [详细]
程序员
Spinner功能在Android 6.0.1上无效

如何解决《Spinner功能在Android6.0.1上无效》经验，为你挑选了1个好方法。 ... [详细]
程序员
mysql左连接计数总是返回1

如何解决《mysql左连接计数总是返回1》经验，为你挑选了1个好方法。 ... [详细]
程序员
OSX应用程序:如何使窗口最大化？

如何解决《OSX应用程序:如何使窗口最大化？》经验，为你挑选了2个好方法。 ... [详细]
程序员
使用cssselector从多个结果中获取最后一个元素

如何解决《使用cssselector从多个结果中获取最后一个元素》经验，为你挑选了1个好方法。 ... [详细]
程序员
gradle守护程序的高内存使用率

如何解决《gradle守护程序的高内存使用率》经验，为你挑选了1个好方法。 ... [详细]
程序员
将sys.stdout作为参数传递给进程

如何解决《将sys.stdout作为参数传递给进程》经验，为你挑选了1个好方法。 ... [详细]
程序员
spring mvc rest protocol缓冲http 406不可接受的错误

如何解决《springmvcrestprotocol缓冲http406不可接受的错误》经验，为你挑选了1个好方法。 ... [详细]
程序员
file_get_contents没有返回整个网页

如何解决《file_get_contents没有返回整个网页》经验，为你挑选了1个好方法。 ... [详细]
程序员
简单程序的高CPU使用率

如何解决《简单程序的高CPU使用率》经验，为你挑选了1个好方法。 ... [详细]
程序员
将HDFS格式的本地磁盘替换为s3获取错误(org.apache.hadoop.service.AbstractService)

如何解决《将HDFS格式的本地磁盘替换为s3获取错误(org.apache.hadoop.service.AbstractService)》经验，为你挑选了1个好方法。 ... [详细]
程序员
Aptana Studio在MAC OS X El Capitan中找不到JNI_CreateJavaVM符号

如何解决《AptanaStudio在MACOSXElCapitan中找不到JNI_CreateJavaVM符号》经验，为你挑选了1个好方法。 ... [详细]
程序员
Go:从http.Request获取路径参数

如何解决《Go:从http.Request获取路径参数》经验，为你挑选了1个好方法。 ... [详细]
程序员
在C中声明变量

如何解决《在C中声明变量》经验，为你挑选了2个好方法。 ... [详细]
程序员
添加依赖项到Android Cordova插件

如何解决《添加依赖项到AndroidCordova插件》经验，为你挑选了2个好方法。 ... [详细]
程序员
Max in a C++ Array

如何解决《MaxinaC++Array》经验，为你挑选了1个好方法。 ... [详细]
程序员
R数据表 - 将值的向量添加为列

如何解决《R数据表-将值的向量添加为列》经验，为你挑选了2个好方法。 ... [详细]
程序员
Python嵌套范围函数

如何解决《Python嵌套范围函数》经验，为你挑选了2个好方法。 ... [详细]
程序员
导入pygame时出错

如何解决《导入pygame时出错》经验，为你挑选了3个好方法。 ... [详细]

jerry613

这个屌丝很懒，什么也没留下！

关注作者

Tags | 热门标签

RankList | 热门文章