13赞

将Tensorflow预处理添加到现有Keras模型(用于Tensorflow服务)

作者：围脖上的博博_771 | 2023-09-07 11:15

如何解决《将Tensorflow预处理添加到现有Keras模型(用于Tensorflow服务)》经验，为你挑选了2个好方法。

我想在我导出的Keras模型中包含我的自定义预处理逻辑,以用于Tensorflow服务.

我的预处理执行字符串标记化并使用外部字典将每个标记转换为索引以输入到嵌入层:

from keras.preprocessing import sequence

token_to_idx_dict = ... #read from file

# Custom Pythonic pre-processing steps on input_data
tokens = [tokenize(s) for s in input_data]
token_idxs = [[token_to_idx_dict[t] for t in ts] for ts in tokens]
tokens_padded = sequence.pad_sequences(token_idxs, maxlen=maxlen)

模型架构和培训:

model = Sequential()
model.add(Embedding(max_features, 128, input_length=maxlen))
model.add(LSTM(128, activation='sigmoid'))
model.add(Dense(n_classes, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')

model.fit(x_train, y_train)

由于该模型将用于Tensorflow服务,我想将所有预处理逻辑合并到模型本身(在导出的模型文件中编码).

问:我怎样才能使用Keras库？

我发现本指南解释了如何结合Keras和Tensorflow.但我仍然不确定如何将所有东西都作为一个模型出口.

我知道Tensorflow有内置的字符串拆分,文件I/O和字典查找操作.

使用Tensorflow操作的预处理逻辑:

# Get input text
input_string_tensor = tf.placeholder(tf.string, shape={1})
# Split input text by whitespace
splitted_string = tf.string_split(input_string_tensor, " ")
# Read index lookup dictionary
token_to_idx_dict = tf.contrib.lookup.HashTable(tf.contrib.lookup.TextFileInitializer("vocab.txt", tf.string, 0, tf.int64, 1, delimiter=","), -1)
# Convert tokens to indexes
token_idxs = token_to_idx_dict.lookup(splitted_string)
# Pad zeros to fixed length
token_idxs_padded = tf.pad(token_idxs, ...)

问:我如何使用这些Tensorflow预定义的预处理操作和我的Keras层一起训练然后将模型导出为用于Tensorflow服务的"黑盒子"？

1> Qululu..：

我想通了,所以我将在这里回答我自己的问题.

这是要点:

首先,(在单独的代码文件中)我使用Keras仅使用我自己的预处理函数训练模型,导出Keras模型权重文件和我的令牌到索引字典.

然后,我只复制了Keras模型架构,将输入设置为预处理张量输出,从先前训练的Keras模型加载权重文件,并将其夹在Tensorflow预处理操作和Tensorflow导出器之间.

最终产品:

import tensorflow as tf
from keras import backend as K
from keras.models import Sequential, Embedding, LSTM, Dense
from tensorflow.contrib.session_bundle import exporter
from tensorflow.contrib.lookup import HashTable, TextFileInitializer

# Initialize Keras with Tensorflow session
sess = tf.Session()
K.set_session(sess)

# Token to index lookup dictionary
token_to_idx_path = '...'
token_to_idx_dict = HashTable(TextFileInitializer(token_to_idx_path, tf.string, 0, tf.int64, 1, delimiter='\t'), 0)

maxlen = ...

# Pre-processing sub-graph using Tensorflow operations
input = tf.placeholder(tf.string, name='input')
sparse_tokenized_input = tf.string_split(input)
tokenized_input = tf.sparse_tensor_to_dense(sparse_tokenized_input, default_value='')
token_idxs = token_to_idx_dict.lookup(tokenized_input)
token_idxs_padded = tf.pad(token_idxs, [[0,0],[0,maxlen]])
token_idxs_embedding = tf.slice(token_idxs_padded, [0,0], [-1,maxlen])

# Initialize Keras model
model = Sequential()
e = Embedding(max_features, 128, input_length=maxlen)
e.set_input(token_idxs_embedding)
model.add(e)
model.add(LSTM(128, activation='sigmoid'))
model.add(Dense(num_classes, activation='softmax'))

# Load weights from previously trained Keras model
weights_path = '...'
model.load_weights(weights_path)

K.set_learning_phase(0)

# Export model in Tensorflow format
# (Official tutorial: https://github.com/tensorflow/serving/blob/master/tensorflow_serving/g3doc/serving_basic.md)
saver = tf.train.Saver(sharded=True)
model_exporter = exporter.Exporter(saver)
signature = exporter.classification_signature(input_tensor=model.input, scores_tensor=model.output)
model_exporter.init(sess.graph.as_graph_def(), default_graph_signature=signature)
model_dir = '...'
model_version = 1
model_exporter.export(model_dir, tf.constant(model_version), sess)

# Input example
with sess.as_default():
    token_to_idx_dict.init.run()
    sess.run(model.output, feed_dict={input: ["this is a raw input example"]})

仅供参考,Layer方法`set_input()`仅适用于Keras版本1.1.1.之后,它被删除了.我无法弄清楚如何在以后的版本中将图层的输入设置为Tensorflow张量.如果有人这样做,请评论.

2> Daniel Nitza..：

接受的答案非常有用,但它使用过时的Keras API,如@Qululu提到的,以及过时的TF服务API(导出器),它没有显示如何导出模型以使其输入是原始的tf占位符(相对于Keras model.input,后期预处理).以下版本适用于TF v1.4和Keras 2.1.2:

sess = tf.Session()
K.set_session(sess)

K._LEARNING_PHASE = tf.constant(0)
K.set_learning_phase(0)

max_features = 5000
max_lens = 500

dict_table = tf.contrib.lookup.HashTable(tf.contrib.lookup.TextFileInitializer("vocab.txt",tf.string, 0, tf.int64, TextFileIndex.LINE_NUMBER, vocab_size=max_features, delimiter=" "), 0)

x_input = tf.placeholder(tf.string, name='x_input', shape=(None,))
sparse_tokenized_input = tf.string_split(x_input)
tokenized_input = tf.sparse_tensor_to_dense(sparse_tokenized_input, default_value='')
token_idxs = dict_table.lookup(tokenized_input)
token_idxs_padded = tf.pad(token_idxs, [[0,0],[0, max_lens]])
token_idxs_embedding = tf.slice(token_idxs_padded, [0,0], [-1, max_lens])

model = Sequential()
model.add(InputLayer(input_tensor=token_idxs_embedding, input_shape=(None, max_lens)))

 ...REST OF MODEL...

model.load_weights("model.h5")

x_info = tf.saved_model.utils.build_tensor_info(x_input)
y_info = tf.saved_model.utils.build_tensor_info(model.output)

prediction_signature = tf.saved_model.signature_def_utils.build_signature_def(inputs={"text": x_info}, outputs={"prediction":y_info}, method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME)

builder = saved_model_builder.SavedModelBuilder("/path/to/model")

legacy_init_op = tf.group(tf.tables_initializer(), name='legacy_init_op')

init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())
sess.run(init_op)


# Add the meta_graph and the variables to the builder
builder.add_meta_graph_and_variables(
  sess, [tag_constants.SERVING],
  signature_def_map={
       signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
           prediction_signature,
  },
  legacy_init_op=legacy_init_op)

builder.save()

更新使用Tensorflow进行推理的预处理是CPU操作,如果模型部署在GPU服务器上,则无法有效执行.GPU失速非常糟糕,吞吐量非常低.因此,我们放弃了这一点,以便在客户端进程中进行有效的预处理.

推荐阅读

程序员
SQL Max(日期)没有group by

如何解决《SQLMax(日期)没有groupby》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何使用PHP和GD来解释字体冲突

如何解决《如何使用PHP和GD来解释字体冲突》经验，为你挑选了0个好方法。 ... [详细]
程序员
Clojure - 使用recur vs普通递归函数调用

如何解决《Clojure-使用recurvs普通递归函数调用》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何在UWP/RT XAML中声明系统数据类型？

如何解决《如何在UWP/RTXAML中声明系统数据类型？》经验，为你挑选了1个好方法。 ... [详细]
程序员
在正在运行的进程中给出其地址,在可执行文件中查找指令？

如何解决《在正在运行的进程中给出其地址,在可执行文件中查找指令？》经验，为你挑选了1个好方法。 ... [详细]
程序员
将对象数组过滤为对象

如何解决《将对象数组过滤为对象》经验，为你挑选了1个好方法。 ... [详细]
程序员
从承诺返回然后()

如何解决《从承诺返回然后()》经验，为你挑选了3个好方法。 ... [详细]
程序员
如何在python中更改目录，以便在运行脚本后仍保留该目录？

如何解决《如何在python中更改目录，以便在运行脚本后仍保留该目录？》经验，为你挑选了0个好方法。 ... [详细]
程序员
存储微服务的子模块,但仍使用分叉

如何解决《存储微服务的子模块,但仍使用分叉》经验，为你挑选了0个好方法。 ... [详细]
程序员
Javascript增量评估的操作顺序

如何解决《Javascript增量评估的操作顺序》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何对OData客户端进行单元测试？

如何解决《如何对OData客户端进行单元测试？》经验，为你挑选了1个好方法。 ... [详细]
程序员
我的剧本有什么问题,请告诉我

如何解决《我的剧本有什么问题,请告诉我》经验，为你挑选了1个好方法。 ... [详细]
程序员
在IDE中打开模式对话框时，访问Visual Studio DTE成员将挂起。

如何解决《在IDE中打开模式对话框时，访问VisualStudioDTE成员将挂起。》经验，为你挑选了0个好方法。 ... [详细]
程序员
jvnet maven-jaxb2-plugin版本0.13.0的Eclipse错误消息

如何解决《jvnetmaven-jaxb2-plugin版本0.13.0的Eclipse错误消息》经验，为你挑选了1个好方法。 ... [详细]
程序员
从过滤器访问"会话"

如何解决《从过滤器访问"会话"》经验，为你挑选了1个好方法。 ... [详细]
程序员
快递路由器 - :id？

如何解决《快递路由器-:id？》经验，为你挑选了2个好方法。 ... [详细]
程序员
Visual Studio表现得很奇怪.我该如何解决？

如何解决《VisualStudio表现得很奇怪.我该如何解决？》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何获取导致“ st_distance_sphere参数错误”错误的值？

如何解决《如何获取导致“st_distance_sphere参数错误”错误的值？》经验，为你挑选了2个好方法。 ... [详细]
程序员
Edge不会在flexbox中拉伸iframe

如何解决《Edge不会在flexbox中拉伸iframe》经验，为你挑选了1个好方法。 ... [详细]
程序员
Woocommerce:如何从ID中获取产品slug？

如何解决《Woocommerce:如何从ID中获取产品slug？》经验，为你挑选了1个好方法。 ... [详细]

围脖上的博博_771

这个屌丝很懒，什么也没留下！

关注作者

Tags | 热门标签

RankList | 热门文章