sách gpt4 ăn đã đi

python - 数据集映射表中的 Tensorflow 特征列已初始化问题

In lại 作者:太空狗 更新时间:2023-10-30 00:18:41 28 4
mua khóa gpt4 giày nike

我在尝试在传递给数据集映射方法的函数内部使用 Tensorflow 的 feature_column 映射时遇到了问题。当尝试使用 Dataset.map 将数据集的分类字符串特征作为输入管道的一部分进行热编码时,会发生这种情况。我收到的错误消息是: tensorflow.python.framework.errors_impl.FailedPreconditionError: Table already initialized.

以下代码是重现问题的基本示例:

nhập numpy dưới dạng np    
import tensorflow as tf
from tensorflow.contrib.lookup import index_table_from_tensor

# generate tfrecords with two string categorical features and write to file
vlists = dict(season=['Spring', 'Summer', 'Fall', 'Winter'],
day=['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'])

writer = tf.python_io.TFRecordWriter('test.tfr')
for s,d in zip(np.random.choice(vlists['season'],50),
np.random.choice(vlists['day'],50)):
example = tf.train.Example(
features = tf.train.Features(
feature={
'season':tf.train.Feature(
bytes_list=tf.train.BytesList(value=[s.encode()])),
'day':tf.train.Feature(
bytes_list=tf.train.BytesList(value=[d.encode()]))
}
)
)
serialized = example.SerializeToString()
writer.write(serialized)
writer.close()

现在cwd中有一个名为test.tfr的tfrecord文件,有50条记录,每条记录由两个字符串特征组成,'season'和'day',下面将创建一个数据集,该数据集将解析 tfrecords 并创建大小为 4 的批处理

def parse_record(element):
feats = {
'season': tf.FixedLenFeature((), tf.string),
'day': tf.FixedLenFeature((), tf.string)
}
return tf.parse_example(element, feats)

fname = tf.placeholder(tf.string, [])
ds = tf.data.TFRecordDataset(fname)
ds = ds.batch(4).map(parse_record)

此时,如果您创建一个迭代器并在其上多次调用 get_next,它会按预期工作,并且每次运行您都会看到如下输出:

iterator = ds.make_initializable_iterator()
nxt = iterator.get_next()
sess.run(tf.tables_initializer())
sess.run(iterator.initializer, feed_dict={fname:'test.tfr'})
sess.run(nxt)
# output of run(nxt) would look like
# {'day': array([b'Sat', b'Thu', b'Fri', b'Thu'], dtype=object), 'season': array([b'Winter', b'Winter', b'Fall', b'Summer'], dtype=object)}

但是,如果我想使用 feature_columns 将这些分类热编码为使用 map 的数据集转换,那么它会运行一次并产生正确的输出,但在随后的每次调用 run(nxt) 时,它都会给出 Tables already initialized 错误,例如:

# using the same Dataset ds from above
season_enc = tf.feature_column.categorical_column_with_vocabulary_list(
key='season', vocabulary_list=vlists['season'])
season_col = tf.feature_column.indicator_column(season_enc)
day_enc = tf.feature_column.categorical_column_with_vocabulary_list(
key='day', vocabulary_list=vlists['day'])
day_col = tf.feature_column.indicator_column(day_enc)
cols = [season_col, day_col]

def _encode(element, feat_cols=cols):
return tf.feature_column.input_layer(element, feat_cols)

ds1 = ds.map(_encode)
iterator = ds1.make_initializable_iterator()
nxt = iterator.get_next()
sess.run(tf.tables_initializer())
sess.run(iterator.initializer, feed_dict={fname:'test.tfr'})
sess.run(nxt)
# first run will produce correct one hot encoded output
sess.run(nxt)
# second run will generate

W tensorflow/core/framework/op_kernel.cc:1192] Failed precondition: Table
already initialized.
2018-01-25 19:29:55.802358: W tensorflow/core/framework/op_kernel.cc:1192]
Failed precondition: Table already initialized.
2018-01-25 19:29:55.802612: W tensorflow/core/framework/op_kernel.cc:1192]
Failed precondition: Table already initialized.

tensorflow.python.framework.errors_impl.FailedPreconditionError: 表 已经初始化。

但是,如果我尝试像下面那样在没有 feature_columns 的情况下手动进行一次热编码,那么它仅在表是在 map 函数之前创建的情况下才有效,否则它会给出与上面相同的错误

# using same original Dataset ds
tables = dict(season=index_table_from_tensor(vlists['season']),
day=index_table_from_tensor(vlists['day']))
def to_dummy(element):
s = tables['season'].lookup(element['season'])
d = tables['day'].lookup(element['day'])
return (tf.one_hot(s, depth=len(vlists['season']), axis=-1),
tf.one_hot(d, depth=len(vlists['day']), axis=-1))

ds2 = ds.map(to_dummy)
iterator = ds2.make_initializable_iterator()
nxt = iterator.get_next()
sess.run(tf.tables_initializer())
sess.run(iterator.initializer, feed_dict={fname:'test.tfr'})
sess.run(nxt)

它似乎与 feature_columns 创建的索引查找表的范围或命名空间有关,但我不确定如何弄清楚这里发生了什么,我试过更改 feature_column 的位置和时间对象已定义,但没有任何区别。

câu trả lời hay nhất

我刚通过another recent question来到这里并想提出一个潜在的解决方案。由于这个问题已经很晚了,我不确定这里的问题是否已经解决。如果已经有好的解决方案请指正。

我真的不知道这个错误是怎么发生的。但是学习canned estimator ,我意识到可能有另一种方法来完成这项工作,即在解析示例之前迭代数据集。这种方法的一个好处是将特征列映射从映射函数分离到数据集。这可能与此处的未知错误原因有关 it is known that :

when using hash_table in "tensorflow.python.ops.gen_lookup_ops" in tf.data.Dataset.map function because tf.data.Dataset.map do not use the default graph, the hash_table can not be initialized.

我不确定这是否符合您真正想要的,但是在您的代码中使用“test.tfr”生成的潜在示例可能是:

import tensorflow as tf

# using the same Dataset ds from above
vlists = dict(season=['Spring', 'Summer', 'Fall', 'Winter'],
day=['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'])
season_enc = tf.feature_column.categorical_column_with_vocabulary_list(
key='season', vocabulary_list=vlists['season'])
season_col = tf.feature_column.indicator_column(season_enc)
day_enc = tf.feature_column.categorical_column_with_vocabulary_list(
key='day', vocabulary_list=vlists['day'])
day_col = tf.feature_column.indicator_column(day_enc)
cols = [season_col, day_col]

def _encode(element, feat_cols=cols):
element = tf.parse_example(element, features=tf.feature_column.make_parse_example_spec(feat_cols))
return tf.feature_column.input_layer(element, feat_cols)

fname = tf.placeholder(tf.string, [])
ds = tf.data.TFRecordDataset(fname)
ds = ds.batch(4)
ds1 = ds#.map(_encode)
iterator = ds1.make_initializable_iterator()
nxt = iterator.get_next()
nxt = _encode(nxt)

với tf.Session() là sess:
sess.run(tf.tables_initializer())
sess.run(iterator.initializer, feed_dict={fname:'test.tfr'})
print(sess.run(nxt))
# first run will produce correct one hot encoded output
print(sess.run(nxt))

关于python - 数据集映射表中的 Tensorflow 特征列已初始化问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48450785/

28 4 0
Chứng chỉ ICP Bắc Kinh số 000000
Hợp tác quảng cáo: 1813099741@qq.com 6ren.com
Xem sitemap của VNExpress