我看到imageDataGenerator允许我指定不同样式的数据规范化,例如featurewise_center,samplewise_center等.
我从示例中看到,如果我指定其中一个选项,那么我需要在生成器上调用fit方法,以便允许生成器计算统计数据,如生成器上的平均图像.
(X_train, y_train), (X_test, y_test) = cifar10.load_data() Y_train = np_utils.to_categorical(y_train, nb_classes) Y_test = np_utils.to_categorical(y_test, nb_classes) datagen = ImageDataGenerator( featurewise_center=True, featurewise_std_normalization=True, rotation_range=20, width_shift_range=0.2, height_shift_range=0.2, horizontal_flip=True) # compute quantities required for featurewise normalization # (std, mean, and principal components if ZCA whitening is applied) datagen.fit(X_train) # fits the model on batches with real-time data augmentation: model.fit_generator(datagen.flow(X_train, Y_train, batch_size=32), samples_per_epoch=len(X_train), nb_epoch=nb_epoch)
我的问题是,如果我在培训期间指定了数据规范化,预测如何工作?我无法看到在框架中我甚至会传递训练集均值/标准偏差的知识以预测允许我自己标准化我的测试数据,但我也没有在训练代码中看到此信息是存储.
归一化所需的图像统计是否存储在模型中,以便在预测期间使用它们?
是的 - 这是一个非常大的缺点Keras.ImageDataGenerator
,你无法自己提供标准化统计数据.但是 - 如何克服这个问题有一个简单的方法.
假设您有一个normalize(x)
正常化图像批处理的功能(请记住,生成器不提供简单的图像,而是提供图像数组 - 具有形状的批处理,(nr_of_examples_in_batch, image_dims ..)
您可以通过使用以下方法使您自己的生成器具有规范化:
def gen_with_norm(gen, normalize): for x, y in gen: yield normalize(x), y
然后你可能只是使用gen_with_norm(datagen.flow, normalize)
而不是datagen.flow
.
此外 - 您可以通过从datagen中的适当字段(例如和)获取它来恢复mean
和std
计算fit
方法.datagen.mean
datagen.std
standardize
对每个元素使用生成器的方法.以下是CIFAR 10的完整示例:
#!/usr/bin/env python import keras from keras.datasets import cifar10 from keras.preprocessing.image import ImageDataGenerator from keras.models import Sequential from keras.layers import Dense, Dropout, Flatten from keras.layers import Conv2D, MaxPooling2D # input image dimensions img_rows, img_cols, img_channels = 32, 32, 3 num_classes = 10 batch_size = 32 epochs = 1 # The data, shuffled and split between train and test sets: (x_train, y_train), (x_test, y_test) = cifar10.load_data() print(x_train.shape[0], 'train samples') print(x_test.shape[0], 'test samples') # Convert class vectors to binary class matrices. y_train = keras.utils.to_categorical(y_train, num_classes) y_test = keras.utils.to_categorical(y_test, num_classes) model = Sequential() model.add(Conv2D(32, (3, 3), padding='same', activation='relu', input_shape=x_train.shape[1:])) model.add(Conv2D(32, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Conv2D(64, (3, 3), padding='same', activation='relu')) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(512, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(num_classes, activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy']) x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train /= 255 x_test /= 255 datagen = ImageDataGenerator(zca_whitening=True) # Compute principal components required for ZCA datagen.fit(x_train) # Apply normalization (ZCA and others) print(x_test.shape) for i in range(len(x_test)): # this is what you are looking for x_test[i] = datagen.standardize(x_test[i]) print(x_test.shape) # Fit the model on the batches generated by datagen.flow(). model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size), steps_per_epoch=x_train.shape[0] // batch_size, epochs=epochs, validation_data=(x_test, y_test))