技术 | 好处 | 硬体 |
---|---|---|
动态范围量化 | 小 4 倍,加速 2x-3x | CPU |
全INT量化 | 小 4 倍,加速 3x+ | CPU、Edge TPU、微控制器 |
Float16 量化 | 小 2 倍,GPU 加速 | CPU、GPU |
当您使用TensorFlow Lite Converter将已训练的 TensorFlow 模型转换为 TensorFlow Lite 格式时,您可以对其进行量化 。
另外 Pytorch 也有 QUANTIZATION 实作。
以下实作范例可用 Colab 执行,另请注意 TensorFlow 版本需 >= 1.15 。
tf.keras.datasets.mnist
,用CNN进行建模。baseline_weights.h5
,储存量化前的模型non_quantized.h5
,并记录模型大小与准确率,以进行训练後量化的比较。TensorFlow Lite 使用 *.tflite
格式,用tf.lite.TFLiteConverter.from_keras_model
转换先前建立的baseline_model。
converter = tf.lite.TFLiteConverter.from_keras_model(baseline_model)
tflite_model = converter.convert()
with open('non_quantized.tflite', 'wb') as f:
f.write(tflite_model)
建立TF Lite 的评估模型准确率的函数,转档为tflite
後需要特别撰写评估函数,参考并改写官方范例。
# A helper function to evaluate the TF Lite model using "test" dataset.
# from: https://www.tensorflow.org/lite/performance/post_training_integer_quant_16x8#evaluate_the_models
def evaluate_model(filemane):
#Load the model into the interpreters
interpreter = tf.lite.Interpreter(model_path=str(filemane))
interpreter.allocate_tensors()
input_index = interpreter.get_input_details()[0]["index"]
output_index = interpreter.get_output_details()[0]["index"]
# Run predictions on every image in the "test" dataset.
prediction_digits = []
for test_image in test_images:
# Pre-processing: add batch dimension and convert to float32 to match with
# the model's input data format.
test_image = np.expand_dims(test_image, axis=0).astype(np.float32)
interpreter.set_tensor(input_index, test_image)
# Run inference.
interpreter.invoke()
# Post-processing: remove batch dimension and find the digit
# with highest probability.
output = interpreter.tensor(output_index)
digit = np.argmax(output()[0])
prediction_digits.append(digit)
# Compare prediction results with ground truth labels to calculate accuracy.
accurate_count = 0
for index in range(len(prediction_digits)):
if prediction_digits[index] == test_labels[index]:
accurate_count += 1
accuracy = accurate_count * 1.0 / len(prediction_digits)
return accuracy
此时评估模型的准确率相近,模型尺寸减少。
ACCURACY:
{'baseline Keras model': 0.9581000208854675,
'non quantized tflite': 0.9581}
MODEL_SIZE:
{'baseline h5': 98136,
'non quantized tflite': 84688}
# Dynamic range quantization
converter = tf.lite.TFLiteConverter.from_keras_model(baseline_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT] #增加此设定
tflite_model = converter.convert()
with open('post_training_quantized.tflite', 'wb') as f:
f.write(tflite_model)
ACCURACY:
{'baseline Keras model': 0.9581000208854675,
'non quantized tflite': 0.9581,
'post training quantized tflite': 0.9582}
MODEL_SIZE:
{'baseline h5': 98136,
'non quantized tflite': 84688,
'post training quantized tflite': 24096}
tensorflow_model_optimization
模组,该模组提供 quantize_model()
完成任务。epochs = 1
。ACCURACY:
{'baseline Keras model': 0.9581000208854675,
'non quantized tflite': 0.9581,
'post training quantized tflite': 0.9582,
'quantization aware non-quantized': 0.1005999967455864}
MODEL_SIZE:
{'baseline h5': 98136,
'non quantized tflite': 84688,
'post training quantized tflite': 24096,
'quantization aware non-quantized': 115680}
<<: 用React刻自己的投资Dashboard Day5 - 多张图表渲染(Rendering lists)
昨天看到运镜这词,大家是不是会想到拍电影呢?今天的主题也是跟电影有关的(?),大家看电影的时候,遇到...
Web API -- Application Programming Interface for ...
在我们导入microsoft defender for endpoint後,我们主要工作是修复事件。...
前言 昨日我们学习了资料的型态,今天我们要来了解变数的参考。 Pass by Value ? Str...
不怎麽重要的前言 上一篇介绍了for loop的概念,让大家面对在有重复性、明确次数的处理时,可以使...