在TLT中训练mobilenetv2模型出现invalid loss

经过测试,在Resnet18里可以正常训练并计算loss
但是在mobilenetv2模型中会出现invalid loss并终止训练

==================================================================================
Total params: 1,222,588
Trainable params: 1,205,436
Non-trainable params: 17,152


2020-10-15 16:26:03,654 [INFO] iva.ssd.scripts.train: Number of images in the training dataset: 1527
2020-10-15 16:26:03,655 [INFO] iva.ssd.scripts.train: Number of images in the validation dataset: 248
Epoch 12/100
2020-10-15 16:26:32.618167: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-10-15 16:26:36.564399: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x7445790
2020-10-15 16:26:36.565127: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-10-15 16:26:36.969943: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-10-15 16:26:36.971407: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
6/24 [======>…] - ETA: 2:05 - loss: nan Batch 5: Invalid loss, terminating training

Epoch 00012: saving model to /workspace/mydata/ssd/experiment_dir_unpruned/weights/ssd_mobilenet_v2_epoch_012.tlt

以上是报错信息

通常是由于学习率高了,或者样本数量较少引起的。
你可以增加训练样本,或者调低学习率来改善