『TensorFlow × MXNet』SSD項目複現經驗

最後更新：2018-08-31 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：and 就會 number 興趣 enc case 版本位置 dict

為了加深理解，我對SSD項目進行了複現，基於原版，有按照自己理解的修改，

項目見github：SSD_Realization_TensorFlow、SSD_Realization_MXNet

構建思路按照訓練主函數的步驟順序，文末貼了出來，下面我們按照這個順序簡要介紹一下各個流程的重點，想要詳細瞭解的建議看一看之前的解讀源碼的對應篇章（tf），或者看看李沐博士的ssd介紹視頻（雖然不太詳細，不過結合講義思路很清晰，參見：『MXNet』第十彈_物體檢測SSD）。

重點說明

SSD架構主要有四個部分，網路設計、搜尋方塊設計、學習目標處理、損失函數實現。

網路設計

重點在於正常前向網路中挑選出的特徵層分別添加兩個卷積出口：分類和迴歸出口，用於對應後面的每個搜尋方塊的各個類別得分、以及4個座標值。

搜尋方塊設計

對應網路的特徵層：每個層有若干搜尋方塊，我們需要搜尋方塊位置圖形資訊。對於tf版本我們儲存了每個框的中心點以及HW資訊，而mx版本我們儲存的是左上右下兩個的4個座標數值，mx更為直觀，但是tf版本節省空間的：一組框對應同一個中心點，不過搜尋方塊資訊量不大，b無傷大雅。

學習目標處理

個人感覺最為繁瑣，我們需要的的資訊包含（此時已經獲得了）：一組搜尋方塊（實際上指的是全部搜尋方塊的n4個座標值），圖片的label、圖片的真實框座標(對應label數目4)，我們需要的就是找到搜尋方塊和真是圖片的標籤聯絡，
擷取：
每個搜尋方塊對應的分類（和哪個真實框的IOU最大就選真實框的類別標註給該搜尋，也就是說會出現大量的0 class搜尋方塊）
每個搜尋方塊的座標的迴歸目標（同上的尋找方法，空位也為0）
負類掩碼，雖然每張圖片裡面通常只有幾個標註的邊框，但SSD會產生大量的錨框。可以想象很多錨框都不會框住感興趣的物體，就是說跟任何對應感興趣物體的表框的IoU都小於某個閾值。這樣就會產生大量的負類錨框，或者說對應標號為0的錨框。對於這類錨框有兩點要考慮的：
1、邊框預測的損失函數不應該包括負類錨框，因為它們並沒有對應的真實邊框
2、因為負類錨框數目可能遠多於其他，我們可以只保留其中的一些。而且是保留那些目前預測最不確信它是負類的，就是對類0預測值排序，選取數值最小的哪一些困難的負類錨框
所以需要使用掩碼，抑制一部分計算出來的loss。

損失函數

可講的不多，按照公式實現即可，重點也在上一步計算出來的掩碼處理損失函數值一步。

MXNet訓練主函數

if __name__ == ‘__main__‘:    batch_size = 4    ctx = mx.cpu(0)    # ctx = mx.gpu(0)    # box_metric = mx.MAE()    cls_metric = mx.metric.Accuracy()    ssd = ssd_mx.SSDNet()    ssd.initialize(ctx=ctx)  # mx.init.Xavier(magnitude=2)    cls_loss = util_mx.FocalLoss()    box_loss = util_mx.SmoothL1Loss()    trainer = mx.gluon.Trainer(ssd.collect_params(),                               ‘sgd‘, {‘learning_rate‘: 0.01, ‘wd‘: 5e-4})    data = get_iterators(data_shape=304, batch_size=batch_size)    for epoch in range(30):        # reset data iterators and metrics        data.reset()        cls_metric.reset()        # box_metric.reset()        tic = time.time()        for i, batch in enumerate(data):            start_time = time.time()            x = batch.data[0].as_in_context(ctx)            y = batch.label[0].as_in_context(ctx)            # 將-1預留位置改為背景標籤0，對應座標框記錄為[0,0,0,0]            y = nd.where(y < 0, nd.zeros_like(y), y)            with mx.autograd.record():                # anchors, 檢測框座標，[1，n，4]                # class_preds, 各圖片各檢測框分類情況，[bs，n，num_cls + 1]                # box_preds, 各圖片檢測框座標預測情況，[bs, n * 4]                anchors, class_preds, box_preds = ssd(x, True)                # box_target, 檢測框的收斂目標，[bs, n * 4]                # box_mask, 隱藏不需要的背景類，[bs, n * 4]                # cls_target, 記錄全檢測框的真實類別，[bs，n]                box_target, box_mask, cls_target = ssd_mx.training_targets(anchors, class_preds, y)                loss1 = cls_loss(class_preds, cls_target)                loss2 = box_loss(box_preds, box_target, box_mask)                loss = loss1 + loss2            loss.backward()            trainer.step(batch_size)            if i % 1 == 0:                duration = time.time() - start_time                examples_per_sec = batch_size / duration                sec_per_batch = float(duration)                format_str = "[*] step %d,  loss=%.2f (%.1f examples/sec; %.3f sec/batch)"                print(format_str % (i, nd.sum(loss).asscalar(), examples_per_sec, sec_per_batch))            if i % 500 == 0:ssd.model.save_parameters(‘model_mx_{}.params‘.format(epoch))

TensorFlow訓練主函數

def main():    max_steps = 1500    batch_size = 32    adam_beta1 = 0.9    adam_beta2 = 0.999    opt_epsilon = 1.0    num_epochs_per_decay = 2.0    num_samples_per_epoch = 17125    moving_average_decay = None    tf.logging.set_verbosity(tf.logging.DEBUG)    with tf.Graph().as_default():        # Create global_step.        with tf.device("/device:CPU:0"):            global_step = tf.train.create_global_step()        ssd = SSDNet()        ssd_anchors = ssd.anchors        # tfr解析操作放在GPU下有加速，效果不穩定        dataset =             tfr_data_process.get_split(‘./TFR_Data‘,                                       ‘voc2012_*.tfrecord‘,                                       num_classes=21,                                       num_samples=num_samples_per_epoch)        with tf.device("/device:CPU:0"):  # 僅CPU支援隊列操作            image, glabels, gbboxes =                 tfr_data_process.tfr_read(dataset)            image, glabels, gbboxes =                 preprocess_img_tf.preprocess_image(image, glabels, gbboxes, out_shape=(300, 300))            gclasses, glocalisations, gscores =                 ssd.bboxes_encode(glabels, gbboxes, ssd_anchors)            batch_shape = [1] + [len(ssd_anchors)] * 3  # (1,f層,f層,f層)            # Training batches and queue.            r = tf.train.batch(  # 圖片，中心點類別，真實框座標，得分                util_tf.reshape_list([image, gclasses, glocalisations, gscores]),                batch_size=batch_size,                num_threads=4,                capacity=5 * batch_size)            batch_queue = slim.prefetch_queue.prefetch_queue(                r,  # <-----輸入格式實際上並不需要調整                capacity=2 * 1)        # Dequeue batch.        b_image, b_gclasses, b_glocalisations, b_gscores =             util_tf.reshape_list(batch_queue.dequeue(), batch_shape)  # 重整list        predictions, localisations, logits, end_points =             ssd.net(b_image, is_training=True, weight_decay=0.00004)        ssd.losses(logits, localisations,                   b_gclasses, b_glocalisations, b_gscores,                   match_threshold=.5,                   negative_ratio=3,                   alpha=1,                   label_smoothing=.0)        update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)        # =================================================================== #        # Configure the moving averages.        # =================================================================== #        if moving_average_decay:            moving_average_variables = slim.get_model_variables()            variable_averages = tf.train.ExponentialMovingAverage(                moving_average_decay, global_step)        else:            moving_average_variables, variable_averages = None, None        # =================================================================== #        # Configure the optimization procedure.        # =================================================================== #        with tf.device("/device:CPU:0"):  # learning_rate節點使用CPU（不明）            decay_steps = int(num_samples_per_epoch / batch_size * num_epochs_per_decay)            learning_rate = tf.train.exponential_decay(0.01,                                                       global_step,                                                       decay_steps,                                                       0.94,  # learning_rate_decay_factor,                                                       staircase=True,                                                       name=‘exponential_decay_learning_rate‘)            optimizer = tf.train.AdamOptimizer(                learning_rate,                beta1=adam_beta1,                beta2=adam_beta2,                epsilon=opt_epsilon)            tf.summary.scalar(‘learning_rate‘, learning_rate)        if moving_average_decay:            # Update ops executed locally by trainer.            update_ops.append(variable_averages.apply(moving_average_variables))        # Variables to train.        trainable_scopes = None        if trainable_scopes is None:            variables_to_train = tf.trainable_variables()        else:            scopes = [scope.strip() for scope in trainable_scopes.split(‘,‘)]            variables_to_train = []            for scope in scopes:                variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope)                variables_to_train.extend(variables)        losses = tf.get_collection(tf.GraphKeys.LOSSES)        regularization_losses = tf.get_collection(            tf.GraphKeys.REGULARIZATION_LOSSES)        regularization_loss = tf.add_n(regularization_losses)        loss = tf.add_n(losses)        tf.summary.scalar("loss", loss)        tf.summary.scalar("regularization_loss", regularization_loss)        grad = optimizer.compute_gradients(loss, var_list=variables_to_train)        grad_updates = optimizer.apply_gradients(grad,                                                 global_step=global_step)        update_ops.append(grad_updates)        # update_op = tf.group(*update_ops)        with tf.control_dependencies(update_ops):            total_loss = tf.add_n([loss, regularization_loss])        tf.summary.scalar("total_loss", total_loss)        # =================================================================== #        # Kicks off the training.        # =================================================================== #        gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8)        config = tf.ConfigProto(log_device_placement=False,                                gpu_options=gpu_options)        saver = tf.train.Saver(max_to_keep=5,                               keep_checkpoint_every_n_hours=1.0,                               write_version=2,                               pad_step_number=False)        if True:            import os            import time            print(‘start......‘)            model_path = ‘./logs‘            batch_size = batch_size            with tf.Session(config=config) as sess:                summary = tf.summary.merge_all()                coord = tf.train.Coordinator()                threads = tf.train.start_queue_runners(sess=sess, coord=coord)                writer = tf.summary.FileWriter(model_path, sess.graph)                init_op = tf.group(tf.global_variables_initializer(),                                   tf.local_variables_initializer())                init_op.run()                for step in range(max_steps):                    start_time = time.time()                    loss_value = sess.run(total_loss)                    # loss_value, summary_str = sess.run([train_tensor, summary_op])                    # writer.add_summary(summary_str, step)                    duration = time.time() - start_time                    if step % 10 == 0:                        summary_str = sess.run(summary)                        writer.add_summary(summary_str, step)                        examples_per_sec = batch_size / duration                        sec_per_batch = float(duration)                        format_str = "[*] step %d,  loss=%.2f (%.1f examples/sec; %.3f sec/batch)"                        print(format_str % (step, loss_value, examples_per_sec, sec_per_batch))                    # if step % 100 == 0:                    #     accuracy_step = test_cifar10(sess, training=False)                    #     acc.append(‘{:.3f}‘.format(accuracy_step))                    #     print(acc)                    if step % 500 == 0 and step != 0:                        saver.save(sess, os.path.join(model_path, "ssd_tf.model"), global_step=step)                coord.request_stop()coord.join(threads)

『TensorFlow × MXNet』SSD項目複現經驗

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More