OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.

Last update: Jan 08, 2023

Overview

English | 简体中文

Documentation: https://mmtracking.readthedocs.io/

Introduction

MMTracking is an open source video perception toolbox based on PyTorch. It is a part of the OpenMMLab project.

The master branch works with PyTorch1.3+.

Major features

The First Unified Video Perception Platform

We are the first open source toolbox that unifies versatile video perception tasks include video object detection, multiple object tracking, single object tracking and video instance segmentation.
Modular Design

We decompose the video perception framework into different components and one can easily construct a customized method by combining different modules.
Simple, Fast and Strong

Simple: MMTracking interacts with other OpenMMLab projects. It is built upon MMDetection that we can capitalize any detector only through modifying the configs.

Fast: All operations run on GPUs. The training and inference speeds are faster than or comparable to other implementations.

Strong: We reproduce state-of-the-art models and some of them even outperform the official implementations.

License

This project is released under the Apache 2.0 license.

Changelog

v0.8.0 was released in 03/10/2021. Please refer to changelog.md for details and release history.

Benchmark and model zoo

Results and models are available in the model zoo.

Supported methods of video object detection:

DFF (CVPR 2017)
FGFA (ICCV 2017)
SELSA (ICCV 2019)
Temporal RoI Align (AAAI 2021)

Supported methods of multi object tracking:

SORT/DeepSORT (ICIP 2016/2017)
Tracktor (ICCV 2019)

Supported methods of single object tracking:

SiameseRPN++ (CVPR 2019)

Supported methods of video instance segmentation:

MaskTrack R-CNN (ICCV 2019)

Installation

Please refer to install.md for install instructions.

Getting Started

Please see dataset.md and quick_run.md for the basic usage of MMTracking. We also provide usage tutorials, such as learning about configs, an example about detailed description of vid config, an example about detailed description of mot config, an example about detailed description of sot config, customizing dataset, customizing data pipeline, customizing vid model, customizing mot model, customizing sot model, customizing runtime settings and useful tools.

Contributing

We appreciate all contributions to improve MMTracking. Please refer to CONTRIBUTING.md for the contributing guideline.

Acknowledgement

MMTracking is an open source project that welcome any contribution and feedback. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible as well as standardized toolkit to reimplement existing methods and develop their own new video perception methods.

Citation

If you find this project useful in your research, please consider cite:

@misc{mmtrack2020,
    title={{MMTracking: OpenMMLab} video perception toolbox and benchmark},
    author={MMTracking Contributors},
    howpublished = {\url{https://github.com/open-mmlab/mmtracking}},
    year={2020}
}

Projects in OpenMMLab

MMCV: OpenMMLab foundational library for computer vision.
MIM: MIM Installs OpenMMLab Packages.
MMClassification: OpenMMLab image classification toolbox and benchmark.
MMDetection: OpenMMLab detection toolbox and benchmark.
MMDetection3D: OpenMMLab's next-generation platform for general 3D object detection.
MMSegmentation: OpenMMLab semantic segmentation toolbox and benchmark.
MMAction2: OpenMMLab's next-generation action understanding toolbox and benchmark.
MMTracking: OpenMMLab video perception toolbox and benchmark.
MMPose: OpenMMLab pose estimation toolbox and benchmark.
MMEditing: OpenMMLab image and video editing toolbox.
MMOCR: OpenMMLab text detection, recognition and understanding toolbox.
MMGeneration: OpenMMLab Generative Model toolbox and benchmark.
MMFlow: OpenMMLab optical flow toolbox and benchmark.

Comments

Parameters and variables setting in DFF model

In "/mmtracking/mmtrack/models/motion/flownet_simple.py," the init parameters "flow_img_norm_std=[255.0, 255.0, 255.0]" and "flow_img_norm_mean=[0.411, 0.432, 0.450]" . What's the meaning of these parameters? I'm using a type of data with 10 channels, how should I set these parameters?

Also, in "prepare_imgs" method, "img_metas[0]['img_norm_cfg']['mean']" and "img_metas[0]['img_norm_cfg']['std']" are both initialized with 0.Is it necessary to reassign the value while training or testing? If necessary, how and what value should I assign to these variables?

opened by yan811 13
training MOT dataset

hello everyone. thank you for your answers in advance. i am new here, forgive me if i can't explain myself. I am trying to train the mot dataset. but i have a problem with pytorch. torch.distributed.launch is giving me error. i need to change to torhcrun (Transitioning from torch.distributed.launch to torchrun)but i couldn't modify the train.py script. can you please help me with that? thanks again.

opened by mehmetcanmitil 9
在使用多卡训练VID模型时，验证到最后几张图篇时发生卡顿（等了一个小时都没有更新）。
我的日志如下：

2021-12-26 16:04:19,060 - mmtrack - INFO - Environment info:

sys.platform: linux Python: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0] CUDA available: True GPU 0,1,2: GeForce RTX 2080 Ti CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 10.0, V10.0.130 GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 PyTorch: 1.5.0 PyTorch compiling details: PyTorch built with:

GCC 7.3

C++ Version: 201402

Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)

OpenMP 201511 (a.k.a. OpenMP 4.5)

NNPACK is enabled

CPU capability usage: AVX2

CUDA Runtime 10.1

NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37

CuDNN 7.6.3

Magma 2.5.2

Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.6.0a0+82fd1c8 OpenCV: 4.5.4 MMCV: 1.4.1 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 10.1 MMTracking: 0.8.0+

2021-12-26 16:04:19,061 - mmtrack - INFO - Distributed training: True 2021-12-26 16:04:19,761 - mmtrack - INFO - Config: model = dict( detector=dict( type='FasterRCNN', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(3, ), strides=(1, 2, 2, 1), dilations=(1, 1, 1, 2), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch'), neck=dict( type='ChannelMapper', in_channels=[2048], out_channels=512, kernel_size=3), rpn_head=dict( type='RPNHead', in_channels=512, feat_channels=512, anchor_generator=dict( type='AnchorGenerator', scales=[4, 8, 16, 32], ratios=[0.5, 1.0, 2.0], strides=[16]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict( type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)), roi_head=dict( type='SelsaRoIHead', bbox_roi_extractor=dict( type='TemporalRoIAlign', roi_layer=dict( type='RoIAlign', output_size=7, sampling_ratio=2), out_channels=512, featmap_strides=[16], num_most_similar_points=2, num_temporal_attention_blocks=4), bbox_head=dict( type='SelsaBBoxHead', in_channels=512, fc_out_channels=1024, roi_feat_size=7, num_classes=30, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.2, 0.2, 0.2, 0.2]), reg_class_agnostic=False, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0), num_shared_fcs=3, aggregator=dict( type='SelsaAggregator', in_channels=1024, num_attention_blocks=16))), train_cfg=dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, debug=False), rpn_proposal=dict( nms_pre=6000, max_per_img=600, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False)), test_cfg=dict( rpn=dict( nms_pre=6000, max_per_img=300, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( score_thr=0.0001, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100))), type='SELSA') dataset_type = 'ImagenetVIDDataset' data_root = 'data/FALD_VID/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadMultiImagesFromFile'), dict(type='SeqLoadAnnotations', with_bbox=True, with_track=True), dict(type='SeqResize', img_scale=(1000, 600), keep_ratio=True), dict(type='SeqRandomFlip', share_params=True, flip_ratio=0.5), dict( type='SeqNormalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='SeqPad', size_divisor=16), dict( type='VideoCollect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_instance_ids']), dict(type='ConcatVideoReferences'), dict(type='SeqDefaultFormatBundle', ref_prefix='ref') ] test_pipeline = [ dict(type='LoadMultiImagesFromFile'), dict(type='SeqResize', img_scale=(1000, 600), keep_ratio=True), dict(type='SeqRandomFlip', share_params=True, flip_ratio=0.0), dict( type='SeqNormalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='SeqPad', size_divisor=16), dict( type='VideoCollect', keys=['img'], meta_keys=('num_left_ref_imgs', 'frame_stride')), dict(type='ConcatVideoReferences'), dict(type='MultiImagesToTensor', ref_prefix='ref'), dict(type='ToList') ] data = dict( samples_per_gpu=1, workers_per_gpu=2, train=dict( type='ImagenetVIDDataset', ann_file= 'data/FALD_VID/COCOVIDannotations/imagenet_vid_train_every10frames.json', img_prefix='data/FALD_VID/Data/VID', ref_img_sampler=dict( num_ref_imgs=2, frame_range=9, filter_key_img=False, method='bilateral_uniform'), pipeline=[ dict(type='LoadMultiImagesFromFile'), dict(type='SeqLoadAnnotations', with_bbox=True, with_track=True), dict(type='SeqResize', img_scale=(1000, 600), keep_ratio=True), dict(type='SeqRandomFlip', share_params=True, flip_ratio=0.5), dict( type='SeqNormalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='SeqPad', size_divisor=16), dict( type='VideoCollect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_instance_ids']), dict(type='ConcatVideoReferences'), dict(type='SeqDefaultFormatBundle', ref_prefix='ref') ]), val=dict( type='ImagenetVIDDataset', ann_file='data/FALD_VID/annotations/imagenet_vid_val.json', img_prefix='data/FALD_VID/Data/VID', ref_img_sampler=dict( num_ref_imgs=14, frame_range=[-7, 7], method='test_with_adaptive_stride'), pipeline=[ dict(type='LoadMultiImagesFromFile'), dict(type='SeqResize', img_scale=(1000, 600), keep_ratio=True), dict(type='SeqRandomFlip', share_params=True, flip_ratio=0.0), dict( type='SeqNormalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='SeqPad', size_divisor=16), dict( type='VideoCollect', keys=['img'], meta_keys=('num_left_ref_imgs', 'frame_stride')), dict(type='ConcatVideoReferences'), dict(type='MultiImagesToTensor', ref_prefix='ref'), dict(type='ToList') ], test_mode=True), test=dict( type='ImagenetVIDDataset', ann_file='data/FALD_VID/annotations/imagenet_vid_val.json', img_prefix='data/FALD_VID/Data/VID', ref_img_sampler=dict( num_ref_imgs=14, frame_range=[-7, 7], method='test_with_adaptive_stride'), pipeline=[ dict(type='LoadMultiImagesFromFile'), dict(type='SeqResize', img_scale=(1000, 600), keep_ratio=True), dict(type='SeqRandomFlip', share_params=True, flip_ratio=0.0), dict( type='SeqNormalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='SeqPad', size_divisor=16), dict( type='VideoCollect', keys=['img'], meta_keys=('num_left_ref_imgs', 'frame_stride')), dict(type='ConcatVideoReferences'), dict(type='MultiImagesToTensor', ref_prefix='ref'), dict(type='ToList') ], test_mode=True)) optimizer = dict(type='SGD', lr=0.005, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.3333333333333333, step=[2, 5]) total_epochs = 4 evaluation = dict(metric=['bbox'], interval=4) work_dir = './work_dirs/20211226_001_try3/' gpu_ids = range(0, 1)

2021-12-26 16:04:24,438 - mmtrack - INFO - Set random seed to 2034425034, deterministic: False 2021-12-26 16:04:25,201 - mmtrack - INFO - initialize ResNet with init_cfg [{'type': 'Kaiming', 'layer': 'Conv2d'}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}] 2021-12-26 16:04:25,466 - mmtrack - INFO - initialize Bottleneck with init_cfg {'type': 'Constant', 'val': 0, 'override': {'name': 'norm3'}} 2021-12-26 16:04:25,467 - mmtrack - INFO - initialize Bottleneck with init_cfg {'type': 'Constant', 'val': 0, 'override': {'name': 'norm3'}} 2021-12-26 16:04:25,468 - mmtrack - INFO - initialize Bottleneck with init_cfg {'type': 'Constant', 'val': 0, 'override': {'name': 'norm3'}} 2021-12-26 16:04:25,470 - mmtrack - INFO - initialize Bottleneck with init_cfg {'type': 'Constant', 'val': 0, 'override': {'name': 'norm3'}} 2021-12-26 16:04:25,471 - mmtrack - INFO - initialize Bottleneck with init_cfg {'type': 'Constant', 'val': 0, 'override': {'name': 'norm3'}} 2021-12-26 16:04:25,472 - mmtrack - INFO - initialize Bottleneck with init_cfg {'type': 'Constant', 'val': 0, 'override': {'name': 'norm3'}} 2021-12-26 16:04:25,473 - mmtrack - INFO - initialize Bottleneck with init_cfg {'type': 'Constant', 'val': 0, 'override': {'name': 'norm3'}} 2021-12-26 16:04:25,475 - mmtrack - INFO - initialize Bottleneck with init_cfg {'type': 'Constant', 'val': 0, 'override': {'name': 'norm3'}} 2021-12-26 16:04:25,477 - mmtrack - INFO - initialize Bottleneck with init_cfg {'type': 'Constant', 'val': 0, 'override': {'name': 'norm3'}} 2021-12-26 16:04:25,479 - mmtrack - INFO - initialize Bottleneck with init_cfg {'type': 'Constant', 'val': 0, 'override': {'name': 'norm3'}} 2021-12-26 16:04:25,481 - mmtrack - INFO - initialize Bottleneck with init_cfg {'type': 'Constant', 'val': 0, 'override': {'name': 'norm3'}} 2021-12-26 16:04:25,482 - mmtrack - INFO - initialize Bottleneck with init_cfg {'type': 'Constant', 'val': 0, 'override': {'name': 'norm3'}} 2021-12-26 16:04:25,484 - mmtrack - INFO - initialize Bottleneck with init_cfg {'type': 'Constant', 'val': 0, 'override': {'name': 'norm3'}} 2021-12-26 16:04:25,490 - mmtrack - INFO - initialize Bottleneck with init_cfg {'type': 'Constant', 'val': 0, 'override': {'name': 'norm3'}} 2021-12-26 16:04:25,496 - mmtrack - INFO - initialize Bottleneck with init_cfg {'type': 'Constant', 'val': 0, 'override': {'name': 'norm3'}} 2021-12-26 16:04:25,500 - mmtrack - INFO - initialize Bottleneck with init_cfg {'type': 'Constant', 'val': 0, 'override': {'name': 'norm3'}} 2021-12-26 16:04:25,523 - mmtrack - INFO - initialize ChannelMapper with init_cfg {'type': 'Xavier', 'layer': 'Conv2d', 'distribution': 'uniform'} 2021-12-26 16:04:25,583 - mmtrack - INFO - initialize RPNHead with init_cfg {'type': 'Normal', 'layer': 'Conv2d', 'std': 0.01} 2021-12-26 16:04:25,637 - mmtrack - INFO - initialize SelsaBBoxHead with init_cfg [{'type': 'Normal', 'std': 0.01, 'override': {'name': 'fc_cls'}}, {'type': 'Normal', 'std': 0.001, 'override': {'name': 'fc_reg'}}, {'type': 'Xavier', 'distribution': 'uniform', 'override': [{'name': 'shared_fcs'}, {'name': 'cls_fcs'}, {'name': 'reg_fcs'}]}] Name of parameter - Initialization information

detector.backbone.conv1.weight - torch.Size([64, 3, 7, 7]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.bn1.weight - torch.Size([64]): The value is the same before and after calling init_weights of SELSA

detector.backbone.bn1.bias - torch.Size([64]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer1.0.conv1.weight - torch.Size([64, 64, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer1.0.bn1.weight - torch.Size([64]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer1.0.bn1.bias - torch.Size([64]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer1.0.conv2.weight - torch.Size([64, 64, 3, 3]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer1.0.bn2.weight - torch.Size([64]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer1.0.bn2.bias - torch.Size([64]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer1.0.conv3.weight - torch.Size([256, 64, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer1.0.bn3.weight - torch.Size([256]): ConstantInit: val=0, bias=0

detector.backbone.layer1.0.bn3.bias - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer1.0.downsample.0.weight - torch.Size([256, 64, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer1.0.downsample.1.weight - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer1.0.downsample.1.bias - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer1.1.conv1.weight - torch.Size([64, 256, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer1.1.bn1.weight - torch.Size([64]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer1.1.bn1.bias - torch.Size([64]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer1.1.conv2.weight - torch.Size([64, 64, 3, 3]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer1.1.bn2.weight - torch.Size([64]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer1.1.bn2.bias - torch.Size([64]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer1.1.conv3.weight - torch.Size([256, 64, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer1.1.bn3.weight - torch.Size([256]): ConstantInit: val=0, bias=0

detector.backbone.layer1.1.bn3.bias - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer1.2.conv1.weight - torch.Size([64, 256, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer1.2.bn1.weight - torch.Size([64]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer1.2.bn1.bias - torch.Size([64]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer1.2.conv2.weight - torch.Size([64, 64, 3, 3]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer1.2.bn2.weight - torch.Size([64]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer1.2.bn2.bias - torch.Size([64]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer1.2.conv3.weight - torch.Size([256, 64, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer1.2.bn3.weight - torch.Size([256]): ConstantInit: val=0, bias=0

detector.backbone.layer1.2.bn3.bias - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.0.conv1.weight - torch.Size([128, 256, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer2.0.bn1.weight - torch.Size([128]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.0.bn1.bias - torch.Size([128]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.0.conv2.weight - torch.Size([128, 128, 3, 3]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer2.0.bn2.weight - torch.Size([128]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.0.bn2.bias - torch.Size([128]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.0.conv3.weight - torch.Size([512, 128, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer2.0.bn3.weight - torch.Size([512]): ConstantInit: val=0, bias=0

detector.backbone.layer2.0.bn3.bias - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.0.downsample.0.weight - torch.Size([512, 256, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer2.0.downsample.1.weight - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.0.downsample.1.bias - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.1.conv1.weight - torch.Size([128, 512, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer2.1.bn1.weight - torch.Size([128]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.1.bn1.bias - torch.Size([128]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.1.conv2.weight - torch.Size([128, 128, 3, 3]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer2.1.bn2.weight - torch.Size([128]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.1.bn2.bias - torch.Size([128]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.1.conv3.weight - torch.Size([512, 128, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer2.1.bn3.weight - torch.Size([512]): ConstantInit: val=0, bias=0

detector.backbone.layer2.1.bn3.bias - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.2.conv1.weight - torch.Size([128, 512, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer2.2.bn1.weight - torch.Size([128]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.2.bn1.bias - torch.Size([128]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.2.conv2.weight - torch.Size([128, 128, 3, 3]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer2.2.bn2.weight - torch.Size([128]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.2.bn2.bias - torch.Size([128]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.2.conv3.weight - torch.Size([512, 128, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer2.2.bn3.weight - torch.Size([512]): ConstantInit: val=0, bias=0

detector.backbone.layer2.2.bn3.bias - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.3.conv1.weight - torch.Size([128, 512, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer2.3.bn1.weight - torch.Size([128]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.3.bn1.bias - torch.Size([128]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.3.conv2.weight - torch.Size([128, 128, 3, 3]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer2.3.bn2.weight - torch.Size([128]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.3.bn2.bias - torch.Size([128]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer2.3.conv3.weight - torch.Size([512, 128, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer2.3.bn3.weight - torch.Size([512]): ConstantInit: val=0, bias=0

detector.backbone.layer2.3.bn3.bias - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.0.conv1.weight - torch.Size([256, 512, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer3.0.bn1.weight - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.0.bn1.bias - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.0.conv2.weight - torch.Size([256, 256, 3, 3]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer3.0.bn2.weight - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.0.bn2.bias - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.0.conv3.weight - torch.Size([1024, 256, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer3.0.bn3.weight - torch.Size([1024]): ConstantInit: val=0, bias=0

detector.backbone.layer3.0.bn3.bias - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.0.downsample.0.weight - torch.Size([1024, 512, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer3.0.downsample.1.weight - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.0.downsample.1.bias - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.1.conv1.weight - torch.Size([256, 1024, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer3.1.bn1.weight - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.1.bn1.bias - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.1.conv2.weight - torch.Size([256, 256, 3, 3]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer3.1.bn2.weight - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.1.bn2.bias - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.1.conv3.weight - torch.Size([1024, 256, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer3.1.bn3.weight - torch.Size([1024]): ConstantInit: val=0, bias=0

detector.backbone.layer3.1.bn3.bias - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.2.conv1.weight - torch.Size([256, 1024, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer3.2.bn1.weight - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.2.bn1.bias - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.2.conv2.weight - torch.Size([256, 256, 3, 3]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer3.2.bn2.weight - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.2.bn2.bias - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.2.conv3.weight - torch.Size([1024, 256, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer3.2.bn3.weight - torch.Size([1024]): ConstantInit: val=0, bias=0

detector.backbone.layer3.2.bn3.bias - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.3.conv1.weight - torch.Size([256, 1024, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer3.3.bn1.weight - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.3.bn1.bias - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.3.conv2.weight - torch.Size([256, 256, 3, 3]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer3.3.bn2.weight - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.3.bn2.bias - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.3.conv3.weight - torch.Size([1024, 256, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer3.3.bn3.weight - torch.Size([1024]): ConstantInit: val=0, bias=0

detector.backbone.layer3.3.bn3.bias - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.4.conv1.weight - torch.Size([256, 1024, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer3.4.bn1.weight - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.4.bn1.bias - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.4.conv2.weight - torch.Size([256, 256, 3, 3]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer3.4.bn2.weight - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.4.bn2.bias - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.4.conv3.weight - torch.Size([1024, 256, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer3.4.bn3.weight - torch.Size([1024]): ConstantInit: val=0, bias=0

detector.backbone.layer3.4.bn3.bias - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.5.conv1.weight - torch.Size([256, 1024, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer3.5.bn1.weight - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.5.bn1.bias - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.5.conv2.weight - torch.Size([256, 256, 3, 3]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer3.5.bn2.weight - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.5.bn2.bias - torch.Size([256]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer3.5.conv3.weight - torch.Size([1024, 256, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer3.5.bn3.weight - torch.Size([1024]): ConstantInit: val=0, bias=0

detector.backbone.layer3.5.bn3.bias - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer4.0.conv1.weight - torch.Size([512, 1024, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer4.0.bn1.weight - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer4.0.bn1.bias - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer4.0.conv2.weight - torch.Size([512, 512, 3, 3]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer4.0.bn2.weight - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer4.0.bn2.bias - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer4.0.conv3.weight - torch.Size([2048, 512, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer4.0.bn3.weight - torch.Size([2048]): ConstantInit: val=0, bias=0

detector.backbone.layer4.0.bn3.bias - torch.Size([2048]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer4.0.downsample.0.weight - torch.Size([2048, 1024, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer4.0.downsample.1.weight - torch.Size([2048]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer4.0.downsample.1.bias - torch.Size([2048]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer4.1.conv1.weight - torch.Size([512, 2048, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer4.1.bn1.weight - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer4.1.bn1.bias - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer4.1.conv2.weight - torch.Size([512, 512, 3, 3]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer4.1.bn2.weight - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer4.1.bn2.bias - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer4.1.conv3.weight - torch.Size([2048, 512, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer4.1.bn3.weight - torch.Size([2048]): ConstantInit: val=0, bias=0

detector.backbone.layer4.1.bn3.bias - torch.Size([2048]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer4.2.conv1.weight - torch.Size([512, 2048, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer4.2.bn1.weight - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer4.2.bn1.bias - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer4.2.conv2.weight - torch.Size([512, 512, 3, 3]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer4.2.bn2.weight - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer4.2.bn2.bias - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.backbone.layer4.2.conv3.weight - torch.Size([2048, 512, 1, 1]): KaimingInit: a=0, mode=fan_out, nonlinearity=relu, distribution =normal, bias=0

detector.backbone.layer4.2.bn3.weight - torch.Size([2048]): ConstantInit: val=0, bias=0

detector.backbone.layer4.2.bn3.bias - torch.Size([2048]): The value is the same before and after calling init_weights of SELSA

detector.neck.convs.0.conv.weight - torch.Size([512, 2048, 3, 3]): XavierInit: gain=1, distribution=uniform, bias=0

detector.neck.convs.0.conv.bias - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.rpn_head.rpn_conv.weight - torch.Size([512, 512, 3, 3]): NormalInit: mean=0, std=0.01, bias=0

detector.rpn_head.rpn_conv.bias - torch.Size([512]): NormalInit: mean=0, std=0.01, bias=0

detector.rpn_head.rpn_cls.weight - torch.Size([12, 512, 1, 1]): NormalInit: mean=0, std=0.01, bias=0

detector.rpn_head.rpn_cls.bias - torch.Size([12]): NormalInit: mean=0, std=0.01, bias=0

detector.rpn_head.rpn_reg.weight - torch.Size([48, 512, 1, 1]): NormalInit: mean=0, std=0.01, bias=0

detector.rpn_head.rpn_reg.bias - torch.Size([48]): NormalInit: mean=0, std=0.01, bias=0

detector.roi_head.bbox_roi_extractor.embed_network.conv.weight - torch.Size([512, 512, 3, 3]): Initialized by user-defined init_weights in ConvModule

detector.roi_head.bbox_roi_extractor.embed_network.conv.bias - torch.Size([512]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.fc_cls.weight - torch.Size([31, 1024]): NormalInit: mean=0, std=0.01, bias=0

detector.roi_head.bbox_head.fc_cls.bias - torch.Size([31]): NormalInit: mean=0, std=0.01, bias=0

detector.roi_head.bbox_head.fc_reg.weight - torch.Size([120, 1024]): NormalInit: mean=0, std=0.001, bias=0

detector.roi_head.bbox_head.fc_reg.bias - torch.Size([120]): NormalInit: mean=0, std=0.001, bias=0

detector.roi_head.bbox_head.shared_fcs.0.weight - torch.Size([1024, 25088]): XavierInit: gain=1, distribution=uniform, bias=0

detector.roi_head.bbox_head.shared_fcs.0.bias - torch.Size([1024]): XavierInit: gain=1, distribution=uniform, bias=0

detector.roi_head.bbox_head.shared_fcs.1.weight - torch.Size([1024, 1024]): XavierInit: gain=1, distribution=uniform, bias=0

detector.roi_head.bbox_head.shared_fcs.1.bias - torch.Size([1024]): XavierInit: gain=1, distribution=uniform, bias=0

detector.roi_head.bbox_head.shared_fcs.2.weight - torch.Size([1024, 1024]): XavierInit: gain=1, distribution=uniform, bias=0

detector.roi_head.bbox_head.shared_fcs.2.bias - torch.Size([1024]): XavierInit: gain=1, distribution=uniform, bias=0

detector.roi_head.bbox_head.aggregator.0.fc_embed.weight - torch.Size([1024, 1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.0.fc_embed.bias - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.0.ref_fc_embed.weight - torch.Size([1024, 1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.0.ref_fc_embed.bias - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.0.fc.weight - torch.Size([1024, 1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.0.fc.bias - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.0.ref_fc.weight - torch.Size([1024, 1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.0.ref_fc.bias - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.1.fc_embed.weight - torch.Size([1024, 1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.1.fc_embed.bias - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.1.ref_fc_embed.weight - torch.Size([1024, 1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.1.ref_fc_embed.bias - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.1.fc.weight - torch.Size([1024, 1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.1.fc.bias - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.1.ref_fc.weight - torch.Size([1024, 1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.1.ref_fc.bias - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.2.fc_embed.weight - torch.Size([1024, 1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.2.fc_embed.bias - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.2.ref_fc_embed.weight - torch.Size([1024, 1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.2.ref_fc_embed.bias - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.2.fc.weight - torch.Size([1024, 1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.2.fc.bias - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.2.ref_fc.weight - torch.Size([1024, 1024]): The value is the same before and after calling init_weights of SELSA

detector.roi_head.bbox_head.aggregator.2.ref_fc.bias - torch.Size([1024]): The value is the same before and after calling init_weights of SELSA
2021-12-26 16:04:28,460 - mmtrack - INFO - Start running, host: [email protected], work_dir: /data/yangjiahui/VIDProject/mmtracking/work_dirs/20211226_001_try3 2021-12-26 16:04:28,461 - mmtrack - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) CheckpointHook
(NORMAL ) DistEvalHook
(VERY_LOW ) TextLoggerHook

before_train_epoch: (VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) DistSamplerSeedHook
(NORMAL ) DistEvalHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook

before_train_iter: (VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) DistEvalHook
(LOW ) IterTimerHook

after_train_iter: (ABOVE_NORMAL) OptimizerHook
(NORMAL ) CheckpointHook
(NORMAL ) DistEvalHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook

after_train_epoch: (NORMAL ) CheckpointHook
(NORMAL ) DistEvalHook
(VERY_LOW ) TextLoggerHook

before_val_epoch: (NORMAL ) DistSamplerSeedHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook

before_val_iter: (LOW ) IterTimerHook

after_val_iter: (LOW ) IterTimerHook

after_val_epoch: (VERY_LOW ) TextLoggerHook

after_run: (VERY_LOW ) TextLoggerHook

2021-12-26 16:04:28,461 - mmtrack - INFO - workflow: [('train', 1)], max: 4 epochs 2021-12-26 16:04:28,461 - mmtrack - INFO - Checkpoints will be saved to /data/yangjiahui/VIDProject/mmtracking/work_dirs/20211226_001_try3 by HardDiskBackend. 2021-12-26 16:05:00,501 - mmtrack - INFO - Saving checkpoint at 1 epochs 2021-12-26 16:05:32,658 - mmtrack - INFO - Saving checkpoint at 2 epochs 2021-12-26 16:06:04,769 - mmtrack - INFO - Saving checkpoint at 3 epochs 2021-12-26 16:06:37,068 - mmtrack - INFO - Saving checkpoint at 4 epochs
opened by FarranYang 9
Many errors when training ReID of Tractor model on MOT17.
I ran successfully the official tractor repo, but I cannot run this repo. Same command (default training command of Reid model), but different errors on different days.

Checklist

I have searched related issues but cannot get the expected help.

The bug has not been fixed in the latest version.

Describe the bug A clear and concise description of what the bug is.

Reproduction

What command or script did you run?

A placeholder for the command.

Did you make any modifications on the code or config? Did you understand what you have modified?

What dataset did you use and what task did you run?

Environment

Please run python mmtrack/utils/collect_env.py to collect necessary environment information and paste it here.

You may add addition that may be helpful for locating the problem, such as

How you installed PyTorch [e.g., pip, conda, source]

Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback If applicable, paste the error trackback here.

A placeholder for trackback.

Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!
opened by sjtuytc 9
Person id changes when it's viewed from different camera using traktor

I was testing the traktor model on some videos and I found that when the camera changes,the id assigned to the object being tracked also changes. For example if we look at the last few seconds of the below inference video(obtained by running traktor code),it seems that the runners are now assigned a new ID. What could be the possible reason,is it because the same person is being viewed from a different camera angle.Or could it be that I need to retrain the re-id model dataset used was MOT 20,configuration was tracktor_faster-rcnn_r50_fpn_8e_mot20-public-half.

Input video https://drive.google.com/file/d/1IVxcL3a5jUH3huJuyVzgDepIpBE62H3F/view?usp=sharing inference video https://drive.google.com/file/d/1Rcl3nrdTQznyPO4GQLm7_juYzsSZtLK4/view?usp=sharing

opened by sparshgarg23 8

ReID training

Thanks for your error report and we appreciate it a lot.

Checklist

I have searched related issues but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug A clear and concise description of what the bug is.

Reproduction

What command or script did you run?

python3 ./tools/train.py configs/reid/resnet50_b32x8_MOT17.py --work-dir work_dirs/resnet50_b32x8_MOT17

I did not make any modification on the code except dataset path
Im running ReID training on MOT dataset

Environment

Please run python mmtrack/utils/collect_env.py to collect necessary environment information and paste it here. sys.platform: linux Python: 3.8.11 (default, Jul 3 2021, 17:53:42) [GCC 7.5.0] CUDA available: True GPU 0: TITAN Xp CUDA_HOME: None GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 PyTorch: 1.7.1+cu101 PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75
CuDNN 7.6.3
Magma 2.5.2
Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.8.2+cu101 OpenCV: 4.5.3 MMCV: 1.3.11 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 10.1 MMTracking: 0.6.0+4d78b77

You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
- Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback If applicable, paste the error trackback here.

sys.platform: linux
Python: 3.8.11 (default, Jul  3 2021, 17:53:42) [GCC 7.5.0]
CUDA available: True
GPU 0: TITAN Xp
CUDA_HOME: None
GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
PyTorch: 1.7.1+cu101
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75
  - CuDNN 7.6.3
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.8.2+cu101
OpenCV: 4.5.3
MMCV: 1.3.11
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
MMTracking: 0.6.0+4d78b77
------------------------------------------------------------

2021-08-17 11:24:25,348 - mmtrack - INFO - Distributed training: False
2021-08-17 11:24:26,303 - mmtrack - INFO - Config:
dataset_type = 'ReIDDataset'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadMultiImagesFromFile', to_float32=True),
    dict(
        type='SeqResize',
        img_scale=(128, 256),
        share_params=False,
        keep_ratio=False,
        bbox_clip_border=False,
        override=False),
    dict(
        type='SeqRandomFlip',
        share_params=False,
        flip_ratio=0.5,
        direction='horizontal'),
    dict(
        type='SeqNormalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='VideoCollect', keys=['img', 'gt_label']),
    dict(type='ReIDFormatBundle')
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='Resize', img_scale=(128, 256), keep_ratio=False),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='Collect', keys=['img'], meta_keys=[])
]
data_root = '/projects/datasets/MOT/MOT17/'
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type='ReIDDataset',
        triplet_sampler=dict(num_ids=8, ins_per_id=4),
        data_prefix='/projects/datasets/MOT/MOT17/reid/imgs',
        ann_file='/projects/datasets/MOT/MOT17/reid/meta/train_80.txt',
        pipeline=[
            dict(type='LoadMultiImagesFromFile', to_float32=True),
            dict(
                type='SeqResize',
                img_scale=(128, 256),
                share_params=False,
                keep_ratio=False,
                bbox_clip_border=False,
                override=False),
            dict(
                type='SeqRandomFlip',
                share_params=False,
                flip_ratio=0.5,
                direction='horizontal'),
            dict(
                type='SeqNormalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='VideoCollect', keys=['img', 'gt_label']),
            dict(type='ReIDFormatBundle')
        ]),
    val=dict(
        type='ReIDDataset',
        triplet_sampler=None,
        data_prefix='/projects/datasets/MOT/MOT17/reid/imgs',
        ann_file='/projects/datasets/MOT/MOT17/reid/meta/val_20.txt',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='Resize', img_scale=(128, 256), keep_ratio=False),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'], meta_keys=[])
        ]),
    test=dict(
        type='ReIDDataset',
        triplet_sampler=None,
        data_prefix='/projects/datasets/MOT/MOT17/reid/imgs',
        ann_file='/projects/datasets/MOT/MOT17/reid/meta/val_20.txt',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='Resize', img_scale=(128, 256), keep_ratio=False),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'], meta_keys=[])
        ]))
evaluation = dict(interval=1, metric='mAP')
optimizer = dict(type='SGD', lr=0.0025, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
checkpoint_config = dict(interval=1)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
USE_MMCLS = True
model = dict(
    type='BaseReID',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(3, ),
        style='pytorch'),
    neck=dict(type='GlobalAveragePooling', kernel_size=(8, 4), stride=1),
    head=dict(
        type='LinearReIDHead',
        num_fcs=1,
        in_channels=2048,
        fc_channels=1024,
        out_channels=128,
        num_classes=378,
        loss=dict(type='CrossEntropyLoss', loss_weight=1.0),
        loss_pairwise=dict(type='TripletLoss', margin=0.3, loss_weight=1.0),
        norm_cfg=dict(type='BN1d'),
        act_cfg=dict(type='ReLU')),
    init_cfg=dict(
        type='Pretrained',
        checkpoint=
        'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_batch256_imagenet_20200708-cfb998bf.pth'
    ))
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[5])
total_epochs = 6
work_dir = 'work_dirs/resnet50_b32x8_MOT17'
gpu_ids = range(0, 1)

2021-08-17 11:24:26,638 - mmtrack - INFO - initialize BaseReID with init_cfg {'type': 'Pretrained', 'checkpoint': 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_batch256_imagenet_20200708-cfb998bf.pth'}
2021-08-17 11:24:26,638 - mmcv - INFO - load model from: https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_batch256_imagenet_20200708-cfb998bf.pth
2021-08-17 11:24:26,638 - mmcv - INFO - Use load_from_http loader
2021-08-17 11:24:26,844 - mmcv - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: head.fc.weight, head.fc.bias

missing keys in source state_dict: head.fcs.0.fc.weight, head.fcs.0.fc.bias, head.fcs.0.bn.weight, head.fcs.0.bn.bias, head.fcs.0.bn.running_mean, head.fcs.0.bn.running_var, head.fc_out.weight, head.fc_out.bias, head.bn.weight, head.bn.bias, head.bn.running_mean, head.bn.running_var, head.classifier.weight, head.classifier.bias

2021-08-17 11:24:33,803 - mmtrack - INFO - Start running, host: [email protected], work_dir: /home2/qljx17/Open-MMLab/mmtracking/work_dirs/resnet50_b32x8_MOT17
2021-08-17 11:24:33,803 - mmtrack - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) StepLrUpdaterHook                  
(NORMAL      ) CheckpointHook                     
(NORMAL      ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) StepLrUpdaterHook                  
(NORMAL      ) EvalHook                           
(LOW         ) IterTimerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_train_iter:
(VERY_HIGH   ) StepLrUpdaterHook                  
(NORMAL      ) EvalHook                           
(LOW         ) IterTimerHook                      
 -------------------- 
after_train_iter:
(ABOVE_NORMAL) OptimizerHook                      
(NORMAL      ) CheckpointHook                     
(NORMAL      ) EvalHook                           
(LOW         ) IterTimerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
after_train_epoch:
(NORMAL      ) CheckpointHook                     
(NORMAL      ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_val_epoch:
(LOW         ) IterTimerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_epoch:
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
2021-08-17 11:24:33,803 - mmtrack - INFO - workflow: [('train', 1)], max: 6 epochs
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:59: ClassNLLCriterion_updateOutput_no_reduce_kernel: block: [0,0,0], thread: [44,0,0] Assertion `cur_target >= 0 && cur_target < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:59: ClassNLLCriterion_updateOutput_no_reduce_kernel: block: [0,0,0], thread: [45,0,0] Assertion `cur_target >= 0 && cur_target < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:59: ClassNLLCriterion_updateOutput_no_reduce_kernel: block: [0,0,0], thread: [46,0,0] Assertion `cur_target >= 0 && cur_target < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:59: ClassNLLCriterion_updateOutput_no_reduce_kernel: block: [0,0,0], thread: [47,0,0] Assertion `cur_target >= 0 && cur_target < n_classes` failed.
Traceback (most recent call last):
  File "./tools/train.py", line 174, in <module>
    main()
  File "./tools/train.py", line 163, in main
    train_model(
  File "/home2/qljx17/Open-MMLab/mmtracking/mmtrack/apis/train.py", line 136, in train_model
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home2/qljx17/Open-MMLab/evenv/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home2/qljx17/Open-MMLab/evenv/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "/home2/qljx17/Open-MMLab/evenv/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 29, in run_iter
    outputs = self.model.train_step(data_batch, self.optimizer,
  File "/home2/qljx17/Open-MMLab/evenv/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/home2/qljx17/Open-MMLab/mmclassification/mmcls/models/classifiers/base.py", line 146, in train_step
    loss, log_vars = self._parse_losses(losses)
  File "/home2/qljx17/Open-MMLab/mmclassification/mmcls/models/classifiers/base.py", line 97, in _parse_losses
    log_vars[loss_name] = loss_value.mean()
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: device-side assert triggered
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fc1479138b2 in /home2/qljx17/Open-MMLab/evenv/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7fc147b65952 in /home2/qljx17/Open-MMLab/evenv/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fc1478feb7d in /home2/qljx17/Open-MMLab/evenv/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x5fd7a2 (0x7fc1920fb7a2 in /home2/qljx17/Open-MMLab/evenv/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x5fd856 (0x7fc1920fb856 in /home2/qljx17/Open-MMLab/evenv/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: python3() [0x534ce6]
frame #6: python3() [0x51c5d9]
frame #7: python3() [0x52cb15]
frame #8: python3() [0x52cb15]
frame #9: python3() [0x500a2e]
frame #10: python3() [0x57d905]
frame #11: python3() [0x57d8bb]
frame #12: python3() [0x57d8bb]
frame #13: python3() [0x57d8bb]
frame #14: python3() [0x57d8bb]
frame #15: python3() [0x57d8bb]
frame #16: python3() [0x57d8bb]
frame #17: python3() [0x5f25e6]
<omitting python frames>
frame #23: __libc_start_main + 0xf3 (0x7fc1a2ef10b3 in /lib/x86_64-linux-gnu/libc.so.6)

/var/spool/slurmd/job128755/slurm_script: line 21: 3941330 Aborted                 (core dumped) python3 ./tools/train.py configs/reid/resnet50_b32x8_MOT17.py --work-dir work_dirs/resnet50_b32x8_MOT17
^Z

Bug fix From the error above, I can assume that its because of the number of classes. From the default config, num of class is being set as 378, which is taken from train_80.txt, hence the error appear. However, when I set the num of class as 512, which is the number of samples in imgs folder, Im able to run the training without any error. Is there something that I missed, or the number of classes could be the main problem here?

opened by yonafalinie 8

What is the difference between load_from and pretrain?

Hello~ Thanks a lot for your awesome job and I appreciate for your effort! However, I have some problems hoping you to help me solve it. When I use the default configure at configs/det/faster-rcnn_r50_fpn_4e_mot17-half.py to train faster-rcnn detector by MMtracking, I got Nan losses. But when I change the downloaded state dict, which is pretrained faster-rcnn on COCO dataset, from 'load_from' entry to 'pretrain' entry of detector, the Nan losses disappears. I wonder how this happen? What's the difference between 'load_from' and 'pretrain', since both of them seem not to strictly load parameters? Thanks a lot again!

I check again, finding that the 'pretrain' entry for detection seems NOT load pretrain dicts as I expected, and directly train from randomly initiated parameters. So how to use the pretrained faster rcnn model dicts anyway?

opened by gsygsy96 8
Problem met when running MOT demo
Hi, I met a problem when I run MOT demo. It said that "IndexError: tensors used as indices must be long, byte or bool tensors /cluster/home/it_stu12/.conda/envs/gjj/lib/python3.7/tempfile.py:798: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpd1jtqm1n'> _warnings.warn(warn_message, ResourceWarning)". Here's my error log.

Error Log

Traceback (most recent call last): File "demo/demo_mot.py", line 94, in main() File "demo/demo_mot.py", line 70, in main result = inference_mot(model, img, frame_id=i) File "/cluster/home/it_stu12/main/gjj/mmtracking/mmtrack/apis/inference.py", line 81, in inference_mot data = collate([data], samples_per_gpu=1) File "/cluster/home/it_stu12/main/gjj/mmcv/mmcv/parallel/collate.py", line 81, in collate for key in batch[0] File "/cluster/home/it_stu12/main/gjj/mmcv/mmcv/parallel/collate.py", line 81, in for key in batch[0] File "/cluster/home/it_stu12/main/gjj/mmcv/mmcv/parallel/collate.py", line 77, in collate return [collate(samples, samples_per_gpu) for samples in transposed] File "/cluster/home/it_stu12/main/gjj/mmcv/mmcv/parallel/collate.py", line 77, in return [collate(samples, samples_per_gpu) for samples in transposed] File "/cluster/home/it_stu12/main/gjj/mmcv/mmcv/parallel/collate.py", line 81, in collate for key in batch[0] File "/cluster/home/it_stu12/main/gjj/mmcv/mmcv/parallel/collate.py", line 81, in for key in batch[0] File "/cluster/home/it_stu12/main/gjj/mmcv/mmcv/parallel/collate.py", line 80, in key: collate([d[key] for d in batch], samples_per_gpu) IndexError: tensors used as indices must be long, byte or bool tensors /cluster/home/it_stu12/.conda/envs/gjj/lib/python3.7/tempfile.py:798: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpd1jtqm1n'> _warnings.warn(warn_message, ResourceWarning)

Environment

No CUDA runtime is found, using CUDA_HOME='/cluster/apps/cuda/10.1' sys.platform: linux Python: 3.7.10 (default, Jun 4 2021, 14:48:32) [GCC 7.5.0] CUDA available: False GCC: gcc (GCC) 5.4.0 PyTorch: 1.6.0 PyTorch compiling details: PyTorch built with:

GCC 7.3

C++ Version: 201402

Intel(R) Math Kernel Library Version 2019.0.5 Product Build 20190808 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)

OpenMP 201511 (a.k.a. OpenMP 4.5)

NNPACK is enabled

CPU capability usage: AVX2

Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.7.0 OpenCV: 4.1.0 MMCV: 1.4.2 MMCV Compiler: GCC 5.4 MMCV CUDA Compiler: not available MMTracking: 0.8.0+603d6fe

Some Other Problems

The doc of MMTracking 0.8 says that MMCV version should be mmcv-full>=1.3.8, <1.4.0. But when I install mmcv-full 1.3.9, it told me that my "mmcv-full is too old, please install mmcv-fulll >=1.3.16, <=1.5.0". Which one should I believe?

The Chinese version doc of MMTracking 0.8 gives a demo script python demo/demo_mot.py configs/mot/deepsort/sort_faster-rcnn_fpn_4e_mot17-private.py --input demo/demo.mp4 --output mot.mp4. But I didn't find demo_mot.py in folder demo but found demo_mot_vis.py. Maybe the Chinese doc should be updated?

Thank you so much!
opened by AndrewGuo0930 7
time estimation log export

I found this library very helpful. Great work. I have to ask, Is it possible to keep export of the log while the video (time + weight of the tracking object) is running either live camera or playing recorded video? Please guide me if how to export that log using this library.

opened by Tortoise17 7
pickle file has only det_bboxes

Hello, I have tested on my custom dataset for VID and saved the results to a .pkl file. However, the pickle file seems to have only the det_bboxes and not the det_labels . Is there any way to add det_labels too? Any tips would be helpful!

opened by godwinrayanc 6
How to select classes of which outputs from Detector model to be fed into reid model ?

I have trained the detector model in mmdetection with multiple classes , if i want to fed the "person" class outputs alone from the detector model to the reid model during inference , can i do that using config or any other method ?

And also if i have to fed the mmdetection pretrained model into tracker , what are the config changes have to be done ?

Thank you in advance

opened by Balakumaran-kandula 6

I want to train the masktrackrcnn, but it occur :KeyError: "YouTubeVISDataset: 'image_id'"

Hello! I want to train the masktrackrcnn by the official youtube_vis_dataset but it occur :KeyError: "YouTubeVISDataset: 'image_id'". Here is my datatree

Traceback (most recent call last):
  File "/home/music/miniconda3/envs/mmtrack/lib/python3.8/site-packages/mmcv/utils/registry.py", line 69, in build_from_cfg
    return obj_cls(**args)
  File "/home/music/Downloads/mmtracking-master/mmtrack/datasets/youtube_vis_dataset.py", line 44, in __init__
    super().__init__(*args, **kwargs)
  File "/home/music/Downloads/mmtracking-master/mmtrack/datasets/coco_video_dataset.py", line 46, in __init__
    super().__init__(*args, **kwargs)
  File "/home/music/miniconda3/envs/mmtrack/lib/python3.8/site-packages/mmdet/datasets/custom.py", line 97, in __init__
    self.data_infos = self.load_annotations(local_path)
  File "/home/music/Downloads/mmtracking-master/mmtrack/datasets/coco_video_dataset.py", line 61, in load_annotations
    data_infos = self.load_video_anns(ann_file)
  File "/home/music/Downloads/mmtracking-master/mmtrack/datasets/coco_video_dataset.py", line 73, in load_video_anns
    self.coco = CocoVID(ann_file)
  File "/home/music/Downloads/mmtracking-master/mmtrack/datasets/parsers/coco_video_parser.py", line 22, in __init__
    super(CocoVID, self).__init__(annotation_file=annotation_file)
  File "/home/music/miniconda3/envs/mmtrack/lib/python3.8/site-packages/mmdet/datasets/api_wrappers/coco_api.py", line 23, in __init__
    super().__init__(annotation_file=annotation_file)
  File "/home/music/miniconda3/envs/mmtrack/lib/python3.8/site-packages/pycocotools/coco.py", line 86, in __init__
    self.createIndex()
  File "/home/music/Downloads/mmtracking-master/mmtrack/datasets/parsers/coco_video_parser.py", line 57, in createIndex
    imgToAnns[ann['image_id']].append(ann)
KeyError: 'image_id'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tools/train.py", line 213, in <module>
    main()
  File "tools/train.py", line 188, in main
    datasets = [build_dataset(cfg.data.train)]
  File "/home/music/miniconda3/envs/mmtrack/lib/python3.8/site-packages/mmdet/datasets/builder.py", line 82, in build_dataset
    dataset = build_from_cfg(cfg, DATASETS, default_args)
  File "/home/music/miniconda3/envs/mmtrack/lib/python3.8/site-packages/mmcv/utils/registry.py", line 72, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
KeyError: "YouTubeVISDataset: 'image_id'"

Thank you!

opened by eatbreakfast111 2

IndexError: boolean index did not match indexed array along dimension 0; dimension is 10 but corresponding boolean dimension is 9

Thanks for your error report and we appreciate it a lot.

Checklist

I have searched related issues but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug I was trying to run qdtrack model for MOT17 in dev-1.x, but it always got this error.

Reproduction

What command or script did you run?

srun -p bigdata_s2 --quotatype=auto --gres=gpu:1 python tools/train.py configs/mot/qdtrack/qdtrack_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py

Did you make any modifications on the code or config? Did you understand what you have modified? No
What dataset did you use and what task did you run? MOT17

Environment

Please run python mmtrack/utils/collect_env.py to collect necessary environment information and paste it here. Got error:

Traceback (most recent call last):
  File "mmtrack/utils/collect_env.py", line 2, in <module>
    from mmcv.utils import collect_env as collect_base_env
  File "/mnt/lustre/ouyanglinke/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/__init__.py", line 3, in <module>
    from .arraymisc import *
  File "/mnt/lustre/ouyanglinke/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/arraymisc/__init__.py", line 2, in <module>
    from .quantization import dequantize, quantize
  File "/mnt/lustre/ouyanglinke/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/arraymisc/quantization.py", line 2, in <module>
    from typing import Union
  File "/mnt/petrelfs/ouyanglinke/mmtracking/mmtrack/utils/typing.py", line 3, in <module>
    from typing import Dict, List, Optional, Tuple, Union
ImportError: cannot import name 'Dict' from partially initialized module 'typing' (most likely due to a circular import) (/mnt/petrelfs/ouyanglinke/mmtracking/mmtrack/utils/typing.py)

But I do successfully run SOT model. Python version is 3.8. Pytorch is 1.7.1.

You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
- Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback If applicable, paste the error trackback here.

Traceback (most recent call last):
  File "tools/train.py", line 119, in <module>
    main()
  File "tools/train.py", line 115, in main
    runner.train()
  File "/mnt/lustre/ouyanglinke/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1684, in train
    model = self.train_loop.run()  # type: ignore
  File "/mnt/lustre/ouyanglinke/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 90, in run
    self.run_epoch()
  File "/mnt/lustre/ouyanglinke/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 105, in run_epoch
    for idx, data_batch in enumerate(self.dataloader):
  File "/mnt/lustre/ouyanglinke/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
    data = self._next_data()
  File "/mnt/lustre/ouyanglinke/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
    return self._process_data(data)
  File "/mnt/lustre/ouyanglinke/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
    data.reraise()
  File "/mnt/lustre/ouyanglinke/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
    raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/mnt/lustre/ouyanglinke/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
    data = fetcher.fetch(index)
  File "/mnt/lustre/ouyanglinke/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/mnt/lustre/ouyanglinke/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/mnt/lustre/ouyanglinke/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/dataset/base_dataset.py", line 378, in __getitem__
    data = self.prepare_data(idx)
  File "/mnt/petrelfs/ouyanglinke/mmtracking/mmtrack/datasets/base_video_dataset.py", line 387, in prepare_data
    return self.pipeline(final_data_info)
  File "/mnt/lustre/ouyanglinke/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/dataset/base_dataset.py", line 55, in __call__
    data = t(data)
  File "/mnt/lustre/ouyanglinke/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/transforms/base.py", line 12, in __call__
    return self.transform(results)
  File "/mnt/petrelfs/ouyanglinke/mmtracking/mmtrack/datasets/transforms/formatting.py", line 237, in transform
    key_anns[key_valid_idx])
IndexError: boolean index did not match indexed array along dimension 0; dimension is 10 but corresponding boolean dimension is 9

Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated! This message might help: In mmtracking/configs/base/datasets/mot_challenge.py, just block "TransformBroadcaster" would work.

# data pipeline
train_pipeline = [
    dict(
        type='TransformBroadcaster',
        share_random_params=True,
        transforms=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadTrackAnnotations', with_instance_id=True),
            dict(
                type='mmdet.RandomResize',
                scale=(1088, 1088),
                ratio_range=(0.8, 1.2),
                keep_ratio=True,
                clip_object_border=False),
            dict(type='mmdet.PhotoMetricDistortion')
        ]),
    # dict(
    #     type='TransformBroadcaster',
    #     share_random_params=False,
    #     transforms=[
    #         dict(
    #             type='mmdet.RandomCrop',
    #             crop_size=(1088, 1088),
    #             bbox_clip_border=False)
    #     ]),
    dict(
        type='TransformBroadcaster',
        share_random_params=True,
        transforms=[
            dict(type='mmdet.RandomFlip', prob=0.5),
        ]),
    dict(type='PackTrackInputs', ref_prefix='ref', num_key_frames=1)
]

opened by ouyanglinke 1

TypeError: forward_train() missing 4 required positional arguments: 'ref_img', 'ref_img_metas', 'ref_gt_bboxes', and 'ref_gt_labels'

Hello, I want to train the masktrack_rcnn with the coco dataset. So i had reset the dataset of masktrack_rcnn_r50_fpn_12e_youtubevis2019.py------'../../_base_/datasets/coco_instance.py' and the num_classes=6.

By the way, i had reset the /home/music/Downloads/mmtracking-master/mmtrack/datasets/coco_video_dataset.py-------CLASSES = ('aircraft', 'buildings', 'electrical', 'person', 'tree', 'wire') and theload_as_video=False

And my env:

------------------------------------------------------------
sys.platform: linux
Python: 3.8.15 (default, Nov 24 2022, 15:19:38) [GCC 11.2.0]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 3090 Ti
CUDA_HOME: None
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 1.12.1
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.3.2  (built against CUDA 11.5)
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.13.1
OpenCV: 4.6.0
MMCV: 1.7.0
MMCV Compiler: GCC 9.3
MMCV CUDA Compiler: 11.3
MMTracking: 0.14.0+

This my config:

2022-12-16 10:09:31,585 - mmtrack - INFO - Distributed training: False
2022-12-16 10:09:32,054 - mmtrack - INFO - Config:
model = dict(
    detector=dict(
        type='MaskRCNN',
        backbone=dict(
            type='ResNet',
            depth=50,
            num_stages=4,
            out_indices=(0, 1, 2, 3),
            frozen_stages=1,
            norm_cfg=dict(type='BN', requires_grad=True),
            norm_eval=True,
            style='pytorch',
            init_cfg=dict(
                type='Pretrained', checkpoint='torchvision://resnet50')),
        neck=dict(
            type='FPN',
            in_channels=[256, 512, 1024, 2048],
            out_channels=256,
            num_outs=5),
        rpn_head=dict(
            type='RPNHead',
            in_channels=256,
            feat_channels=256,
            anchor_generator=dict(
                type='AnchorGenerator',
                scales=[8],
                ratios=[0.5, 1.0, 2.0],
                strides=[4, 8, 16, 32, 64]),
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0.0, 0.0, 0.0, 0.0],
                target_stds=[1.0, 1.0, 1.0, 1.0]),
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
            loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
        roi_head=dict(
            type='StandardRoIHead',
            bbox_roi_extractor=dict(
                type='SingleRoIExtractor',
                roi_layer=dict(
                    type='RoIAlign', output_size=7, sampling_ratio=0),
                out_channels=256,
                featmap_strides=[4, 8, 16, 32]),
            bbox_head=dict(
                type='Shared2FCBBoxHead',
                in_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=6,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.1, 0.1, 0.2, 0.2]),
                reg_class_agnostic=False,
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
            mask_roi_extractor=dict(
                type='SingleRoIExtractor',
                roi_layer=dict(
                    type='RoIAlign', output_size=14, sampling_ratio=0),
                out_channels=256,
                featmap_strides=[4, 8, 16, 32]),
            mask_head=dict(
                type='FCNMaskHead',
                num_convs=4,
                in_channels=256,
                conv_out_channels=256,
                num_classes=6,
                loss_mask=dict(
                    type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
        train_cfg=dict(
            rpn=dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.7,
                    neg_iou_thr=0.3,
                    min_pos_iou=0.3,
                    match_low_quality=True,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=64,
                    pos_fraction=0.5,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=False),
                allowed_border=-1,
                pos_weight=-1,
                debug=False),
            rpn_proposal=dict(
                nms_pre=200,
                max_per_img=200,
                nms=dict(type='nms', iou_threshold=0.7),
                min_bbox_size=0),
            rcnn=dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.5,
                    neg_iou_thr=0.5,
                    min_pos_iou=0.5,
                    match_low_quality=True,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=128,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False)),
        test_cfg=dict(
            rpn=dict(
                nms_pre=200,
                max_per_img=200,
                nms=dict(type='nms', iou_threshold=0.7),
                min_bbox_size=0),
            rcnn=dict(
                score_thr=0.01,
                nms=dict(type='nms', iou_threshold=0.5),
                max_per_img=100,
                mask_thr_binary=0.5)),
        init_cfg=dict(
            type='Pretrained',
            checkpoint=
            'https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_fpn_1x_coco/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth'
        )),
    type='MaskTrackRCNN',
    track_head=dict(
        type='RoITrackHead',
        roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        embed_head=dict(
            type='RoIEmbedHead',
            num_fcs=2,
            roi_feat_size=7,
            in_channels=256,
            fc_out_channels=1024),
        train_cfg=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=128,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False)),
    tracker=dict(
        type='MaskTrackRCNNTracker',
        match_weights=dict(det_score=1.0, iou=2.0, det_label=10.0),
        num_frames_retain=20))
dataset_type = 'CocoDataset'
data_root = '/home/music/Downloads/mmtracking-master/data/coco/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=6,
    workers_per_gpu=2,
    train=dict(
        type='CocoDataset',
        ann_file=
        '/home/music/Downloads/mmtracking-master/data/coco/annotations/train.json',
        img_prefix=
        '/home/music/Downloads/mmtracking-master/data/coco/train2023/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
            dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(
                type='Collect',
                keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
        ]),
    val=dict(
        type='CocoDataset',
        ann_file=
        '/home/music/Downloads/mmtracking-master/data/coco/annotations/val.json',
        img_prefix='/home/music/Downloads/mmtracking-master/data/coco/val2023/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='CocoDataset',
        ann_file=
        '/home/music/Downloads/mmtracking-master/data/coco/annotations/val.json',
        img_prefix='/home/music/Downloads/mmtracking-master/data/coco/val2023/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
evaluation = dict(metric=['bbox', 'segm'], classwise=True)
optimizer = dict(type='SGD', lr=0.00125, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
checkpoint_config = dict(interval=1)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
opencv_num_threads = 0
mp_start_method = 'fork'
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.3333333333333333,
    step=[8, 11])
total_epochs = 12
work_dir = 'work_dir/masktrack_coco'
gpu_ids = [0]

Best wish! Thank you!

opened by lijoe123 7

Releases(v1.0.0rc1)

v1.0.0rc1(Oct 11, 2022)

MMTracking 1.0.0rc1 is the 2-nd version of MMTracking 1.x, a part of the OpenMMLab 2.0 projects.

Built upon the new training engine, MMTracking 1.x unifies the interfaces of datasets, models, evaluation, and visualization.

And there are some BC-breaking changes. Please check the migration tutorial for more details.

We also support more methods in MMTracking 1.x, such as StrongSORT for MOT, Mask2Former for VIS, PrDiMP for SOT.
Source code(tar.gz)
Source code(zip)
v0.14.0(Sep 19, 2022)
Highlights

Introduce the 1.0.0rc0 version of MMTracking (#725)

New Features

Support OC-SORT method for MOT (#545)

Support multi-class tracking in ByteTrack (#548)

Support DanceTrack dataset for MOT (#543)

Support TAO dataset for QDTrack (#585)

Source code(tar.gz)
Source code(zip)
v1.0.0rc0(Aug 31, 2022)

We recommend you to use MMTracking v1.0.0rc1 version, since the v1.0.0rc0 version has some bugs about the limitation of minimum version of mmdet.
Source code(tar.gz)
Source code(zip)
v0.13.0(Apr 29, 2022)
Highlights

Support tracking colab tutorial (#511)

New Features

Refactor the training datasets of SiamRPN++ (#496), (#518)

Support loading data from ceph for SOT datasets (#494)

Support loading data from ceph for MOT challenge dataset (#517)

Support evaluation metric for VIS task (#501)

Bug Fixes

Fix a bug in the LaSOT datasets and update the pretrained models of STARK (#483), (#503)

Fix a bug in the format_results function of VIS task (#504)

Source code(tar.gz)
Source code(zip)
v0.12.0(Apr 1, 2022)
Highlights

Support QDTrack algorithm in MOT (#433), (#451), (#461), (#469)

Bug Fixes

Support empty tensor for selsa aggregator (#463)

Source code(tar.gz)
Source code(zip)
v0.11.0(Mar 4, 2022)
Highlights

Support STARK algorithm in SOT (#443), (#440), (#434), (#438), (#435), (#426)

Support HOTA evaluation metrics for MOT (#417)

New Features

Support TAO dataset in MOT (#415)

Source code(tar.gz)
Source code(zip)
v0.10.0(Feb 10, 2022)
New Features

Support CPU training (#404)

Improvements

Refactor SOT datasets (#401), (#402), (#393)

Source code(tar.gz)
Source code(zip)
v0.9.0(Jan 6, 2022)
Highlights

Support arXiv 2021 manuscript 'ByteTrack: Multi-Object Tracking by Associating Every Detection Box' (#385), (#383), (#372)

Support ICCV 2019 paper 'Video Instance Segmentation' (#304), (#303), (#298), (#292)

New Features

Support CrowdHuman dataset for MOT (#366)

Support VOT2018 dataset for SOT (#305)

Support YouTube-VIS dataset for VIS (#290)

Bug Fixes

Fix two significant bugs in SOT and provide new SOT pretrained models (#349)

Improvements

Refactor LaSOT, TrackingNet dataset and support GOT-10K datasets (#296)

Support persisitent workers (#348)

Source code(tar.gz)
Source code(zip)
v0.8.0(Oct 3, 2021)
New Features

Support OTB100 dataset in SOT (#271)

Support TrackingNet dataset in SOT (#268)

Support UAV123 dataset in SOT (#260)

Bug Fixes

Fix a bug in mot_param_search.py (#270)

Improvements

Use PyTorch sphinx theme (#274)

Use pycocotools instead of mmpycocotools (#263)

Source code(tar.gz)
Source code(zip)
v0.7.0(Sep 3, 2021)
Highlights

Release code of AAAI 2021 paper 'Temporal ROI Align for Video Object Recognition' (#247)

Refactor English documentations (#243)

Add Chinese documentations (#248), (#250)

New Features

Support fp16 training and testing (#230)

Release model using ResNeXt-101 as backbone for all VID methods (#254)

Support the results of Tracktor on MOT15, MOT16 and MOT20 datasets (#217)

Support visualization for single gpu test (#216)

Bug Fixes

Fix a bug in MOTP evaluation (#235)

Fix two bugs in reid training and testing (#249)

Improvements

Refactor anchor in SiameseRPN++ (#229)

Unify model initialization (#235)

Refactor unittest (#231)

Source code(tar.gz)
Source code(zip)
v0.6.0(Jul 30, 2021)
Highlights

Fix training bugs of all three tasks (#219), (#221)

New Features

Support error visualization for mot task (#212)

Bug Fixes

Fix a bug in SOT demo (#213)

Improvements

Use MMCV registry (#220)

Add README.md for reid training (#210)

Modify dict keys of the outputs of SOT (#223)

Add Chinese docs including install.md, quick_run.md, model_zoo.md, dataset.md (#205), (#214)

Source code(tar.gz)
Source code(zip)
v0.5.3(Jul 2, 2021)
New Features

Support ReID training (#177), (#179), (#180), (#181),

Support MIM (#158)

Bug Fixes

Fix evaluation hook (#176)

Fix a typo in vid config (#171)

Improvements

Refactor nms config (#167)

Source code(tar.gz)
Source code(zip)
v0.5.2(Jun 3, 2021)
Improvements

Fixed typos (#104, #121, #145)

Added conference reference (#111)

Updated the link of CONTRIBUTING to mmcv (#112)

Adapt updates in mmcv (FP16Hook) (#114, #119)

Added bibtex and links to other codebases (#122)

Added docker files (#124)

Used collect_env in mmcv (#129)

Added and updated Chinese README (#135, #147, #148)

Source code(tar.gz)
Source code(zip)
v0.5.1(Feb 1, 2021)
Bug Fixes

Fixed ReID checkpoint loading (#80)

Fixed empty tensor in track_result (#86)

Fixed wait_time in MOT demo script (#92)

Improvements

Support single-stage detector for DeepSORT (#100)

Source code(tar.gz)
Source code(zip)
v0.5.0(Jan 5, 2021)
Highlights

MMTracking is released! It is the first open source toolbox that unifies versatile video perception tasks include single object tracking, multiple object tracking, and video object detection.

New Features

Support video object detection methods: DFF, FGFA, SELSA

Support multi object tracking methods: SORT/DeepSORT, Tracktor

Support single object tracking methods: SiameseRPN++

Source code(tar.gz)
Source code(zip)

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.

Related tags

Overview

Introduction

Major features

License

Changelog

Benchmark and model zoo

Installation

Getting Started

Contributing

Acknowledgement

Citation

Projects in OpenMMLab

Comments

2021-12-26 16:04:19,060 - mmtrack - INFO - Environment info:

TorchVision: 0.6.0a0+82fd1c8 OpenCV: 4.5.4 MMCV: 1.4.1 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 10.1 MMTracking: 0.8.0+

before_train_epoch: (VERY_HIGH ) StepLrUpdaterHook (NORMAL ) DistSamplerSeedHook (NORMAL ) DistEvalHook (LOW ) IterTimerHook (VERY_LOW ) TextLoggerHook

before_train_iter: (VERY_HIGH ) StepLrUpdaterHook (NORMAL ) DistEvalHook (LOW ) IterTimerHook

after_train_iter: (ABOVE_NORMAL) OptimizerHook (NORMAL ) CheckpointHook (NORMAL ) DistEvalHook (LOW ) IterTimerHook (VERY_LOW ) TextLoggerHook

after_train_epoch: (NORMAL ) CheckpointHook (NORMAL ) DistEvalHook (VERY_LOW ) TextLoggerHook

before_val_epoch: (NORMAL ) DistSamplerSeedHook (LOW ) IterTimerHook (VERY_LOW ) TextLoggerHook

before_val_iter: (LOW ) IterTimerHook

after_val_iter: (LOW ) IterTimerHook

after_val_epoch: (VERY_LOW ) TextLoggerHook

after_run: (VERY_LOW ) TextLoggerHook

Error Log

Environment

Some Other Problems

Releases(v1.0.0rc1)

v1.0.0rc1(Oct 11, 2022)

v0.14.0(Sep 19, 2022)

Highlights

New Features

v1.0.0rc0(Aug 31, 2022)

v0.13.0(Apr 29, 2022)

Highlights

New Features

Bug Fixes

v0.12.0(Apr 1, 2022)

Highlights

Bug Fixes

v0.11.0(Mar 4, 2022)

Highlights

New Features

v0.10.0(Feb 10, 2022)

New Features

Improvements

v0.9.0(Jan 6, 2022)

Highlights

New Features

Bug Fixes

Improvements

v0.8.0(Oct 3, 2021)

New Features

Bug Fixes

Improvements

v0.7.0(Sep 3, 2021)

Highlights

New Features

Bug Fixes

Improvements

v0.6.0(Jul 30, 2021)

Highlights

New Features

Bug Fixes

Improvements

v0.5.3(Jul 2, 2021)

New Features

Bug Fixes

Improvements

v0.5.2(Jun 3, 2021)

Improvements

v0.5.1(Feb 1, 2021)

Bug Fixes

Improvements

v0.5.0(Jan 5, 2021)

Highlights

New Features

Owner

before_train_epoch: (VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) DistSamplerSeedHook
(NORMAL ) DistEvalHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook

before_train_iter: (VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) DistEvalHook
(LOW ) IterTimerHook

after_train_iter: (ABOVE_NORMAL) OptimizerHook
(NORMAL ) CheckpointHook
(NORMAL ) DistEvalHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook

after_train_epoch: (NORMAL ) CheckpointHook
(NORMAL ) DistEvalHook
(VERY_LOW ) TextLoggerHook

before_val_epoch: (NORMAL ) DistSamplerSeedHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook