当前位置:网站首页>Time frequency diagram classification challenge of intelligent hardware voice control 2.0 (ideas and results, currently top5)

Time frequency diagram classification challenge of intelligent hardware voice control 2.0 (ideas and results, currently top5)

2022-07-19 04:35:00 Hyacinth's cat redamancy

Time frequency diagram classification challenge of intelligent hardware voice control 2.0

Here is a record of , Some ideas and processes about my study , stay 2022 Some competitions in iFLYTEK's developer competition in , And some success

Address of the competition :http://challenge.xfyun.cn/topic/info?type=time-frequency-2022&option=ssgy

One 、 Background of the event

2014 year 11 month , Amazon has launched a new concept of smart speakers Echo, Control hardware devices interactively through voice instructions . end 2016 year 4 month ,Echo The cumulative sales volume of has exceeded 300 Ten thousand units .2017 year 12 A total of tens of millions of units per month . Amazon Echo The launch of speakers marks the practical landing scheme based on voice interaction .

The voice control intelligent hardware represented by intelligent speakers has been commercialized on a large scale in China .2020 In, China accounted for 51%, Ranked first in the world , In the same period, the share of the United States increased from 44% Down to 24%.

Two 、 The mission of the event

The competition questions are provided with 24 Speech time spectrum data set of sentence speech interaction instructions (spectrogram dataset), Players need to complete the construction of network model , Based on dense multilayer network 、 Combination of basic structures such as convolution network and cyclic network , Make effective predictions .

3、 ... and 、 Review rules

1. Data description

This competition provides contestants with voice signals and their corresponding sentence tags . For the sake of data security , All data are desensitized data .

2. Evaluation indicators

This model is based on the submitted result document , use Macro-F1 Evaluate .

3. Evaluation and ranking

1、 Download data is provided for the preliminary and semi-finals , Players debug the algorithm locally , Submit the results on the competition page .

2、 Each team can submit up to... Per day 3 Time .

3、 The ranking is ranked from high to low , The ranking list will select the team's best performance in history for ranking .

Four 、 Work submission requirements

1、 File format : according to csv Submit test results in the format

2、 file size : No requirements

3、 The document details :

  1. Encoded as UTF-8

  2. The submission format is shown in the submission example

5、 ... and 、 The schedule rules

This competition adopts a round system

Race cycle 7 month 1 Japan -8 month 1 Japan

1、7 month 1 Japan 10:00 Publish relevant data sets ( That is, open the competition list )

2、 The deadline for submission of competition works is 8 month 1 Japan 17:00

On the spot defense

1、 Finally, the top three teams will be invited to participate in iFLYTEK global 1024 Developers' day and defense on site

2、 Reply with (10mins State +5mins Question and answer ) In the form of

3、 Score comprehensively according to the score of the work and the score of the defense ( Proportion of work achievements 70%, Proportion of on-site defense copies 30%)

6、 ... and 、 Award settings

  • Finalists
    • Hkust xunfei 1024 Developer's day pass
    • Final qualification certificate
    • IFLYTEK incubator base green entry channel
    • A.I. Service market entry privileges
  • Win the finals
    • Final prize , Each track TOP3 The contestants will win the race track prize , The first name 5000 element 、 proxime accessit 3000 element 、 The third 2000 element .
    • Participate in 1024 Global developer Festival Award Ceremony , Award bonus on site 、 Certificates and customized trophies
    • A.I. Whole chain entrepreneurship support
    • Green employment channel & IFLYTEK Offer

7、 ... and 、 Try Tricks And ideas

  • Try multi-purpose data enhancement

  • Try to use the existing weights for transfer learning

  • Try to make use of LabelSmooth The loss of

  • Try multi model integration , Model fusion and other methods

  • Try changing the resolution of the image , It used to be 450x750

    450x750 It's actually a wonderful data , In the picture , Probably 500x800,450x750 After the edge data is eliminated , That is, the final result of the edge noise , This method is more reliable

  • Try to increase batchsize Run and get results , from 5->8

  • Try to use large models for training

8、 ... and 、 Detailed parameters and operation

Data enhancement processing

transform_train = A.Compose([
        A.RandomCrop(450, 750),
    ])

Data enhancement will be added later , I found that from the results , Because the brightness change in our picture is obvious , If you change the brightness , Our data enhancement is almost ineffective , Personal feeling contrast is also , Therefore, the increased data enhancement is mainly the translation of the image , Or cover up and so on . If the result is good , Then consider testing with the enhancement of brightness and contrast

Added A.CoarseDropout(p=0.5) in the future , It turns out that 1% about

transform_train = A.Compose([
            A.RandomCrop(450, 750),
            A.CoarseDropout(p=0.5),
            # A.ShiftScaleRotate(shift_limit=0.0625, scale_limit=0.05, rotate_limit=0, p=0.5),
            # A.RandomBrightnessContrast(p=0.5),
        ])

ResNet18

First, use for reference baseline Medium ResNet18 Training , Then add your own framework and a little modification to train , The first training reached 91.5% The score

CUDA_VISIBLE_DEVICES=3 python train.py -f --cuda --net ResNet18 --epochs 50 -bs 5 -lr 0.001

How to train

CUDA_VISIBLE_DEVICES=0 python train.py -f --cuda --net Model --epochs 50 -bs 5 -lr 0.001 -fe 5

It turns out that , We can often get good results by training with small models , especially EfficientNetv2 The model of the series , A relatively high accuracy can be obtained in the verification set

All of these are tested by using the pre training model process , Because a model with a certain amount of knowledge can get better results , And in the following model , Freeze training first 5 An iterative

besides , Added early stop strategy , Prevent over fitting

Here is the optimal result of the model

Using the model The number of iterations Training parameters Training set ACC Verification set ACC
ResNet18epochs = 50AdamW,lr = 0.0005,batch-size = 899.9097.12
ConvNeXt-Tepochs = 50AdamW,lr = 0.0005,batch-size = 8
EfficientNetv2-Tepochs = 50AdamW,lr = 0.0005,batch-size = 899.9091.12
EfficientNetv2-b0epochs = 50AdamW,lr = 0.0005,batch-size = 899.9096.63
EfficientNetv2-b1epochs = 50AdamW,lr = 0.0005,batch-size = 899.9095.67

In fact, the existing models are all small models for training , Later, you can also try to use the large model to see whether you can get better results

Nine 、 Submit results

2022.7.15, It's No 7, score 0.93121

 Insert picture description here

2022.7.15, It's No 5, score 0.94377, This time, only one data enhancement is added, and a good result is obtained

 Insert picture description here

ID state score Submit file name Submit comments Submission Submission time
1 Back to the score 0.94377submit_ensemble_07-15-16-56-00.csv Integrate multiple Efficientv2 The model of the series , add ResNet18 Little model , Add the result of random masking data enhancement Good at shooters pikachu2022-07-15 17:14:56
2 Back to the score 0.93121submit_ensemble_07-15-01-03-09.csv Integrate multiple Efficientv2 The model of the series , add ResNet18 Little model , No data enhancement results Good at shooters pikachu2022-07-15 09:53:24
3 Back to the score 0.93121submit_EfficientNetv2-S_07-15-01-03-09.csv Use three models ConvNeXt-T,ResNet18,EfficientNetv2-S, No data enhanced knots Good at shooters pikachu2022-07-15 01:04:40
4 Back to the score 0.90679sub_convnext-T.csv utilize ConvNeXt-T Model , Train on the basis of improvement , No data enhancement results Good at shooters pikachu2022-07-14 22:20:30
5 Back to the score 0.9145sub.csv utilize baseline Medium ResNet18 Model , Train on the basis of improvement , The final test result Good at shooters pikachu2022-07-14 16:54:44
原网站

版权声明
本文为[Hyacinth's cat redamancy]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/200/202207170411165303.html