当前位置:网站首页>Time frequency diagram classification challenge of intelligent hardware voice control 2.0 (ideas and results, currently top5)
Time frequency diagram classification challenge of intelligent hardware voice control 2.0 (ideas and results, currently top5)
2022-07-19 04:35:00 【Hyacinth's cat redamancy】
Time frequency diagram classification challenge of intelligent hardware voice control 2.0
Here is a record of , Some ideas and processes about my study , stay 2022 Some competitions in iFLYTEK's developer competition in , And some success
Address of the competition :http://challenge.xfyun.cn/topic/info?type=time-frequency-2022&option=ssgy
One 、 Background of the event
2014 year 11 month , Amazon has launched a new concept of smart speakers Echo, Control hardware devices interactively through voice instructions . end 2016 year 4 month ,Echo The cumulative sales volume of has exceeded 300 Ten thousand units .2017 year 12 A total of tens of millions of units per month . Amazon Echo The launch of speakers marks the practical landing scheme based on voice interaction .
The voice control intelligent hardware represented by intelligent speakers has been commercialized on a large scale in China .2020 In, China accounted for 51%, Ranked first in the world , In the same period, the share of the United States increased from 44% Down to 24%.
Two 、 The mission of the event
The competition questions are provided with 24 Speech time spectrum data set of sentence speech interaction instructions (spectrogram dataset), Players need to complete the construction of network model , Based on dense multilayer network 、 Combination of basic structures such as convolution network and cyclic network , Make effective predictions .
3、 ... and 、 Review rules
1. Data description
This competition provides contestants with voice signals and their corresponding sentence tags . For the sake of data security , All data are desensitized data .
2. Evaluation indicators
This model is based on the submitted result document , use Macro-F1 Evaluate .
3. Evaluation and ranking
1、 Download data is provided for the preliminary and semi-finals , Players debug the algorithm locally , Submit the results on the competition page .
2、 Each team can submit up to... Per day 3 Time .
3、 The ranking is ranked from high to low , The ranking list will select the team's best performance in history for ranking .
Four 、 Work submission requirements
1、 File format : according to csv Submit test results in the format
2、 file size : No requirements
3、 The document details :
Encoded as UTF-8
The submission format is shown in the submission example
5、 ... and 、 The schedule rules
This competition adopts a round system
Race cycle 7 month 1 Japan -8 month 1 Japan
1、7 month 1 Japan 10:00 Publish relevant data sets ( That is, open the competition list )
2、 The deadline for submission of competition works is 8 month 1 Japan 17:00
On the spot defense
1、 Finally, the top three teams will be invited to participate in iFLYTEK global 1024 Developers' day and defense on site
2、 Reply with (10mins State +5mins Question and answer ) In the form of
3、 Score comprehensively according to the score of the work and the score of the defense ( Proportion of work achievements 70%, Proportion of on-site defense copies 30%)
6、 ... and 、 Award settings
- Finalists
- Hkust xunfei 1024 Developer's day pass
- Final qualification certificate
- IFLYTEK incubator base green entry channel
- A.I. Service market entry privileges
- Win the finals
- Final prize , Each track TOP3 The contestants will win the race track prize , The first name 5000 element 、 proxime accessit 3000 element 、 The third 2000 element .
- Participate in 1024 Global developer Festival Award Ceremony , Award bonus on site 、 Certificates and customized trophies
- A.I. Whole chain entrepreneurship support
- Green employment channel & IFLYTEK Offer
7、 ... and 、 Try Tricks And ideas
Try multi-purpose data enhancement
Try to use the existing weights for transfer learning
Try to make use of LabelSmooth The loss of
Try multi model integration , Model fusion and other methods
Try changing the resolution of the image , It used to be 450x750
450x750 It's actually a wonderful data , In the picture , Probably 500x800,450x750 After the edge data is eliminated , That is, the final result of the edge noise , This method is more reliable
Try to increase batchsize Run and get results , from 5->8
Try to use large models for training
8、 ... and 、 Detailed parameters and operation
Data enhancement processing
transform_train = A.Compose([
A.RandomCrop(450, 750),
])
Data enhancement will be added later , I found that from the results , Because the brightness change in our picture is obvious , If you change the brightness , Our data enhancement is almost ineffective , Personal feeling contrast is also , Therefore, the increased data enhancement is mainly the translation of the image , Or cover up and so on . If the result is good , Then consider testing with the enhancement of brightness and contrast
Added A.CoarseDropout(p=0.5) in the future , It turns out that 1% about
transform_train = A.Compose([
A.RandomCrop(450, 750),
A.CoarseDropout(p=0.5),
# A.ShiftScaleRotate(shift_limit=0.0625, scale_limit=0.05, rotate_limit=0, p=0.5),
# A.RandomBrightnessContrast(p=0.5),
])
ResNet18
First, use for reference baseline Medium ResNet18 Training , Then add your own framework and a little modification to train , The first training reached 91.5% The score
CUDA_VISIBLE_DEVICES=3 python train.py -f --cuda --net ResNet18 --epochs 50 -bs 5 -lr 0.001
How to train
CUDA_VISIBLE_DEVICES=0 python train.py -f --cuda --net Model --epochs 50 -bs 5 -lr 0.001 -fe 5
It turns out that , We can often get good results by training with small models , especially EfficientNetv2 The model of the series , A relatively high accuracy can be obtained in the verification set
All of these are tested by using the pre training model process , Because a model with a certain amount of knowledge can get better results , And in the following model , Freeze training first 5 An iterative
besides , Added early stop strategy , Prevent over fitting
Here is the optimal result of the model
| Using the model | The number of iterations | Training parameters | Training set ACC | Verification set ACC |
|---|---|---|---|---|
| ResNet18 | epochs = 50 | AdamW,lr = 0.0005,batch-size = 8 | 99.90 | 97.12 |
| ConvNeXt-T | epochs = 50 | AdamW,lr = 0.0005,batch-size = 8 | ||
| EfficientNetv2-T | epochs = 50 | AdamW,lr = 0.0005,batch-size = 8 | 99.90 | 91.12 |
| EfficientNetv2-b0 | epochs = 50 | AdamW,lr = 0.0005,batch-size = 8 | 99.90 | 96.63 |
| EfficientNetv2-b1 | epochs = 50 | AdamW,lr = 0.0005,batch-size = 8 | 99.90 | 95.67 |
In fact, the existing models are all small models for training , Later, you can also try to use the large model to see whether you can get better results
Nine 、 Submit results
2022.7.15, It's No 7, score 0.93121

2022.7.15, It's No 5, score 0.94377, This time, only one data enhancement is added, and a good result is obtained

| ID | state | score | Submit file name | Submit comments | Submission | Submission time |
|---|---|---|---|---|---|---|
| 1 | Back to the score | 0.94377 | submit_ensemble_07-15-16-56-00.csv | Integrate multiple Efficientv2 The model of the series , add ResNet18 Little model , Add the result of random masking data enhancement | Good at shooters pikachu | 2022-07-15 17:14:56 |
| 2 | Back to the score | 0.93121 | submit_ensemble_07-15-01-03-09.csv | Integrate multiple Efficientv2 The model of the series , add ResNet18 Little model , No data enhancement results | Good at shooters pikachu | 2022-07-15 09:53:24 |
| 3 | Back to the score | 0.93121 | submit_EfficientNetv2-S_07-15-01-03-09.csv | Use three models ConvNeXt-T,ResNet18,EfficientNetv2-S, No data enhanced knots | Good at shooters pikachu | 2022-07-15 01:04:40 |
| 4 | Back to the score | 0.90679 | sub_convnext-T.csv | utilize ConvNeXt-T Model , Train on the basis of improvement , No data enhancement results | Good at shooters pikachu | 2022-07-14 22:20:30 |
| 5 | Back to the score | 0.9145 | sub.csv | utilize baseline Medium ResNet18 Model , Train on the basis of improvement , The final test result | Good at shooters pikachu | 2022-07-14 16:54:44 |
边栏推荐
- ASP. Net1==visual studio create asp net demo
- 使用__slots__和__dict__来节省空间(简直就是质的飞越,LeetCode亲测有效)
- The author of surging issued the pressure test results
- 64. 最小路径和:给定一个包含非负整数的 m x n 网格 grid ,请找出一条从左上角到右下角的路径,使得路径上的数字总和为最小。 说明:每次只能向下或者向右移动一步。
- SQL interface switching cannot obtain focus
- Intensive reading series of papers
- C language explanation series - practice and consolidation of circular sentences, explanation of binary search
- [Vuforia] 图像识别的简单逻辑
- Software testing - use cases
- surging作者出具压测结果
猜你喜欢
随机推荐
Explain pod and container resource management and allocation in detail (CPU and memory allocation, temporary storage management)
使用小丸工具箱进行极限视频压缩
MySQL中判断和向下取整的使用场景和用法
Touchid and faceid~2
Cannot find module ‘process‘ or its corresponding type declarations.
PyTorch Image Models(timm)库
树状数组:[JXOI2017]加法 题解
[ruoyi Vue plus] learning notes 30 - redisson (VI) bounded blocking queue (redisson source code + Lua script)
kettle5.4问题记录
Analysis of network attack detection technology for NDN
策略模式代替if-else
B+ tree stored procedures, triggers, substring and substr, and truncate and delete
Ftxui basic notes (botton button component Foundation)
Swift 【Class】【struct】
Wechat e-book reading of applet completion works (7) Interim inspection report
Software testing - Advanced
If by frame package name modifier
李宏毅_机器学习_作业4(详解)_HW4 Classify the speakers
How to open the applet for people near wechat (the way to open the applet near wechat)
surging作者出具压测结果




![[Unity] 交互之双击](/img/28/8c9c3dd9de413ff8e6373ea111b04f.png)




