当前位置:网站首页>Voice conversion mainly involves technical records
Voice conversion mainly involves technical records
2022-07-18 22:23:00 【Wsyoneself】
Speech analysis and synthesis , At present, the main analytical and synthetic methods :
- Harmonic plus noise model (HNM): Divide the signal into harmonic components and noise components , The harmonic component represents the low-frequency part of the signal , Can be determined by the fundamental frequency 、 Range 、 Phase is represented by three parameters ; Noise represents the high-frequency part of the signal , It can be represented by Gaussian white noise through high pass filter , Then use the specific algorithm to calculate the amplitude and phase values .
- STRAIGHT( Speech transformation and representation using weighted spectral adaptive interpolation ): A pitch adaptive time-frequency spectrum smoothing algorithm model is proposed , It can reduce the interference between signal period and spectrum
- Speech feature parameter extraction , At present, the mapping parameter features extracted in speech conversion are mainly the local features of segmental information and the context features of suprasegmental information . The local feature is mainly spectral envelope 、 Cepstrum, formant and other parameters , More commonly used are line spectrum pairs (LSF) Parameter and Mel frequency cepstrum coefficient considering human auditory characteristics (MFCC), Context feature mainly refers to the dynamic information between speech frames .
- Voice time alignment :
- For parallel corpus : The most commonly used method is dynamic time warping (DTW), Calculate the best time alignment for each utterance pair or between each phoneme pair . After dynamic time domain warping , The final result is a pair of source and target feature sequences of equal length .
- For non parallel corpus : Can be used based on WaveNet Voice conversion method of silent coder , This method does not need to deal with intermediate features , But the use of wavenet Directly map the speech posterior to the waveform samples , This avoids the estimation error caused by vocoder and feature conversion .
- Evaluation of conversion effect :
- objective evaluation : Based on the measurement of speech data distortion , Some distance criterion is used to measure the similarity between the converted speech and the original target speech , From this, we can get the evaluation method of the advantages and disadvantages of the conversion method . The main objective evaluation indicators have mean square error (MSE)、 Spectrum loss really ( SD) and mei Er pour Spectrum loss really (MCD),MSE、SD and MCD The smaller the value of , The smaller the distortion , The higher the conversion accuracy .
- Subjective evaluation : Taking people as the main body , Test the voice through people's subjective feelings . Compared with objective evaluation , Subjective evaluation results are more reliable . Subjective methods are generally based on speech quality and speaker feature similarity 2 From two angles , The method adopted is mainly average opinion score (MOS) and ABX:
- MOS test : Let the evaluator base on 5 Score the subjective feeling of the test speech by three grades , It can be used for subjective evaluation of speech quality , It can also be used to evaluate the similarity of speaker features .MOS Score is the comprehensive average result of all test statements and all evaluators .
- ABX test : It mainly evaluates the conversion effect based on the speaker feature similarity of the converted speech , It draws lessons from the principle of speaker recognition . During the test , The assessors listen separately 3 Paragraph voice A、B and X, And judge the voice in terms of personality characteristics A still B Closer to X.(X Is the voice after conversion , A and B They are source speech and target speech . Finally, the judgment results of all evaluators are counted , Calculate the percentage that sounds like the target voice .
边栏推荐
猜你喜欢

图像、视频、3D 数据一把抓,不挑食的 AI 模型 Omnivore !

What about the update error of win11 preview? Solutions to the failure of win11 preview installation

KDD 2017 | metapath2vec:异质图的可扩展表示学习

1302_ Analysis of design and implementation of coroutine in FreeRTOS

快速解决MySQL插入中文数据时报错或乱码问题

毕业季--数据库常见面试题

Leetcode 1332. Delete the palindrome subsequence (after reading the problem solution, you suddenly realize

Win11录屏怎么录声音?Win11录屏幕视频带声音的方法

C language - array

Practical application of machine learning: quickly brush five machine learning problems of Niuke
随机推荐
MySQL 5.7.37 database download and installation tutorial (no installation required for Windows)
[UCOS III source code analysis] - event flag group
What kind of wireless Bluetooth headset is good? Bluetooth headset with the best comprehensive performance
[UCOS III source code analysis] - semaphore
Spark Streaming 编程指南
Network basic VLAN configuration (ENSP, Cisco)
Which is a good noise reduction Bluetooth headset? Bluetooth headset noise reduction recommendations
CMU15445 (Fall 2019) 之 Project#4 - Logging & Recovery 详解
Win11预览版更新错误怎么办?Win11预览版安装失败的解决方法
第9.1章MATLAB的程序设计
ArrayList源码解析
Swin transformer, the best paper model of iccv 2021, finally started with video!
Accessing local variables in anonymous inner classes
Dcat Admin 代码生成器应用(重新编辑)
Network infrastructure VLAN configuration trunk Technology (ENSP, Cisco)
C语言-数组
网络基础VlAN配置(eNSP、Cisco)
Huawei od search for the same substring
Which brand of Bluetooth headset is good at noise reduction? 2022 noise reduction headset ranking
How to take long screenshots in win11? Win11 long screenshot method