当前位置:网站首页>Voice conversion history research record
Voice conversion history research record
2022-07-18 22:23:00 【Wsyoneself】
- Channel spectrum conversion : Based on the statistical analysis of the source and target speaker speech , Through parameter mapping
- Conversion method based on codebook mapping : Reduce the number of features of source and target speech through vector quantization , Then convert the centroid vector closest to the source code book into the corresponding target codebook through clustering method to realize speech conversion . shortcoming : When quantifying, it will cause discontinuity of feature space , And the inter frame information is ignored , The conversion effect is not ideal . Subsequently, a series of improved codebook mapping methods are proposed to solve the discontinuity problem , But it also leads to the problem of over smoothing .
- Transformation method based on Gaussian mixture model : The probability distribution of observation data is expressed by the weighted sum result of a set of Gaussian functions . shortcoming : Estimate only on the source eigenvector , Instead of joint eigenvector estimation , That is, the inter frame information is not considered enough , Prone to over fitting and over smoothing problems . Subsequently, a large number of mathematical methods have been combined into GMM in , But because of GMM There is a non one-to-one mapping in itself , The resulting over smoothing problem has not been fundamentally solved . Use one GMM To fit the joint distribution of input features and output features , According to the input characteristics and GMM To infer output characteristics .
- Transformation method based on Hidden Markov model : The dynamic change of speech signal can be modeled by using its own implicit state and state transition probability matrix . shortcoming : because HMM Limited number of implied states for , The dynamic range of voice signal is limited , Thus, the accuracy of conversion processing is restricted
- Based on frequency bending conversion method : Means by stretching or compressing the spectrum along the frequency axis , To adjust the position and bandwidth of the formant , And adjust the energy in each frequency by amplitude scaling , So as to realize the spectrum mapping from the source to the target speaker . characteristic : It can maintain the naturalness of speech to the greatest extent , And the converted voice quality is high , But the similarity is slightly insufficient , It needs to be combined with other methods to achieve further improvement .
- Conversion method based on Neural Network : Full convolution neural network , Generative antagonistic network , Bidirectional long-term and short-term memory networks are used to realize high-precision conversion from spectral sequence to sequence . More neural networks combine different speech features and adopt different network conversion models . shortcoming : Current excellent deep learning model , Too many dependent parameters , In the non cooperative mode, when the training data is insufficient , There will be fitting phenomenon , Leading to a sharp decline in performance
- Conversion method based on waveform generation : Directly generate audio waveform sample points , Typical example Wavenet. A deep autoregressive model based mainly on conditional probability modeling , Take various features of speech as conditions , Find a suitable autoregressive model through training . characteristic : The resulting speech has high clarity and naturalness 、 Good quality and no smoothing problems , But the network generation speed is slow . Aiming at the phenomenon that speech waveform generated by sample points in the network is easy to cause speech collapse , And how to further improve the naturalness of converted speech remains to be further studied .
- Prosody Conversion : Prosody conversion mainly includes the conversion of pitch period 、 Time conversion and energy conversion , The conversion of vocal tract spectrum is expressed as formant frequency 、 Formant bandwidth 、 Spectrum tilt and other conversion .
边栏推荐
- A New Optimizer Using Particle Swarm Theory
- [UCOS III source code analysis] - event flag group
- [UCOS III source code analysis] - semaphore
- ThoughtWorks modern enterprise architecture framework white paper notes
- LCD 显示撕裂解决
- 985学生:为什么现在学校还在教C语言?| 文末送书
- wordpress建立数据库连接时出错
- UCOS III learning notes - time slice rotation
- 华为od js 日志排序
- What kind of wireless Bluetooth headset is good? Bluetooth headset with the best comprehensive performance
猜你喜欢

Leetcode 1332. Delete the palindrome subsequence (after reading the problem solution, you suddenly realize

剑指 Offer 10- II. 青蛙跳台阶问题(4种解法)

Visual studio production environment configuration scheme: slowcheetah

Which brand of Bluetooth headset has good noise reduction? Top 10 active noise reduction headphones

快速解决MySQL插入中文数据时报错或乱码问题

SCI论文投稿流程

Leetcode 1342. Number of operations to change the number to 0

国产之光!高分时空表征学习模型 UniFormer

Win11怎么进行长截图?Win11长截图的方法

Timesformer: can you understand video by transformer alone? Another attack of attention mechanism!
随机推荐
Sword finger offer 44 [a bit in the number sequence] [100%, 100%]
RPA ecosystem revealed, supporting the life source of RPA enterprises' billions of valuation
Crazy God redis notes 01
Spark Streaming 编程指南
8080端口被占用怎么解决?Win11 8080端口被占用解决方法
[UCOS III source code analysis] - task creation
SCI论文投稿流程
【部署】Redis
Win11录屏怎么录声音?Win11录屏幕视频带声音的方法
SCI paper submission process
JS Huawei od log time sorting
Win11怎么进行长截图?Win11长截图的方法
LINGO运算符和内置函数
Alibaba cloud, Huawei cloud and Google cloud have all entered the Bureau, and the RPAS of 13 cloud computing manufacturers have been checked
技术干货 | 模型优化精度、速度我全都要!MindSpore模型精度调优实战(二)
Towhee daily model weekly report
MySQL 5.7.37 database download and installation tutorial (no installation required for Windows)
Interviewer: what is the builder model?
将博客搬至CSDN
Flutter is stuck in running gradle task 'assemblydebug' Solution of