3 对照实验·实验记录
RVC-Boss edited this page 2023-07-26 16:47:16 +08:00
This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

一、对照实验数据集

训练集target speaker约8min。

采样试听(用于展示音色和训练集质量)来自 米津玄師《ピースサイン》

https://user-images.githubusercontent.com/129054828/236691565-591a0238-b260-46a5-a9fe-83fd4f8065ee.mp4

测试音频:夏真浔 翻唱 《冬之花》 第一段

https://user-images.githubusercontent.com/129054828/236691637-ab48a326-2016-4960-8d50-cf255e34fe65.mp4

before-baseline-version(史前版本)混音结果完整版(《冬の花》coverd by AI米津玄師)

https://www.bilibili.com/video/BV1Kb411d7zC

二、faiss索引对照updated20230428

结论:

1、nprobe增大对效果影响不大因此更新后从7降至1检索速度7倍

2、fastscanPQ128质量有损注意 wa ta shi no i no "ch"i暂时不采纳

3、top8进行加权混合代替top1显著削弱高频刺耳的现象提升了音频质量采纳。

https://user-images.githubusercontent.com/129054828/236691899-80900082-0fd8-4562-a936-c572688e7afe.mp4

https://user-images.githubusercontent.com/129054828/236691905-509d591a-52fb-4d10-9440-40c7e6ffb8c1.mp4

https://user-images.githubusercontent.com/129054828/236691908-f88fb16f-3623-4a86-9044-273a842af519.mp4

https://user-images.githubusercontent.com/129054828/236691910-912327cd-6604-46bb-b7bc-4e886489a883.mp4

三、backbone结构对照底模+小训练集fine tune

version:hubert_baseContentVec+add 3 period discriminators

harvest+邻域3的中值滤波+index_rate=1

hubert_base结构下中间hidden size为768结尾linear至256

C768不使用final_proj

C256使用final_proj

L9/L12hubert的特征层数

baseline当前版本C256L9

结论C768L12默秒全呼吸+辅音齿音电流声)。

https://user-images.githubusercontent.com/129054828/236691753-86c1242e-0fd5-4da0-b6e6-d14cbc5a3712.mp4

https://user-images.githubusercontent.com/129054828/236691760-4de3f6ee-6ff1-4475-b8d7-1b679729fca8.mp4

https://user-images.githubusercontent.com/129054828/236691763-41c3335e-7f32-471c-9532-bf2f95866148.mp4

https://user-images.githubusercontent.com/129054828/236691767-5ea7d837-87f2-40ed-a2f8-79f1daba8e7e.mp4

四、RVC_v3偷跑

大就是好!

https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/assets/129054828/e8b21d2a-d868-47ae-8a35-2108570914a6