diff --git a/FAQ-(Frequently-Asked-Questions).md b/FAQ-(Frequently-Asked-Questions).md index 3415f8e..569dfef 100644 --- a/FAQ-(Frequently-Asked-Questions).md +++ b/FAQ-(Frequently-Asked-Questions).md @@ -18,30 +18,30 @@ In the future, weights/exp_name.pth and logs/exp_name/added_xxx.index will be me Copying/sharing the several hundred MB pth files from the logs folder to the weights folder for forced inference may result in errors such as missing f0, tgt_sr, or other keys. You need to use the ckpt tab at the bottom to manually or automatically (if the information is found in the logs/exp_name), select whether to include pitch infomation and target audio sampling rate options and then extract the smaller model. After extraction, there will be a 60+ MB pth file in the weights folder, and you can refresh the voices to use it.
-## Q4:Connection Error. +## Q5:Connection Error. You may have closed the console (black command line window).
-## Q5:WebUI popup 'Expecting value: line 1 column 1 (char 0)'. +## Q6:WebUI popup 'Expecting value: line 1 column 1 (char 0)'. Please disable system LAN proxy/global proxy and then refresh.
-## Q6:How to train and infer without the WebUI? +## Q7:How to train and infer without the WebUI? Training script:
You can run training in WebUI first, and the command-line versions of dataset preprocessing and training will be displayed in the message window.
Inference script:
https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/myinfer.py (to be updated)
-## Q7:Cuda memory error/Cuda out of memory. +## Q8:Cuda error/Cuda out of memory. There is a small chance that there is a problem with the CUDA configuration or the device is not supported; more likely, there is not enough memory (out of memory).
For training, reduce the batch size (if reducing to 1 is still not enough, you may need to change the graphics card); for inference, adjust the x_pad, x_query, x_center, and x_max settings in the config.py file as needed. 4G or lower memory cards (e.g. 1060(3G) and various 2G cards) can be abandoned, while 4G memory cards still have a chance.
-## Q8:How many total_epoch are optimal? +## Q9:How many total_epoch are optimal? If the training dataset's audio quality is poor and the noise floor is high, 20-30 epochs are sufficient. Setting it too high won't improve the audio quality of your low-quality training set.
If the training set audio quality is high, the noise floor is low, and there is sufficient duration, you can increase it. 200 is acceptable (since training is fast, and if you're able to prepare a high-quality training set, your GPU likely can handle a longer training duration without issue).
-## Q9:How much training set duration is needed? +## Q10:How much training set duration is needed? A dataset of around 10min to 50min is recommended.
@@ -53,19 +53,19 @@ There are some people who have trained successfully with 1min to 2min data, but Data of less than 1min duration has not been successfully attempted so far. This is not recommended.
-## Q10:What is the index rate for and how to adjust it? +## Q11:What is the index rate for and how to adjust it? If the tone quality of the pre-trained model and inference source is higher than that of the training set, they can bring up the tone quality of the inference result, but at the cost of a possible tone bias towards the tone of the underlying model/inference source rather than the tone of the training set, which is generally referred to as "tone leakage".
The index rate is used to reduce/resolve the timbre leakage problem. If the index rate is set to 1, theoretically there is no timbre leakage from the inference source and the timbre quality is more biased towards the training set. If the training set has a lower sound quality than the inference source, then a higher index rate may reduce the sound quality. Turning it down to 0 does not have the effect of using retrieval blending to protect the training set tones.
If the training set has good audio quality and long duration, turn up the total_epoch, when the model itself is less likely to refer to the inferred source and the pretrained underlying model, and there is little "tone leakage", the index_rate is not important and you can even not create/share the index file.
-## Q11:How to choose the gpu when inferring? +## Q12:How to choose the gpu when inferring? In the config.py file, select the card number after "device cuda:".
The mapping between card number and graphics card can be seen in the graphics card information section of the training tab.
-## Q12:How to use the model saved in the middle of training? +## Q13:How to use the model saved in the middle of training? Save via model extraction at the bottom of the ckpt processing tab.