VCTK.txt 5.1 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394
  1. ---------------------------------------------------------------------
  2. CSTR VCTK Corpus
  3. English Multi-speaker Corpus for CSTR Voice Cloning Toolkit
  4. (Version 0.92)
  5. RELEASE September 2019
  6. The Centre for Speech Technology Research
  7. University of Edinburgh
  8. Copyright (c) 2019
  9. Junichi Yamagishi
  10. jyamagis@inf.ed.ac.uk
  11. ---------------------------------------------------------------------
  12. Overview
  13. This CSTR VCTK Corpus includes speech data uttered by 110 English
  14. speakers with various accents. Each speaker reads out about 400
  15. sentences, which were selected from a newspaper, the rainbow passage
  16. and an elicitation paragraph used for the speech accent archive.
  17. The newspaper texts were taken from Herald Glasgow, with permission
  18. from Herald & Times Group. Each speaker has a different set of the
  19. newspaper texts selected based a greedy algorithm that increases the
  20. contextual and phonetic coverage. The details of the text selection
  21. algorithms are described in the following paper:
  22. C. Veaux, J. Yamagishi and S. King,
  23. "The voice bank corpus: Design, collection and data analysis of
  24. a large regional accent speech database,"
  25. https://doi.org/10.1109/ICSDA.2013.6709856
  26. The rainbow passage and elicitation paragraph are the same for all
  27. speakers. The rainbow passage can be found at International Dialects
  28. of English Archive:
  29. (http://web.ku.edu/~idea/readings/rainbow.htm). The elicitation
  30. paragraph is identical to the one used for the speech accent archive
  31. (http://accent.gmu.edu). The details of the the speech accent archive
  32. can be found at
  33. http://www.ualberta.ca/~aacl2009/PDFs/WeinbergerKunath2009AACL.pdf
  34. All speech data was recorded using an identical recording setup: an
  35. omni-directional microphone (DPA 4035) and a small diaphragm condenser
  36. microphone with very wide bandwidth (Sennheiser MKH 800), 96kHz
  37. sampling frequency at 24 bits and in a hemi-anechoic chamber of
  38. the University of Edinburgh. (However, two speakers, p280 and p315
  39. had technical issues of the audio recordings using MKH 800).
  40. All recordings were converted into 16 bits, were downsampled to
  41. 48 kHz, and were manually end-pointed.
  42. This corpus was originally aimed for HMM-based text-to-speech synthesis
  43. systems, especially for speaker-adaptive HMM-based speech synthesis
  44. that uses average voice models trained on multiple speakers and speaker
  45. adaptation technologies. This corpus is also suitable for DNN-based
  46. multi-speaker text-to-speech synthesis systems and waveform modeling.
  47. COPYING
  48. This corpus is licensed under the Creative Commons License: Attribution 4.0 International
  49. http://creativecommons.org/licenses/by/4.0/legalcode
  50. VCTK VARIANTS
  51. There are several variants of the VCTK corpus:
  52. Speech enhancement
  53. - Noisy speech database for training speech enhancement algorithms and TTS models where we added various types of noises to VCTK artificially: http://dx.doi.org/10.7488/ds/2117
  54. - Reverberant speech database for training speech dereverberation algorithms and TTS models where we added various types of reverberantion to VCTK artificially http://dx.doi.org/10.7488/ds/1425
  55. - Noisy reverberant speech database for training speech enhancement algorithms and TTS models http://dx.doi.org/10.7488/ds/2139
  56. - Device Recorded VCTK where speech signals of the VCTK corpus were played back and re-recorded in office environments using relatively inexpensive consumer devices http://dx.doi.org/10.7488/ds/2316
  57. - The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) https://github.com/microsoft/MS-SNSD
  58. ASV and anti-spoofing
  59. - Spoofing and Anti-Spoofing (SAS) corpus, which is a collection of synthetic speech signals produced by nine techniques, two of which are speech synthesis, and seven are voice conversion. All of them were built using the VCTK corpus. http://dx.doi.org/10.7488/ds/252
  60. - Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2015) Database. This database consists of synthetic speech signals produced by ten techniques and this has been used in the first Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2015) http://dx.doi.org/10.7488/ds/298
  61. - ASVspoof 2019: The 3rd Automatic Speaker Verification Spoofing and Countermeasures Challenge database. This database has been used in the 3rd Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2019) https://doi.org/10.7488/ds/2555
  62. ACKNOWLEDGEMENTS
  63. The CSTR VCTK Corpus was constructed by:
  64. Christophe Veaux (University of Edinburgh)
  65. Junichi Yamagishi (University of Edinburgh)
  66. Kirsten MacDonald
  67. The research leading to these results was partly funded from EPSRC
  68. grants EP/I031022/1 (NST) and EP/J002526/1 (CAF), from the RSE-NSFC
  69. grant (61111130120), and from the JST CREST (uDialogue).
  70. Please cite this corpus as follows:
  71. Christophe Veaux, Junichi Yamagishi, Kirsten MacDonald,
  72. "CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit",
  73. The Centre for Speech Technology Research (CSTR),
  74. University of Edinburgh