Lojban diphone speech synthesizer

Contact Xavier if you'd like to contribute on this project.

We're making progress! For now, you can check out its first words here:
http://staticfree.info/projects/lojban_festvox/mi_nelci_la_lojban.ogg

Other TTS samples:
TTS of first paragraph of la nicte cadzu
TTS of second paragraph of la nicte cadzu. Contains at least one stress error: cabycte -> cabYcte
Slowed down TTS sample

Now, the rest of this probably only makes sense to those of you who are familiar with phonetics. I apologize in advance for the technical jargon.

What we need to do now is to listen through the corpus, and decide where the diphone boundaries go. We also have to find the "middle" of the diphones. I don't know what the TTS system expects, but a preliminary rule of thumb that should at least yield consistent, if not correct, results is to put the middle point in the boundaries between the phones. If there are two consecutive diphones, the part between the two middle marks should sound as one phone.

Dipthongs (two vowels together, eg.: ai, oi, au) should be split where the sound changes. So when you see "a" turning to "i" split it right there.

For plosives (diphones like "a-p" and "k-u" where there is a burst of air coming from the mouth), the diphone split should be done before the opening phase of the plosive. E.g., for two diphones, "a-p", and "p-a", half of the "a" and the silent part should end up in "a-p". The explosion and half of the next "a" should end up in "p-a".

See here for more information on diphone tagging conventions:
http://www-2.cs.cmu.edu/~awb/papers/festvox/festvox_5.html#SEC26

The format of the file is to be found at http://www.cstr.ed.ac.uk/projects/festival/manual/festival_20.html#SEC80.

A practical way of doing this is with Praat, http://www.fon.hum.uva.nl/praat/. Here is a short howto:

Praat objects window -> Read -> Read from file...
Select the file, and push Label and Segment -> To Textgrid...
Tier names: Diphone Middle. Leave Point tiers blank. Click OK.
Select both the sound file and the TextGrid. Push Edit.
Click anywhere in the waveform or spectrogram to move the cursor there. Click Boundary - Add on selected tier, tier 2, etc. You can always move the boundary later.
Click between two boundaries to select it. You can play it, and you will see the location of the start and end points in seconds.
Do the same with the middles, but this time, click on the boundaries instead of between them. The exact location will be shown.
To use Xavier's conversion script, simply label the segments with their diphone (a-t, #-d) in the text-box at the top of the edit window. Make sure you label the segment between the midpoints with the appropriate phoneme, not diphone as it really shouldn't be one. See http://staticfree.info/projects/lojban_festvox/praat_timing.png for an example.
Back in the main window, select all the TextGrids that you've been working on and Write -> Write to text file. Note: if you save one TextGrid at a time, make sure to retain the original filename; otherwise the script won't have the sample name at all. Praat doesn't put the sample name in singular TextGrid files for some reason.
finally do:

./util/TextGrid2index.pl -l ljb_diphone/ljb_diphone_hand_timed.index \
praat/praat.Collection > ljb_diphone/ljb_diphone.index

(with the appropriate filenames, of course)

See the documentation included in the distribution for more instructions on using Praat to time diphones.

ID	Name	Comment	Uploaded	Size	Downloads
134	toi_ljb_phones.scm	Lojban phoneme files for FestVox	arj Sat 06 of Aug, 2005 22:44 GMT	4.88 Kb	3447
133	ljb_schema_pseudocode.doc	How the Lojban FestVox diphone list was generated.	arj Sat 06 of Aug, 2005 22:43 GMT	16.50 Kb	2690
101	since_masno.ogg	Slowed down TTS sample	arj Wed 27 of Apr, 2005 20:12 GMT	776.44 Kb	3340
100	kalifornias.ogg	TTS of second paragraph of la nicte cadzu	arj Wed 27 of Apr, 2005 20:11 GMT	127.19 Kb	3105
99	lnc-tts.ogg	TTS of first paragraph of la nicte cadzu	arj Wed 27 of Apr, 2005 20:11 GMT	100.38 Kb	3378
53	akflacs.zip	FestVox diphones as a zipped set of flac files.	rlpowell Sat 12 of Feb, 2005 22:42 GMT	14.35 Mb	10109
50	akwavs.zip	FestVox diphones as a zipped set of wav files.	rlpowell Sat 12 of Feb, 2005 22:40 GMT	25.80 Mb	12256
35	ljbdiph.list	Diphone List for FestVox	arj Sat 12 of Feb, 2005 20:34 GMT	51.38 Kb	3233

Lojban diphone speech synthesizer

Attached files

Search Lojban Resources

Translations

Backlinks

Page actions

Lojban diphone speech synthesizer

Attached files

Search Lojban Resources