Contact Xavier if you'd like to contribute on this project.
We're making progress! For now, you can check out its first words here:
http://staticfree.info/projects/lojban_festvox/mi_nelci_la_lojban.ogg
Now, the rest of this probably only makes sense to those of you who are familiar with phonetics. I apologize in advance for the technical jargon.
What we need to do now is to listen through the corpus, and decide where the diphone boundaries go. We also have to find the "middle" of the diphones. I don't know what the TTS system expects, but a preliminary rule of thumb that should at least yield consistent, if not correct, results is to put the middle point in the boundaries between the phones. If there are two consecutive diphones, the part between the two middle marks should sound as one phone.
Dipthongs (two vowels together, eg.: ai, oi, au) should be split where the sound changes. So when you see "a" turning to "i" split it right there.
For plosives (diphones like "a-p" and "k-u" where there is a burst of air coming from the mouth), the diphone split should be done before the opening phase of the plosive. E.g., for two diphones, "a-p", and "p-a", half of the "a" and the silent part should end up in "a-p". The explosion and half of the next "a" should end up in "p-a".
See here for more information on diphone tagging conventions:
http://www-2.cs.cmu.edu/~awb/papers/festvox/festvox_5.html#SEC26
The format of the file is to be found at http://www.cstr.ed.ac.uk/projects/festival/manual/festival_20.html#SEC80.
A practical way of doing this is with Praat, http://www.fon.hum.uva.nl/praat/. Here is a short howto:
./util/TextGrid2index.pl -l ljb_diphone/ljb_diphone_hand_timed.index \
praat/praat.Collection > ljb_diphone/ljb_diphone.index
(with the appropriate filenames, of course)
See the documentation included in the distribution for more instructions on using Praat to time diphones.