How to generate lists yourself
- See discussion for details
- The Lojbanic corpus in a .tar.gz archive.
- Older word frequencies can be found here
- Utterance templates by frequency This is a sorted list of "sentence templates" excerpted from IRC. It shows which sequences of selma'o/word types are most common.
Robin Lee Powell's lists
gismu and cmavo frequency ordered word list, based on Lojban IRC, Alice, and a few other large texts. There is also a large selection of intermediary files, including pure frequency lists
Rob Speer's lists
The following is about Rob Speer's frequency lists, which have
fallen off the 'net. Some of them have been recovered and attached
The word frequency lists as of 2003/4/30. Stored on a separate server.
These frequency lists are drawn from a corpus containing the contents of the lojban.org/texts directory, most of this Wiki's texts in Lojban, as many IRC logs as I could find, the texts on CVS, and a large portion of the jbosnu archives. I spent some time weeding out most of the English text, and tried to avoid picking up metalinguistic discussion (a word frequency list based on the main mailing list showed that lujvo is one of the most commonly used words).
- Rob Speer's gismu frequencies
- Rob Speer's cmavo frequencies
- BROKEN LINK: cmavo compounds
- BROKEN LINK: lujvo (updated 2003/7/12; non-lujvo removed; malformed almost-lujvo marked with *)
- BROKEN LINK: fu'ivla
- BROKEN LINK: cmene