History: Corpora

Preview of version: 4

Here is some info that we hope will be useful to BPFK commissioners, and other people doing research on the Lojban language.

  • All the text on lojban.org can be searched from http://www.lojban.org/search.html
  • It can also be search by doing a Google search with "site:.lojban.org" Try, for instance http://www.google.com/search?hl=no&q=nurma+site%3A.lojban.org


The two alternatives above will often give so many false positives in English so as to be useless. The main source we have for Lojban usage is the IRC logs:

These are filtered line-by-line to exclude lines that have too many words that are not possible Lojban word-forms, so it is a very high-quality corpus, and consists of more than 360,000 words (as of February 12th, 2006).

Lojbab's old archives are at http://www.lojban.org/files/texts/archives/

We also have some contributed texts that were uploaded to the old Twiki, and (to my knowledge) not available elsewhere:

History

Advanced
Information Version
Sun 08 of Jun, 2014 19:22 GMT mukti from 216.194.27.154 6
Fri 16 of Apr, 2010 06:21 GMT rlpowell from 64.81.66.169 5
Sat 11 of Feb, 2006 23:48 GMT arj from 129.241.210.193 Updated word count 4
Sat 06 of Aug, 2005 21:38 GMT arj from 129.241.222.58 Lojbab's old archives 3
Tue 15 of Feb, 2005 00:44 GMT arj from 129.241.222.42 gram 2
Mon 14 of Feb, 2005 18:22 GMT arj from 129.241.222.139 Added files, reversed pipe-link 1
Show PHP error messages