gismu etymology Posted by mublin on Thu 13 of Mar, 2008 20:05 GMT posts: 20 Use this thread to discuss the gismu etymology page.
Posted by mublin on Thu 13 of Mar, 2008 20:05 GMT posts: 20 coi rodo, Does anybody know the format of the gismu etymology file 1? Also, this file lists source language words in a Lojbanised form, in ASCII, without inflectional endings and with affricates reduced to simple spirants. There is mention somewhere (but I can't remember where) of a hardcopy with the source words in their original form. Would it possible to scan this hardcopy and upload it to It would be great to make the natural language origins of Lojban vocabulary more visible. For example, having access to the original words would open the way for an etymological section in a gismu dictionary, with source words in Unicode and with IPA transcription (and possibly Lojban/TLI Loglan correspondence as well, as this is already documented in 2). 1 2 -- mu'o mi'e mublin. To unsubscribe from this list, send mail to with the subject unsubscribe, or go to, or if you're really stuck, send mail to for help.
Posted by lojbab on Thu 13 of Mar, 2008 23:52 GMT posts: 162 mublin wrote: > Does anybody know the format of the gismu etymology file 1? Yes. I wrote this entire message and then found that there is a file on the website that may have more or better explanations: Those that weren't made by the word making algorithm have no etymology, except as indicated by notes, and only a single line. Those made by algorithm have several lines. I choose one for an example: > 619a catni 56.40 authority 1/3o >4.0 > cuan atorati cakti autoriz vlastn sulta > cuan atorati cakti autoriz vlastn tafuid > (authority ) > 3/7 catni 56.40 3 3 4 3 3 0 Line one 619a is a run number, which tells me where to find the actual run amongst several thousand pages of output. catni is the word chosen. 56.40 is its calculated recognition score with 100 being perfect but 30s and 40s more common. authority is the English keyword. 1/3 means that the algorithm gave three acceptable possible words, of which catni was the first. The other two may have been eliminated by conflict with other gismu, or not presented as many options for rafsi. the o immediately following is a fixed column marker that allowed me to quickly select these first-lines in a text editor. All of the working files were created by hand using a text editor, and I used lots of shortcuts to save time and reduce errors (but I still made errors >4.0 means that the scores for this word (and the other two that were considered, were significantly better (4 points) than other candidates. This was noted in case conflicts existed for all candidates, to allow me to recognize the tradeoffs in choosing. Some other words were chosen with lower scores due to conflicts with the higher scoring word. This is reflected in the notes to the right, sometimes indicating just how good or bad the score was. Line 2 and 3 indicate the two sets of words that were run, which both gave this result (in this case because neither Arabic word contributed to the chosen word). There were actually many more sets of words run, and this only indicates the winning sets. 626e purci is an example of a word that had many tied winning sets, and either of two Russian words had a score of 3 letter matches in order out of 6 letters. For example, following is the complete set of runs made for English keyword authority, as part of the 619a data runs (perhaps a dozen words, with 50-odd total combinations tried, which probably took around 4 hours at the original 8086 computer that did these runs - nowadays the whole run would be done in a minute or so). I've labeled the 6 languages - the English keyword is shown at the end of the line: Chinese English Hindi Spanish Russian Arabic > cuan atorati cakti autoriz vlastn sulta authority > cuan atorati cakti autoriz palnamoci sulta authority > cuan atorati cakti autoriz vlastn tafuid authority > cuan atorati cakti autoriz palnamoci tafuid authority > cuan atorati adikar autoriz vlastn sulta authority > cuan atorati adikar autoriz palnamoci sulta authority > cuan atorati adikar autoriz vlastn tafuid authority > cuan atorati adikar autoriz palnamoci tafuid authority > cuan atorati cakti autoridad aftaritiet sulta authority > cuan atorati cakti autoridad aftaritiet tafuid authority > cuan atorati adikar autoridad aftaritiet sulta authority > cuan atorati adikar autoridad aftaritiet tafuid authority > cuan atorati cakti autoriz aftaritiet sulta authority > cuan atorati cakti autoriz aftaritiet tafuid authority > cuan atorati adikar autoriz aftaritiet sulta authority > cuan atorati adikar autoriz aftaritiet tafuid authority > cuan atorati cakti mand aftaritiet sulta authority > cuan atorati cakti mand aftaritiet tafuid authority > cuan atorati adikar mand aftaritiet sulta authority > cuan atorati adikar mand aftaritiet tafuid authority In the final line, 3/7 is the English etymology score - 3 letters matching in order among the 7 in the Lojbanized form "atoriti". The 56.40 is the score again, and then follows 6 numbers with the score in each of the six languages (divided by the number of letters in the Lojbanized wordform for that language, for the winning data set) The etymology file does not contain notes on the errors that were made (like the fact that gismu was actually generated as gicmu). I know I prepared a list of known errors at one point; I am not finding it however. ----------------------- > Also, this file lists source language words in a Lojbanised form, in > ASCII, without inflectional endings and with affricates reduced to > simple spirants. and a few other rules, some of them source-language specific, but you have the most significant of them. > There is mention somewhere (but I can't remember > where) of a hardcopy with the source words in their original > form. Would it possible to scan this hardcopy and upload it to > Not reasonably. It's a big thick binder of one page per word, usually only one side but sometimes with notes on the back, all handwritten. lojbab To unsubscribe from this list, send mail to with the subject unsubscribe, or go to, or if you're really stuck, send mail to for help.