Lojban In General

Lojban In General


gismu etymology

posts: 20
Use this thread to discuss the gismu etymology page.
posts: 20

coi rodo,

Does anybody know the format of the gismu etymology file 1?

Also, this file lists source language words in a Lojbanised form, in
ASCII, without inflectional endings and with affricates reduced to
simple spirants. There is mention somewhere (but I can't remember
where) of a hardcopy with the source words in their original
form. Would it possible to scan this hardcopy and upload it to
Lojban.org?

It would be great to make the natural language origins of Lojban
vocabulary more visible. For example, having access to the original
words would open the way for an etymological section in a gismu
dictionary, with source words in Unicode and with IPA transcription
(and possibly Lojban/TLI Loglan correspondence as well, as this is
already documented in 2).

1 http://www.lojban.org/tiki/tiki-index.php?page=gismu%20Etymology
2 http://www.lojban.org/files/wordlists/oldlog.txt

--
mu'o mi'e mublin.


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 162

mublin wrote:
> Does anybody know the format of the gismu etymology file 1?

Yes.

I wrote this entire message and then found that there is a file on the
website that may have more or better explanations:
http://www.lojban.org/publications/etymology/etysample.txt

Those that weren't made by the word making algorithm have no etymology,
except as indicated by notes, and only a single line.

Those made by algorithm have several lines. I choose one for an example:

> 619a catni 56.40 authority 1/3o >4.0
> cuan atorati cakti autoriz vlastn sulta
> cuan atorati cakti autoriz vlastn tafuid
> (authority )
> 3/7 catni 56.40 3 3 4 3 3 0

Line one
619a is a run number, which tells me where to find the actual run
amongst several thousand pages of output.

catni is the word chosen. 56.40 is its calculated recognition score
with 100 being perfect but 30s and 40s more common.

authority is the English keyword.

1/3 means that the algorithm gave three acceptable possible words, of
which catni was the first. The other two may have been eliminated by
conflict with other gismu, or not presented as many options for rafsi.

the o immediately following is a fixed column marker that allowed me to
quickly select these first-lines in a text editor. All of the working
files were created by hand using a text editor, and I used lots of
shortcuts to save time and reduce errors (but I still made errors

>4.0 means that the scores for this word (and the other two that were
considered, were significantly better (4 points) than other candidates.
This was noted in case conflicts existed for all candidates, to allow
me to recognize the tradeoffs in choosing. Some other words were chosen
with lower scores due to conflicts with the higher scoring word. This
is reflected in the notes to the right, sometimes indicating just how
good or bad the score was.

Line 2 and 3 indicate the two sets of words that were run, which both
gave this result (in this case because neither Arabic word contributed
to the chosen word). There were actually many more sets of words run,
and this only indicates the winning sets.
626e purci is an example of a word that had many tied
winning sets, and either of two Russian words had a score of 3 letter
matches in order out of 6 letters.

For example, following is the complete set of runs made for English
keyword authority, as part of the 619a data runs (perhaps a dozen words,
with 50-odd total combinations tried, which probably took around 4 hours
at the original 8086 computer that did these runs - nowadays the whole
run would be done in a minute or so). I've labeled the 6 languages -
the English keyword is shown at the end of the line:
Chinese English Hindi Spanish Russian Arabic
> cuan atorati cakti autoriz vlastn sulta authority
> cuan atorati cakti autoriz palnamoci sulta authority
> cuan atorati cakti autoriz vlastn tafuid authority
> cuan atorati cakti autoriz palnamoci tafuid authority
> cuan atorati adikar autoriz vlastn sulta authority
> cuan atorati adikar autoriz palnamoci sulta authority
> cuan atorati adikar autoriz vlastn tafuid authority
> cuan atorati adikar autoriz palnamoci tafuid authority
> cuan atorati cakti autoridad aftaritiet sulta authority
> cuan atorati cakti autoridad aftaritiet tafuid authority
> cuan atorati adikar autoridad aftaritiet sulta authority
> cuan atorati adikar autoridad aftaritiet tafuid authority
> cuan atorati cakti autoriz aftaritiet sulta authority
> cuan atorati cakti autoriz aftaritiet tafuid authority
> cuan atorati adikar autoriz aftaritiet sulta authority
> cuan atorati adikar autoriz aftaritiet tafuid authority
> cuan atorati cakti mand aftaritiet sulta authority
> cuan atorati cakti mand aftaritiet tafuid authority
> cuan atorati adikar mand aftaritiet sulta authority
> cuan atorati adikar mand aftaritiet tafuid authority

In the final line, 3/7 is the English etymology score - 3 letters
matching in order among the 7 in the Lojbanized form "atoriti". The
56.40 is the score again, and then follows 6 numbers with the score in
each of the six languages (divided by the number of letters in the
Lojbanized wordform for that language, for the winning data set)

The etymology file does not contain notes on the errors that were made
(like the fact that gismu was actually generated as gicmu). I know I
prepared a list of known errors at one point; I am not finding it however.

-----------------------


> Also, this file lists source language words in a Lojbanised form, in
> ASCII, without inflectional endings and with affricates reduced to
> simple spirants.

and a few other rules, some of them source-language specific, but you
have the most significant of them.

> There is mention somewhere (but I can't remember
> where) of a hardcopy with the source words in their original
> form. Would it possible to scan this hardcopy and upload it to
> Lojban.org?

Not reasonably. It's a big thick binder of one page per word, usually
only one side but sometimes with notes on the back, all handwritten.

lojbab



To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.