Lojban Ideography

Wouldn't it be neat if Lojban had its own ideographic system, maybe in parallel with the phonetic orthography? mi'e cein

Well, here are the beginnings of a proposal of mine. — tk1@

  • I like it! Here's a list of gismu, sorted according to the Lojban thesaurus, with proposed hanzi (well, kanji actually) for gismu. I've got nearly 900 kanji, which covers two-thirds of the 1342 gismu in that list. (I mostly went through and took the kanji used in a salient adjective or verb; since nouns often use more than one kanji, I generally picked one which I felt either conveyed the concept or which I used as abbreviation for the multi-character compound, even if it may be semantically ambiguous (for example, see zgike, where the character usually means something like "enjoyable, fun", but is also the second character in ongaku "music"). --pne
    • :-) Wow, that's a cool list... I'll try to incorporate as much of its contents as I can into my proposal pronto. — tk1@
    • :( Only, I cannot read (display) the Japanese encoding — Aolung
      • Hmm... does your system do GB or Big5? If so I can try to convert them over to one of these. (Speaking of which, what encoding you are using on your home page? I tried GB, Big5, EUC-TW, SJIS, EUC-JP, UTF-8, UTF-7, even Korean, but all gave bogus results.) — tk1@
      • Usually I can read BIG5, GB and UT-8. On some forums (e.g. CTB forum) I only read UT-8, and cannot write at all! Even copy-paste from my Wenlin software doesn't work there - and, obviously, here either! (My platform is MacOSX 10.2.6). — Aolung
      • Here is the list in Big5, with my own character proposals added in some places. (Now I just need to get it into GIF or PNG form to avoid this charset madness altogether...) There are 4 characters in the original list which are Japanese-specific and aren't in Big5: http://angelfire.com/folk/sm0p/cindu-gunka-greku-rupnu.gif (cindu gunka greku rupnu). From the list, it's apparent that some words have acquired very different semantics in Mandarin and Japanese... — tk1@
      • Great! Hen duoxie! May I suggest putting the encoding's mode in the header so the correct BIG5 text appears immediately without need to switching the browser? I'll try to generate a gif-file from it - which might be huge, though. — Aolung
      • Done (I changed the file extension so that encoding tags can work). — tk1@
      • (update) I've started on a list of gismu-hanzi proposed mappings for my orthography proposal, based on pne's list. It's spread across a number of GIF and HTML files (they're generated automatically though, so updating them isn't too hard). I'm currently trying to determine the `best' hanzi assignments, but it seems doing a decent mapping will require huge doses of CJK-ology research... — tk1@
      • (another update) A medium-length sample of my proposed orthography is available on my wiki page.

The nice thing about ideographies is that they are (supposed to be) language-independent, so any existing one could be used for lojban. Of course, they are not so independent. And they need non-ideographic characters to handle grammatical devices (conversion, say): consider the Japanese use of Chinese characters. The sample seems to make some attempts in that direction, but needs discussion. pycyn

Re discussion: I'm all ears. :-) I chose Chinese ideograms because they're something I actually know (also, it seems to be the most widely used ideographic script at the moment). I find that certain words such as "gismu" have no corresponding Chinese character, even though "gismu" is a gismu (primitive concept) in Lojban; so indeed in a way ideographies aren't really that language independent. I'm in the process of writing out the details of my proposed script... but in the meantime, the sample I put up corresponds to pemcrxaiku #5, in case it helps the discussion. — tk1@despammed.com

(P.S. actually my main concern right now is whether there's a way to _typeset_ this sort of thing in an automatic fashion...)

  • What would most help discussion — for us non-Chinese (etc.?) readers — would be an analysis of the meaning — in Chinese and Lojban (and whatever else is relevant — some of that looks Mongolian or so) of the characters.
  • I don't remember exactly how it works in Chinese, but my vague memory is that Chinese has a word (hence a character?) for "predicate;" but maybe it is a phrase, "living word." And another, maybe "dead word," for grammatical tags, =? cmavo. pycyn
    • There are hardly one-character words in Putonghua: "predicate" e.g. is _biao3yu3_ (Ë°®Ë?û) - can you read this?, it's two characters.
      • No; I never remember to get a Chinese script decoder when the opportunity comes along. I remember something in Chao about the problems with the concept "word" for Chinese — I suppose I meant syllable-character confluence. pc
      • I can't read it, but I presume you mean http://shavian.org/lojban/biao3yu3.png. --pne.
      • Yes, that's it! (Yet you'd better convert the png to gif format ;-) to shrink file size) — .aulun.
        • Whatever it is, it is the first Chinese characters I could read directly. pc

BTW, don't think it a good idea to mix up Hanzi, Hiragana _and_ Manchu script! I once gave it a try with Hiragana and Hanzi, i.e. the pleasant Japanese way, yet this was just for ;-) — .aulun.

  • Re meaning analysis: OK, I'll field that. I'm using single Chinese characters to stand for certain gismu, glyphs modified from Tibetan numerals as short-hands for certain often-used Lojban words such as "le", and Manchu phonetic script for everything else. The Tibetan characters (which I've assigned in a somewhat arbitrary fashion) correspond to "le" (twice), "sel-" and "do". The four kanji are "ye4" (night/nightly), "xi3" (happy/happiness), "kong1" (empty/emptiness) and "yuan3" (far, distant, quality of being far); their Lojban pronunciations are indicated in furigana on the right. The Manchu words are just "oisai", "ca'o" and "ije" written out phonetically.
    • Why the mixture? I suspect the whole can be done in Chinese without too much violence, though a stylistic difference between brivla and cmavo would be nice. pc
      • Why: well, mainly because I'm bored. I did consider the possibility of writing everything in hanzi, but that feels so... lame. — tk1@
      • I like the mixture. To me, it corresponds to how Japanese is written: content words in kanji, flectional endings and particles in kana. Similarly here: gismu in hanzi, cmavo in something else (I probably wouldn't have picked Manchu, but it's an interesting idea). I think it would be interesting to extend the gismu = hanzi concept to brivla = hanzi, but I'm not sure how to distinguish between tanru and lujvo (the "prefixed number-of-components" thing is an interesting idea, though it messes up audiovisual isomorphism). Also, fu'ivla (which are brivla) can't really be built on hanzi in this manner. --pne.
        • I think audio-visual isomorphism is nice, but requiring different words to be used for gismu and their corresponding rafsi will be sheer madness, hence this clunky lujvo notation. And fu'ivla are tough indeed. For the name part (e.g. -xaiku in pemcrxaiku), I can simply write some random hanzi denoting its meaning, indicate its pronunciation with ruby characters, and then quote or underline (leftline?) the whole thing. But where should the type (pemcr-) go? :-(tk1@

For this haiku, there happens to be a rather good match between gismu and the (modern Chinese) meanings of single characters. From what I understand, there's no concept of "word" in Chinese: "words" are just arbitrary groups of characters which often occur together, but which can also be broken down and analyzed. A character often denotes some nebulous concept, and its exact meaning and even part of speech often depends on its position, its context and any modifier characters. .aulun.: Just curious, why don't you think it's a good idea? — tk1@despammed.com

    • Because hardly any language, except for Chinese, is adequately expressed in Hanzi (this IMHO also goes for Japanese - although I feel that the specific combination of Kanji and Hiragana looks really beautiful!. The best way of writing Lojban (a very straightforward conlang) up to now seems to be Latin script. Others might look more decorative (e.g. Tengwar etc.) and should at least be phonetical (like Yiddish-Hebrew). Hiragana is very nice but - still - moreorless defective. (I cannot judge Mongolian or Manchu scripts for this Lojban purpose, although I like these very much. I'd favour Mongolian language again being written in their beautiful ancient script instead of changing from Cyrillic to Latin.) BTW, I also had the idea to give gismu using single characters, yet came to the conclusion that this doesn't seem to be possible due to the fact that Chinese basic characters semantically do not match the stock of Lojban gismu. And what about all the Lojban compounds of rafsi+rafsi or rafsi+gismu called lujvo ;-) — Aolung
      • I suppose those are just the other Chinese words — frequently occurring clustes. But then we would need a device to distinguish lujvo from tanru — either a ligature or a separator. As for the semantic mismatch, lojban has that built in in the way it picked the Chinese contributions to the phonetics of the gismu. The connection is better than arbitrary, possibly even to the point of being confusing for a competent Chinese-reader. pc
      • For non-fitting gismu, my current idea is to `simply' fall back on writing them phonetically. — tk1@despammed.com
  • OK... I have fleshed out the details of my proposal (most of it anyway). Feel free to attack! — tk1@
  • I guess the only problem is in calling it an ideography, since almost everything has a phonological gloss and large chunks have no ideographic component. pne seems to have pushed the ideographic side a bit further. But note that most cmavo — BAI at least and probably some others — are as "predicative" as brivla, only in a more indirect way. I'm not sure what to make of that, nor of the fact that Chinese also has characters that function like at least some non-predicative cmavo. It does seem something more ideographic could be created.pc
    • I intended the glosses to be optional, as with again Japanese (in fact Chinese is sometimes also written with pinyin/bopomofo glosses, albeit less commonly). That aside, ancientscripts.com does mention a fundamental problem with the term "ideography" itself, arguing that current "ideographic" scripts are better termed "logographic" scripts, and I quite agree. And bai, bau etc. are driving me nuts. :-(tk1@
      • A decent point: there can't be a separate symbol for each and every idea and any attempt to build ideoglyphs up from a basic few will reveal some features of the underlying language (AN or NA, for example), while sentences will reveal even more (or fail to reveal and leave us uncertain what is meant). Unless, of course, the combinations are all symmetric (and transitive?), which is pretty unlikely (though AUI seems to think it is, to the point of not distinguishing between sentences and compounds. But then it is basically German, an I-E for which this is often almost as plausible as for Buddhist Synthetic Sanskrit). One could throw in explicit modifier markers and other devices to make the word order match at worst the underlying language of the writer but not compel any particular order otherwise, but this would quickly get impratically complicated. On the other hand, since what is wanted is a LOJBAN ideography, none of these is a problem. pc
        • The problems with a LOJBAN ideography (aside from explaining why one wants to do it) are all in dealing with the threefold role of gismu-concepts in the grammar: tanru (including monadic), lujvo, and cmavo. I see no way out short of a small (two at least, five that I can see as maybe required) markers — and some of these will correspond to morphemes anyhow. Of course, the result will still be strictly logographic, but that is inevitable for a particular language. pc

>From what I understand, there's no concept of "word" in Chinese: "words" are just arbitrary groups of characters...

  • As you know, there are precise words for these concepts: _ci2_ is "word" (i.e. a semantic unit, often composed by several characters), _zi4_ is "character", _yan2_ seem to be (spoken, uttered) "words". A _ci2_ is one or more _zi4_, whereas a _zi4_ sometimes (but not always) can be an entire _ci2_. Under an aspect of written texts you're correct, of course, that characters and words are not too easily discernible by someone unfamiliar with vocabulary and syntax, since compound _ci2_ are not written in clusters. True and most interesting also, that the semantic units (i.e. characters) usually consist of "hard kernels" and a moreorless fuzzy (nebulous) "outer seam". This is a very powerful linguistic feature of Chinese with regard to creating new vocabulary: compounded _zi4_ semantically "overlap" and thus are able to give a wide spectrum of subtle shades and colours. — Aolung
    • You're right, I stand corrected. — tk1@

>I cannot judge Mongolian or Manchu scripts for this Lojban purpose, although I like these very much. I'd favour Mongolian language again being written in their beautiful ancient script

  • For the record, classical Mongolian is full of scripting ambiguity (a technical problem), so I can understand if some Mongols are looking to other scripts because of this. Manchu removes much of this ambiguity via diacritics. I concede though that writing Lojban in Manchu may occasionally produce rather ugly results, due to Lojban's phoneme distribution. Using other phonemic scripts is, of course, possible. — tk1@

Ugh, no competing ideography proposals around? :-(

  • Well, there are not many other (well-understood) ideographies available. The ones underlying Egyptian (and alphabets generally) are very restricted, similarly the various Mesopotamian forms. And Mayan may turn out not to be nearly as ideographic as folks of my generation were led to believe. I suppose there are some Conlang ideographic forms — neo-Urqharts like AUI (of course, those are not pictographic) pc

Created by sanxiyn. Last Modification: Friday 27 of June, 2003 11:53:07 GMT by sanxiyn.