lojgloss extraneous characters Posted by pdf23ds on Wed 18 of Jun, 2008 11:05 GMT posts: 143 Use this thread to discuss the lojgloss extraneous characters page.
Posted by pdf23ds on Wed 18 of Jun, 2008 11:05 GMT posts: 143 How should I handle non-lojban characters? I can strip out some that should be superfluous, and others that I can't make any sense out of, but should I translate some characters into cmavo? I'm thinking of parentheses, braces, and brackets here. I could translate parentheses into to-toi, and square brackets into sei-se'u. Any other ideas? Should I just ignore any random non-lojban characters? Chris Capel -- "What is it like to be a bat? What is it like to bat a bee? What is it like to be a bee being batted? What is it like to be a batted bee?" -- The Mind's I (Hofstadter, Dennet) To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by Anonymous on Wed 18 of Jun, 2008 13:04 GMT On 6/18/08, Chris Capel <pdf23ds@gmail.com> wrote: > How should I handle non-lojban characters? I can strip out some that > should be superfluous, and others that I can't make any sense out of, > but should I translate some characters into cmavo? I'm thinking of > parentheses, braces, and brackets here. I could translate parentheses > into to-toi, and square brackets into sei-se'u. Any other ideas? Sometimes people write "(to ..toi)", which would translate to "to to...toi toi". That's still grammatical, but probably not what was intended. Similarly "?" is sometimes used along with question words but not instead of them. ";" is sometimes used for "pi'e". > Should I just ignore any random non-lojban characters? Probably some like "_" should be taken as a space. mu'o mi'e xorxes To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by skaryzgik on Wed 18 of Jun, 2008 21:31 GMT posts: 5 On Wed, Jun 18, 2008 at 6:02 AM, Chris Capel <pdf23ds@gmail.com> wrote: > I could translate parentheses > into to-toi, and square brackets into sei-se'u. Any other ideas? > If you're doing that, you could translate something like curly braces into tu'e-tu'u. Curly braces reminds me of something. Does there exist a program that will translate math formulae or expressions into lojban mekso and/or vice versa? .imu'omi'e .skaryzgik. -- .i ko tcesi'a la .diskord. http://skaryzgik.blogspot.com .i mi'e la poi jitro be lo jdaca'i ku'o .skaryzgik. poi raibalralju selsi'afanva
Posted by Eimi on Fri 20 of Jun, 2008 14:30 GMT posts: 18 On Wed, 18 Jun 2008, Chris Capel wrote: > How should I handle non-lojban characters? I can strip out some that > should be superfluous, and others that I can't make any sense out of, > but should I translate some characters into cmavo? I'm thinking of > parentheses, braces, and brackets here. I could translate parentheses > into to-toi, and square brackets into sei-se'u. Any other ideas? > > Should I just ignore any random non-lojban characters? I would probably translate 0..9 to qw(no pa re ci vo mu xa ze bi so) and probably a . between digits as a pi. Others get trickier. ; in numbers is usually pi'e. / and - in dates are pi'e, but - is also ni'u, va'a, and vu'u, while + is ma'u and su'i. Most of the times I've seen punctuation other than that, it's along with the lojban word in question, like "(to ... toi)" and quotes along with lu/li'u. I think second guessing those would probably be more problems than it's worth. -- Adam Lopresto <adam@wustl.edu> System Administrator Engineering IT, Washington University To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by adamgarrigus on Fri 20 of Jun, 2008 14:58 GMT posts: 92 On Wed, 18 Jun 2008, Chris Capel wrote: How should I handle non-lojban characters? I can strip out some that > should be superfluous, and others that I can't make any sense out of, > but should I translate some characters into cmavo? I'm thinking of > parentheses, braces, and brackets here. I could translate parentheses > into to-toi, and square brackets into sei-se'u. Any other ideas? > > Should I just ignore any random non-lojban characters? > I think it's probably best to leach this shorthand out of Lojban usage. It feels like a natlang security blanket to me, and seems to run counter to the principle of audiovisual isomorphism. A problem with numerals that hasn't been brought up (at least in this thread) is: don't most non-anglophone countries use "," where anglophones use "." and a space or "." where we use "," (i.e. 186,282.397 == 186 282,397 == 186.282,397)? Of course, the downside is that use of English-style numerals in Lojban text is semi-standard & well represented in Lojban text to date, including the instructional materials. mu'o mi'e komfo,amonan
Posted by pdf23ds on Fri 20 of Jun, 2008 18:08 GMT posts: 143 On Fri, Jun 20, 2008 at 9:56 AM, komfo,amonan <komfoamonan@gmail.com> wrote: > On Wed, 18 Jun 2008, Chris Capel wrote: > >> How should I handle non-lojban characters? I can strip out some that >> should be superfluous, and others that I can't make any sense out of, >> but should I translate some characters into cmavo? I'm thinking of >> parentheses, braces, and brackets here. I could translate parentheses >> into to-toi, and square brackets into sei-se'u. Any other ideas? >> >> Should I just ignore any random non-lojban characters? > > I think it's probably best to leach this shorthand out of Lojban usage. It > feels like a natlang security blanket to me, and seems to run counter to the > principle of audiovisual isomorphism. Perhaps so, but if so a parser/glosser is probably not the place to do it. I really want Lojgloss to be beginner friendly, so that they could paste any lojban text into the box and see what it means. So I want to make it as permissive as possible, at least by default. For instance, I plan to convert "\n\>*" (i.e., e-mail quotes) in the input to spaces so you can get glosses for quoted lojban text. On the other hand, perhaps there's something I can do as a step afterwards to encourage proper lojban? > A problem with numerals that hasn't > been brought up (at least in this thread) is: don't most non-anglophone > countries use "," where anglophones use "." and a space or "." where we use > "," (i.e. 186,282.397 == 186 282,397 == 186.282,397)? Of course, the > downside is that use of English-style numerals in Lojban text is > semi-standard & well represented in Lojban text to date, including the > instructional materials. Hmm. Currently the morphology parses digits as PA cmavo (except in cmene), but basically ignores "," and ".". I'm fine with leaving it that way, actually. It's close enough for non-standard input. I don't care about getting non-standard things *exactly right* every time, I just want it so they at least don't break the rest of the parse, and if it's possible to do something more useful than not with it, then I'd like to do it. Chris Capel -- "What is it like to be a bat? What is it like to bat a bee? What is it like to be a bee being batted? What is it like to be a batted bee?" -- The Mind's I (Hofstadter, Dennet) To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.