experimental cmavo in lojgloss. Posted by pdf23ds on Mon 03 of Nov, 2008 05:29 GMT posts: 143 Use this thread to discuss the experimental cmavo in lojgloss. page.
Posted by pdf23ds on Mon 03 of Nov, 2008 05:29 GMT posts: 143 I'm wondering how to handle experimental cmavo in Lojgloss. As I see it, they probably fall into at least two catogories. 1) Drops right into an existing selma'o, like BAI. 2) Constitutes its own selma'o. 1 could be handled pretty straightforwardly without having to change the parser code, just using some configuration. But 2 would probably be more difficult to handle. At the extreme it would require changing the grammar and recompiling the parser. It'd be nice to avoid that. So how common is case 2? Chris Capel -- "What is it like to be a bat? What is it like to bat a bee? What is it like to be a bee being batted? What is it like to be a batted bee?" -- The Mind's I (Hofstadter, Dennet) To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by Anonymous on Mon 03 of Nov, 2008 12:17 GMT On Mon, Nov 3, 2008 at 2:28 AM, Chris Capel <pdf23ds@gmail.com> wrote: > I'm wondering how to handle experimental cmavo in Lojgloss. As I see > it, they probably fall into at least two catogories. > > 1) Drops right into an existing selma'o, like BAI. > 2) Constitutes its own selma'o. > > 1 could be handled pretty straightforwardly without having to change > the parser code, just using some configuration. But 2 would probably > be more difficult to handle. At the extreme it would require changing > the grammar and recompiling the parser. It'd be nice to avoid that. So > how common is case 2? The only example of 2 that comes to mind is zo'oi/la'oi, of selma'o ZOhOI. The rules for selmaho ZOhOI should be just like those for ZO, except "any-word" is replaced by something similar to "zoi-word". In general, introducing a new selmaho is a bad idea, unless very explicit rules are given for it. mu'o mi'e xorxes To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by Anonymous on Mon 03 of Nov, 2008 18:10 GMT Here is an example of case 2 currently in use on IRC: http://www.lojban.org/tiki/tiki-index.php?page=le'ai I'm not sure about what its exact grammar should be or how it should interact with other magic words. Maybe someone who is knowledgable in this area could suggest something? -- Daniel Brockman daniel@brockman.se
Posted by Anonymous on Mon 03 of Nov, 2008 18:41 GMT On Mon, Nov 3, 2008 at 9:29 AM, Daniel Brockman <dbrockman@gmail.com> wrote: > Here is an example of case 2 currently in use on IRC: > > http://www.lojban.org/tiki/tiki-index.php?page=le'ai > I'm not sure about what its exact grammar should be or how it should > interact with other magic words. Maybe someone who is knowledgable in this > area could suggest something? I don't think any CFG or PEG can handle anything like that. SA is problematic enough, but this one is supposed to recognize and replace individual words, not just selma'o or higher constructions. mu'o mi'e xorxes To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by arj on Mon 03 of Nov, 2008 19:55 GMT posts: 953 On Sun, Nov 02, 2008 at 11:28:15PM -0600, Chris Capel wrote: > I'm wondering how to handle experimental cmavo in Lojgloss. As I see > it, they probably fall into at least two catogories. > > 1) Drops right into an existing selma'o, like BAI. > 2) Constitutes its own selma'o. > > 1 could be handled pretty straightforwardly without having to change > the parser code, just using some configuration. But 2 would probably > be more difficult to handle. At the extreme it would require changing > the grammar and recompiling the parser. It'd be nice to avoid that. So > how common is case 2? Depends on how forgiving you want to make the parser. The official parser just assumes that all unknown cmavo are UI. This will often result in parse failures, but what have you. Camxes (the candidate for the next official parser) does not recognize experimental cmavo at all. One major problem with trying to parse experimental cmavo is that they are very ad-hoc. Often, someone will suggest that some new kind of cmavo is necessary/would be nice, and then grab an arbitrary cmavo form from experimental space (xVV*/CVV+'+V*) without checking if someone else have suggested the same cmavo with a completely different use. As for how common case 2 is - depends on how you count. My educated guess is that new cmavo proposals are about 50/50 additions to existing selma'o and completely new functions. But if you count usages of cmavo, not just the cmavo, the case where you need new rules to parse the text is probably in the majority. For instance, the lo'ai/sa'ai/le'ai construct for correcting mistakes is very common on IRC these days. Another point (or, if you like, a stern warning): if you decide you want to handle experimental cmavo and constructs, please flag them as such, prominently. A glosser/parser is a tool that people not only use to make sense of other people's texts, but also to check if their own prose is correct. If they try to use experimental cmavo, they need to know that their Lojban is not quite kosher, and might not be understood by those who aren't down with the latest lingo. -- Arnt Richard Johansen http://arj.nvg.org/ Confusion among -ate ~ -ant pairs is even more prominate, since both are legitimant suffixes. --Adam Albright To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by Anonymous on Mon 03 of Nov, 2008 20:36 GMT On Mon, Nov 3, 2008 at 4:53 PM, Arnt Richard Johansen <arj@nvg.org> wrote: > > Camxes (the candidate for the next official parser) does not recognize experimental cmavo at all. Well, it does recognize them as experimental cmavo, it just doesn't assign them to any particular selma'o, so they can't be used to do much. It has no problem parsing {zo la'oi cu cipra cmavo}, for example. Or recognizing {la'oi bu} as a lerfu, or {la'oi zei da} as a tanru unit. Modifying the grammar to treat all unassigned cmavo as UI would be easy if that was desirable. mu'o mi'e xorxes To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by dbrock on Tue 04 of Nov, 2008 09:44 GMT posts: 47 On Mon, Nov 3, 2008 at 7:40 PM, Jorge LlambÃas <jjllambias@gmail.com> wrote: > On Mon, Nov 3, 2008 at 9:29 AM, Daniel Brockman <dbrockman@gmail.com> > wrote: > > Here is an example of case 2 currently in use on IRC: > > > > http://www.lojban.org/tiki/tiki-index.php?page=le'ai > > I'm not sure about what its exact grammar should be or how it should > > interact with other magic words. Maybe someone who is knowledgable in > this > > area could suggest something? > > I don't think any CFG or PEG can handle anything like that. SA is > problematic enough, but this one is supposed to recognize and replace > individual words, not just selma'o or higher constructions. It doesn't have to actually make the replacements at the parser level. Just having a grammar for it so that it can parse would be totally fine. It can be treated as an informal indicator that has to be interpreted at a higher level. -- Daniel Brockman daniel@brockman.se
Posted by Anonymous on Tue 04 of Nov, 2008 11:51 GMT On Tue, Nov 4, 2008 at 6:35 AM, Daniel Brockman <daniel@brockman.se> wrote: > >> > http://www.lojban.org/tiki/tiki-index.php?page=le'ai > > It doesn't have to actually make the replacements at the parser level. Just > having a grammar for it so that it can parse would be totally fine. It can > be treated as an informal indicator that has to be interpreted at a higher > level. An internal grammar for the construction, and then making the construction a free indicator, is doable. But in that case it won't be of help for fixing something that is grammatically broken, like SI/SA/SU are meant to do. For example {mi kjama le zarji lo'ai kjama sa'ai klama le'ai} will parse as "{mi} followed by uninterpretable gibberish", so it will never get to a higher level for interpretation. mu'o mi'e xorxes To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by dbrock on Tue 04 of Nov, 2008 13:37 GMT posts: 47 On Tue, Nov 4, 2008 at 12:50 PM, Jorge LlambÃas <jjllambias@gmail.com>wrote: > On Tue, Nov 4, 2008 at 6:35 AM, Daniel Brockman <daniel@brockman.se> > wrote: > > > >> > http://www.lojban.org/tiki/tiki-index.php?page=le'ai > > > > It doesn't have to actually make the replacements at the parser level. > Just > > having a grammar for it so that it can parse would be totally fine. It > can > > be treated as an informal indicator that has to be interpreted at a > higher > > level. > > An internal grammar for the construction, and then making the > construction a free indicator, is doable. Yeah, that would be useful. > But in that case it won't be > of help for fixing something that is grammatically broken, like > SI/SA/SU are meant to do. For example {mi kjama le zarji lo'ai kjama > sa'ai klama le'ai} will parse as "{mi} followed by uninterpretable > gibberish", so it will never get to a higher level for interpretation. It will when you are talking to a human. -- Daniel Brockman daniel@brockman.se
Posted by pdf23ds on Tue 04 of Nov, 2008 23:26 GMT posts: 143 On Tue, Nov 4, 2008 at 07:36, Daniel Brockman <daniel@brockman.se> wrote: > On Tue, Nov 4, 2008 at 12:50 PM, Jorge LlambÃas <jjllambias@gmail.com> > wrote: >> But in that case it won't be >> of help for fixing something that is grammatically broken, like >> SI/SA/SU are meant to do. For example {mi kjama le zarji lo'ai kjama >> sa'ai klama le'ai} will parse as "{mi} followed by uninterpretable >> gibberish", so it will never get to a higher level for interpretation. > > It will when you are talking to a human. Right, but it would still be unparsable. The problem is that the text before it is ungrammatical, and so has to be ignored by the parser to get the whole thing to parse, which requires that the parser understand which words the lo'ai is nulling out. It can't be treated half-way and have things still parse. Chris Capel -- "What is it like to be a bat? What is it like to bat a bee? What is it like to be a bee being batted? What is it like to be a batted bee?" -- The Mind's I (Hofstadter, Dennet) To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by dbrock on Wed 05 of Nov, 2008 10:18 GMT posts: 47 On Wed, Nov 5, 2008 at 12:23 AM, Chris Capel <pdf23ds@gmail.com> wrote: > On Tue, Nov 4, 2008 at 07:36, Daniel Brockman <daniel@brockman.se> wrote: > > On Tue, Nov 4, 2008 at 12:50 PM, Jorge LlambÃas <jjllambias@gmail.com> > > wrote: > >> But in that case it won't be > >> of help for fixing something that is grammatically broken, like > >> SI/SA/SU are meant to do. For example {mi kjama le zarji lo'ai kjama > >> sa'ai klama le'ai} will parse as "{mi} followed by uninterpretable > >> gibberish", so it will never get to a higher level for interpretation. > > > > It will when you are talking to a human. > > Right, but it would still be unparsable. The problem is that the text > before it is ungrammatical, and so has to be ignored by the parser to > get the whole thing to parse, which requires that the parser > understand which words the lo'ai is nulling out. It can't be treated > half-way and have things still parse. The obvious way to implement {lo'ai .. sa'ai .. le'ai} in a parser is to just treat it as a self-contained construct that requires morphologically correct Lojban inside it, just like {lo'u .. le'u'}, and syntactically correct Lojban before it (just like everything else). Anything more advanced than that is, well, more advanced. Of course it would require extraordinary methods to get things like {kwama lo'ai kwama sa'ai klama le'ai} --- or why not {fsen.45ynl5tnerg98ehg4n su coi} --- to parse. It's not practical and not cost-efficient. The {kjama} example falls in this category because {kj} is morphologically invalid. What _would_ be useful and cost-efficient would be to get things like {.i .ai mi cakla sa'ai ckakla le'ai} to parse. The parser shouldn't try to actually replace anything at the parser level. It should just parse the {le'ai} construct and report its syntax tree to the client. A first approximation would be to just put {lo'ai} and {sa'ai} in LOhU and {le'ai} in LEhU. The next step would be to give this construct its own, specifically appropriate, syntax. -- Daniel Brockman daniel@brockman.se
Posted by pdf23ds on Wed 05 of Nov, 2008 13:38 GMT posts: 143 On Wed, Nov 5, 2008 at 03:58, Daniel Brockman <daniel@brockman.se> wrote: > On Wed, Nov 5, 2008 at 12:23 AM, Chris Capel <pdf23ds@gmail.com> wrote: >> Right, but it would still be unparsable. The problem is that the text >> before it is ungrammatical, and so has to be ignored by the parser to >> get the whole thing to parse, which requires that the parser >> understand which words the lo'ai is nulling out. It can't be treated >> half-way and have things still parse. > > The obvious way to implement {lo'ai .. sa'ai .. le'ai} in a parser is to > just treat it as a self-contained construct that requires morphologically > correct Lojban inside it, just like {lo'u .. le'u'}, and syntactically > correct Lojban before it (just like everything else). How far before it? Up to the beginning of the sentence? The statement? > Of course it would require extraordinary methods to get things like {kwama > lo'ai kwama sa'ai klama le'ai} --- or why not {fsen.45ynl5tnerg98ehg4n su > coi} --- to parse. It's not practical and not cost-efficient. The {kjama} > example falls in this category because {kj} is morphologically invalid. Hmm. I think you overestimate the difference in effort between the two implementations. They both require the same tricks, just at a slightly different level in the grammar. > What _would_ be useful and cost-efficient would be to get things like {.i > .ai mi cakla sa'ai ckakla le'ai} to parse. Actually, this is the only useful one I can think of. If the text isn't grammatical before the le'ai clause, then the user is probably going to want to manually correct it anyway before feeding it to the parser. This example happens to be both grammatical, and having nearly the same parse tree as the corrected version (since the before and after words are both brivla). > The parser shouldn't try to > actually replace anything at the parser level. It should just parse the > {le'ai} construct and report its syntax tree to the client. Right, I'm with you on that. Chris Capel -- "What is it like to be a bat? What is it like to bat a bee? What is it like to be a bee being batted? What is it like to be a batted bee?" -- The Mind's I (Hofstadter, Dennet) To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by pdf23ds on Wed 05 of Nov, 2008 13:57 GMT posts: 143 On Mon, Nov 3, 2008 at 13:53, Arnt Richard Johansen <arj@nvg.org> wrote: > Depends on how forgiving you want to make the parser. The official parser just assumes that all unknown cmavo are UI. That would be a good first step, with specific exception for known ones (that aren't in fact UI). > One major problem with trying to parse experimental cmavo is that they are very ad-hoc. Often, someone will suggest that some new kind of cmavo is necessary/would be nice, and then grab an arbitrary cmavo form from experimental space (xVV*/CVV+'+V*) without checking if someone else have suggested the same cmavo with a completely different use. The solution for this is to allow multiple definitions for the same word and then let the user of Lojgloss switch between them at parse time. > Another point (or, if you like, a stern warning): if you decide you want to handle experimental cmavo and constructs, please flag them as such, prominently. A glosser/parser is a tool that people not only use to make sense of other people's texts, but also to check if their own prose is correct. If they try to use experimental cmavo, they need to know that their Lojban is not quite kosher, and might not be understood by those who aren't down with the latest lingo. I can certainly color them differently, and maybe mark them in the text output as well. But in general, I don't think users are going to have much trouble recognizing the form of experimental cmavo, if they're familiar with the concept at all. Chris Capel -- "What is it like to be a bat? What is it like to bat a bee? What is it like to be a bee being batted? What is it like to be a batted bee?" -- The Mind's I (Hofstadter, Dennet) To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by pdf23ds on Wed 05 of Nov, 2008 13:57 GMT posts: 143 On Mon, Nov 3, 2008 at 06:29, Daniel Brockman <dbrockman@gmail.com> wrote: > Here is an example of case 2 currently in use on IRC: > > http://www.lojban.org/tiki/tiki-index.php?page=le'ai > I'm not sure about what its exact grammar should be or how it should > interact with other magic words. Maybe someone who is knowledgable in this > area could suggest something? Another option is to just implement these le'ai using a preprocessor. Even unmorphological (is that a word?) messes will parse, just have a bunch of non-lojban-word productions. Chris Capel -- "What is it like to be a bat? What is it like to bat a bee? What is it like to be a bee being batted? What is it like to be a batted bee?" -- The Mind's I (Hofstadter, Dennet) To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by Anonyme on Thu 06 of Nov, 2008 00:11 GMT Hi, On Wed, Nov 5, 2008 at 2:34 PM, Chris Capel <pdf23ds@gmail.com> wrote: > On Wed, Nov 5, 2008 at 03:58, Daniel Brockman <daniel@brockman.se> wrote: > > On Wed, Nov 5, 2008 at 12:23 AM, Chris Capel <pdf23ds@gmail.com> wrote: > >> Right, but it would still be unparsable. The problem is that the text > >> before it is ungrammatical, and so has to be ignored by the parser to > >> get the whole thing to parse, which requires that the parser > >> understand which words the lo'ai is nulling out. It can't be treated > >> half-way and have things still parse. > > > > The obvious way to implement {lo'ai .. sa'ai .. le'ai} in a parser is to > > just treat it as a self-contained construct that requires morphologically > > correct Lojban inside it, just like {lo'u .. le'u'}, and syntactically > > correct Lojban before it (just like everything else). > > How far before it? Up to the beginning of the sentence? The statement? > The {le'ai} construct doesn't care about ANYTHING else. However your parser works, that's how it works before {le'ai}. > > Of course it would require extraordinary methods to get things like > {kwama > > lo'ai kwama sa'ai klama le'ai} --- or why not {fsen.45ynl5tnerg98ehg4n su > > coi} --- to parse. It's not practical and not cost-efficient. The > {kjama} > > example falls in this category because {kj} is morphologically invalid. > > Hmm. I think you overestimate the difference in effort between the two > implementations. They both require the same tricks, just at a slightly > different level in the grammar. > What are you talking about? One implementation is self-contained; the other requires lots of weird backtracking and re-parsing and weird, weird stuff. > > What _would_ be useful and cost-efficient would be to get things like {.i > > .ai mi cakla sa'ai ckakla le'ai} to parse. > > Actually, this is the only useful one I can think of. If the text > isn't grammatical before the le'ai clause, then the user is probably > going to want to manually correct it anyway before feeding it to the > parser. This example happens to be both grammatical, and having nearly > the same parse tree as the corrected version (since the before and > after words are both brivla). > It doesn't matter if it has the same parse tree. It only matters that it PARSES IN ANY WAY. If it does, then the parser will be able to continue. If it doesn't, then the parser will die. -- Daniel Brockman daniel@gointeractive.se
Posted by Anonyme on Thu 06 of Nov, 2008 00:11 GMT On Wed, Nov 5, 2008 at 2:43 PM, Chris Capel <pdf23ds@gmail.com> wrote: > On Mon, Nov 3, 2008 at 06:29, Daniel Brockman <dbrockman@gmail.com> wrote: > > Here is an example of case 2 currently in use on IRC: > > > > http://www.lojban.org/tiki/tiki-index.php?page=le'ai<http://www.lojban.org/tiki/tiki-index.php?page=le%27ai> > > I'm not sure about what its exact grammar should be or how it should > > interact with other magic words. Maybe someone who is knowledgable in > this > > area could suggest something? > > Another option is to just implement these le'ai using a preprocessor. > Even unmorphological (is that a word?) messes will parse, just have a > bunch of non-lojban-word productions. > Sure. If you want to. Go for it. -- Daniel Brockman daniel@gointeractive.se
Posted by pdf23ds on Thu 06 of Nov, 2008 00:33 GMT posts: 143 On Wed, Nov 5, 2008 at 18:07, Daniel Brockman <daniel@gointeractive.se> wrote: >> > The obvious way to implement {lo'ai .. sa'ai .. le'ai} in a parser is to >> > just treat it as a self-contained construct that requires >> > morphologically >> > correct Lojban inside it, just like {lo'u .. le'u'}, and syntactically >> > correct Lojban before it (just like everything else). >> >> How far before it? Up to the beginning of the sentence? The statement? > > The {le'ai} construct doesn't care about ANYTHING else. However your parser > works, that's how it works before {le'ai}. I don't understand. You're saying that if there's a lo'ai then everything before it in the text should get only a syntactical parse, not a grammatical parse? If not, there has to be some cutoff. >> > Of course it would require extraordinary methods to get things like >> > {kwama >> > lo'ai kwama sa'ai klama le'ai} --- or why not {fsen.45ynl5tnerg98ehg4n >> > su >> > coi} --- to parse. It's not practical and not cost-efficient. The >> > {kjama} >> > example falls in this category because {kj} is morphologically invalid. >> >> Hmm. I think you overestimate the difference in effort between the two >> implementations. They both require the same tricks, just at a slightly >> different level in the grammar. > > What are you talking about? One implementation is self-contained; the other > requires lots of weird backtracking and re-parsing and weird, weird stuff. No, both require backtracking (but not reparsing, since this is a packrat parser) and lots of lookahead that's usually wasted (but hopefully fast). You have to check every sentence (or whatever) for lo'ai before the main grammar parse, whether you do it before or after the morph parse. If you want to see how that's implemented, take a look at SA. Now, SA has a lot more complicated grammar, so lo'ai would be easier to implement even using the same technique. (And contrary to Jorge, I'm not too sure it would introduce any weird interactions with the SA machinery.) > It doesn't matter if it has the same parse tree. It only matters that it > PARSES IN ANY WAY. If it does, then the parser will be able to continue. > If it doesn't, then the parser will die. I'm more concerned about interactive parsing where parse errors aren't a huge deal, especially because you get detailed and helpful error information, much, much better than jbofi'e, to help you find the problem. I think perhaps a better (simple) way to handle lo'ai is to treat it similar to a plain-old lo'u - le'u quote. Still have it behave like a UI, but only morph parse the words until the le'ai. In fact, I imagine a number of experimental cmavo that create new selmaho could be handled cursorily as quotes of this kind. It's not ideal, but it allows a non-expert user to modify the parser with configuration to handle text using these cmavo better than before. Chris Capel -- "What is it like to be a bat? What is it like to bat a bee? What is it like to be a bee being batted? What is it like to be a batted bee?" -- The Mind's I (Hofstadter, Dennet) To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by dbrock on Thu 06 of Nov, 2008 08:56 GMT posts: 47 On Thu, Nov 6, 2008 at 1:30 AM, Chris Capel <pdf23ds@gmail.com> wrote: > On Wed, Nov 5, 2008 at 18:07, Daniel Brockman <daniel@gointeractive.se> > wrote: > >> > The obvious way to implement {lo'ai .. sa'ai .. le'ai} in a parser is > to > >> > just treat it as a self-contained construct that requires > >> > morphologically > >> > correct Lojban inside it, just like {lo'u .. le'u'}, and syntactically > >> > correct Lojban before it (just like everything else). > >> > >> How far before it? Up to the beginning of the sentence? The statement? > > > > The {le'ai} construct doesn't care about ANYTHING else. However your > parser > > works, that's how it works before {le'ai}. > > I don't understand. You're saying that if there's a lo'ai then > everything before it in the text should get only a syntactical parse, > not a grammatical parse? If not, there has to be some cutoff. Syntax and grammar is one and the same thing to me, so I don't understand the distinction. > >> > Of course it would require extraordinary methods to get things like > >> > {kwama > >> > lo'ai kwama sa'ai klama le'ai} --- or why not {fsen.45ynl5tnerg98ehg4n > >> > su > >> > coi} --- to parse. It's not practical and not cost-efficient. The > >> > {kjama} > >> > example falls in this category because {kj} is morphologically > invalid. > >> > >> Hmm. I think you overestimate the difference in effort between the two > >> implementations. They both require the same tricks, just at a slightly > >> different level in the grammar. > > > > What are you talking about? One implementation is self-contained; the > other > > requires lots of weird backtracking and re-parsing and weird, weird > stuff. > > No, both require backtracking (but not reparsing, since this is a > packrat parser) and lots of lookahead that's usually wasted (but > hopefully fast). You have to check every sentence (or whatever) for > lo'ai before the main grammar parse, whether you do it before or after > the morph parse. If you want to see how that's implemented, take a > look at SA. Now, SA has a lot more complicated grammar, so lo'ai would > be easier to implement even using the same technique. (And contrary to > Jorge, I'm not too sure it would introduce any weird interactions with > the SA machinery.) I'm still not getting through. We are talking about two different things. > > It doesn't matter if it has the same parse tree. It only matters that it > > PARSES IN ANY WAY. If it does, then the parser will be able to continue. > > If it doesn't, then the parser will die. > > I'm more concerned about interactive parsing where parse errors aren't > a huge deal, especially because you get detailed and helpful error > information, much, much better than jbofi'e, to help you find the > problem. > > I think perhaps a better (simple) way to handle lo'ai is to treat it > similar to a plain-old lo'u - le'u quote. Still have it behave like a > UI, but only morph parse the words until the le'ai. In fact, I imagine > a number of experimental cmavo that create new selmaho could be > handled cursorily as quotes of this kind. It's not ideal, but it > allows a non-expert user to modify the parser with configuration to > handle text using these cmavo better than before. > Yes, this is what I've been trying to say. Thank you. Just handle it like a parenthetical expression. The more complicated implementation that actually replaces at parse time is another discussion (which I've been trying to avoid in order to keep this simple, but by all means continue if it is interesting to you). I'm not even sure I'd want my parser to erase and replace stuff. I consider an erasure or a replacement to be an additional utterance that is often best understood as such. It would even be interesting to make a parser that could parse through errors and resynchronize later (e.g., when {.i} is encountered), and things like that. Anyway, I'm in over my head. I'm not a parser expert. -- Daniel Brockman daniel@brockman.se
Posted by Anonymous on Thu 06 of Nov, 2008 13:51 GMT On Wed, Nov 5, 2008 at 9:30 PM, Chris Capel <pdf23ds@gmail.com> wrote: > Now, SA has a lot more complicated grammar, so lo'ai would > be easier to implement even using the same technique. (And contrary to > Jorge, I'm not too sure it would introduce any weird interactions with > the SA machinery.) I don't think interactions with SA are the problem. I think LOhAI (if the construction is to be more than just a free modifier) is much more complicated than SA, with or without interactions. I agree their interaction is mostly irrelevant. What SA does is: before each word, look ahead to see whether the word and everything that follows up to SA will end up being deleted by SA. If not, proceed. If yes, ignore everythig up to the replacement word and proceed from there. LOhAI can't do just that. What it has to do is: before each word, look ahead to see whether the word and some of what follows will be deleted. If not, proceed. If yes, ignore everything that matches the part between LOhAI and SAhAI, continue with what comes after SAhAI, go back to the end of the previous match and proceeed from there. The only way I can see that working with a PEG is having a different rule for each potential replacement string. But the number of potential replacements is infinite, so I don't see how that could work. (Or I may be missing something.) What Daniel proposes is much simpler, but I'm still not sure I see the point of it. It won't do anything useful for humans, nor for a computer parser. mu'o mi'e xorxes To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by dbrock on Thu 06 of Nov, 2008 14:33 GMT posts: 47 > > What Daniel proposes is much simpler, but I'm still not sure I see the > point of it. It won't do anything useful for humans, nor for a > computer parser. > Experienced Lojbanists are relatively unlikely to make blatant grammatical errors.They remain (depending on character, I suppose) prone to making all sorts of mistakes, grammatical or otherwise. As a result, the {le'ai} construct is often employed to make corrections even when there are no grammatical *errors* anywhere in sight. For example, the #jbosnu channel accepts only grammatically correct Lojban. Hence, you may not use {le'ai} constructions there. But that would be useful to correct mistakes, despite the fact that grammatical mistakes are not even allowed in the first place. -- Daniel Brockman daniel@brockman.se
Posted by dbrock on Thu 06 of Nov, 2008 14:37 GMT posts: 47 > > despite the fact that grammatical mistakes are not even allowed in the > first place. > I meant grammatical *errors* are not allowed. (You can make a grammatical mistake and still end up speaking valid Lojban.) -- Daniel Brockman daniel@brockman.se
Posted by Anonymous on Thu 06 of Nov, 2008 16:28 GMT On Thu, Nov 6, 2008 at 11:32 AM, Daniel Brockman <daniel@brockman.se> wrote: > > Experienced Lojbanists are relatively unlikely to make blatant grammatical > errors. > They remain (depending on character, I suppose) prone to making all sorts of > mistakes, > grammatical or otherwise. I guess the most likely error for a fluent speaker should be a typo. A typo is likely to result in ungrammatical text or grammatical but with an uninteded structure, though in some cases it can result in a text with the same structure if the typo doesn't change the selma'o of the word. > As a result, the {le'ai} construct is often employed to make > corrections even when there are no grammatical *errors* anywhere in sight. A text that parses with an unintended parse tree is a grammatical error by the speaker, even if the text on its own is grammatically valid. It is as bad as one that does not parse, or perhaps even worse because it could be misleading without announcing that something is wrong. > For example, the #jbosnu channel accepts only grammatically correct Lojban. Interesting, I didn't know that. What parser does it use? > Hence, > you may not use {le'ai} constructions there. But that would be useful to > correct mistakes, > despite the fact that grammatical mistakes are not even allowed in the first > place. OK, what jbosnu seems to require is a construction like: correction <- (!SAhAI any-word)* SAhAI any-word* which constitutes a whole utterance all by itself. Any number of words (grammatical or not) followed by any number of words (grammatical o not). It would appear at the text level: text <- correction / ... Why would it be useful to be able to embed this construction in the midst of some other utterance? If the correction is not going to be used by the parser to fix anything, embedding it in a formally broken text won't work, because it will never be detected. Would it be of any use to embed it in a formally unbroken but effectively uninterpretable or incorrectly interpretable text? In other words, this construction as you are using it is used to make a comment about some other text, it's about how some other text should be fixed, not about the text it appears in. If that's the case, only SAhAI is required, because there is no need to separate the construction from anything else. mu'o mi'e xorxes To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by dbrock on Thu 06 of Nov, 2008 17:54 GMT posts: 47 On Thu, Nov 6, 2008 at 5:27 PM, Jorge LlambÃas <jjllambias@gmail.com> wrote: > > On Thu, Nov 6, 2008 at 11:32 AM, Daniel Brockman <daniel@brockman.se> wrote: > > > > Experienced Lojbanists are relatively unlikely to make blatant grammatical > > errors. > > They remain (depending on character, I suppose) prone to making all sorts of > > mistakes, > > grammatical or otherwise. > > I guess the most likely error for a fluent speaker should be a typo. A > typo is likely to result in ungrammatical text or grammatical but with > an uninteded structure, though in some cases it can result in a text > with the same structure if the typo doesn't change the selma'o of the > word. Granted. > > As a result, the {le'ai} construct is often employed to make > > corrections even when there are no grammatical *errors* anywhere in sight. > > A text that parses with an unintended parse tree is a grammatical > error by the speaker, even if the text on its own is grammatically > valid. It is as bad as one that does not parse, or perhaps even worse > because it could be misleading without announcing that something is > wrong. That's not what I meant by "error". It doesn't matter. > > For example, the #jbosnu channel accepts only grammatically correct Lojban. > > Interesting, I didn't know that. What parser does it use? I'm not sure. If I were to guess, I'd say jbofi'e. > Why would it be useful to be able to embed this construction in the > midst of some other utterance? In speech, it would be useful. On IRC, it wouldn't. > If the correction is not going to be used by the parser to fix > anything, embedding it in a formally broken text won't work, because > it will never be detected. Would it be of any use to embed it in a > formally unbroken but effectively uninterpretable or incorrectly > interpretable text? Probably not with your more restricted grammar and feature set. > In other words, this construction as you are using it is used to make > a comment about some other text, it's about how some other text should > be fixed, not about the text it appears in. How do you separate one "text" from another? > If that's the case, only SAhAI is required, because there is no need > to separate the construction from anything else. You will lose a bunch of things you can do with {le'ai} then: * Distinguish between insertions and vague mistakes. * Distinguish between deletions and vague corrections. * Distinguish between null replacements and vague replacements. * Ask about the replacement with {le'ai pei} ("did you mean ...?"). * Deny the replacement with {le'ai nai} ("I really meant ..." or just "sic"). * Express doubt about a used expression with {le'ai cu'i}. * Switch around the order of mistake and correction. -- Daniel Brockman daniel@brockman.se
Posted by Anonymous on Thu 06 of Nov, 2008 19:09 GMT On Thu, Nov 6, 2008 at 2:52 PM, Daniel Brockman <daniel@brockman.se> wrote: > On Thu, Nov 6, 2008 at 5:27 PM, Jorge LlambÃas <jjllambias@gmail.com> wrote: > >> In other words, this construction as you are using it is used to make >> a comment about some other text, it's about how some other text should >> be fixed, not about the text it appears in. > > How do you separate one "text" from another? By a "text" I mean the amount of speech given to the parser for parsing in one chunk. In a conversation a new text is started every time the speaker changes. Often two texts won't parse if they are simply concatenated into one. In #jbosnu each line constitutes a new text because it is parsed independently of the rest. (I call that unit "text" because that's what the official formal grammar calls it.) mu'o mi'e xorxes To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by pdf23ds on Thu 06 of Nov, 2008 23:30 GMT posts: 143 On Thu, Nov 6, 2008 at 07:47, Jorge LlambÃas <jjllambias@gmail.com> wrote: > On Wed, Nov 5, 2008 at 9:30 PM, Chris Capel <pdf23ds@gmail.com> wrote: >> Now, SA has a lot more complicated grammar, so lo'ai would >> be easier to implement even using the same technique. (And contrary to >> Jorge, I'm not too sure it would introduce any weird interactions with >> the SA machinery.) > > I don't think interactions with SA are the problem. I think LOhAI (if > the construction is to be more than just a free modifier) is much more > complicated than SA, with or without interactions. I agree their > interaction is mostly irrelevant. Sorry, I wasn't talking about full lo'ai there, just the restricted subset that would treat it like a quote. Chris Capel -- "What is it like to be a bat? What is it like to bat a bee? What is it like to be a bee being batted? What is it like to be a batted bee?" -- The Mind's I (Hofstadter, Dennet) To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by dbrock on Fri 07 of Nov, 2008 09:48 GMT posts: 47 > Sorry, I wasn't talking about full lo'ai there, just the restricted > subset that would treat it like a quote. Now that we're deep into this discussion, I might as well ask, how would {lo'ai .. sa'ai .. le'ai} interact with other magic words? (I'm assuming the grammar would be consistent with already existing similar constructions.) Like, I assume {lo'ai zo sa'ai sa'ai le'ai} would be okay. What about {lo'ai lo'u sa'ai le'u sa'ai le'ai}? Is there a simple rule to follow? (I'm still, as I have been from the start, talking about parsing the construction as just a free modifier parenthetical quote-like thing.) -- Daniel Brockman daniel@brockman.se To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by dbrock on Fri 07 of Nov, 2008 09:48 GMT posts: 47 > By a "text" I mean the amount of speech given to the parser for > parsing in one chunk. In a conversation a new text is started every > time the speaker changes. Often two texts won't parse if they are > simply concatenated into one. In #jbosnu each line constitutes a new > text because it is parsed independently of the rest. (I call that > unit "text" because that's what the official formal grammar calls it.) Okay, that's what I thought. Yeah, that's fine for IRC. But what do you do in a live conversation if you want to correct a mistake? Do you have to wait for the other person to say something first? -- Daniel Brockman daniel@brockman.se To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by Anonymous on Fri 07 of Nov, 2008 11:37 GMT On Fri, Nov 7, 2008 at 6:45 AM, Daniel Brockman <daniel@brockman.se> wrote: > > Now that we're deep into this discussion, I might as well ask, > how would {lo'ai .. sa'ai .. le'ai} interact with other magic words? > (I'm assuming the grammar would be consistent with already > existing similar constructions.) If it's modelled on lo'e ... le'u, it should simply treat them as all empty words, i.e. ignore their meaning. > Like, I assume {lo'ai zo sa'ai sa'ai le'ai} would be okay. That would replace a "zo" with a "sa'ai" ("zo" and the second "sa'ai" are just empty words, not parsed with their function). > What about {lo'ai lo'u sa'ai le'u sa'ai le'ai}? Replaces "lo'u" with "le'u sa'ai". > Is there a simple rule to follow? First come, first served. The first magic word is the one with the magic. mu'o mi'e xorxes To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by Anonymous on Fri 07 of Nov, 2008 12:12 GMT On Fri, Nov 7, 2008 at 6:47 AM, Daniel Brockman <daniel@brockman.se> wrote: > > But what do you do in a live conversation if you want to correct a mistake? > Do you have to wait for the other person to say something first? In a spoken conversation typos are less likely. Also, you are less likely to want to change some particular choice of word, simply because you don't have the text in front of you to check what you just said. Unless one is listening to a recording, a fluent speaker is normally not even aware of making any slips. Someone learning a new language is much more conscious of each individual word, but a fluent speaker is more likely to concentrate on the ideas expressed, not the exact words used. When you really are concentrating on some choice of word, and talking about the word, then it's probably more clear to use the normal grammar to talk about the word, rather than use some shortcut for replacement. That's why I've never been too keen on SA, it seems so wrong to have to concentrate on the exact words you are using instead of on what you are expressing with them. SU is not so bad, only because it is so drastic: "strike all that, let me start again". That's useful when you are carefully trying to phrase something right and you realize that you are making a mess of it. That happens to fluent speakers too when dealing with complex ideas. SI is not so likely to be used by a fluent speaker, it's more of a crutch for the beginner, it's tolerable because the very last word is still fresh in the mind and still accessible as a word. Anyway, I'm not really opposed to people using things like SA or the LOhAI/SAhAI/LEhAI construction if they find it useful. Usage rules. It's just that to me it's too artificial, it approaches language from the wrong end (from the valsi instead of the se valsi). I have a similar gripe about "di'u" and "la'e di'u". Why is the most common and useful "la'e di'u" a compound, and the less useful "di'u" a single word? Normally we are much more likely to want to talk about la'e di'u than about di'u. mu'o mi'e xorxes To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.