Lojban In General

Lojban In General


posts: 143

I'm wondering how to handle experimental cmavo in Lojgloss. As I see
it, they probably fall into at least two catogories.

1) Drops right into an existing selma'o, like BAI.
2) Constitutes its own selma'o.

1 could be handled pretty straightforwardly without having to change
the parser code, just using some configuration. But 2 would probably
be more difficult to handle. At the extreme it would require changing
the grammar and recompiling the parser. It'd be nice to avoid that. So
how common is case 2?

Chris Capel
--
"What is it like to be a bat? What is it like to bat a bee? What is it
like to be a bee being batted? What is it like to be a batted bee?"
-- The Mind's I (Hofstadter, Dennet)


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

On Mon, Nov 3, 2008 at 2:28 AM, Chris Capel <pdf23ds@gmail.com> wrote:

> I'm wondering how to handle experimental cmavo in Lojgloss. As I see
> it, they probably fall into at least two catogories.
>
> 1) Drops right into an existing selma'o, like BAI.
> 2) Constitutes its own selma'o.
>
> 1 could be handled pretty straightforwardly without having to change
> the parser code, just using some configuration. But 2 would probably
> be more difficult to handle. At the extreme it would require changing
> the grammar and recompiling the parser. It'd be nice to avoid that. So
> how common is case 2?

The only example of 2 that comes to mind is zo'oi/la'oi, of selma'o ZOhOI.

The rules for selmaho ZOhOI should be just like those for ZO, except
"any-word" is replaced by something similar to "zoi-word".

In general, introducing a new selmaho is a bad idea, unless very
explicit rules are given for it.

mu'o mi'e xorxes


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Here is an example of case 2 currently in use on IRC:

http://www.lojban.org/tiki/tiki-index.php?page=le'ai
I'm not sure about what its exact grammar should be or how it should
interact with other magic words. Maybe someone who is knowledgable in this
area could suggest something?

--
Daniel Brockman
daniel@brockman.se

On Mon, Nov 3, 2008 at 9:29 AM, Daniel Brockman <dbrockman@gmail.com> wrote:
> Here is an example of case 2 currently in use on IRC:
>
> http://www.lojban.org/tiki/tiki-index.php?page=le'ai
> I'm not sure about what its exact grammar should be or how it should
> interact with other magic words. Maybe someone who is knowledgable in this
> area could suggest something?

I don't think any CFG or PEG can handle anything like that. SA is
problematic enough, but this one is supposed to recognize and replace
individual words, not just selma'o or higher constructions.

mu'o mi'e xorxes


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 953

On Sun, Nov 02, 2008 at 11:28:15PM -0600, Chris Capel wrote:
> I'm wondering how to handle experimental cmavo in Lojgloss. As I see
> it, they probably fall into at least two catogories.
>
> 1) Drops right into an existing selma'o, like BAI.
> 2) Constitutes its own selma'o.
>
> 1 could be handled pretty straightforwardly without having to change
> the parser code, just using some configuration. But 2 would probably
> be more difficult to handle. At the extreme it would require changing
> the grammar and recompiling the parser. It'd be nice to avoid that. So
> how common is case 2?

Depends on how forgiving you want to make the parser. The official parser just assumes that all unknown cmavo are UI. This will often result in parse failures, but what have you. Camxes (the candidate for the next official parser) does not recognize experimental cmavo at all.

One major problem with trying to parse experimental cmavo is that they are very ad-hoc. Often, someone will suggest that some new kind of cmavo is necessary/would be nice, and then grab an arbitrary cmavo form from experimental space (xVV*/CVV+'+V*) without checking if someone else have suggested the same cmavo with a completely different use.

As for how common case 2 is - depends on how you count. My educated guess is that new cmavo proposals are about 50/50 additions to existing selma'o and completely new functions. But if you count usages of cmavo, not just the cmavo, the case where you need new rules to parse the text is probably in the majority. For instance, the lo'ai/sa'ai/le'ai construct for correcting mistakes is very common on IRC these days.

Another point (or, if you like, a stern warning): if you decide you want to handle experimental cmavo and constructs, please flag them as such, prominently. A glosser/parser is a tool that people not only use to make sense of other people's texts, but also to check if their own prose is correct. If they try to use experimental cmavo, they need to know that their Lojban is not quite kosher, and might not be understood by those who aren't down with the latest lingo.

--
Arnt Richard Johansen http://arj.nvg.org/
Confusion among -ate ~ -ant pairs is even more prominate, since both
are legitimant suffixes. --Adam Albright


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

On Mon, Nov 3, 2008 at 4:53 PM, Arnt Richard Johansen <arj@nvg.org> wrote:
>
> Camxes (the candidate for the next official parser) does not recognize experimental cmavo at all.

Well, it does recognize them as experimental cmavo, it just doesn't
assign them to any particular selma'o, so they can't be used to do
much. It has no problem parsing {zo la'oi cu cipra cmavo}, for
example. Or recognizing {la'oi bu} as a lerfu, or {la'oi zei da} as a
tanru unit. Modifying the grammar to treat all unassigned cmavo as UI
would be easy if that was desirable.

mu'o mi'e xorxes


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 47

On Mon, Nov 3, 2008 at 7:40 PM, Jorge Llambías <jjllambias@gmail.com> wrote:

> On Mon, Nov 3, 2008 at 9:29 AM, Daniel Brockman <dbrockman@gmail.com>
> wrote:
> > Here is an example of case 2 currently in use on IRC:
> >
> > http://www.lojban.org/tiki/tiki-index.php?page=le'ai
> > I'm not sure about what its exact grammar should be or how it should
> > interact with other magic words. Maybe someone who is knowledgable in
> this
> > area could suggest something?
>
> I don't think any CFG or PEG can handle anything like that. SA is
> problematic enough, but this one is supposed to recognize and replace
> individual words, not just selma'o or higher constructions.


It doesn't have to actually make the replacements at the parser level. Just
having a grammar for it so that it can parse would be totally fine. It can
be treated as an informal indicator that has to be interpreted at a higher
level.

--
Daniel Brockman
daniel@brockman.se

On Tue, Nov 4, 2008 at 6:35 AM, Daniel Brockman <daniel@brockman.se> wrote:
>
>> > http://www.lojban.org/tiki/tiki-index.php?page=le'ai
>
> It doesn't have to actually make the replacements at the parser level. Just
> having a grammar for it so that it can parse would be totally fine. It can
> be treated as an informal indicator that has to be interpreted at a higher
> level.

An internal grammar for the construction, and then making the
construction a free indicator, is doable. But in that case it won't be
of help for fixing something that is grammatically broken, like
SI/SA/SU are meant to do. For example {mi kjama le zarji lo'ai kjama
sa'ai klama le'ai} will parse as "{mi} followed by uninterpretable
gibberish", so it will never get to a higher level for interpretation.

mu'o mi'e xorxes


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 47

On Tue, Nov 4, 2008 at 12:50 PM, Jorge Llambías <jjllambias@gmail.com>wrote:

> On Tue, Nov 4, 2008 at 6:35 AM, Daniel Brockman <daniel@brockman.se>
> wrote:
> >
> >> > http://www.lojban.org/tiki/tiki-index.php?page=le'ai
> >
> > It doesn't have to actually make the replacements at the parser level.
> Just
> > having a grammar for it so that it can parse would be totally fine. It
> can
> > be treated as an informal indicator that has to be interpreted at a
> higher
> > level.
>
> An internal grammar for the construction, and then making the
> construction a free indicator, is doable.


Yeah, that would be useful.


> But in that case it won't be
> of help for fixing something that is grammatically broken, like
> SI/SA/SU are meant to do. For example {mi kjama le zarji lo'ai kjama
> sa'ai klama le'ai} will parse as "{mi} followed by uninterpretable
> gibberish", so it will never get to a higher level for interpretation.


It will when you are talking to a human.

--
Daniel Brockman
daniel@brockman.se

posts: 143

On Tue, Nov 4, 2008 at 07:36, Daniel Brockman <daniel@brockman.se> wrote:
> On Tue, Nov 4, 2008 at 12:50 PM, Jorge Llambías <jjllambias@gmail.com>
> wrote:
>> But in that case it won't be
>> of help for fixing something that is grammatically broken, like
>> SI/SA/SU are meant to do. For example {mi kjama le zarji lo'ai kjama
>> sa'ai klama le'ai} will parse as "{mi} followed by uninterpretable
>> gibberish", so it will never get to a higher level for interpretation.
>
> It will when you are talking to a human.

Right, but it would still be unparsable. The problem is that the text
before it is ungrammatical, and so has to be ignored by the parser to
get the whole thing to parse, which requires that the parser
understand which words the lo'ai is nulling out. It can't be treated
half-way and have things still parse.

Chris Capel
--
"What is it like to be a bat? What is it like to bat a bee? What is it
like to be a bee being batted? What is it like to be a batted bee?"
-- The Mind's I (Hofstadter, Dennet)


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 47
On Wed, Nov 5, 2008 at 12:23 AM, Chris Capel <pdf23ds@gmail.com> wrote:


> On Tue, Nov 4, 2008 at 07:36, Daniel Brockman <daniel@brockman.se> wrote:
> > On Tue, Nov 4, 2008 at 12:50 PM, Jorge Llambías <jjllambias@gmail.com>
> > wrote:
> >> But in that case it won't be
> >> of help for fixing something that is grammatically broken, like
> >> SI/SA/SU are meant to do. For example {mi kjama le zarji lo'ai kjama
> >> sa'ai klama le'ai} will parse as "{mi} followed by uninterpretable
> >> gibberish", so it will never get to a higher level for interpretation.
> >
> > It will when you are talking to a human.
>
> Right, but it would still be unparsable. The problem is that the text
> before it is ungrammatical, and so has to be ignored by the parser to
> get the whole thing to parse, which requires that the parser
> understand which words the lo'ai is nulling out. It can't be treated
> half-way and have things still parse.


The obvious way to implement {lo'ai .. sa'ai .. le'ai} in a parser is to
just treat it as a self-contained construct that requires morphologically
correct Lojban inside it, just like {lo'u .. le'u'}, and syntactically
correct Lojban before it (just like everything else). Anything more
advanced than that is, well, more advanced.

Of course it would require extraordinary methods to get things like {kwama
lo'ai kwama sa'ai klama le'ai} --- or why not {fsen.45ynl5tnerg98ehg4n su
coi} --- to parse. It's not practical and not cost-efficient. The {kjama}
example falls in this category because {kj} is morphologically invalid.

What _would_ be useful and cost-efficient would be to get things like {.i
.ai mi cakla sa'ai ckakla le'ai} to parse. The parser shouldn't try to
actually replace anything at the parser level. It should just parse the
{le'ai} construct and report its syntax tree to the client.

A first approximation would be to just put {lo'ai} and {sa'ai} in LOhU and
{le'ai} in LEhU. The next step would be to give this construct its own,
specifically appropriate, syntax.

--
Daniel Brockman
daniel@brockman.se

posts: 143

On Wed, Nov 5, 2008 at 03:58, Daniel Brockman <daniel@brockman.se> wrote:

> On Wed, Nov 5, 2008 at 12:23 AM, Chris Capel <pdf23ds@gmail.com> wrote:

>> Right, but it would still be unparsable. The problem is that the text
>> before it is ungrammatical, and so has to be ignored by the parser to
>> get the whole thing to parse, which requires that the parser
>> understand which words the lo'ai is nulling out. It can't be treated
>> half-way and have things still parse.
>
> The obvious way to implement {lo'ai .. sa'ai .. le'ai} in a parser is to
> just treat it as a self-contained construct that requires morphologically
> correct Lojban inside it, just like {lo'u .. le'u'}, and syntactically
> correct Lojban before it (just like everything else).

How far before it? Up to the beginning of the sentence? The statement?

> Of course it would require extraordinary methods to get things like {kwama
> lo'ai kwama sa'ai klama le'ai} --- or why not {fsen.45ynl5tnerg98ehg4n su
> coi} --- to parse. It's not practical and not cost-efficient. The {kjama}
> example falls in this category because {kj} is morphologically invalid.

Hmm. I think you overestimate the difference in effort between the two
implementations. They both require the same tricks, just at a slightly
different level in the grammar.

> What _would_ be useful and cost-efficient would be to get things like {.i
> .ai mi cakla sa'ai ckakla le'ai} to parse.

Actually, this is the only useful one I can think of. If the text
isn't grammatical before the le'ai clause, then the user is probably
going to want to manually correct it anyway before feeding it to the
parser. This example happens to be both grammatical, and having nearly
the same parse tree as the corrected version (since the before and
after words are both brivla).

> The parser shouldn't try to
> actually replace anything at the parser level. It should just parse the
> {le'ai} construct and report its syntax tree to the client.

Right, I'm with you on that.

Chris Capel
--
"What is it like to be a bat? What is it like to bat a bee? What is it
like to be a bee being batted? What is it like to be a batted bee?"
-- The Mind's I (Hofstadter, Dennet)


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 143

On Mon, Nov 3, 2008 at 13:53, Arnt Richard Johansen <arj@nvg.org> wrote:
> Depends on how forgiving you want to make the parser. The official parser just assumes that all unknown cmavo are UI.

That would be a good first step, with specific exception for known
ones (that aren't in fact UI).

> One major problem with trying to parse experimental cmavo is that they are very ad-hoc. Often, someone will suggest that some new kind of cmavo is necessary/would be nice, and then grab an arbitrary cmavo form from experimental space (xVV*/CVV+'+V*) without checking if someone else have suggested the same cmavo with a completely different use.

The solution for this is to allow multiple definitions for the same
word and then let the user of Lojgloss switch between them at parse
time.

> Another point (or, if you like, a stern warning): if you decide you want to handle experimental cmavo and constructs, please flag them as such, prominently. A glosser/parser is a tool that people not only use to make sense of other people's texts, but also to check if their own prose is correct. If they try to use experimental cmavo, they need to know that their Lojban is not quite kosher, and might not be understood by those who aren't down with the latest lingo.

I can certainly color them differently, and maybe mark them in the
text output as well. But in general, I don't think users are going to
have much trouble recognizing the form of experimental cmavo, if
they're familiar with the concept at all.

Chris Capel
--
"What is it like to be a bat? What is it like to bat a bee? What is it
like to be a bee being batted? What is it like to be a batted bee?"
-- The Mind's I (Hofstadter, Dennet)


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 143

On Mon, Nov 3, 2008 at 06:29, Daniel Brockman <dbrockman@gmail.com> wrote:
> Here is an example of case 2 currently in use on IRC:
>
> http://www.lojban.org/tiki/tiki-index.php?page=le'ai
> I'm not sure about what its exact grammar should be or how it should
> interact with other magic words. Maybe someone who is knowledgable in this
> area could suggest something?

Another option is to just implement these le'ai using a preprocessor.
Even unmorphological (is that a word?) messes will parse, just have a
bunch of non-lojban-word productions.

Chris Capel
--
"What is it like to be a bat? What is it like to bat a bee? What is it
like to be a bee being batted? What is it like to be a batted bee?"
-- The Mind's I (Hofstadter, Dennet)


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Hi,

On Wed, Nov 5, 2008 at 2:34 PM, Chris Capel <pdf23ds@gmail.com> wrote:


> On Wed, Nov 5, 2008 at 03:58, Daniel Brockman <daniel@brockman.se> wrote:

> > On Wed, Nov 5, 2008 at 12:23 AM, Chris Capel <pdf23ds@gmail.com> wrote:

> >> Right, but it would still be unparsable. The problem is that the text
> >> before it is ungrammatical, and so has to be ignored by the parser to
> >> get the whole thing to parse, which requires that the parser
> >> understand which words the lo'ai is nulling out. It can't be treated
> >> half-way and have things still parse.
> >
> > The obvious way to implement {lo'ai .. sa'ai .. le'ai} in a parser is to
> > just treat it as a self-contained construct that requires morphologically
> > correct Lojban inside it, just like {lo'u .. le'u'}, and syntactically
> > correct Lojban before it (just like everything else).
>
> How far before it? Up to the beginning of the sentence? The statement?
>

The {le'ai} construct doesn't care about ANYTHING else. However your parser
works, that's how it works before {le'ai}.


> > Of course it would require extraordinary methods to get things like
> {kwama
> > lo'ai kwama sa'ai klama le'ai} --- or why not {fsen.45ynl5tnerg98ehg4n su
> > coi} --- to parse. It's not practical and not cost-efficient. The
> {kjama}
> > example falls in this category because {kj} is morphologically invalid.
>
> Hmm. I think you overestimate the difference in effort between the two
> implementations. They both require the same tricks, just at a slightly
> different level in the grammar.
>

What are you talking about? One implementation is self-contained; the other
requires lots of weird backtracking and re-parsing and weird, weird stuff.


> > What _would_ be useful and cost-efficient would be to get things like {.i
> > .ai mi cakla sa'ai ckakla le'ai} to parse.
>
> Actually, this is the only useful one I can think of. If the text
> isn't grammatical before the le'ai clause, then the user is probably
> going to want to manually correct it anyway before feeding it to the
> parser. This example happens to be both grammatical, and having nearly
> the same parse tree as the corrected version (since the before and
> after words are both brivla).
>

It doesn't matter if it has the same parse tree. It only matters that it
PARSES IN ANY WAY. If it does, then the parser will be able to continue.
If it doesn't, then the parser will die.

--
Daniel Brockman
daniel@gointeractive.se

On Wed, Nov 5, 2008 at 2:43 PM, Chris Capel <pdf23ds@gmail.com> wrote:


> On Mon, Nov 3, 2008 at 06:29, Daniel Brockman <dbrockman@gmail.com> wrote:
> > Here is an example of case 2 currently in use on IRC:
> >
> > http://www.lojban.org/tiki/tiki-index.php?page=le'ai<http://www.lojban.org/tiki/tiki-index.php?page=le%27ai>
> > I'm not sure about what its exact grammar should be or how it should
> > interact with other magic words. Maybe someone who is knowledgable in
> this
> > area could suggest something?
>
> Another option is to just implement these le'ai using a preprocessor.
> Even unmorphological (is that a word?) messes will parse, just have a
> bunch of non-lojban-word productions.
>

Sure. If you want to. Go for it.

--
Daniel Brockman
daniel@gointeractive.se

posts: 143

On Wed, Nov 5, 2008 at 18:07, Daniel Brockman <daniel@gointeractive.se> wrote:
>> > The obvious way to implement {lo'ai .. sa'ai .. le'ai} in a parser is to
>> > just treat it as a self-contained construct that requires
>> > morphologically
>> > correct Lojban inside it, just like {lo'u .. le'u'}, and syntactically
>> > correct Lojban before it (just like everything else).
>>
>> How far before it? Up to the beginning of the sentence? The statement?
>
> The {le'ai} construct doesn't care about ANYTHING else. However your parser
> works, that's how it works before {le'ai}.

I don't understand. You're saying that if there's a lo'ai then
everything before it in the text should get only a syntactical parse,
not a grammatical parse? If not, there has to be some cutoff.

>> > Of course it would require extraordinary methods to get things like
>> > {kwama
>> > lo'ai kwama sa'ai klama le'ai} --- or why not {fsen.45ynl5tnerg98ehg4n
>> > su
>> > coi} --- to parse. It's not practical and not cost-efficient. The
>> > {kjama}
>> > example falls in this category because {kj} is morphologically invalid.
>>
>> Hmm. I think you overestimate the difference in effort between the two
>> implementations. They both require the same tricks, just at a slightly
>> different level in the grammar.
>
> What are you talking about? One implementation is self-contained; the other
> requires lots of weird backtracking and re-parsing and weird, weird stuff.

No, both require backtracking (but not reparsing, since this is a
packrat parser) and lots of lookahead that's usually wasted (but
hopefully fast). You have to check every sentence (or whatever) for
lo'ai before the main grammar parse, whether you do it before or after
the morph parse. If you want to see how that's implemented, take a
look at SA. Now, SA has a lot more complicated grammar, so lo'ai would
be easier to implement even using the same technique. (And contrary to
Jorge, I'm not too sure it would introduce any weird interactions with
the SA machinery.)

> It doesn't matter if it has the same parse tree. It only matters that it
> PARSES IN ANY WAY. If it does, then the parser will be able to continue.
> If it doesn't, then the parser will die.

I'm more concerned about interactive parsing where parse errors aren't
a huge deal, especially because you get detailed and helpful error
information, much, much better than jbofi'e, to help you find the
problem.

I think perhaps a better (simple) way to handle lo'ai is to treat it
similar to a plain-old lo'u - le'u quote. Still have it behave like a
UI, but only morph parse the words until the le'ai. In fact, I imagine
a number of experimental cmavo that create new selmaho could be
handled cursorily as quotes of this kind. It's not ideal, but it
allows a non-expert user to modify the parser with configuration to
handle text using these cmavo better than before.

Chris Capel
--
"What is it like to be a bat? What is it like to bat a bee? What is it
like to be a bee being batted? What is it like to be a batted bee?"
-- The Mind's I (Hofstadter, Dennet)


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 47
On Thu, Nov 6, 2008 at 1:30 AM, Chris Capel <pdf23ds@gmail.com> wrote:


> On Wed, Nov 5, 2008 at 18:07, Daniel Brockman <daniel@gointeractive.se>
> wrote:
> >> > The obvious way to implement {lo'ai .. sa'ai .. le'ai} in a parser is
> to
> >> > just treat it as a self-contained construct that requires
> >> > morphologically
> >> > correct Lojban inside it, just like {lo'u .. le'u'}, and syntactically
> >> > correct Lojban before it (just like everything else).
> >>
> >> How far before it? Up to the beginning of the sentence? The statement?
> >
> > The {le'ai} construct doesn't care about ANYTHING else. However your
> parser
> > works, that's how it works before {le'ai}.
>
> I don't understand. You're saying that if there's a lo'ai then
> everything before it in the text should get only a syntactical parse,
> not a grammatical parse? If not, there has to be some cutoff.


Syntax and grammar is one and the same thing to me, so I don't understand
the distinction.


> >> > Of course it would require extraordinary methods to get things like
> >> > {kwama
> >> > lo'ai kwama sa'ai klama le'ai} --- or why not {fsen.45ynl5tnerg98ehg4n
> >> > su
> >> > coi} --- to parse. It's not practical and not cost-efficient. The
> >> > {kjama}
> >> > example falls in this category because {kj} is morphologically
> invalid.
> >>
> >> Hmm. I think you overestimate the difference in effort between the two
> >> implementations. They both require the same tricks, just at a slightly
> >> different level in the grammar.
> >
> > What are you talking about? One implementation is self-contained; the
> other
> > requires lots of weird backtracking and re-parsing and weird, weird
> stuff.
>
> No, both require backtracking (but not reparsing, since this is a
> packrat parser) and lots of lookahead that's usually wasted (but
> hopefully fast). You have to check every sentence (or whatever) for
> lo'ai before the main grammar parse, whether you do it before or after
> the morph parse. If you want to see how that's implemented, take a
> look at SA. Now, SA has a lot more complicated grammar, so lo'ai would
> be easier to implement even using the same technique. (And contrary to
> Jorge, I'm not too sure it would introduce any weird interactions with
> the SA machinery.)


I'm still not getting through. We are talking about two different things.


> > It doesn't matter if it has the same parse tree. It only matters that it
> > PARSES IN ANY WAY. If it does, then the parser will be able to continue.
> > If it doesn't, then the parser will die.
>
> I'm more concerned about interactive parsing where parse errors aren't
> a huge deal, especially because you get detailed and helpful error
> information, much, much better than jbofi'e, to help you find the
> problem.
>
> I think perhaps a better (simple) way to handle lo'ai is to treat it
> similar to a plain-old lo'u - le'u quote. Still have it behave like a
> UI, but only morph parse the words until the le'ai. In fact, I imagine
> a number of experimental cmavo that create new selmaho could be
> handled cursorily as quotes of this kind. It's not ideal, but it
> allows a non-expert user to modify the parser with configuration to
> handle text using these cmavo better than before.
>

Yes, this is what I've been trying to say. Thank you. Just handle it like
a parenthetical expression.

The more complicated implementation that actually replaces at parse time is
another discussion (which I've been trying to avoid in order to keep this
simple, but by all means continue if it is interesting to you).

I'm not even sure I'd want my parser to erase and replace stuff. I consider
an erasure or a replacement to be an additional utterance that is often best
understood as such. It would even be interesting to make a parser that
could parse through errors and resynchronize later (e.g., when {.i} is
encountered), and things like that.

Anyway, I'm in over my head. I'm not a parser expert.

--
Daniel Brockman
daniel@brockman.se

On Wed, Nov 5, 2008 at 9:30 PM, Chris Capel <pdf23ds@gmail.com> wrote:

> Now, SA has a lot more complicated grammar, so lo'ai would
> be easier to implement even using the same technique. (And contrary to
> Jorge, I'm not too sure it would introduce any weird interactions with
> the SA machinery.)

I don't think interactions with SA are the problem. I think LOhAI (if
the construction is to be more than just a free modifier) is much more
complicated than SA, with or without interactions. I agree their
interaction is mostly irrelevant.

What SA does is: before each word, look ahead to see whether the word
and everything that follows up to SA will end up being deleted by SA.
If not, proceed. If yes, ignore everythig up to the replacement word
and proceed from there.

LOhAI can't do just that. What it has to do is: before each word, look
ahead to see whether the word and some of what follows will be
deleted. If not, proceed. If yes, ignore everything that matches the
part between LOhAI and SAhAI, continue with what comes after SAhAI, go
back to the end of the previous match and proceeed from there. The
only way I can see that working with a PEG is having a different rule
for each potential replacement string. But the number of potential
replacements is infinite, so I don't see how that could work. (Or I
may be missing something.)

What Daniel proposes is much simpler, but I'm still not sure I see the
point of it. It won't do anything useful for humans, nor for a
computer parser.

mu'o mi'e xorxes


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 47

>
> What Daniel proposes is much simpler, but I'm still not sure I see the
> point of it. It won't do anything useful for humans, nor for a
> computer parser.
>

Experienced Lojbanists are relatively unlikely to make blatant grammatical
errors.They remain (depending on character, I suppose) prone to making all
sorts of mistakes,
grammatical or otherwise. As a result, the {le'ai} construct is often
employed to make
corrections even when there are no grammatical *errors* anywhere in sight.

For example, the #jbosnu channel accepts only grammatically correct Lojban.
Hence,
you may not use {le'ai} constructions there. But that would be useful to
correct mistakes,
despite the fact that grammatical mistakes are not even allowed in the first
place.

--
Daniel Brockman
daniel@brockman.se

posts: 47

>
> despite the fact that grammatical mistakes are not even allowed in the
> first place.
>

I meant grammatical *errors* are not allowed. (You can make a grammatical
mistake
and still end up speaking valid Lojban.)

--
Daniel Brockman
daniel@brockman.se

On Thu, Nov 6, 2008 at 11:32 AM, Daniel Brockman <daniel@brockman.se> wrote:
>
> Experienced Lojbanists are relatively unlikely to make blatant grammatical
> errors.
> They remain (depending on character, I suppose) prone to making all sorts of
> mistakes,
> grammatical or otherwise.

I guess the most likely error for a fluent speaker should be a typo. A
typo is likely to result in ungrammatical text or grammatical but with
an uninteded structure, though in some cases it can result in a text
with the same structure if the typo doesn't change the selma'o of the
word.

> As a result, the {le'ai} construct is often employed to make
> corrections even when there are no grammatical *errors* anywhere in sight.

A text that parses with an unintended parse tree is a grammatical
error by the speaker, even if the text on its own is grammatically
valid. It is as bad as one that does not parse, or perhaps even worse
because it could be misleading without announcing that something is
wrong.

> For example, the #jbosnu channel accepts only grammatically correct Lojban.

Interesting, I didn't know that. What parser does it use?

> Hence,
> you may not use {le'ai} constructions there. But that would be useful to
> correct mistakes,
> despite the fact that grammatical mistakes are not even allowed in the first
> place.

OK, what jbosnu seems to require is a construction like:

correction <- (!SAhAI any-word)* SAhAI any-word*

which constitutes a whole utterance all by itself. Any number of words
(grammatical or not) followed by any number of words (grammatical o
not). It would appear at the text level:

text <- correction / ...

Why would it be useful to be able to embed this construction in the
midst of some other utterance?

If the correction is not going to be used by the parser to fix
anything, embedding it in a formally broken text won't work, because
it will never be detected. Would it be of any use to embed it in a
formally unbroken but effectively uninterpretable or incorrectly
interpretable text?

In other words, this construction as you are using it is used to make
a comment about some other text, it's about how some other text should
be fixed, not about the text it appears in.

If that's the case, only SAhAI is required, because there is no need
to separate the construction from anything else.

mu'o mi'e xorxes


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 47

On Thu, Nov 6, 2008 at 5:27 PM, Jorge Llambías <jjllambias@gmail.com> wrote:
>
> On Thu, Nov 6, 2008 at 11:32 AM, Daniel Brockman <daniel@brockman.se> wrote:
> >
> > Experienced Lojbanists are relatively unlikely to make blatant grammatical
> > errors.
> > They remain (depending on character, I suppose) prone to making all sorts of
> > mistakes,
> > grammatical or otherwise.
>
> I guess the most likely error for a fluent speaker should be a typo. A
> typo is likely to result in ungrammatical text or grammatical but with
> an uninteded structure, though in some cases it can result in a text
> with the same structure if the typo doesn't change the selma'o of the
> word.

Granted.

> > As a result, the {le'ai} construct is often employed to make
> > corrections even when there are no grammatical *errors* anywhere in sight.
>
> A text that parses with an unintended parse tree is a grammatical
> error by the speaker, even if the text on its own is grammatically
> valid. It is as bad as one that does not parse, or perhaps even worse
> because it could be misleading without announcing that something is
> wrong.

That's not what I meant by "error". It doesn't matter.

> > For example, the #jbosnu channel accepts only grammatically correct Lojban.
>
> Interesting, I didn't know that. What parser does it use?

I'm not sure. If I were to guess, I'd say jbofi'e.

> Why would it be useful to be able to embed this construction in the
> midst of some other utterance?

In speech, it would be useful. On IRC, it wouldn't.

> If the correction is not going to be used by the parser to fix
> anything, embedding it in a formally broken text won't work, because
> it will never be detected. Would it be of any use to embed it in a
> formally unbroken but effectively uninterpretable or incorrectly
> interpretable text?

Probably not with your more restricted grammar and feature set.

> In other words, this construction as you are using it is used to make
> a comment about some other text, it's about how some other text should
> be fixed, not about the text it appears in.

How do you separate one "text" from another?

> If that's the case, only SAhAI is required, because there is no need
> to separate the construction from anything else.

You will lose a bunch of things you can do with {le'ai} then:

* Distinguish between insertions and vague mistakes.
* Distinguish between deletions and vague corrections.
* Distinguish between null replacements and vague replacements.
* Ask about the replacement with {le'ai pei} ("did you mean ...?").
* Deny the replacement with {le'ai nai} ("I really meant ..." or just "sic").
* Express doubt about a used expression with {le'ai cu'i}.
* Switch around the order of mistake and correction.

--
Daniel Brockman
daniel@brockman.se

On Thu, Nov 6, 2008 at 2:52 PM, Daniel Brockman <daniel@brockman.se> wrote:
> On Thu, Nov 6, 2008 at 5:27 PM, Jorge Llambías <jjllambias@gmail.com> wrote:
>
>> In other words, this construction as you are using it is used to make
>> a comment about some other text, it's about how some other text should
>> be fixed, not about the text it appears in.
>
> How do you separate one "text" from another?

By a "text" I mean the amount of speech given to the parser for
parsing in one chunk. In a conversation a new text is started every
time the speaker changes. Often two texts won't parse if they are
simply concatenated into one. In #jbosnu each line constitutes a new
text because it is parsed independently of the rest. (I call that
unit "text" because that's what the official formal grammar calls it.)

mu'o mi'e xorxes


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 143

On Thu, Nov 6, 2008 at 07:47, Jorge Llambías <jjllambias@gmail.com> wrote:

> On Wed, Nov 5, 2008 at 9:30 PM, Chris Capel <pdf23ds@gmail.com> wrote:

>> Now, SA has a lot more complicated grammar, so lo'ai would
>> be easier to implement even using the same technique. (And contrary to
>> Jorge, I'm not too sure it would introduce any weird interactions with
>> the SA machinery.)
>
> I don't think interactions with SA are the problem. I think LOhAI (if
> the construction is to be more than just a free modifier) is much more
> complicated than SA, with or without interactions. I agree their
> interaction is mostly irrelevant.

Sorry, I wasn't talking about full lo'ai there, just the restricted
subset that would treat it like a quote.

Chris Capel
--
"What is it like to be a bat? What is it like to bat a bee? What is it
like to be a bee being batted? What is it like to be a batted bee?"
-- The Mind's I (Hofstadter, Dennet)


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 47

> Sorry, I wasn't talking about full lo'ai there, just the restricted
> subset that would treat it like a quote.

Now that we're deep into this discussion, I might as well ask,
how would {lo'ai .. sa'ai .. le'ai} interact with other magic words?
(I'm assuming the grammar would be consistent with already
existing similar constructions.)

Like, I assume {lo'ai zo sa'ai sa'ai le'ai} would be okay.
What about {lo'ai lo'u sa'ai le'u sa'ai le'ai}?

Is there a simple rule to follow?

(I'm still, as I have been from the start, talking about parsing the
construction as just a free modifier parenthetical quote-like thing.)

--
Daniel Brockman
daniel@brockman.se


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 47

> By a "text" I mean the amount of speech given to the parser for
> parsing in one chunk. In a conversation a new text is started every
> time the speaker changes. Often two texts won't parse if they are
> simply concatenated into one. In #jbosnu each line constitutes a new
> text because it is parsed independently of the rest. (I call that
> unit "text" because that's what the official formal grammar calls it.)

Okay, that's what I thought. Yeah, that's fine for IRC.

But what do you do in a live conversation if you want to correct a mistake?
Do you have to wait for the other person to say something first?

--
Daniel Brockman
daniel@brockman.se


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

On Fri, Nov 7, 2008 at 6:45 AM, Daniel Brockman <daniel@brockman.se> wrote:
>
> Now that we're deep into this discussion, I might as well ask,
> how would {lo'ai .. sa'ai .. le'ai} interact with other magic words?
> (I'm assuming the grammar would be consistent with already
> existing similar constructions.)

If it's modelled on lo'e ... le'u, it should simply treat them as all
empty words, i.e. ignore their meaning.

> Like, I assume {lo'ai zo sa'ai sa'ai le'ai} would be okay.

That would replace a "zo" with a "sa'ai" ("zo" and the second "sa'ai"
are just empty words, not parsed with their function).

> What about {lo'ai lo'u sa'ai le'u sa'ai le'ai}?

Replaces "lo'u" with "le'u sa'ai".

> Is there a simple rule to follow?

First come, first served. The first magic word is the one with the magic.

mu'o mi'e xorxes


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

On Fri, Nov 7, 2008 at 6:47 AM, Daniel Brockman <daniel@brockman.se> wrote:
>
> But what do you do in a live conversation if you want to correct a mistake?
> Do you have to wait for the other person to say something first?

In a spoken conversation typos are less likely. :-)

Also, you are less likely to want to change some particular choice of
word, simply because you don't have the text in front of you to check
what you just said. Unless one is listening to a recording, a fluent
speaker is normally not even aware of making any slips. Someone
learning a new language is much more conscious of each individual
word, but a fluent speaker is more likely to concentrate on the ideas
expressed, not the exact words used.

When you really are concentrating on some choice of word, and talking
about the word, then it's probably more clear to use the normal
grammar to talk about the word, rather than use some shortcut for
replacement.

That's why I've never been too keen on SA, it seems so wrong to have
to concentrate on the exact words you are using instead of on what you
are expressing with them. SU is not so bad, only because it is so
drastic: "strike all that, let me start again". That's useful when you
are carefully trying to phrase something right and you realize that
you are making a mess of it. That happens to fluent speakers too when
dealing with complex ideas. SI is not so likely to be used by a fluent
speaker, it's more of a crutch for the beginner, it's tolerable
because the very last word is still fresh in the mind and still
accessible as a word.

Anyway, I'm not really opposed to people using things like SA or the
LOhAI/SAhAI/LEhAI construction if they find it useful. Usage rules.
It's just that to me it's too artificial, it approaches language from
the wrong end (from the valsi instead of the se valsi).

I have a similar gripe about "di'u" and "la'e di'u". Why is the most
common and useful "la'e di'u" a compound, and the less useful "di'u" a
single word? Normally we are much more likely to want to talk about
la'e di'u than about di'u.

mu'o mi'e xorxes


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.