PEG grammar issues

Posted by pdf23ds on Mon 16 of Jun, 2008 06:01 GMT posts: 143

I've found a few minor issues with the PEG grammar over the course of
working on it, and I thought I'd start to tell people about them in
case I forget them.

First, the top-level production should fail if it can't parse the
whole string. Currently 'text' ends with an EOF?, which makes it never
fail. However, it can't be EOF, because LU references the text
production, and doesn't require an EOF before "li'u". 'text' should be
changed to something like

text-eof <- text EOF

text <- intro-null NAI-clause* text-part-2 (!text-1 joik-jek)? text-1?
faho-clause

I made this change in my parser in the first release, and it seems to work fine.

Second, selbri-3 should parse its child selbri-4 into left-associative
groups. Currently it just parses them all into one group, which is
misleading and possibly wrong, depending on your interpretation. I
tried to figure out a way to fix this, but couldn't find a way to do
so and avoid left recursion in the definition. So I gave up and added
a post-parsing step in my own parser to group them properly.

Third, tenses that probably ought to be parsed as part of the bridi
are currently being parsed as head terms, because of the term-1
production:

term-1 <- sumti / ( !gek (tag / FA-clause free*) (sumti /
KU-clause? free*) ) / termset / NA-clause KU-clause free*

{mi} {pu} <klama le zarci>

(In braces are term-1 matches, and in angle brackets is the
bridi-tail.) 'term-1' matches "mi", and then it matches "pu". 'sumti'
fails on "pu", but '!gek' and 'tag' succeed, and then since
'KU-clause' and 'free' are both optional, the second option of
'term-1' succeeds. I'm not exactly sure how this one needs to be
fixed, but what about this:

term-1 <- sumti / term-2 / termset / NA-clause KU-clause free*

term-2 <- !gek (tag (sumti / KU-clause free*) / FA-clause free*
(sumti / KU-clause? free*) )

Here, 'term-2' is the second option of the original 'term-1', except
that the third item in the sequence has been factored into the second,
and the ? removed from 'KU-clause' after 'tag'. It seems to work in my
parser!

Fourth, 'term-sa' only appears to match one term sa under some
conditions. For instance, it doesn't match this:

mi ba klama lo sa lo sa do

which one might imagine could be said by someone with a stutter.
Here's one possible fix:

term-sa <- term-start (!term-1 (sa-word / SA-clause !term-1) )*
SA-clause &term-1

Well, that's about it for now. Number 3 is currently a problem for my
parser, as it makes it hard for me to correctly gloss tense cmavo in
context. So if anyone sees a problem with my correction, let me know.

Chris Capel
--
"What is it like to be a bat? What is it like to bat a bee? What is it
like to be a bee being batted? What is it like to be a batted bee?"
-- The Mind's I (Hofstadter, Dennet)

To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Link

Posted by Anonymous on Mon 16 of Jun, 2008 16:36 GMT

On Sun, Jun 15, 2008 at 11:39 PM, Chris Capel <pdf23ds@gmail.com> wrote:

>
> First, the top-level production should fail if it can't parse the
> whole string. Currently 'text' ends with an EOF?, which makes it never
> fail.

I think that was on purpose: parse as much as you can parse, and
discard anything unparsable that follows.

> Second, selbri-3 should parse its child selbri-4 into left-associative
> groups. Currently it just parses them all into one group, which is
> misleading and possibly wrong, depending on your interpretation. I
> tried to figure out a way to fix this, but couldn't find a way to do
> so and avoid left recursion in the definition. So I gave up and added
> a post-parsing step in my own parser to group them properly.

The same applies to statement-1, bridi-tail-1 and sumti-2, right?

> Third, tenses that probably ought to be parsed as part of the bridi
> are currently being parsed as head terms, because of the term-1
> production:
>
> term-1 <- sumti / ( !gek (tag / FA-clause free*) (sumti /
> KU-clause? free*) ) / termset / NA-clause KU-clause free*
>
> {mi} {pu} <klama le zarci>
>
> (In braces are term-1 matches, and in angle brackets is the
> bridi-tail.) 'term-1' matches "mi", and then it matches "pu". 'sumti'
> fails on "pu", but '!gek' and 'tag' succeed, and then since
> 'KU-clause' and 'free' are both optional, the second option of
> 'term-1' succeeds. I'm not exactly sure how this one needs to be
> fixed, but what about this:
>
> term-1 <- sumti / term-2 / termset / NA-clause KU-clause free*
>
> term-2 <- !gek (tag (sumti / KU-clause free*) / FA-clause free*
> (sumti / KU-clause? free*) )
>
> Here, 'term-2' is the second option of the original 'term-1', except
> that the third item in the sequence has been factored into the second,
> and the ? removed from 'KU-clause' after 'tag'. It seems to work in my
> parser!

That makes it impossible to omit {ku} in other positions as well.
For example, {mi ka'e pu klama} would fail.

How about "!gek !selbri" instead of just "!gek" in the original rule?

> Fourth, 'term-sa' only appears to match one term sa under some
> conditions. For instance, it doesn't match this:
>
> mi ba klama lo sa lo sa do
>
> which one might imagine could be said by someone with a stutter.
> Here's one possible fix:
>
> term-sa <- term-start (!term-1 (sa-word / SA-clause !term-1) )*
> SA-clause &term-1

SA ought to be ditched or completely reformulated, IMHO.

mu'o mi'e xorxes

To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Link

Posted by pdf23ds on Mon 16 of Jun, 2008 23:15 GMT posts: 143

On Mon, Jun 16, 2008 at 11:34 AM, Jorge LlambÃas <jjllambias@gmail.com> wrote:

> On Sun, Jun 15, 2008 at 11:39 PM, Chris Capel <pdf23ds@gmail.com> wrote:

>>
>> First, the top-level production should fail if it can't parse the
>> whole string. Currently 'text' ends with an EOF?, which makes it never
>> fail.
>
> I think that was on purpose: parse as much as you can parse, and
> discard anything unparsable that follows.

Sure, but I think that both behaviors are needed in different
contexts. But it doesn't really matter much--individual parsers will
do what they will.

>> Second, selbri-3 should parse its child selbri-4 into left-associative
>> groups.
>
> The same applies to statement-1, bridi-tail-1 and sumti-2, right?

Don't think so, maybe, and maybe, except that in the last two cases,
the parse tree actually shows them as right-associative, which would
make it harder to fix. But I'm not terribly clear on the grammar (in
the wider sense) here. No to 'statement-1' because I don't think
statement-2's really have associativity, so the correct parse tree
would be flat, and the current parse tree is flat, so it's not broken.
Same thing for bridi-tail-1 and sumti-2--do those really have
associativity? Does it matter which order you interpret the giheks or
jeks?

>> Third, tenses that probably ought to be parsed as part of the bridi
>> are currently being parsed as head terms, because of the term-1
>> production:

>> I'm not exactly sure how this one needs to be
>> fixed, but what about this:
>>
>> term-1 <- sumti / term-2 / termset / NA-clause KU-clause free*
>>
>> term-2 <- !gek (tag (sumti / KU-clause free*) / FA-clause free*
>> (sumti / KU-clause? free*) )
>
> That makes it impossible to omit {ku} in other positions as well.
> For example, {mi ka'e pu klama} would fail.
>
> How about "!gek !selbri" instead of just "!gek" in the original rule?

Sounds good to me.

>> Fourth, 'term-sa' only appears to match one term sa under some
>> conditions. For instance, it doesn't match this:
>>
>> mi ba klama lo sa lo sa do
>>
>> which one might imagine could be said by someone with a stutter.
>> Here's one possible fix:
>>
>> term-sa <- term-start (!term-1 (sa-word / SA-clause !term-1) )*
>> SA-clause &term-1
>
> SA ought to be ditched or completely reformulated, IMHO.

Well, my suggestion's there for posterity's sake.

Chris Capel
--
"What is it like to be a bat? What is it like to bat a bee? What is it
like to be a bee being batted? What is it like to be a batted bee?"
-- The Mind's I (Hofstadter, Dennet)

To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Link

Posted by Anonymous on Tue 17 of Jun, 2008 12:43 GMT

On 6/16/08, Chris Capel <pdf23ds@gmail.com> wrote:

> On Mon, Jun 16, 2008 at 11:34 AM, Jorge LlambÃas <jjllambias@gmail.com> wrote:

> > On Sun, Jun 15, 2008 at 11:39 PM, Chris Capel <pdf23ds@gmail.com> wrote:

> >> Second, selbri-3 should parse its child selbri-4 into left-associative
> >> groups.
> > The same applies to statement-1, bridi-tail-1 and sumti-2, right?
>
> No to 'statement-1' because I don't think
> statement-2's really have associativity, so the correct parse tree
> would be flat, and the current parse tree is flat, so it's not broken.

ijeks are left associative:

(broda ije brode) ija brodi

not:

broda ije (brode ija brodi)

To get the latter you need {ijabo}

> Same thing for bridi-tail-1 and sumti-2--do those really have
> associativity? Does it matter which order you interpret the giheks or
> jeks?

In general, yes. In particular cases, for example if you have only
{je}s, or only {ja}s, it doesn't matter.

mu'o mi'e xorxes

To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Link

Posted by pdf23ds on Tue 17 of Jun, 2008 18:56 GMT posts: 143

On Tue, Jun 17, 2008 at 7:41 AM, Jorge LlambÃas <jjllambias@gmail.com> wrote:
> ijeks are left associative:
>
> (broda ije brode) ija brodi
>
> not:
>
> broda ije (brode ija brodi)
>
> To get the latter you need {ijabo}

Are ijabos right associative with each other?

broda ijabo (brode ijabo brodi)?

Oh, here's another one: the NU-clause option of tanru-unit-2. Does
that need associativity? Or what about terms-1 or selbri-4 (or tag)?

Oh, here's another weirdness thing (indented for clarity):

statement-2 <-
statement-3 (I-clause (jek / joik)? stag? BO-clause free* statement-2)? /
statement-3 (I-clause (jek / joik)? stag? BO-clause free*)?

Isn't that identical to

statement-2 <- statement-3 (I-clause (jek / joik)? stag? BO-clause
free* statement-2?)?

It'd be nice to simplify.

Chris Capel
--
"What is it like to be a bat? What is it like to bat a bee? What is it
like to be a bee being batted? What is it like to be a batted bee?"
-- The Mind's I (Hofstadter, Dennet)

To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Link

Posted by Anonymous on Tue 17 of Jun, 2008 19:32 GMT

On 6/17/08, Chris Capel <pdf23ds@gmail.com> wrote:

> Are ijabos right associative with each other?
>
> broda ijabo (brode ijabo brodi)?

Yes. (With two {ija} it makes no difference though, you need to
mix {ija} with {ije} for example to get different meanings.)

> Oh, here's another one: the NU-clause option of tanru-unit-2. Does
> that need associativity?

Probably, but since it's a construction that nobody ever uses (and
hopefully nobody ever will) it doesn't matter.

> Or what about terms-1 or selbri-4 (or tag)?

Yes, they are all left-asssoc.

> Oh, here's another weirdness thing (indented for clarity):
>
> statement-2 <-
> statement-3 (I-clause (jek / joik)? stag? BO-clause free* statement-2)? /
> statement-3 (I-clause (jek / joik)? stag? BO-clause free*)?
>
> Isn't that identical to
>
> statement-2 <- statement-3 (I-clause (jek / joik)? stag? BO-clause
> free* statement-2?)?
>
> It'd be nice to simplify.

Yes. I think there are some other slightly weird points. If I remember
correctly,
one rule has an unreachable part.

mu'o mi'e xorxes

To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Link

Lojban In General

PEG grammar issues

Search Lojban Resources

Lojban In General

Thread actions

Search Lojban Resources