Lojban In General

Lojban In General


gleki xisri'i

posts: 3588

coi rodo

It is said (I think) that the spirit of the holiday season brings out the best
in us; however, the best is not always entirely goodwill. In my case, I
sometimes find myself programming more at this time of year, and this year I
finally turned to a program that concerns us all: the context-free grammar for
Lojban. It turns out that the solution is quite simple, yet if you are locked
in to the idea of strict BNF grammars and the like, it probably wouldn't have
occurred to you. The solution lies in the format of Yacc input, the most
common implementation of BNFs and the format in which the current official
attempt at a Lojban grammar is written.

First, the problem: as I understand the rules, a Lojban parser must absorb as
many tokens as possible into the current (innermost) grammatical construct,
stopping only when it encounters either a terminator or a token that cannot
continue the construct. When a terminator is encountered that could seemingly
terminate more than one construct (as in "{lonu djica lonu le gerna cu mulno
kei}"), it terminates the innermost one, and parsing of the containing
construct resumes. In discussion of parsers, this behavior is described by
saying that shift/reduce conflicts are resolved via shifting. However,
shift/reduce conflicts in Yacc can only be resolved by examining the precedence
rules set forth in the input file, and it is not possible to force Yacc to
shift for all of them with just a single command. (Note that, although Yacc's
default resolution of shift/reduce conflicts is to shift, this rule is only
needed when the input rules are insufficient, in which case the grammar is
considered ambiguous and generates errors from Yacc.) The solution is to
indicate for those rules that have shift/reduce conflicts that shifting should
be performed, and in Yacc this is usually done by assigning precedences to the
relevant terminal symbols. However, the official grammar's abstraction around
terminals in order to handle free modifiers means that the rules that have
shift/reduce conflicts consist solely of non-terminals. This can be solved
through the use of the %prec keyword, which changes the precedence of the rule
it is attached to. When a shift/reduce conflict is encountered, the parser
compares the precedences of the rule and the current look-ahead token; if the
rule's precedence is higher, reduction is chosen, and shifting is chosen if the
token's precedence is higher. The solution now becomes: assign all ambiguous
rules a precedence value lower than that of the terminal symbols, and this is
what I have done.

I am quite certain that the rules that now have precedences assigned to them
are the minimum amount necessary to eliminate all ambiguity; I tested removing
most of them, one at a time, and those that remain should all be necessary to
avoid conflicts. I only found one problem in the official grammar that could
not be solved with precedence alone; the rule on line 733 (747 in my completed
version) that read "MEX_310 : FUhA_441 rp_expression_330" caused reduce/reduce
conflicts and had to be changed to "MEX_310 : FUhA_441 rp_operand_332 %prec
shiftPrec".

Regardless of how I developed it, my patch for the official grammar is
available at <http://jwodder.freeshell.org/downloads/jbogeha.diff> and was made
against the grammar at
<http://www.lojban.org/publications/formal-grammars/grammar.300.txt>. I admit
that there could be places in which constructs are parsed incorrectly, but this
is based on my somewhat average understanding of Lojban grammar, and a
half-right unambiguous fix to the parser should at least be a step in the right
direction.

To Powell and Cowan: I await your judgment.

mu'omi'e la'o gy. Minimiscience .gy.
me'e ji'a zoi gy. John T. Wodder II .gy.

--
mi klama .i mi viska .i mi fanva fi la lojban.


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 14214

On Thu, Dec 25, 2008 at 04:39:46AM +0000, Minimiscience wrote:
> Regardless of how I developed it, my patch for the official
> grammar is available at
> <http://jwodder.freeshell.org/downloads/jbogeha.diff> and was made
> against the grammar at
> <http://www.lojban.org/publications/formal-grammars/grammar.300.txt>.
> I admit that there could be places in which constructs are parsed
> incorrectly, but this is based on my somewhat average
> understanding of Lojban grammar, and a half-right unambiguous fix
> to the parser should at least be a step in the right direction.

You seem to be relying on yacc's %prec marker. I'd like some
evidence that this reduces to a CFG, please, because it sure doesn't
look that way to me. It looks like a yacc-specific trick.
Interesting, but definitely not a CFG solution for Lojban.

-Robin

--
They say: "The first AIs will be built by the military as weapons."
And I'm thinking: "Does it even occur to you to try for something
other than the default outcome?" — http://shorl.com/tydruhedufogre
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 14214

On Thu, Dec 25, 2008 at 01:40:52AM -0800, Robin Lee Powell wrote:
> On Thu, Dec 25, 2008 at 04:39:46AM +0000, Minimiscience wrote:
> > Regardless of how I developed it, my patch for the official
> > grammar is available at
> > <http://jwodder.freeshell.org/downloads/jbogeha.diff> and was
> > made against the grammar at
> > <http://www.lojban.org/publications/formal-grammars/grammar.300.txt>.
> > I admit that there could be places in which constructs are
> > parsed incorrectly, but this is based on my somewhat average
> > understanding of Lojban grammar, and a half-right unambiguous
> > fix to the parser should at least be a step in the right
> > direction.
>
> You seem to be relying on yacc's %prec marker. I'd like some
> evidence that this reduces to a CFG, please, because it sure
> doesn't look that way to me. It looks like a yacc-specific trick.
> Interesting, but definitely not a CFG solution for Lojban.

It would also be helpful if you showed how to use this to build a
working grammar, so I could play with it a bit. The source I'm
using is the one at http://home.ccil.org/~cowan/parser-3.0.00.tar.gz

The particular problem I'm having right now is that mkgramy
generates qualified %token lines, but even fixing that doesn't seem
to be helping.

-Robin

--
They say: "The first AIs will be built by the military as weapons."
And I'm thinking: "Does it even occur to you to try for something
other than the default outcome?" — http://shorl.com/tydruhedufogre
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 3588

de'i li 25 pi'e 12 pi'e 2008 la'o fy. Robin Lee Powell .fy. cusku zoi
skamyxatra.
> You seem to be relying on yacc's %prec marker.
.skamyxatra

Yes, that's the idea.

> I'd like some evidence that this reduces to a CFG, please, because it sure
> doesn't look that way to me.

I don't see how it wouldn't be a context-free grammar. A Yacc grammar without
precedence rules is context-free (albeit possibly ambiguous), correct? I am
simply indicating to Yacc how the ambiguous rules should be solved using only
the precedences of the rules and a single look-ahead token. If using LALR(1)
makes a grammar context-sensitive, you've really shot yourself in the foot.

At the very least, all of the relevant information I can find online indicates
that Yacc input is always context-free, including the official Yacc
specification at
<http://www.opengroup.org/onlinepubs/000095399/utilities/yacc.html>. Also
worth noting is the Wikipedia page for "LALR parser", which explicitly states
that LALR is used for context-free grammars.

> It looks like a yacc-specific trick.

I wouldn't call it a "trick"; it's a documented & well-known feature. Besides,
so what if it's Yacc-specific? Yacc (along with its GNU implementation, Bison)
is one of the more popular parser generators (perhaps the most popular; I can't
seem to find any usage statistics right now), and you even said in the original
e-mail announcing the challenge:

> If you produce a grammar, I don't care what parser generator it needs, or
> even if such a parser generator exists. I care only that the language is
> actually a CFG, and that a parser generator could, in principle, be built for
> whatever you came up with.

So, I guess I have two questions: what makes you think that %prec makes the
grammar non-context-free, and if you were to somehow get a working
context-sensitive grammar for a parser generator that is only expected to work
for CFGs, what would be the problem, other than being kept from showing that
Lojban grammar is context-free, which seems to be nothing more than a personal
goal of yours?

mu'omi'e la'o gy. Minimiscience .gy.

--
ko senpi lo du'u do bilga lonu senpi


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 14214

On Thu, Dec 25, 2008 at 07:41:52PM +0000, Minimiscience wrote:
> > It looks like a yacc-specific trick.
>
> I wouldn't call it a "trick"; it's a documented & well-known
> feature. Besides, so what if it's Yacc-specific?

We asked for a general CFG, not something that only works in some
specific grammar generator.

> So, I guess I have two questions: what makes you think that %prec
> makes the grammar non-context-free,

Because there is no formal definition of it, for one thing. It's
not nearly as obvious as you seem to think; there have been serious
papers written on the topic:

http://shrunklink.com/bkop

Furthermore, and this is much more important, unless I'm really
missing something you're *changing the precedence as the grammar is
parsed*. That is, if a parse doesn't work without a particular
terminator you drop the precedence of that terminator-free
production to zero *at parse time*. If that's reducable to a
4-tuple a la http://en.wikipedia.org/wiki/Context_free_grammar I'd
be *really* surprised indeed!

> and if you were to somehow get a working context-sensitive grammar
> for a parser generator that is only expected to work for CFGs,
> what would be the problem,

We already have one; that's exactly what the official parser is.
The only difference is that you use %prec to modify precedence
during parsing, whereas the current official parser uses the error
production to do the same thing.

Such a thing cannot be formally reasoned about, and (much more
importantly) is hard to port to other parser generators. A pure CFG
should be trivially portable to (picking one at random here) say,
ANTLR, but that's not true with %prec, since every grammar generator
that has such a thing, if it does, will do it differently.

> other than being kept from showing that Lojban grammar is
> context-free, which seems to be nothing more than a personal goal
> of yours?

Well, that was the point of the contest, so it matters in as much as
you seemed to be asking John and I to judge this as an entry to the
contest. I'm afraid I have to say that it fails. John may
disagree, but I doubt it.

-Robin

--
They say: "The first AIs will be built by the military as weapons."
And I'm thinking: "Does it even occur to you to try for something
other than the default outcome?" — http://shorl.com/tydruhedufogre
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 3588

de'i li 25 pi'e 12 pi'e 2008 la'o fy. Robin Lee Powell .fy. cusku zoi
skamyxatra.
> We asked for a general CFG, not something that only works in some
> specific grammar generator.
.skamyxatra

When did you say that? In your original e-mail
(<http://www.lojban.org/lists/lojban-beginners/msg06812.html>) you said, in
addition to the passage quoted earlier, "Produce a working CFG for Lojban, in
any format that some parser generator somewhere can accept...". I don't see
any restriction to a general CFG anywhere in that thread.

> Furthermore, and this is much more important, unless I'm really
> missing something you're *changing the precedence as the grammar is
> parsed*. That is, if a parse doesn't work without a particular
> terminator you drop the precedence of that terminator-free
> production to zero *at parse time*. If that's reducable to a
> 4-tuple a la http://en.wikipedia.org/wiki/Context_free_grammar I'd
> be *really* surprised indeed!

That's not how it works. If the parser must choose between reducing the
current construct and shifting the next terminal before reducing, it uses the
precedence rules to determine which action to take, and these precedence rules
are set in stone when Yacc parses its input. In the case of the Lojban
grammar, if a construct can be terminated and the terminator is the next input
symbol, the terminator is shifted onto the stack, and then the entire construct
is reduced. This takes place regardless of whether it leads to a valid parse
tree. See <http://www.opengroup.org/onlinepubs/000095399/utilities/yacc.html

  1. tag_04_174_13_07> for more information.


I also feel it worth pointing out that since most implementations of Yacc
(including the ones I tested this with), when given the -v option, generate a
file describing every state of the parsing automaton for the given grammar in
all of its deterministic glory, any actions that the parser takes at runtime
must be foreseeable and have results that can be predetermined in a
context-free manner.

mu'omi'e la'o gy. Minimiscience .gy.

--
mi klama .i mi viska .i mi fanva fi la lojban.


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 3588

de'i li 25 pi'e 12 pi'e 2008 la'o fy. Robin Lee Powell .fy. cusku zoi
skamyxatra.
> We asked for a general CFG, not something that only works in some
> specific grammar generator.
.skamyxatra

When did you say that? In your original e-mail
(<http://www.lojban.org/lists/lojban-beginners/msg06812.html>) you said, in
addition to the passage quoted earlier, "Produce a working CFG for Lojban, in
any format that some parser generator somewhere can accept...". I don't see
any restriction to a general CFG anywhere in that thread.

> Furthermore, and this is much more important, unless I'm really
> missing something you're *changing the precedence as the grammar is
> parsed*. That is, if a parse doesn't work without a particular
> terminator you drop the precedence of that terminator-free
> production to zero *at parse time*. If that's reducable to a
> 4-tuple a la http://en.wikipedia.org/wiki/Context_free_grammar I'd
> be *really* surprised indeed!

That's not how it works. If the parser must choose between reducing the
current construct and shifting the next terminal before reducing, it uses the
precedence rules to determine which action to take, and these precedence rules
are set in stone when Yacc parses its input. In the case of the Lojban
grammar, if a construct can be terminated and the terminator is the next input
symbol, the terminator is shifted onto the stack, and then the entire construct
is reduced. This takes place regardless of whether it leads to a valid parse
tree. See <http://www.opengroup.org/onlinepubs/000095399/utilities/yacc.html

  1. tag_04_174_13_07> for more information.


I also feel it worth pointing out that since most implementations of Yacc
(including the ones I tested this with), when given the -v option, generate a
file describing every state of the parsing automaton for the given grammar in
all of its deterministic glory, any actions that the parser takes at runtime
must be foreseeable and have results that can be predetermined in a
context-free manner.

mu'omi'e la'o gy. Minimiscience .gy.

--
mi klama .i mi viska .i mi fanva fi la lojban.


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 14214

On Thu, Dec 25, 2008 at 10:04:05PM +0000, Minimiscience wrote:
> de'i li 25 pi'e 12 pi'e 2008 la'o fy. Robin Lee Powell .fy. cusku
> zoi skamyxatra.
> > We asked for a general CFG, not something that only works in
> > some specific grammar generator.
> .skamyxatra
>
> When did you say that? In your original e-mail
> (<http://www.lojban.org/lists/lojban-beginners/msg06812.html>) you
> said, in addition to the passage quoted earlier, "Produce a
> working CFG for Lojban, in any format that some parser generator
> somewhere can accept...". I don't see any restriction to a
> general CFG anywhere in that thread.

What part of "working CFG" was unclear? What you produced isn't
one, as far as I can tell.

What you produced is exactly the same as the current official
parser: a yacc grammar that uses hard-to-port tricks to deal with
elidable terminators.

> > Furthermore, and this is much more important, unless I'm really
> > missing something you're *changing the precedence as the grammar
> > is parsed*. That is, if a parse doesn't work without a
> > particular terminator you drop the precedence of that
> > terminator-free production to zero *at parse time*. If that's
> > reducable to a 4-tuple a la
> > http://en.wikipedia.org/wiki/Context_free_grammar I'd be
> > *really* surprised indeed!
>
> That's not how it works. If the parser must choose between
> reducing the current construct and shifting the next terminal
> before reducing, it uses the precedence rules to determine which
> action to take, and these precedence rules are set in stone when
> Yacc parses its input. In the case of the Lojban grammar, if a
> construct can be terminated and the terminator is the next input
> symbol, the terminator is shifted onto the stack, and then the
> entire construct is reduced. This takes place regardless of
> whether it leads to a valid parse tree.

OK. That's not how I read the page; I read it that %prec isn't
processed until that point in the tree is reached.

Regardless, I can't verify any of this myself because I haven't been
able to run it, as I said in another mail. Some help there would be
nice.

-Robin

--
They say: "The first AIs will be built by the military as weapons."
And I'm thinking: "Does it even occur to you to try for something
other than the default outcome?" — http://shorl.com/tydruhedufogre
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 40

On Thu, Dec 25, 2008 at 10:41 PM, Minimiscience <minimiscience@gmail.com> wrote:

> I don't see how it wouldn't be a context-free grammar. A Yacc grammar without
> precedence rules is context-free (albeit possibly ambiguous), correct?

Let me try. The grammar is context-free. But, when Yacc uses some
precedence rule, it's choice is fixed and never traced back.
Therefore, some possible parsing trees that can be generated by the
CFG, will not be generated by the Yacc, and the phrase in question
will be rejected. In such cases, the phrase is not a valid Lojban
under the current definition, albeit it can be generated by the Yacc
grammar considered as CFG. Example of such invalid phrase:

nu le broda broda

(Courtesy of Robin; probably he is bored to repeat this again and
again, so there is my turn ;-)

--
http://slobin.pp.ru/ `When I use a word,' Humpty Dumpty said,
<cyril@slobin.pp.ru> `it means just what I choose it to mean'


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 40

On Fri, Dec 26, 2008 at 3:22 AM, Cyril Slobin <cyril@slobin.pp.ru> wrote:

> Example of such invalid phrase:
>
> nu le broda broda
>
> (Courtesy of Robin; probably he is bored to repeat this again and
> again, so there is my turn ;-)

To Robin: after a cup of tea, the problem seems to me worse then
before. I don't remember the exact CLL wording, but it is some
informal English prose like "terminators may be omitted unless this
leads to ambiguity", right? As a formalization of this informal
English prose, we have the official yacc-based parser with it's shift
over reduce preference, right? But this formalization fails in the
example above! Correct me if I am wrong, but I believe there is one
and only one way to restore the omitted terminators there:

nu le broda KU broda KEI

Therefore, by definition, there is no ambiguity, and the phrase is
correct. But the official parser rejects it! So, I think, we should
either (1) decide that parser is always right, and clarify English
wording, or (2) make a better parser. Or am I missed something?

--
http://slobin.pp.ru/ `When I use a word,' Humpty Dumpty said,
<cyril@slobin.pp.ru> `it means just what I choose it to mean'


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 3588

de'i li 25 pi'e 12 pi'e 2008 la'o fy. Robin Lee Powell .fy. cusku zoi
skamyxatra.
> What part of "working CFG" was unclear? What you produced isn't
> one, as far as I can tell.
>
> What you produced is exactly the same as the current official
> parser: a yacc grammar that uses hard-to-port tricks to deal with
> elidable terminators.
.skamyxatra

Good point. However, note that the entire point of the %prec trick is to force
Yacc to shift when it detects ambiguity in the rules. If another parser
generator can be told in some implementation-defined manner to always shift,
you can port the grammar over by stripping out the %precs, reformatting the BNF
as necessary, and adding whatever the other parser needs to know to shift. I'm
not familiar with any other parser generators, so I don't know how hard this
would be.

> Regardless, I can't verify any of this myself because I haven't been
> able to run it, as I said in another mail. Some help there would be
> nice.

I'm working on that. If you can tolerate a subpar lexer (i.e., one that
accepts invalid {brivla}) for the purposes of testing, I should be able to have
a program that shows the grouping of grammatical constructs with my grammar by
late tonight.

mu'omi'e la'o gy. Minimiscience .gy.

--
mi klama .i mi viska .i mi fanva fi la lojban.


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 3588

de'i li 26 pi'e 12 pi'e 2008 la'o fy. Cyril Slobin .fy. cusku zoi skamyxatra.
> To Robin: after a cup of tea, the problem seems to me worse then
> before. I don't remember the exact CLL wording, but it is some
> informal English prose like "terminators may be omitted unless this
> leads to ambiguity", right? As a formalization of this informal
> English prose, we have the official yacc-based parser with it's shift
> over reduce preference, right? But this formalization fails in the
> example above! Correct me if I am wrong, but I believe there is one
> and only one way to restore the omitted terminators there:
>
> nu le broda KU broda KEI
>
> Therefore, by definition, there is no ambiguity, and the phrase is
> correct. But the official parser rejects it! So, I think, we should
> either (1) decide that parser is always right, and clarify English
> wording, or (2) make a better parser. Or am I missed something?
.skamyxatra

I interpret that rule as meaning to say "unless a change in meaning results"
and treat the grammar as always shifting. Forcing "{nu le broda broda}" to
contain an implicit "{ku}" in the middle leads to too many problems. Among
other things, if you want to answer a "{ma}" question with a {sumti} that
contains a {tanru}, you have to add a seemingly superfluous "{ku}" onto the end
in order to keep the {tanru} from being broken into a {sumti} descriptor and a
{selbri}. Moreover, where would the "{ku}" be placed in "{nu le broda brode
brodi}"? Designing the parser such that it rewrites constructs that don't work
would ... well, it probably wouldn't be LALR(1) unless you took the
brute-force, grillion-productions approach, and it would definitely lend weight
to the idea of an impossible Lojban CFG. If this unintuitive "implicit
terminator insertion in the middle of constructs" approach is how Lojban is
meant to work, I would really, really like to hear the rationale for it, among
other things.

mu'omi'e la'o gy. Minimiscience .gy.

--
do ganai ka'e tcidu dei gi djuno lo dukse


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 40

On Fri, Dec 26, 2008 at 4:21 AM, Minimiscience <minimiscience@gmail.com> wrote:

> I interpret that rule as meaning to say "unless a change in meaning results"

You cannot say "change in meaning" before you define the meaning of
terminator-less phrase. If you say "this phrase is meaningless,
because insertion of terminator change is from meaningless to
meaningful", your definition is circular. On the contrary, you example
{nu le broda brode brodi} IS ambiguous, because terminators can be
inserted in at least two ways.

I do NOT suggest to change the language, the current practice is good
enough. But I suggest to find a better way to explain it in a plain
English. "Always shifting" is NOT a plain English: LALR(1) is just one
of many parsing methods, and the language definition should not be
formulated in it's terms.

Consider another, the most famous shift-reduce conflict: the "dangling
else" case. Neither Wirth nor Kernighan and Ritchie said "always
reduce", they said "else is associated with closest previous else-less
if". What I suggest is to find and explicitly write the similar rule
for Lojban. Or to create a really unambiguous grammar, making such an
additional rules unnecessary. BTW, for dangling else it is technically
almost trivial, but the resulting grammar seems unnatural for human
readers.

--
http://slobin.pp.ru/ `When I use a word,' Humpty Dumpty said,
<cyril@slobin.pp.ru> `it means just what I choose it to mean'


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 40

Cyril (me) said:

> Neither Wirth nor Kernighan and Ritchie said "always reduce",

"Always shift", of course. Just a typo.

Minimiscience said:

> A Lojban parser must absorb as many tokens as possible into the current
> (innermost) grammatical construct, stopping only when it encounters either
> a terminator or a token that cannot continue the construct.

This is the current practice, and this is a good practice, but CLL
contains no such words. CLL contains words about "unambiguity", and
reader not aware of current practice will not necessary read it as
"try to stretch the innermost construct as long as possible". In fact
the first does not mean the second. What we need is to approve this or
similar wording, and say it explicitly.

--
http://slobin.pp.ru/ `When I use a word,' Humpty Dumpty said,
<cyril@slobin.pp.ru> `it means just what I choose it to mean'


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 14214

On Fri, Dec 26, 2008 at 01:05:33AM +0000, Minimiscience wrote:
> > Regardless, I can't verify any of this myself because I haven't
> > been able to run it, as I said in another mail. Some help there
> > would be nice.
>
> I'm working on that. If you can tolerate a subpar lexer (i.e.,
> one that accepts invalid {brivla}) for the purposes of testing, I
> should be able to have a program that shows the grouping of
> grammatical constructs with my grammar by late tonight.

Why, in the name of unnamed things, are you not just using the
official parser code?

-Robin

--
They say: "The first AIs will be built by the military as weapons."
And I'm thinking: "Does it even occur to you to try for something
other than the default outcome?" — http://shorl.com/tydruhedufogre
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 14214

On Fri, Dec 26, 2008 at 03:50:28AM +0300, Cyril Slobin wrote:
> On Fri, Dec 26, 2008 at 3:22 AM, Cyril Slobin <cyril@slobin.pp.ru>
> wrote:
>
> > Example of such invalid phrase:
> >
> > nu le broda broda
> >
> > (Courtesy of Robin; probably he is bored to repeat this again
> > and again, so there is my turn ;-)
>
> To Robin: after a cup of tea, the problem seems to me worse then
> before. I don't remember the exact CLL wording, but it is some
> informal English prose like "terminators may be omitted unless
> this leads to ambiguity", right?

Something to that effect, specifically:

// encloses an elidable terminator, which may be omitted
(without change of meaning) if no grammatical ambiguity results.

> As a formalization of this informal English prose, we have the
> official yacc-based parser with it's shift over reduce preference,
> right? But this formalization fails in the example above!

The official yacc-based parser uses magical error productions to
avoid being a real CFG.

> Correct me if I am wrong, but I believe there is one and only one
> way to restore the omitted terminators there:
>
> nu le broda KU broda KEI

Seems to me, yes. Note that "si" can also fix it in interesting
ways. :-)

> Therefore, by definition, there is no ambiguity, and the phrase is
> correct. But the official parser rejects it! So, I think, we
> should either (1) decide that parser is always right, and clarify
> English wording, or (2) make a better parser. Or am I missed
> something?

(1) is already true, actually; the CLL makes it clear that the
official parser wins.

The problem is that there *is* grammatical ambiguity: the sentence

  • could* be an error; someone could have forgotten to say "cu brodi"

afterwards, or it's taken out of context or whatever, but
regardless, nu le broda broda KU KEI is perfectly valid, it's just
not a valid Lojban utterance by itself.

Saying that the parser will find *some* valid parse if it exists
strikes me as expanding Lojban rather a lot. I imagine that
figuring out that "nu le broda broda" means "nu le broda KU broda
KEI" in real time would *really* suck. I don't think a human could
keep up.

-Robin

--
They say: "The first AIs will be built by the military as weapons."
And I'm thinking: "Does it even occur to you to try for something
other than the default outcome?" — http://shorl.com/tydruhedufogre
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 3588

de'i li 25 pi'e 12 pi'e 2008 la'o fy. Robin Lee Powell .fy. cusku zoi
skamyxatra.
> Why, in the name of unnamed things, are you not just using the
> official parser code?
.skamyxatra

Wait, what? There's more to the official parser than just the error-using Yacc
input? I don't think I've ever encountered any references to other official
source code. Oh, how I hate the tiki.

mu'omi'e la'o gy. Minimiscience .gy.

--
no zo mi nenri zo bende


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 14214

On Fri, Dec 26, 2008 at 03:35:38AM +0000, Minimiscience wrote:
> de'i li 25 pi'e 12 pi'e 2008 la'o fy. Robin Lee Powell .fy. cusku
> zoi skamyxatra.
> > Why, in the name of unnamed things, are you not just using the
> > official parser code?
> .skamyxatra
>
> Wait, what? There's more to the official parser than just the
> error-using Yacc input? I don't think I've ever encountered any
> references to other official source code.

Dude, I sent you the link in my second mail on this thread!

> Oh, how I hate the tiki.

http://www.lojban.org/tiki/tiki-index.php?page=Official+Parser&bl=y

But the link I sent you was:

http://home.ccil.org/~cowan/parser-3.0.00.tar.gz

Yes, it is actual working code; I run it regularily.

-Robin

--
They say: "The first AIs will be built by the military as weapons."
And I'm thinking: "Does it even occur to you to try for something
other than the default outcome?" — http://shorl.com/tydruhedufogre
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 3588

de'i li 25 pi'e 12 pi'e 2008 la'o fy. Robin Lee Powell .fy. cusku zoi
skamyxatra.
> Dude, I sent you the link in my second mail on this thread!
.skamyxatra

In my defense, you never actually said it was the *official* code.

I'm going to work on getting my patch to work with this. I am either going to
succeed in getting the parser to work or else die trying.

mu'omi'e la'o gy. Minimiscience .gy.

--
ko na xalni


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 40

On Fri, Dec 26, 2008 at 6:29 AM, Robin Lee Powell
<rlpowell@digitalkingdom.org> wrote:

> The problem is that there *is* grammatical ambiguity: the sentence
> *could* be an error;

If one counts the ambiguities with could-be errors, than any and every phrase
is ambiguous. And any sequence of lojban words become valid if one just says
that someone forgot lo'u..le'u around it. ;-)

> nu le broda broda KU KEI is perfectly valid, it's just
> not a valid Lojban utterance by itself.

It is not, we need bridi inside nu..KEI, not sumti.

> Saying that the parser will find *some* valid parse if it exists
> strikes me as expanding Lojban rather a lot.

When I read the word "if no ambiguity occurs", I treat it like
"if parser found two or more parses, then the phrase is wrong;
but if exactly one can be found, it's OK".

> I imagine that figuring out that "nu le broda broda" means "nu le broda KU
> broda KEI" in real time would *really* suck. I don't think a human could
> keep up.

I agree. I do not suggest to change the rules, I am trying to say that we need
a better explanation of "the innermost the longest" rule. In fact we need *some*
explanation of this rule: it is a common lojban lore, but I haven't seen it
written explicitly in CLL or elsewhere. And I believe that the
explanation should

  • not* refer to YACC, LALR, and other low-level tech details. We must found a way

to explain this in the language terms.

--
http://slobin.pp.ru/ `When I use a word,' Humpty Dumpty said,
<cyril@slobin.pp.ru> `it means just what I choose it to mean'


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 40

On Fri, Dec 26, 2008 at 7:17 AM, Minimiscience <minimiscience@gmail.com> wrote:

> In my defense, you never actually said it was the *official* code.

In defense of Robin: if you are referring to the sentence "No idea
about it's official status although" on the wiki page, it is mine. ;-)
The story behind this: only the old versions of the parser was on the
page for a long time. Then somebody in this list (sorry, I don't
remember who) requested if someone can add a link to the new code. It
happens so that I was the volunteer, and I've appended this note
because I really have no idea. I am very far from LLG officials and
all such things.

--
http://slobin.pp.ru/ `When I use a word,' Humpty Dumpty said,
<cyril@slobin.pp.ru> `it means just what I choose it to mean'


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 14214

On Fri, Dec 26, 2008 at 07:18:43AM +0300, Cyril Slobin wrote:
> > I imagine that figuring out that "nu le broda broda" means "nu
> > le broda KU broda KEI" in real time would *really* suck. I
> > don't think a human could keep up.
>
> I agree. I do not suggest to change the rules, I am trying to say
> that we need a better explanation of "the innermost the longest"
> rule. In fact we need *some* explanation of this rule: it is a
> common lojban lore, but I haven't seen it written explicitly in
> CLL or elsewhere. And I believe that the explanation should *not*
> refer to YACC, LALR, and other low-level tech details. We must
> found a way to explain this in the language terms.

IIRC, jcowan was able to explain this to me without resorting to
same, although I don't remember how. But yes, I agree.

-Robin

--
They say: "The first AIs will be built by the military as weapons."
And I'm thinking: "Does it even occur to you to try for something
other than the default outcome?" — http://shorl.com/tydruhedufogre
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 143

On Thu, Dec 25, 2008 at 18:50, Cyril Slobin <cyril@slobin.pp.ru> wrote:
> nu le broda KU broda KEI
>
> Therefore, by definition, there is no ambiguity, and the phrase is
> correct. But the official parser rejects it!

Hmm. The PEG rejects this as well (without the {ku}).

> So, I think, we should
> either (1) decide that parser is always right, and clarify English
> wording, or (2) make a better parser. Or am I missed something?

I agree--we should decide if this ought to be grammatical.

Chris Capel
--
"What is it like to be a bat? What is it like to bat a bee? What is it
like to be a bee being batted? What is it like to be a batted bee?"
-- The Mind's I (Hofstadter, Dennet)


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

On Fri, Dec 26, 2008 at 1:18 AM, Cyril Slobin <cyril@slobin.pp.ru> wrote:
>
> I do not suggest to change the rules, I am trying to say that we need
> a better explanation of "the innermost the longest" rule. In fact we need *some*
> explanation of this rule: it is a common lojban lore, but I haven't seen it
> written explicitly in CLL or elsewhere. And I believe that the
> explanation should
> *not* refer to YACC, LALR, and other low-level tech details. We must found a way
> to explain this in the language terms.

I think the correct statement of the elidability rule is something like this:

"An elidable terminator terminates some construct. The terminator can
be elided if and only if the construct it terminates won't be extended
in its absence."

In the case of "nu le broda KU broda", KU is terminating the construct
"le broda KU". If we elide it, that construct will be extened to "le
broda brode KU", and so it cannot be elided. It doesn't matter where
the construct under consideation is embedded. Or whether by inserting
a terminator in the middle of a construct we could make some
ungrammatical text grammatical or not. The only thing to consider is
the single construct that the terminator terminates and whether
anything that follows will be absorbed by that construct if the
terminator is absent.

mu'o mi'e xorxes


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 143

On Fri, Dec 26, 2008 at 08:42, Jorge Llambías <jjllambias@gmail.com> wrote:
> I think the correct statement of the elidability rule is something like this:
>
> "An elidable terminator terminates some construct. The terminator can
> be elided if and only if the construct it terminates won't be extended
> in its absence."

This statement of the rule (just a clarification? or is it stronger?)
seems to make it even more clear that a pure CFG will not suffice to
handle elideable terminators. It would require one part of the parse
tree to know what's going on in a completely different (though
textually adjacent) part.

> In the case of "nu le broda KU broda", KU is terminating the construct
> "le broda KU". If we elide it, that construct will be extened to "le
> broda brode KU", and so it cannot be elided.

Robin, would a proof that a CFG could not correctly parse this one
specific example suffice for the lesser prize?

Chris Capel
--
"What is it like to be a bat? What is it like to bat a bee? What is it
like to be a bee being batted? What is it like to be a batted bee?"
-- The Mind's I (Hofstadter, Dennet)


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 40
On Fri, Dec 26, 2008 at 7:15 PM, Chris Capel <pdf23ds@gmail.com> wrote:


> Robin, would a proof that a CFG could not correctly parse this one
> specific example suffice for the lesser prize?

I believe that this one specific example *can* be correctly parsed by
CFG. I'll try to write such CFG for you tonight. The problem here is
that correct CFG is much longer than "traditional" one. And, even if
the similar transformation for the whole Lojban grammar is possible,
the result will be far too huge to be used in practice.

--
http://slobin.pp.ru/ `When I use a word,' Humpty Dumpty said,
<cyril@slobin.pp.ru> `it means just what I choose it to mean'


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

On Fri, Dec 26, 2008 at 2:19 PM, Cyril Slobin <cyril@slobin.pp.ru> wrote:
>
> I believe that this one specific example *can* be correctly parsed by
> CFG. I'll try to write such CFG for you tonight. The problem here is
> that correct CFG is much longer than "traditional" one. And, even if
> the similar transformation for the whole Lojban grammar is possible,
> the result will be far too huge to be used in practice.

I agree. Indeed I don't see any reason for a complete CFG not to be
possible, but it will almost certainly require more than 2000 lines.
Orders of magnitude more, probably, as there are dozens of terminators
and handling each one will expand the grammar by some factor of, I
suspect, around 2.

mu'o mi'e xorxes


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 40

On Fri, Dec 26, 2008 at 9:00 PM, Jorge Llambías <jjllambias@gmail.com> wrote:

> Orders of magnitude more, probably, as there are dozens of terminators
> and handling each one will expand the grammar by some factor of, I
> suspect, around 2.

I have some little hope that the dependence here is not exponential,
but only quadratic or even, in a case of a good luck, n*log(n). No,
I do not claim this yet, but I'll try to investigate this during New
Year holidays. In the worst case I just will not success. ;-)

--
http://slobin.pp.ru/ `When I use a word,' Humpty Dumpty said,
<cyril@slobin.pp.ru> `it means just what I choose it to mean'

posts: 3588

I have good news and bad news.

The good news is that I got Cowan's parser to compile with my patch with a few
tweaks. The complete patch is at
<http://jwodder.freeshell.org/downloads/jbogeha2.diff>; just apply it to the
source at <http://home.ccil.org/~cowan/parser-3.0.00.tar.gz> and run `make`.
Note that you need to be using version 2.4 of Bison; version 2.3 (at least the
build that I have) doesn't accept assigning numeric values to %nonassoc tokens,
in violation of the Yacc specification.

The bad news is that the outputs from the respective programs differ for some
constructs, and my version even chokes on two specific "{li'u}"s in openwm.txt
for some reason that I can't yet work out. Regardless, I still feel that my
underlying theory is sound, and a handful of careful edits to the grammar file
should be all that's necessary to bring the programs into alignment. Now I
just need to work out what those edits are.

mu'omi'e la'o gy. Minimiscience .gy.

--
ko na xalni


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 14214

On Fri, Dec 26, 2008 at 10:51:52PM +0000, Minimiscience wrote:
> I have good news and bad news.
>
> The good news is that I got Cowan's parser to compile with my
> patch with a few tweaks. The complete patch is at
> <http://jwodder.freeshell.org/downloads/jbogeha2.diff>; just apply
> it to the source at
> <http://home.ccil.org/~cowan/parser-3.0.00.tar.gz> and run `make`.

Which file did you patch, exactly? "make" just runs "cc -o parser

  • .c", which doesn't use grammar.300 at all, which is the only file

that the patch works on. You need to use mkgramy, doyacc, and
mknames to actually generate the .c files, and I couldn't get those
to work.

> The bad news is that the outputs from the respective programs
> differ for some constructs, and my version even chokes on two
> specific "{li'u}"s in openwm.txt for some reason that I can't yet
> work out. Regardless, I still feel that my underlying theory is
> sound, and a handful of careful edits to the grammar file should
> be all that's necessary to bring the programs into alignment. Now
> I just need to work out what those edits are.

OK. I'm still wondering what difference it makes to use %prec
rather than error productions.

-Robin

--
They say: "The first AIs will be built by the military as weapons."
And I'm thinking: "Does it even occur to you to try for something
other than the default outcome?" — http://shorl.com/tydruhedufogre
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 3588

de'i li 26 pi'e 12 pi'e 2008 la'o fy. Robin Lee Powell .fy. cusku zoi
skamyxatra.
> Which file did you patch, exactly? "make" just runs "cc -o parser
> *.c", which doesn't use grammar.300 at all, which is the only file
> that the patch works on. You need to use mkgramy, doyacc, and
> mknames to actually generate the .c files, and I couldn't get those
> to work.
.skamyxatra

The patch affects both grammar.300 and mkgramy, and it also changes the
Makefile to depend on & use grammar.300 properly. I was unaware that mknames
might also have to be edited; however, given that running `./mknames` inside a
pristine (and supposedly compilation-ready) copy of the source code causes
rulename.i to be changed in dubious-looking ways, I'm rather hesitant to touch
it.

mu'omi'e la'o gy. Minimiscience .gy.

--
ko senpi lo du'u do bilga lonu senpi


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 3588

mulno .ui .o'u

I've fixed my patch for the grammar; the problem was simply that discursive
{bridi} (and presumably other free modifiers) weren't binding as tightly as
they should have. The output from the patched parser now matches that of the
original parser, and the final patch can be found at
<http://jwodder.freeshell.org/downloads/jbogeha3.diff>. Hopefully, the grammar
now works exactly as it's supposed to.

As to the question of what the difference between using `error' and using
`%prec' is, I'm not entirely sure what you want me to say. `%prec' uses
features as they were intended and is better (and less obfuscated) style?
Specifying how to resolve shift/reduce conflicts is more understandable &
portable than error recovery? Using `error' here is like passing void pointers
to structures that start with a type-identifying integer in an effort to
implement polymorphism when you could just code in C++? It "just works"? Both
ways get the job done, but using %prec is the way that it's supposed to get
done.

mu'omi'e la'o gy. Minimiscience .gy.

--
do ganai ka'e tcidu dei gi djuno lo dukse


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 14214

On Sun, Dec 28, 2008 at 01:27:25AM +0000, Minimiscience wrote:
> mulno .ui .o'u
>
> I've fixed my patch for the grammar; the problem was simply that
> discursive {bridi} (and presumably other free modifiers) weren't
> binding as tightly as they should have. The output from the
> patched parser now matches that of the original parser, and the
> final patch can be found at
> <http://jwodder.freeshell.org/downloads/jbogeha3.diff>.
> Hopefully, the grammar now works exactly as it's supposed to.

I haven't tested it thoroughly (I actually have a big script set for
this purpose), but so far it seems to be doing pretty well.

> As to the question of what the difference between using `error'
> and using `%prec' is, I'm not entirely sure what you want me to
> say. `%prec' uses features as they were intended and is better
> (and less obfuscated) style? Specifying how to resolve
> shift/reduce conflicts is more understandable & portable than
> error recovery? Using `error' here is like passing void pointers
> to structures that start with a type-identifying integer in an
> effort to implement polymorphism when you could just code in C++?
> It "just works"? Both ways get the job done, but using %prec is
> the way that it's supposed to get done.

.u'u sai

I'm so sorry; I phrased that poorly and I've been rude to you.

Yes, this solution is *substantially* better than the extant trick
that the official parser uses. I'm particularily curious to see what
code can be removed with this version (i.e. the error handling
code). For the record, looking at the changelog seems to indicate
that %prec may not actually have existed when the official parser
was made.

I'm just disappointed that this doesn't seem to put us any closer to
an actual CFG for Lojban (not that I've analyzed it properly; been
busy). I was way too excited when I saw your first post. Sorry if
I was an ass.

-Robin

--
They say: "The first AIs will be built by the military as weapons."
And I'm thinking: "Does it even occur to you to try for something
other than the default outcome?" — http://shorl.com/tydruhedufogre
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.