A challenge for computer science/programming geeks: The LLG wants to give you $500!

Posted by pdf23ds on Tue 28 of Oct, 2008 22:51 GMT posts: 143

On Tue, Oct 28, 2008 at 16:51, Robin Lee Powell
<rlpowell@digitalkingdom.org> wrote:
> Your mission, should you choose to accept it, is to create a CFG for
> Lojban. There are three options, with different monetary values
> attached, in order of what we in the LLG board would prefer to get.

How exactly is PEG not context free? Wikipedia's not helping me much
here. AFAICT, PEGs are a subset of CFGs where each rule is unique and
all operators are greedy and unambiguous.

Chris Capel
--
"What is it like to be a bat? What is it like to bat a bee? What is it
like to be a bee being batted? What is it like to be a batted bee?"
-- The Mind's I (Hofstadter, Dennet)

To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Link

Posted by pdf23ds on Tue 28 of Oct, 2008 23:07 GMT posts: 143

On Tue, Oct 28, 2008 at 16:51, Robin Lee Powell
<rlpowell@digitalkingdom.org> wrote:
> For $100: Formally prove that encoding Lojban's elidable terminators
> is not possible in a CFG.

Hmm. Don't get too excited, there. What's the standard of formality,
here? Computer proof level, or simply very explicit English?

Chris Capel
--
"What is it like to be a bat? What is it like to bat a bee? What is it
like to be a bee being batted? What is it like to be a batted bee?"
-- The Mind's I (Hofstadter, Dennet)

To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Link

Posted by Anonymous on Wed 29 of Oct, 2008 00:39 GMT

On Tue, Oct 28, 2008 at 6:51 PM, Robin Lee Powell
<rlpowell@digitalkingdom.org> wrote:
>
> For $500: Produce a working CFG for Lojban, in any format that some
> parser generator somewhere can accept, *or* in BNF or a well defined
> extension thereof. It must support every aspect of the language
> except for ZOI (which is fully context sensitive). Elidable
> terminators are the hard part.

You don't specify that the CFG must be unambiguous, so ambiguous CFG's
are acceptable? (We already know the language is unambiguous, but a
grammar that generates it can still be ambiguous, allowing more than
one way to generate the same valid string.)

Terminals are selmaho, not phonemes, right?

mu'o mi'e xorxes

To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Link

Posted by Anonymous on Wed 29 of Oct, 2008 01:24 GMT

On Tue, Oct 28, 2008 at 9:59 PM, Robin Lee Powell
<rlpowell@digitalkingdom.org> wrote:
>
> If the grammar is ambiguous, I think you've failed to make a grammar for Lojban.

A CFG is a set of rules that generates each and only the valid strings
of the language. Unambiguity is desirable for a parsing grammar (which
we already have, the PEG), but a generating grammar that generates all
the Lojban strings is also a grammar for Lojban, even if it has more
than one way to generate the same valid string. A machine that
generated its output from such a grammar would generate valid Lojban.
If you want an unambiguous CFG (which is obviously a somewhat harder
problem) I think you need to specify it.

mu'o mi'e xorxes

To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Link

Posted by Anonymous on Wed 29 of Oct, 2008 01:35 GMT

On Tue, Oct 28, 2008 at 10:24 PM, Robin Lee Powell
<rlpowell@digitalkingdom.org> wrote:
>
> I don't believe, but am not certain, that that is possible. That
> is, I believe that if there is more than one way to generate a
> string, you've describe an ambiguous language, and that such a
> description would not, in fact, describe Lojban.

See example 3 in <http://en.wikipedia.org/wiki/Context-free_grammar>

mu'o mi'e xorxes

To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Link

Posted by slobin on Wed 29 of Oct, 2008 08:42 GMT posts: 40

On 10/29/08, Robin Lee Powell <rlpowell@digitalkingdom.org> wrote:

> > A machine that generated its output from such a grammar would
> > generate valid Lojban. If you want an unambiguous CFG (which is
> > obviously a somewhat harder problem) I think you need to specify
> > it.
>
> Then please, everyone, consider it so specified. Apologies for
> triple posting again, but if xorxes is right this is important.

Correct me if I mislook something, but without this clarification the
task seems to be trivial, isn't it? If we don't care about parsing
tree, we can just make all elidable terminators optional and get the
solution. Look at a very tiny subset of the language:

<bridi> ::= broda | <bridi> <bridi> | nu <bridi> | nu <bridi> kei

The string "nu broda broda" can be generated in two ways:

nu broda) broda) and (nu (broda broda

, but the grammar generates all good strings and only good strings.
Therefore unambiguity is the central difficulty there.

--
http://slobin.pp.ru/ `When I use a word,' Humpty Dumpty said,
<cyril@slobin.pp.ru> `it means just what I choose it to mean'

To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Link

Posted by kpreid on Wed 29 of Oct, 2008 10:32 GMT posts: 7

On Oct 28, 2008, at 21:32, Robin Lee Powell wrote:

> On Tue, Oct 28, 2008 at 10:30:16PM -0300, Jorge Llambas wrote:
>> On Tue, Oct 28, 2008 at 10:24 PM, Robin Lee Powell
>> <rlpowell@digitalkingdom.org> wrote:
>>>
>>> I don't believe, but am not certain, that that is possible.
>>> That is, I believe that if there is more than one way to
>>> generate a string, you've describe an ambiguous language, and
>>> that such a description would not, in fact, describe Lojban.
>>
>> See example 3 in
>> <http://en.wikipedia.org/wiki/Context-free_grammar>
>
> You win.

It seems to me that if you have such an ambiguous grammar, then it is,
if not actually wrong, then misleading: the parse tree has degrees of
freedom which are not actually part of the structure of the language.

For example, the trivial grammar

SELBRI -> BRIVLA | SELBRI SELBRI

will generate only valid selbri, but non-left-leaning 'parse' trees
may suggest tanru structure which isn't actually present (due to lack
of ke...ke'e).

(On the other hand, you can have an unambiguous grammar whose trees go
the wrong direction; e.g. right-associative tanru.)

(An unambiguous grammar for the same set of strings would be

SELBRI -> BRIVLA | SELBRI BRIVLA

An unambiguous grammar for the same set of parse trees would be

SELBRI -> SBATOM | SELBRI SBATOM
SBATOM -> BRIVLA | "ke" SELBRI "ke'e"

)

--
Kevin Reid <http://homepage.mac.com/kpreid/>

To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Link

Posted by Anonymous on Wed 29 of Oct, 2008 12:44 GMT

On Wed, Oct 29, 2008 at 7:27 AM, Kevin Reid <kpreid@mac.com> wrote:
>
> An unambiguous grammar for the same set of parse trees would be
>
> SELBRI -> SBATOM | SELBRI SBATOM
> SBATOM -> BRIVLA | "ke" SELBRI "ke'e"

And to make "ke'e" elidable:

SELBRI -> SELBRI-closed | SELBRI-open
SELBRI-closed -> SBATOM-closed | SELBRI-closed SBATOM-closed
SELBRI-open -> SBATOM-open | SELBRI-closed SBATOM-open
SBATOM-closed -> BRIVLA | "ke" SELBRI "ke'e"
SBATOM-open -> "ke" SELBRI

My first impresssion is that the mission truly is impossible: i.e. it
is possible to handle the elidable terminators, but not with less than
2000 production rules, because handling each terminator will multiply
the number of productions by some factor, and there are a lot of
terminators. But that's just my first impression.

mu'o mi'e xorxes

To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Link

Posted by Anonymous on Wed 29 of Oct, 2008 18:41 GMT

On Wed, Oct 29, 2008 at 3:19 PM, Robin Lee Powell
<rlpowell@digitalkingdom.org> wrote:
> On Wed, Oct 29, 2008 at 09:41:57AM -0300, Jorge LlambÃas wrote:
>> On Wed, Oct 29, 2008 at 7:27 AM, Kevin Reid <kpreid@mac.com> wrote:
>> >
>> > An unambiguous grammar for the same set of parse trees would be
>> >
>> > SELBRI -> SBATOM | SELBRI SBATOM
>> > SBATOM -> BRIVLA | "ke" SELBRI "ke'e"
>>
>> And to make "ke'e" elidable:
>>
>> SELBRI -> SELBRI-closed | SELBRI-open
>> SELBRI-closed -> SBATOM-closed | SELBRI-closed SBATOM-closed
>> SELBRI-open -> SBATOM-open | SELBRI-closed SBATOM-open
>> SBATOM-closed -> BRIVLA | "ke" SELBRI "ke'e"
>> SBATOM-open -> "ke" SELBRI
>
> Again, this is too simple to actually illustrate the problem: there
> is no sentence that Kevin's example generates that is illegal if
> you remove a ke'e.

True, but there are sentences that could be ambiguously parsed if ke'e
is simply optional. "ke broda brode" would have two valid parses. What
I gave is an unambiguous grammar that handles the elidability of
"ke'e". Indeed this is too simple to show all the isssues with
terminators, but it already shows how the number of rules required
increases. (I later thought of a way of doing it with four rules
rather than five, but still.)

>> My first impresssion is that the mission truly is impossible: i.e.
>> it is possible to handle the elidable terminators, but not with
>> less than 2000 production rules, because handling each terminator
>> will multiply the number of productions by some factor, and there
>> are a lot of terminators. But that's just my first impression.
>
> I don't believe it's possible even with 100K productions; I would be
> very interested to be proven wrong on that.

But you didn't offer any money for that proof! ;)

mu'o mi'e xorxes

To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Link

Posted by sjp on Thu 30 of Oct, 2008 19:23 GMT posts: 4034

On 10/28/08, Robin Lee Powell <rlpowell@digitalkingdom.org> wrote:
>
> Your mission, should you choose to accept it, is to create a CFG for
> Lojban. There are three options, with different monetary values
> attached, in order of what we in the LLG board would prefer to get.

> For $100: Formally prove that encoding Lojban's elidable terminators
> is not possible in a CFG.

I've been thinking about the problem as well as I think the issue is
the ambiguity. For instance
I beleive that any peg grammar can be reduced into a most likely
weaker cfg grammar that has potentialy multiple parse trees for the
same valid statement in the grammar. If you get rid of all things that
are of the form &pattern or !pattern and relax the greadyness
contraint so that ? matches zero or one nongreadily, and * matches
zero or more non greadily than the result of that transformation
should be a cfg. So any proof that shows lojbo gerna can't be encoded
with or without ambiguity in cfg is also a proof that a peg can't do
it either. What a peg can do that a cfg can is provide one unique
parse; if you weaken a peg into a cfg then if there was a peg parse
the cfg will have a nonempty set of valid parses which includes the
one that the peg found. That set is not guarentied to include one and
only one element though. so if you take the first one it finds then it
might not be the right one you wanted.

http://en.wikipedia.org/wiki/Context-free_grammar
Strickly speaking a cfg can't even handle the boi eildiable terminator
correctly without ambiguity.

This is a very minimal cfg grammar that is ambiguouis for both and SUMTI
if you parse "sypy co'e" with it then it can be parsed four different ways.
SUMTI(sy) SUMTI(py ) COhE(co'e)
SUMTI(sypy) COhE( co'e)
SUMTI(sy) SUMTI(py) COhE( co'e)
SUMTI(sypy ) COhE(co'e)

whitespace is semanticly insignificant. adding one boi still doesn't matter.
"sypy boi co'e bycy" produces a whole bunch of valid trees among them
SUMTI(sy) SUMTI(py boi ) COhE(co'e) SUMTI(bycy)
SUMTI(sypyboi) COhE( co'e) SUMTI(bycy)
SUMTI(sy) SUMTI(py boi ) COhE(co'e) SUMTI(by) SUMTI(cy)
SUMTI(sypyboi) COhE( co'e) SUMTI(by) SUMTI(cy)

[[the widely used LR and LL parsers are more efficient algorithms that
deal only with more restrictive subsets of context-free grammars.]]
it's trivial to see that cfg isn't powerful enough to handle many of
lojban's eildiable terminators without ambiguity, but maybe you really
meant one of it's subsets like lalr(1) like what yacc uses.

It is also interesting to me that if you change:
"SUMTI := CYC | CYC BOI" in to "SUMTI := CYC BOI"
then it's no longer semanticly ambigous but it requires you to use
boi, which is the exact behavior jbofi'e seems to have.

The rule to me for elidable terminators seems to me that the addition
of the least amount of terminators at the most delayed spot is the
correct answer. Assuming another version of the grammar that requires
all terminators.

{broda gi'e brode le brodi ku le brodo ku}
is {broda gi'e brode le brodi ku le brodo ku vau vau}
even though {broda gi'e brode le brodi vau ku le brodo ku vau} and
{broda gi'e brode vau le brodi ku le brodo ku vau} have the same
number of additions.

To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Link

Posted by sjp on Thu 30 of Oct, 2008 20:01 GMT posts: 4034

On 10/30/08, Robin Lee Powell <rlpowell@digitalkingdom.org> wrote:
> On Thu, Oct 30, 2008 at 11:21:45AM -0800, Stephen Pollei wrote:
> > On 10/28/08, Robin Lee Powell <rlpowell@digitalkingdom.org> wrote:

>
> You're wrong, I'm afraid.
>
> http://en.wikipedia.org/wiki/Parsing_expression_grammar#Examples
>
> Look for "The following parsing expression grammar describes the
> classic non-context-free language".
S â† &(A !b) a+ B !c
A â† a A? b
B â† b B? c
would will my transformations become
S â† a+ B
A â† a A? b
B â† b B? c
all since I specified that ?, *, and + would be non greedy than this
cfg grammar would match whatever that peg grammar matches, only
problem is that it would over match, and would possible have multiple
matches.
By over match I mean the validation would be much weaker, there might
exist strings the cfg would match that the peg would not.
By multiple matches I mean where the peg might have only one valid
match, the cfg might have more than one.
However as I stated before a peg grammar matched then if you weaken
that peg into a cfg uses the methods I specified then that cfg will
have a non-empty set of matches which includes the same match the peg
had.

for instance the weakened cfg would match "abcc" and "aaaabc" for S
which the peg would reject.
for "aaabbbccc" and others it looks like the parse tree would be
similar, so it's bad example for ambiguity. However it is clearer the
case that when you weaken a peg to a cfg that it can cause ambiquity
issues.

In any case for any peg, you can build a cfg that finds the same
matchs that the peg found.
Only problem is that it is likely to find other matches in addition.

>
> PEGs can encode things that CFGs simply cannot, ambiguously or
> otherwise. This is why Lojban's status is an interesting open
> question: we can do elidable terminators in PEG, but haven't been
> able to in CFGs.
and I mentioned that you can weaken a peg into a cfg that handles them
ambiguously and likely over matches. Also I never claimed that cfg
could be effciently used.
PEGs can do extra validation and resolve ambiquity, both are wonderful things.

[[The following recursive rule matches standard C-style if/then/else
statements in such a way that the optional "else" clause always binds
to the innermost "if", because of the implicit prioritization of the
'/' operator. (In a context-free grammar, this construct yields the
classic dangling else ambiguity.)]]

I also don't say that the weaker cfg won't have ambiguity, in fact I
claim they most likely will. I also claimed that cfg can't handle boi
without issues.

To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Link

Posted by sjp on Thu 30 of Oct, 2008 20:34 GMT posts: 4034

>
> That's not even remotely the same language. For one thing, there's
> no way to get to A, so it's actually:
>
> S => a+ B
> B => b B? c
>

> which is just a

mb

nc

n, which is a completely different language.

the weakened cfg clearly parses a superset of the example peg, that is
true and I'm a 100% possitive that if you took you and xorxes peg
things and reduced it to a cfg that it would also parse a superset of
what your peg parses and do it ambiguiously to boot.

> You may already understand this, and that's fine, I just want there
> to be no confusion: a language that is as close to Lojban as that is

> to a

nb

nc

n is of no use to us, and we won't pay out for it.

Yes I never argued that the resultant cfg would be any good merely
that it would parse all valid lojban that your peg already parses.
Also I'd claim that the cfg would produce set of answers that would
include your peg answer for any given valid lojban text which your peg
parses.

Also yes the resulting cfg would parse some text which is invalid
lojban, especially at the word morphology level I see that it has some
validation checks for things like invalid consonant clusters which the
cfg version of it would lose. At the grammar level though I'm unsure
if the extra invalid texts it would parse would: a) exist b) be
noticeable . The one thing I'm certain of is that it would be ambigous
not even handling {boi} correctly.
The {boi} example is really what I thought was more interesting then
the thought that you could weaken any peg.

To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Link

Posted by pdf23ds on Thu 30 of Oct, 2008 20:45 GMT posts: 143

On Thu, Oct 30, 2008 at 15:33, Stephen Pollei <stephen.pollei@gmail.com> wrote:
> At the grammar level though I'm unsure
> if the extra invalid texts it would parse would: a) exist

Yes, definitely. There are some places where the greedy parsing is
wrong, including one bug I myself fixed.

> b) be
> noticeable.

Yes. I imagine the PEG-superset-CFG would give dozens or hundreds of
parse trees for pretty much every sentence longer than a few words.

Chris Capel
--
"What is it like to be a bat? What is it like to bat a bee? What is it
like to be a bee being batted? What is it like to be a batted bee?"
-- The Mind's I (Hofstadter, Dennet)

To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

Link

Lojban In General

A challenge for computer science/programming geeks: The LLG wants to give you $500!

Search Lojban Resources

Lojban In General

Thread actions

Search Lojban Resources