A challenge for computer science/programming geeks: The LLG wants to give you $500! Posted by pdf23ds on Tue 28 of Oct, 2008 22:51 GMT posts: 143 Use this thread to discuss the A challenge for computer science/programming geeks: The LLG wants to give you $500! page.
Posted by pdf23ds on Tue 28 of Oct, 2008 22:51 GMT posts: 143 On Tue, Oct 28, 2008 at 16:51, Robin Lee Powell <rlpowell@digitalkingdom.org> wrote: > Your mission, should you choose to accept it, is to create a CFG for > Lojban. There are three options, with different monetary values > attached, in order of what we in the LLG board would prefer to get. How exactly is PEG not context free? Wikipedia's not helping me much here. AFAICT, PEGs are a subset of CFGs where each rule is unique and all operators are greedy and unambiguous. Chris Capel -- "What is it like to be a bat? What is it like to bat a bee? What is it like to be a bee being batted? What is it like to be a batted bee?" -- The Mind's I (Hofstadter, Dennet) To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by pdf23ds on Tue 28 of Oct, 2008 23:07 GMT posts: 143 On Tue, Oct 28, 2008 at 16:51, Robin Lee Powell <rlpowell@digitalkingdom.org> wrote: > For $100: Formally prove that encoding Lojban's elidable terminators > is not possible in a CFG. Hmm. Don't get too excited, there. What's the standard of formality, here? Computer proof level, or simply very explicit English? Chris Capel -- "What is it like to be a bat? What is it like to bat a bee? What is it like to be a bee being batted? What is it like to be a batted bee?" -- The Mind's I (Hofstadter, Dennet) To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by Anonymous on Wed 29 of Oct, 2008 00:39 GMT On Tue, Oct 28, 2008 at 6:51 PM, Robin Lee Powell <rlpowell@digitalkingdom.org> wrote: > > For $500: Produce a working CFG for Lojban, in any format that some > parser generator somewhere can accept, *or* in BNF or a well defined > extension thereof. It must support every aspect of the language > except for ZOI (which is fully context sensitive). Elidable > terminators are the hard part. You don't specify that the CFG must be unambiguous, so ambiguous CFG's are acceptable? (We already know the language is unambiguous, but a grammar that generates it can still be ambiguous, allowing more than one way to generate the same valid string.) Terminals are selmaho, not phonemes, right? mu'o mi'e xorxes To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by Anonymous on Wed 29 of Oct, 2008 01:24 GMT On Tue, Oct 28, 2008 at 9:59 PM, Robin Lee Powell <rlpowell@digitalkingdom.org> wrote: > > If the grammar is ambiguous, I think you've failed to make a grammar for Lojban. A CFG is a set of rules that generates each and only the valid strings of the language. Unambiguity is desirable for a parsing grammar (which we already have, the PEG), but a generating grammar that generates all the Lojban strings is also a grammar for Lojban, even if it has more than one way to generate the same valid string. A machine that generated its output from such a grammar would generate valid Lojban. If you want an unambiguous CFG (which is obviously a somewhat harder problem) I think you need to specify it. mu'o mi'e xorxes To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by Anonymous on Wed 29 of Oct, 2008 01:35 GMT On Tue, Oct 28, 2008 at 10:24 PM, Robin Lee Powell <rlpowell@digitalkingdom.org> wrote: > > I don't believe, but am not certain, that that is possible. That > is, I believe that if there is more than one way to generate a > string, you've describe an ambiguous language, and that such a > description would not, in fact, describe Lojban. See example 3 in <http://en.wikipedia.org/wiki/Context-free_grammar> mu'o mi'e xorxes To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by slobin on Wed 29 of Oct, 2008 08:42 GMT posts: 40 On 10/29/08, Robin Lee Powell <rlpowell@digitalkingdom.org> wrote: > > A machine that generated its output from such a grammar would > > generate valid Lojban. If you want an unambiguous CFG (which is > > obviously a somewhat harder problem) I think you need to specify > > it. > > Then please, everyone, consider it so specified. Apologies for > triple posting again, but if xorxes is right this is important. Correct me if I mislook something, but without this clarification the task seems to be trivial, isn't it? If we don't care about parsing tree, we can just make all elidable terminators optional and get the solution. Look at a very tiny subset of the language: <bridi> ::= broda | <bridi> <bridi> | nu <bridi> | nu <bridi> kei The string "nu broda broda" can be generated in two ways: nu broda) broda) and (nu (broda broda , but the grammar generates all good strings and only good strings. Therefore unambiguity is the central difficulty there. -- http://slobin.pp.ru/ `When I use a word,' Humpty Dumpty said, <cyril@slobin.pp.ru> `it means just what I choose it to mean' To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by kpreid on Wed 29 of Oct, 2008 10:32 GMT posts: 7 On Oct 28, 2008, at 21:32, Robin Lee Powell wrote: > On Tue, Oct 28, 2008 at 10:30:16PM -0300, Jorge Llambas wrote: >> On Tue, Oct 28, 2008 at 10:24 PM, Robin Lee Powell >> <rlpowell@digitalkingdom.org> wrote: >>> >>> I don't believe, but am not certain, that that is possible. >>> That is, I believe that if there is more than one way to >>> generate a string, you've describe an ambiguous language, and >>> that such a description would not, in fact, describe Lojban. >> >> See example 3 in >> <http://en.wikipedia.org/wiki/Context-free_grammar> > > You win. It seems to me that if you have such an ambiguous grammar, then it is, if not actually wrong, then misleading: the parse tree has degrees of freedom which are not actually part of the structure of the language. For example, the trivial grammar SELBRI -> BRIVLA | SELBRI SELBRI will generate only valid selbri, but non-left-leaning 'parse' trees may suggest tanru structure which isn't actually present (due to lack of ke...ke'e). (On the other hand, you can have an unambiguous grammar whose trees go the wrong direction; e.g. right-associative tanru.) (An unambiguous grammar for the same set of strings would be SELBRI -> BRIVLA | SELBRI BRIVLA An unambiguous grammar for the same set of parse trees would be SELBRI -> SBATOM | SELBRI SBATOM SBATOM -> BRIVLA | "ke" SELBRI "ke'e" ) -- Kevin Reid <http://homepage.mac.com/kpreid/> To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by Anonymous on Wed 29 of Oct, 2008 12:44 GMT On Wed, Oct 29, 2008 at 7:27 AM, Kevin Reid <kpreid@mac.com> wrote: > > An unambiguous grammar for the same set of parse trees would be > > SELBRI -> SBATOM | SELBRI SBATOM > SBATOM -> BRIVLA | "ke" SELBRI "ke'e" And to make "ke'e" elidable: SELBRI -> SELBRI-closed | SELBRI-open SELBRI-closed -> SBATOM-closed | SELBRI-closed SBATOM-closed SELBRI-open -> SBATOM-open | SELBRI-closed SBATOM-open SBATOM-closed -> BRIVLA | "ke" SELBRI "ke'e" SBATOM-open -> "ke" SELBRI My first impresssion is that the mission truly is impossible: i.e. it is possible to handle the elidable terminators, but not with less than 2000 production rules, because handling each terminator will multiply the number of productions by some factor, and there are a lot of terminators. But that's just my first impression. mu'o mi'e xorxes To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by Anonymous on Wed 29 of Oct, 2008 18:41 GMT On Wed, Oct 29, 2008 at 3:19 PM, Robin Lee Powell <rlpowell@digitalkingdom.org> wrote: > On Wed, Oct 29, 2008 at 09:41:57AM -0300, Jorge LlambÃas wrote: >> On Wed, Oct 29, 2008 at 7:27 AM, Kevin Reid <kpreid@mac.com> wrote: >> > >> > An unambiguous grammar for the same set of parse trees would be >> > >> > SELBRI -> SBATOM | SELBRI SBATOM >> > SBATOM -> BRIVLA | "ke" SELBRI "ke'e" >> >> And to make "ke'e" elidable: >> >> SELBRI -> SELBRI-closed | SELBRI-open >> SELBRI-closed -> SBATOM-closed | SELBRI-closed SBATOM-closed >> SELBRI-open -> SBATOM-open | SELBRI-closed SBATOM-open >> SBATOM-closed -> BRIVLA | "ke" SELBRI "ke'e" >> SBATOM-open -> "ke" SELBRI > > Again, this is too simple to actually illustrate the problem: there > is no sentence that Kevin's example generates that is illegal if > you remove a ke'e. True, but there are sentences that could be ambiguously parsed if ke'e is simply optional. "ke broda brode" would have two valid parses. What I gave is an unambiguous grammar that handles the elidability of "ke'e". Indeed this is too simple to show all the isssues with terminators, but it already shows how the number of rules required increases. (I later thought of a way of doing it with four rules rather than five, but still.) >> My first impresssion is that the mission truly is impossible: i.e. >> it is possible to handle the elidable terminators, but not with >> less than 2000 production rules, because handling each terminator >> will multiply the number of productions by some factor, and there >> are a lot of terminators. But that's just my first impression. > > I don't believe it's possible even with 100K productions; I would be > very interested to be proven wrong on that. But you didn't offer any money for that proof! ;) mu'o mi'e xorxes To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by sjp on Thu 30 of Oct, 2008 19:23 GMT posts: 4034 On 10/28/08, Robin Lee Powell <rlpowell@digitalkingdom.org> wrote: > > Your mission, should you choose to accept it, is to create a CFG for > Lojban. There are three options, with different monetary values > attached, in order of what we in the LLG board would prefer to get. > For $100: Formally prove that encoding Lojban's elidable terminators > is not possible in a CFG. I've been thinking about the problem as well as I think the issue is the ambiguity. For instance I beleive that any peg grammar can be reduced into a most likely weaker cfg grammar that has potentialy multiple parse trees for the same valid statement in the grammar. If you get rid of all things that are of the form &pattern or !pattern and relax the greadyness contraint so that ? matches zero or one nongreadily, and * matches zero or more non greadily than the result of that transformation should be a cfg. So any proof that shows lojbo gerna can't be encoded with or without ambiguity in cfg is also a proof that a peg can't do it either. What a peg can do that a cfg can is provide one unique parse; if you weaken a peg into a cfg then if there was a peg parse the cfg will have a nonempty set of valid parses which includes the one that the peg found. That set is not guarentied to include one and only one element though. so if you take the first one it finds then it might not be the right one you wanted. http://en.wikipedia.org/wiki/Context-free_grammar Strickly speaking a cfg can't even handle the boi eildiable terminator correctly without ambiguity. BOI := bBoOiI | bBoOiI | bBoOiI COhE := cCoO'heE | cCoO'heE | cCoO'heE CY := bcdfgjspBCDFGJSPyY | bcdfgjspBCDFGJSPyY | bcdfgjspBCDFGJSPyY CYC := CY | CYC CY SUMTI := CYC | CYC BOI SUMTIS := SUMTI | SUMTIS SUMTI DONE := SUMTIS | SUMTIS COhE | COhE SUMTIS | SUMTIS COhE SUMTIS This is a very minimal cfg grammar that is ambiguouis for both and SUMTI if you parse "sypy co'e" with it then it can be parsed four different ways. SUMTI(sy) SUMTI(py ) COhE(co'e) SUMTI(sypy) COhE( co'e) SUMTI(sy) SUMTI(py) COhE( co'e) SUMTI(sypy ) COhE(co'e) whitespace is semanticly insignificant. adding one boi still doesn't matter. "sypy boi co'e bycy" produces a whole bunch of valid trees among them SUMTI(sy) SUMTI(py boi ) COhE(co'e) SUMTI(bycy) SUMTI(sypyboi) COhE( co'e) SUMTI(bycy) SUMTI(sy) SUMTI(py boi ) COhE(co'e) SUMTI(by) SUMTI(cy) SUMTI(sypyboi) COhE( co'e) SUMTI(by) SUMTI(cy) [[the widely used LR and LL parsers are more efficient algorithms that deal only with more restrictive subsets of context-free grammars.]] it's trivial to see that cfg isn't powerful enough to handle many of lojban's eildiable terminators without ambiguity, but maybe you really meant one of it's subsets like lalr(1) like what yacc uses. It is also interesting to me that if you change: "SUMTI := CYC | CYC BOI" in to "SUMTI := CYC BOI" then it's no longer semanticly ambigous but it requires you to use boi, which is the exact behavior jbofi'e seems to have. The rule to me for elidable terminators seems to me that the addition of the least amount of terminators at the most delayed spot is the correct answer. Assuming another version of the grammar that requires all terminators. {broda gi'e brode le brodi ku le brodo ku} is {broda gi'e brode le brodi ku le brodo ku vau vau} even though {broda gi'e brode le brodi vau ku le brodo ku vau} and {broda gi'e brode vau le brodi ku le brodo ku vau} have the same number of additions. To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by sjp on Thu 30 of Oct, 2008 20:01 GMT posts: 4034 On 10/30/08, Robin Lee Powell <rlpowell@digitalkingdom.org> wrote: > On Thu, Oct 30, 2008 at 11:21:45AM -0800, Stephen Pollei wrote: > > On 10/28/08, Robin Lee Powell <rlpowell@digitalkingdom.org> wrote: > > You're wrong, I'm afraid. > > http://en.wikipedia.org/wiki/Parsing_expression_grammar#Examples > > Look for "The following parsing expression grammar describes the > classic non-context-free language". S ↠&(A !b) a+ B !c A ↠a A? b B ↠b B? c would will my transformations become S ↠a+ B A ↠a A? b B ↠b B? c all since I specified that ?, *, and + would be non greedy than this cfg grammar would match whatever that peg grammar matches, only problem is that it would over match, and would possible have multiple matches. By over match I mean the validation would be much weaker, there might exist strings the cfg would match that the peg would not. By multiple matches I mean where the peg might have only one valid match, the cfg might have more than one. However as I stated before a peg grammar matched then if you weaken that peg into a cfg uses the methods I specified then that cfg will have a non-empty set of matches which includes the same match the peg had. for instance the weakened cfg would match "abcc" and "aaaabc" for S which the peg would reject. for "aaabbbccc" and others it looks like the parse tree would be similar, so it's bad example for ambiguity. However it is clearer the case that when you weaken a peg to a cfg that it can cause ambiquity issues. In any case for any peg, you can build a cfg that finds the same matchs that the peg found. Only problem is that it is likely to find other matches in addition. > > PEGs can encode things that CFGs simply cannot, ambiguously or > otherwise. This is why Lojban's status is an interesting open > question: we can do elidable terminators in PEG, but haven't been > able to in CFGs. and I mentioned that you can weaken a peg into a cfg that handles them ambiguously and likely over matches. Also I never claimed that cfg could be effciently used. PEGs can do extra validation and resolve ambiquity, both are wonderful things. [[The following recursive rule matches standard C-style if/then/else statements in such a way that the optional "else" clause always binds to the innermost "if", because of the implicit prioritization of the '/' operator. (In a context-free grammar, this construct yields the classic dangling else ambiguity.)]] I also don't say that the weaker cfg won't have ambiguity, in fact I claim they most likely will. I also claimed that cfg can't handle boi without issues. To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by sjp on Thu 30 of Oct, 2008 20:34 GMT posts: 4034 > > That's not even remotely the same language. For one thing, there's > no way to get to A, so it's actually: > > S => a+ B > B => b B? c > > which is just ambncn, which is a completely different language. the weakened cfg clearly parses a superset of the example peg, that is true and I'm a 100% possitive that if you took you and xorxes peg things and reduced it to a cfg that it would also parse a superset of what your peg parses and do it ambiguiously to boot. > You may already understand this, and that's fine, I just want there > to be no confusion: a language that is as close to Lojban as that is > to anbncn is of no use to us, and we won't pay out for it. Yes I never argued that the resultant cfg would be any good merely that it would parse all valid lojban that your peg already parses. Also I'd claim that the cfg would produce set of answers that would include your peg answer for any given valid lojban text which your peg parses. Also yes the resulting cfg would parse some text which is invalid lojban, especially at the word morphology level I see that it has some validation checks for things like invalid consonant clusters which the cfg version of it would lose. At the grammar level though I'm unsure if the extra invalid texts it would parse would: a) exist b) be noticeable . The one thing I'm certain of is that it would be ambigous not even handling {boi} correctly. The {boi} example is really what I thought was more interesting then the thought that you could weaken any peg. To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.
Posted by pdf23ds on Thu 30 of Oct, 2008 20:45 GMT posts: 143 On Thu, Oct 30, 2008 at 15:33, Stephen Pollei <stephen.pollei@gmail.com> wrote: > At the grammar level though I'm unsure > if the extra invalid texts it would parse would: a) exist Yes, definitely. There are some places where the greedy parsing is wrong, including one bug I myself fixed. > b) be > noticeable. Yes. I imagine the PEG-superset-CFG would give dozens or hundreds of parse trees for pretty much every sentence longer than a few words. Chris Capel -- "What is it like to be a bat? What is it like to bat a bee? What is it like to be a bee being batted? What is it like to be a batted bee?" -- The Mind's I (Hofstadter, Dennet) To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.