Lojban In General

Lojban In General


Parsing NIhO sections of text

posts: 80

coi rodo,

I'm trying to parse out sections of Lojban text delimited by sequences
of NIhO cmavo into their respective paragraphs, sections, chapters,
etc.

So, if I have:

ni'o ni'o
broda
ni'o
broda
ni'o ni'o
broda

I would like to get something like:

[broda,broda],broda]

where the inner brackets represent paragraphs, the outer brackets
represent sections, and further containing brackets would designate
chapters, parts, volumes, etc.

I'm trying to use the DCG facilities of Prolog to do this. For
simplicity, I'm using "p" to represent a paragraph and "n" to
represent a cmavo from NIhO.

The CLL states that a text utilizing NIhO should start with a string
of NIhOs as long as any other NIhO string in the text. I managed to
create grammar rules to parse paragraph structure, AS LONG AS the
above condition is met. The following DCG clauses do this well:

parse(0,p) --> p.
parse(_,[]) --> [].
parse(N,T) --> n, parse(N,H), {H \= []}, parse(N,T).

They find the correct parse, and only the correct parse. (i.e.,
backtracking always terminates and never finds any more solutions.)

Here's an example of the parser in action:

| ?- phrase(parse(Depth,Parse),n,n,p,n,p,n,n,p).

Depth = 0
Parse = [p,p],p] ?

This is the same structure as in the {broda} example above. (Note:
the Depth returned is in Peano form: 1 = 0, 2 = 0, etc.)

The problem I'm having is that when the CLL condition is NOT
met... that is, when a longer string of NIhOs appears somewhere down
the line, the text will fail to parse. For example:

| ?- phrase(parse(Depth,Parse),n,n,p,n,n,n,p,n,n,p).

no

That "no" is Prolog's way of saying that the phrase
"n,n,p,n,n,n,p,n,n,p" doesn't satisfy the grammar defined for
"parse". That's because the text starts with NIhO NIhO (two NIhOs)
but has NIhO NIhO NIhO (three NIhOs) further along in it.

I've tried two different approaches, now, to infer how many NIhOs are
missing at the front of the text. One of these approaches worked
correctly, but it required the use of cuts(!) to prevent infinite
recursion. While that's fine for parsing, using cuts really is
cheating when it comes to writing grammar rules.

Does anyone here know how I could use contetx-free grammar rules to
parse the different sections separated by NIhO sequences?

Any ideas (expressed in EBNF, Prolog, YACC, or whatever you speak)
would be much appreciated!

ki'e

mi'e brablonau


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.

posts: 3588

de'i li 06 pi'e 08 pi'e 2009 la'o fy. sunrise2000@comcast.net .fy. cusku zoi
skamyxatra.
> I'm trying to parse out sections of Lojban text delimited by sequences
> of NIhO cmavo into their respective paragraphs, sections, chapters,
> etc.
...
> Does anyone here know how I could use contetx-free grammar rules to
> parse the different sections separated by NIhO sequences?
>
> Any ideas (expressed in EBNF, Prolog, YACC, or whatever you speak)
> would be much appreciated!
.skamyxatra

If the length of a NIhO sequence exceeds the maximum depth of the parse
tree/list/structure, can't you just enclose the list in another list until the
depths match? E.g., when you have the list X=[broda, broda], broda], and
you encounter four NIhOs in a row, let X=[[X]] (two levels of lists because
four minus the depth of X is two), and then append to X whatever comes after
that.

I don't think this can be handled by a CFG without encoding the lengths of the
NIhO strings into the productions, which would lead to an infinitely large
grammar. Note that the official Yacc and BNF grammars treat a sequence of
NIhOs as a single NIhO and leave the structuring of the text to whatever
semantic engine comes after.

mu'omi'e .kamymecraijun.

--
li'a .e'i ca vondei .i mi na'e pu'i kufra loi vondei


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.