This document was originated to provide a clearing house for discussion related to the BPFK Magic Words checkpoint, as well as a place safer than /tmp to store my ongoing description of the magic word interactions, which here follows.
Magic words are all cmavo that interact directly with the nature of the speech stream, so SI SA SU ZO ZOI LOhU LEhU ZEI BU BAhE FAhO.
The two documents that were used to construct this page are grammar.300 (which is a plain text document, and really should be renamed to have .txt after it) and RefGram Chapter 19, section 16.
It is worth noting that these two documents contradict each other on many points. In particular, the order of operations is substantially different (grammar.300 handles zoi before lo'u...le'u, for example).
(based on C16 S19 of the RefGram)
In order of precedence. selma'o names used throughout. + means "a member of the selma'o to the left of the + followed immediately by a member of the selma'o to the right of the + (ignoring words taken out of the equation by previous steps)". For example, "ZO+LEhU" means a string like "zo le'u" (since both of these are currently (6 Nov 2004) single member selma'o).
SI gets very complicated in a few cases, so here we go. Some SA notes here too.
Deleting bits as I integrate them into the list above.
Step 2 - Filtering
From start to end, performing the following filtering and lexing tasks
using the given order of precedence in case of conflict:
a. If the Lojban word "zoi" (selma'o ZOI) is identified, take the
following Lojban word (which should be end delimited with a pause for
separation from the following non-Lojban text) as an opening delimiter.
Treat all text following that delimiter, until that delimiter recurs
*after a pause*, as grammatically a single token (labelled 'anything_699'
in this grammar). There is no need for processing within this text
except as necessary to find the closing delimiter.
Please not that since this is the very first rule, grammar.300 allows ZOI quotes in LOhU...LEhU, which the Red Book seems to disagree with.
e. If the word "si" (selma'o SI) is identified, erase it and the
previous word (or token, if the previous text has been condensed into a
single token by one of the above rules).
f. If the word "sa" (selma'o SA) is identified, erase it and all
preceding text as far back as necessary to make what follows attach to
what precedes. (This rule is hard to formalize and may receive further
definition later.)
g. If the word 'su' (selma'o SU) is identified, erase it and all
preceding text back to and including the first preceding token word
which is in one of the selma'o: NIhO, LU, TUhE, and TO. However, if
speaker identification is available, a SU shall only erase to the
beginning of a speaker's discourse, unless it occurs at the beginning of
a speaker's discourse. (Thus, if the speaker has said something, two
"su"'s are required to erase the entire conversation.
Step 3 - Termination
If the text contains a FAhO, treat that as the end-of-text and ignore
everything that follows it.
Step 4 - Absorption of Grammar-Free Tokens
In a new pass, perform the following absorptions (absorption means that
the token is removed from the grammar for processing in following steps,
and optionally reinserted, grouped with the absorbing token after
parsing is completed).
a. Token sequences of the form any - (ZEI - any) ..., where there may be
any number of ZEIs, are merged into a single token of selma'o BRIVLA.
b. Absorb all selma'o BAhE tokens into the following token. If
they occur at the end of text, leave them alone (they are errors).
c. Absorb all selma'o BU tokens into the previous token. Relabel the
previous token as selma'o BY.
d. If selma'o NAI occurs immediately following any of tokens UI or CAI,
absorb the NAI into the previous token.
e. Absorb all members of selma'o DAhO, FUhO, FUhE, UI, Y, and CAI
into the previous token. All of these null grammar tokens are permitted
following any word of the grammar, without interfering with that word's
grammatical function, or causing any effect on the grammatical
interpretation of any other token in the text. Indicators at the
beginning of text are explicitly handled by the grammar.