Morphology: Algorithm Posted by pycyn on Sat 26 of Feb, 2005 02:29 GMT posts: 2388 Use this thread to discuss the Morphology: Algorithm page.
Posted by pycyn on Sat 26 of Feb, 2005 02:29 GMT posts: 2388 OK, so I haven't been following this as closely as I ought. To bring me up to speed, let me lay out what I understand to be the status quo ante for "pure" Lojban (no names, no fuhivla). There are then only cmavo, gismu and lujvo. Using W to stand for any of V, diphthong, V'V, y cmavo are all (C)W'W where the current number of repetitons of the boxed syllable is 1 and there are no cases of sibilant + iV nor syllabic consonant + uV and y has some sort of restrictions. The restrictions on y can probably be overridden. The restriction on uV seems odd to reasonably competent speakers of French and Spanish and might be overrriden. For some purposes {y bu} counts as a cmavo but need not for these phonological ones. Gismu are all either CV*CCV or CCV*CV (V* being a stressed vowel). This, the oldest rule in the book, is not going to change. The most that can happen is that a new class of brivla is created which are unanalyzable within Lojban, like gismu, but are not derived in the gismu building way from other languages. Lujvo are built up from strings of (more or less) reduced forms of gismu and so are constructed as follows. Each begins with CVCCy/CCVCy/CCV/CVC*/CVC/CV(')V(#) where initial CC must be permitted initial CC, VV must be a diphthong permitted in that place (see cmavo list) V is not y, is y if the preceding and following Cs form a permitted initial CC (and the whole is not CCC?)and V is unstressed. Otherwise * is void. is r if the next syllable does not begin with CC or begins with a permitted initial CC. It is n in these cases where the first C is r. Otherwise # is void. is y if the preceding and following Cs form an impermissible CC. Otherwise is void. If the V (the second in the case of CV'V) in this is stressed (V+), the whole can be followed only by CCV/CVV. Otherwise the whole can be followed by CCVCy/CVCCy/CVC/CCV/CV(')V repeated any number of times with unstressed V. Finally, there is any of the stressed initial + final chunks from above, or CV+CCV/CCV+CV/CV+'V This is a much later set of rules, but is also unlikely to change. Now, given the original specification of brivla, how can this be liberalized. According to the orignal specs, a brivla 1. ends in a vowel 2. has penultimate stress 3. has a CC in its first 5 phones (not counting ' and y). Anything more than this (and the specifics of Lojban phonotactics) is irrelevant for the separation of words in a string and into categories even. Introducing names and direct borrowings does not change this assuming we keep some barriers around them (initial {la] and the like with final conconant and pause, {la'o} and {zoi}, or explicit markers like {iy} and {uy}). So, once the beginning of a brivla is set — by the CC or y — what happens after that until the next stressed vowel is not relevant to the separation algorithm (except, as noted to make sure everything that occurs is permissible in Lojban — a question we appear now to be discussing). Nor need setting the beginning (finding the first CC or y) be restricted as it it above: a string of up to three vowels, whether as a triphthong, a diphthong and another syllable joined by ' or three syllables joined by ' will work equally well, provided the first CC is not a permitted initial (and if the rightmost of the vowels is stressed, even that is allowed). At the end, any permitted combination of syllable final and syllable initial clusters can occur, including an interpolated complete y syllable (I think more than one would be pushing a bit to far), ending witha vowel diphthong or triphthong. If this is essentially correct (and I am pretty sure I have not taken proper care of at least r (n) hyphens when syllabic here), then the only question is what is permissible in Lojban and even that does not affect the separation algorithm as such but rather the question of whether what has been separated out are Lojban words. And the only construction problems with naturalized borrowings is to be sure you don't accidentally produce from some other source a lujvo — which given lujvos' restricted form is just a matter of (computer assisted — these thinks don't get created on the fly) checking.
Posted by xorxes on Sat 26 of Feb, 2005 02:30 GMT posts: 1912 > Using W to stand for any of V, diphthong, V'V, y > cmavo are all > (C)W'W > where the current number of repetitons of the > boxed syllable is 1 and there are no cases of > sibilant + iV nor syllabic consonant + uV and y > has some sort of restrictions. In fact there are no cases of any consonant plus iV or uV, not just sibilants or syllabic consonants. y only appears in Cy, y'y and y. > Lujvo are built up from strings of (more or less) > reduced forms of gismu and so are constructed as > follows. Each begins with > CVCCy/CCVCy/CCV/CVC*/CVC/CV(')V(#) > > where initial CC must be permitted initial CC, > VV must be a diphthong permitted in that place > (see cmavo list) > V is not y, > * is y if the preceding and following Cs form a > permitted initial CC (and the whole is not > CCC?)and V is unstressed. Otherwise * is void. > # is r if the next syllable does not begin with > CC or begins with a permitted initial CC. It is > n in these cases where the first C is r. > Otherwise # is void. > is y if the preceding and following Cs form an > impermissible CC. Otherwise is void. and look very similar > If the V (the second in the case of CV'V) in this > is stressed (V+), the whole can be followed only > by CCV/CVV. > Otherwise the whole can be followed by > CCVCy/CVCCy/CVC^/CCV/CV(')V repeated any number > of times with unstressed V. > Finally, there is any of the stressed initial + > final chunks from above, or CV+CCV/CCV+CV/CV+'V You also have to take into account the tosmabru test. CVC+CVCCV appears to be legitimate, but in fact sometimes it is not because it could break as CV CCV+CCV. In order to avoid this breakage, you have to use CVCyCVCCV in that and analogous cases. In the case of fu'ivla, in addition to the tosmabru test you also need to apply the slinku'i test: A fu'ivla can't consist of a consonant plus a string of rafsi, even if it fullfills the other criteria, because when a CV cmavo is in front of it, it will look just like a lujvo. But other than that, yes it's basically right. mu'o mi'e xorxes __ Do you Yahoo!? Yahoo! Sports - Sign up for Fantasy Baseball. http://baseball.fantasysports.yahoo.com/
Posted by pycyn on Sat 26 of Feb, 2005 02:31 GMT posts: 2388 wrote: > > --- John E Clifford wrote: > > Using W to stand for any of V, diphthong, > V'V, y > > cmavo are all > > (C)W'W > > where the current number of repetitons of the > > boxed syllable is 1 and there are no cases of > > sibilant + iV nor syllabic consonant + uV and > y > > has some sort of restrictions. > > In fact there are no cases of any consonant > plus iV or uV, > not just sibilants or syllabic consonants. Yeah, I know it says that but I thought that had changed since CLL. Well, add it in and a note that that is way to restrictive and should be removed before a lot more expansion into 'W space > y only appears in Cy, y'y and y. Okay, so the restrictions is that 'y is the only syllable to follow y and that only once. > > Lujvo are built up from strings of (more or > less) > > reduced forms of gismu and so are constructed > as > > follows. Each begins with > > CVCCy/CCVCy/CCV/CVC*/CVC/CV(')V(#) > > > > where initial CC must be permitted initial > CC, > > VV must be a diphthong permitted in that > place > > (see cmavo list) > > V is not y, > > * is y if the preceding and following Cs > form a > > permitted initial CC (and the whole is not > > CCC?)and V is unstressed. Otherwise * is > void. > > # is r if the next syllable does not begin > with > > CC or begins with a permitted initial CC. It > is > > n in these cases where the first C is r. > > Otherwise # is void. > > is y if the preceding and following Cs form > an > > impermissible CC. Otherwise is void. > > * and look very similar > > > If the V (the second in the case of CV'V) in > this > > is stressed (V+), the whole can be followed > only > > by CCV/CVV. > > Otherwise the whole can be followed by > > CCVCy/CVCCy/CVC^/CCV/CV(')V repeated any > number > > of times with unstressed V. > > Finally, there is any of the stressed initial > + > > final chunks from above, or > CV+CCV/CCV+CV/CV+'V > > You also have to take into account the tosmabru > test. > CVC+CVCCV appears to be legitimate, but in fact > sometimes it is not because it could break as > CV CCV+CCV. I thought * took care of that; have I forgotten some further twist. >In order to avoid this breakage, > you have to use CVCyCVCCV in that and analogous > > cases. > > In the case of fu'ivla, in addition to the > tosmabru > test you also need to apply the slinku'i test: > A fu'ivla can't consist of a consonant plus a > string > of rafsi, even if it fullfills the other > criteria, > because when a CV cmavo is in front of it, it > will > look just like a lujvo. Well, I didn't deal with fuhivla just because it caused further problems like this, but that at least is an easy rule to write in (and an excuse to separate lujvo from other brivlajust as gismu are). > But other than that, yes it's basically right. > Glad to hear it. Maybe I can follow a bit better now.
Posted by xorxes on Sat 26 of Feb, 2005 02:31 GMT posts: 1912 > > y only appears in Cy, y'y and y. > > Okay, so the restrictions is that 'y is the > only syllable to follow y and that only once. Hmmm, are we talking about permitted forms, or instantiated forms? Those are the only instantiated forms, but otherwise y is allowed in cmavo-forms like other single vowels. > > > follows. Each begins with > > > CVCCy/CCVCy/CCV/CVC*/CVC/CV(')V(#) > > > * is y if the preceding and following Cs > > form a > > > permitted initial CC (and the whole is not > > > CCC?)and V is unstressed. Otherwise * is > > void. > > > is y if the preceding and following Cs form > > an > > > impermissible CC. Otherwise is void. > > > > You also have to take into account the tosmabru > > test. > > CVC+CVCCV appears to be legitimate, but in fact > > sometimes it is not because it could break as > > CV CCV+CCV. > > I thought * took care of that; have I forgotten > some further twist. Sorry, I took * and to be saying the same thing, but I see they aren't. But * is not quite it. For example CVCCV'V doesn't need a y, but rule * as stated would require it. If the thing after CV is a string of rafsi, then you need the 'y'. I don't think you can escape examining the whole thing till the end. mu'o mi'e xorxes __ Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less. http://info.mail.yahoo.com/mail_250
Posted by pycyn on Sat 26 of Feb, 2005 03:29 GMT posts: 2388 wrote: > > --- John E Clifford wrote: > > > y only appears in Cy, y'y and y. > > > > Okay, so the restrictions is that 'y is the > > only syllable to follow y and that only once. > > Hmmm, are we talking about permitted forms, or > instantiated forms? Great, scratch the restriction — though I don't expect a lot of complex cmavo with y in the middle . > Those are the only instantiated forms, but > otherwise y is allowed > in cmavo-forms like other single vowels. > > > > > follows. Each begins with > > > > CVCCy/CCVCy/CCV/CVC*/CVC/CV(')V(#) > > > > > * is y if the preceding and following Cs > > > form a > > > > permitted initial CC (and the whole is > not > > > > CCC?)and V is unstressed. Otherwise * is > > > void. > > > > is y if the preceding and following Cs > form > > > an > > > > impermissible CC. Otherwise is void. > > > > > > You also have to take into account the > tosmabru > > > test. > > > CVC+CVCCV appears to be legitimate, but in > fact > > > sometimes it is not because it could break > as > > > CV CCV+CCV. > > > > I thought * took care of that; have I > forgotten > > some further twist. > > Sorry, I took * and to be saying the same > thing, but I see > they aren't. > > But * is not quite it. For example CVCCV'V > doesn't need a y, > but rule * as stated would require it. What is the status of {spa'i}, which used to be bruted about? The rule, though wrong when only lujvo are involved, looks to be right when further forms are permitted. It shouldn't need to look beyond determining the first CC and what goes before it, since from there on there are no uniqueness requirements until we get to the stressed vowel and this doesn't play a role there. But other rules may have to change if further brivla forms are allowed too. For example, I suppose the slinku'i test would require that dropping the first consonant did not give a brivla of any form (which shoots down initial CCC pretty fast), but since then it is only necessary to check details through the first CC (or CyC) and the bit after the stressed vowel, this looks pretty manageable. The rest is just not using illegal groupings and even those go beyond simply slicing a correct speech stream correctly to also checking that every slice is a (potential) Lojban word, a logically distinct task (unless written into the original claim, which I am not sure was the case). > If the thing after CV is a string of rafsi, > then you need the 'y'. > I don't think you can escape examining the > whole thing till the end. > > mu'o mi'e xorxes > > > > > __ > Do you Yahoo!? > Yahoo! Mail - 250MB free storage. Do more. > Manage less. > http://info.mail.yahoo.com/mail_250 > > >
Posted by xorxes on Sat 26 of Feb, 2005 03:29 GMT posts: 1912 > > But * is not quite it. For example CVCCV'V > > doesn't need a y, > > but rule * as stated would require it. > > What is the status of {spa'i}, which used to be > bruted about? It fails the slinku'i test. {le spa'i} could be pronounced a {lespa'i}, which is a lujvo. > The rule, though wrong when only > lujvo are involved, looks to be right when > further forms are permitted. lujvo have priority over fu'ivla, so the possibility of {lespa'i} blocks the possibility of {spa'i} as a word. > It shouldn't need > to look beyond determining the first CC and what > goes before it, since from there on there are no > uniqueness requirements until we get to the > stressed vowel and this doesn't play a role > there. Maybe, but that's not how the system is set up. > But other rules may have to change if further > brivla forms are allowed too. For example, I > suppose the slinku'i test would require that > dropping the first consonant did not give a > brivla of any form (which shoots down initial CCC > pretty fast), No, it just requires that it doesn't give a string of rafsi (possibly with a final gismu), but if it gives another fu'ivla it's all right, because fu'ivla can't join with y-less rafsi to form compunds forms. > but since then it is only necessary > to check details through the first CC (or CyC) > and the bit after the stressed vowel, this looks > pretty manageable. I'm not sure I follow what your system would be, but the current system is strongly biased to favour lujvo over fu'ivla, that's why you need to do a full rafsi check in general. In some cases you find out soon enough that it's not a rafsi string: anything starting with CCVCrC- or CVCCrC- for example can never be a lujvo, hence the easy to make type III fu'ivla. mu'o mi'e xorxes __ Do you Yahoo!? Yahoo! Mail - Helps protect you from nasty viruses. http://promotions.yahoo.com/new_mail
Posted by pycyn on Sat 26 of Feb, 2005 18:12 GMT posts: 2388 wrote: > > But * is not quite it. For example CVCCV'V > > doesn't need a y, > > but rule * as stated would require it. > > What is the status of {spa'i}, which used to be > bruted about? Nevermind. I see that this will fail something like slinku'i because CVCCV+'V clearly has preferred status. CVCCV+'V just has to go as a special case. Are there others?
Posted by xorxes on Sat 26 of Feb, 2005 18:12 GMT posts: 1912 > --- John E Clifford <clifford-j@sbcglobal.net> > > > What is the status of {spa'i}, which used to be > > bruted about? > > Nevermind. I see that this will fail something > like slinku'i because CVCCV+'V clearly has > preferred status. CVCCV+'V just has to go as a > special case. Are there others? We are looking for lujvo that start with CVCCV and don't need a y-hyphen, right? CVC-CVV-CVV CVC-CVV-CVCCV CVC-CVV-CVC-CVV .... CVC-CVC-CVV CVC-CVC-CVC-CVV .... CVC-CVCCy-CVV .... and lots, lots more. mu'o mi'e xorxes __ Do you Yahoo!? Yahoo! Mail - now with 250MB free storage. Learn more. http://info.mail.yahoo.com/mail_250
Posted by pycyn on Sat 26 of Feb, 2005 18:12 GMT posts: 2388 wrote: > > --- John E Clifford wrote: > > --- John E Clifford > <clifford-j@sbcglobal.net> > > > > > What is the status of {spa'i}, which used > to be > > > bruted about? > > > > Nevermind. I see that this will fail > something > > like slinku'i because CVCCV+'V clearly has > > preferred status. CVCCV+'V just has to go as > a > > special case. Are there others? > > We are looking for lujvo that start with CVCCV > and don't need a y-hyphen, right? > > CVC-CVV-CVV > CVC-CVV-CVCCV > CVC-CVV-CVC-CVV > ... > CVC-CVC-CVV > CVC-CVC-CVC-CVV > ... > CVC-CVCCy-CVV > ... > > and lots, lots more. Thanks. Using these to guide me through the complexities (I won't for once say "obscurities" of CLL), I have come to realize what a mass of apparently needless complexity and restriction has arisen from the historical and uncritical development of this morphology. The two needed things were unique segmentation and distinct form classes — at least cmene, cmavo and brivla but in the latter maybe also gismu, lujvo and fuhivla. Given the general characterizations of the form classes, a much simpler system would have been possible (starting perhaps at the moment of GMR, had anyone done the deeper analysis now going on then) with essentially the same result though allowing some additional brivla. (I am not sure why we would want more brivla space, but I would have been willing to accept it for the simpler algorithm: "can" ain't "is" after all). Themain thing blocking this simpler algorithm for the present system is the unhyphenated initial CVCCVs. There are probably other problems with it as well, but they have not yet emerged (partly, I'm sure, because I haven't formulated the scheme completely). Changing the rules to require all cases of CVCCV where the cluster is initial and the first vowel is unstressed to be hyphenated: CVCyCV would eliminate the tosmabru problem for which it was originally designed, eliminates the slinku'i problem in at least a number of typical cases and legitmates CCV'V brivla (for whatever that is worth) and apparently a large number of others. (I keep adding to the list as possibilities come to me so far we have all the CV(')VCyC... and V')V((')VCyC... which loosens up the y rules as well. Indeed, I think that the difference between this algorithm and the present complex system comes from thinking just in terms of strings of phones rather than in terms of building up strings out of rafsi blocks.The latter process is a good one for construction of systematic expressions — lujvo and type III fuhivla — but restrictive for generating type IVs and, indeed, forms wanted that have no sources in other languagees at all.) As matters stands, the simpler algorithm is not quite useless, but needs to be used in conjunction with another that checks to see if the whole (or some part) is a lujvo (but this is needed to deal with slinku'i problems anyhow, apparently — it is just needed more often here).
Posted by xorxes on Sat 26 of Feb, 2005 22:14 GMT posts: 1912 > Thanks. Using these to guide me through the > complexities (I won't for once say "obscurities" > of CLL), I have come to realize what a mass of > apparently needless complexity and restriction > has arisen from the historical and uncritical > development of this morphology. "Needless complexity" sounds like an appropriate description of Lojban morphology. > Changing the > rules to require all cases of CVCCV where the > cluster is initial and the first vowel is > unstressed to be hyphenated: CVCyCV would > eliminate the tosmabru problem for which it was > originally designed, eliminates the slinku'i > problem in at least a number of typical cases Not in all cases though. For esample {zbroda} would still fail slinku'i, because you would not require a y in {lez-broda}. What you would need to do to eliminate the slinku'i problem completely is require a y-hyphen after CVC whenever the C forms a valid initial *cluster* with what follows, not just a vaild initial *pair*. mu'o mi'e xorxes __ Do you Yahoo!? Yahoo! Mail - Easier than ever with enhanced search. Learn more. http://info.mail.yahoo.com/mail_250
Posted by pycyn on Sat 26 of Feb, 2005 22:14 GMT posts: 2388 wrote: > > --- John E Clifford wrote: > > Thanks. Using these to guide me through the > > complexities (I won't for once say > "obscurities" > > of CLL), I have come to realize what a mass > of > > apparently needless complexity and > restriction > > has arisen from the historical and uncritical > > development of this morphology. > > "Needless complexity" sounds like an > appropriate > description of Lojban morphology. > > > Changing the > > rules to require all cases of CVCCV where the > > cluster is initial and the first vowel is > > unstressed to be hyphenated: CVCyCV would > > eliminate the tosmabru problem for which it > was > > originally designed, eliminates the slinku'i > > problem in at least a number of typical cases > > > Not in all cases though. For esample {zbroda} > would still fail slinku'i, because you would > not > require a y in {lez-broda}. What you would need > to > do to eliminate the slinku'i problem completely > is > require a y-hyphen after CVC whenever the C > forms > a valid initial *cluster* with what follows, > not > just a vaild initial *pair*. > Thanks again. I was already from other considerations making the the rule "in initial CVCC, the CC must be non-ininitial (either off the list of initinals or hyphenated) except when the immediately preceding vowel is stressed or the CC is followed by y" I think this can extend to the whole cmavoform+CC pattern.
Posted by xorxes on Sat 26 of Feb, 2005 22:14 GMT posts: 1912 > I was already from other > considerations making the the rule > "in initial CVCC, the CC must be non-ininitial > (either off the list of initinals or hyphenated) > except when the immediately preceding vowel is > stressed or the CC is followed by y" Sounds right. > I think > this can extend to the whole cmavoform+CC > pattern. For other cmavo forms the rule is already valid, no need to change anything. (Forms involving 'y' are special though, because 'y' is always a hyphen in brivla, so they are always a compound of at least two things.) mu'o mi'e xorxes __ Do you Yahoo!? Yahoo! Sports - Sign up for Fantasy Baseball. http://baseball.fantasysports.yahoo.com/
Posted by pycyn on Sun 27 of Feb, 2005 00:46 GMT posts: 2388 wrote: > > --- John E Clifford wrote: > > I was already from other > > considerations making the the rule > > "in initial CVCC, the CC must be > non-ininitial > > (either off the list of initinals or > hyphenated) > > except when the immediately preceding vowel > is > > stressed or the CC is followed by y" > > Sounds right. > > > I think > > this can extend to the whole cmavoform+CC > > pattern. > > For other cmavo forms the rule is already > valid, > no need to change anything. (Forms involving > 'y' > are special though, because 'y' is always a > hyphen > in brivla, so they are always a compound of at > least > two things.) > Well, of course this approach would not require that there be things glued by the y introduced by this rule (even though I called it a hyphen). I do think y should be kept out of the part before the first CC, however. As for the other cmavo initials, they are covered so far as I can tell only by the rule that says If you are adding CV(')V and the immediately following is CV then add r (or n)" but I am not sure that applies to V')V((')V. And, of course, we are not in this procedure working in terms of adding something but simply in terms of strings, so "is a compound of" need not apply. (It might be a good idea to use something other than y for this connection just to make that point — but then it screws up the cases where it is a compound, not that that is a concern for this system.)