This article contains opinions that may not necessarily reflect the views of the LLWiki or larger loglanger community.

A language is said to have a self-segregating (occasionally self-segmenting) morphology if every utterance in the language can only be broken up into words and morphemes, or parsed, in a single way. The written forms of natural and constructed languages are often self-segregating, but in most languages, it is possible for a spoken phrase to have two or more possible parses. This creates ambiguity. For languages engineered to be unambiguous, it is necessary to define phonological patterns for words in order that no two sequences of words sound identical. All or most loglangs, depending on the definition used, have self-segregating morphologies.

Related terms

The terms monoparsing and audio-visual isomorphism are nearly synonymous, but both are stronger. Monoparsing is synonymous with syntactic unambiguity. It means the property whereby every well-formed text has a unique, transparent grammatical structure. Audio-visual isomorphism, or AVI, means the property whereby there is an exact one-to-one correspondence of informational content between spoken and written forms of a language. Orthographic features like italic or bold typefaces disrupt audio-visual isomorphism, since there is no exact spoken analogue for them. Therefore, a language can have a self-segregating morphology but lack AVI.

The problem: Homophones in natural and constructed languages

Often, in natural languages, pairs of homophones exist as the result of sound changes or borrowing. These types of homophones, like dear and deer, are typically absent from engineered languages. However, in most constructed languages, compounding, derivational morphology or phrase formation can produce homophones. An ideal self-segregating morphology will prevent homophones on both the word level and the intra-word level.

In English, word-level homophony occurs in many phrases: attack and a tack are homophonous if pronounced normally, as are euthanasia and youth in Asia. There are in addition a vast number of phrases with subtle phonetic distinctions that can be hard to perceive, such as the sky [ðɪ̈ˈskaɪ̯] and this guy [ðɪs ˈg̊aɪ̯].

Morpheme-level homophony occurs in agglutinative languages like Esperanto frequently, in words like fireĝido. This word can be parsed as fi‐reĝido ‘a corrupt prince’ or fireĝ‐ido ‘offspring of a tyrant ’.^[1]

Engineered languages like Lojban, Latejami and Toaq exemplify the means by which such accidental ambiguities can be prevented. These languages have formulas that describe every possible word. It is, at least in theory, provable that their formulas only generate words that self-segregate.

Self-segregation strategies

Self-segregation strategies (hereafter SS strategies) come in several varieties that can be treated separately, even though their borders are fuzzy and they are often mixed together in the design of a given language. In general, all SS strategies are analogous to types of codes in coding theory. For simplicity, this article will focus on self-segregation at the level of words.

The fixed-length strategy

If every word in a language is the same length, then it is trivial to say the language is self-segregating at the word level. (See fixed-length codes.) The same is true for the morpheme level. In some logical languages, all affixes are the same length (at least all nonfinal affixes).

Variable-length strategies

Strategies in which the length of words or morphemes can vary are more naturalistic, and hence much more common.

AB Strategies

These strategies are the most common of all. They are analogous to prefix codes. They work by defining at least two sets of parsing elements, A and B. The elements may be any type of phonological entity, including:

phonemes; obstruents vs. sonorants (e.g. Ceqli)

syllables; heavy vs light (e.g. Tanbau)

tone-bearing sequences (e.g. Toaq)

A self-segregation formula or word-shape formula is defined in terms of A and B. The simplest good formula is A*B, that is, one B element, optionally preceded by any number of A elements. This may also be called the “right-breaking” formula or method. It is common for a good reason: pausing, or cutting off speech, in the middle of a word will never create a new word.

A related strategy uses the “left-breaking” formula AB*: one A element, optionally followed by any number of B elements. This is strictly inferior in one sense, because pausing in the middle of a word can create another word. At the level of syntax, Lojban’s sentence-starter particle, .i, exemplifies the method. In Lojban morphology, too, this method plays a role in self-segregation. Content words (brivla) must have a consonant cluster within the first five segments.^[2] The presence of a cluster signifies the approximate left edge of a word.

A third common strategy is A+B+: one or more A, followed by one or more B. The minimal word is AB. A word ends after the last B before an A. Ceqli uses this strategy: A is an obstruent consonant such as /p/, /g/ or /z/; B is a sonorant consonant, such as /ŋ/, /l/, /r/ or /w/, or a vowel. Legal words include grin (AABB), diyan(ABBBB) and starloremi (AABBBBBBBB).^[3]

There are many other word-shape formulas that produce self-segregating words. A variety of formulas are possible with just A and B elements. Many language have more than two sets of elements as well. Latejami’s formula uses three sets. Latejami is largely a CV language, but, for the purposes of word-level self-segregation, only consonants matter. Certain consonants are A elements, an apparently arbitrary set: {b c d f j k q r t x z}^[4]. Others are B elements: {g m n p s v}. The basic pattern is A+B+, ignoring the vowels. The third element set has only a single phoneme, /l/, so it can be called L. L allows variations on the formula that are useful in the broader context of Latejami morphology.

[incomplete]

Lexical exclusivity

References

↑ Rye, Justin B. 2021. “Learn Not to Speak Esperanto,” section 07. Accessed from http://jbr.me.uk/ranto/index.html#07 (19 June 2021).
↑ Cowan, John W. 2016. The Complete Lojban Language, section 4.3. Fairfax, VA: The Logical Language Group. Accessed from https://lojban.org/publications/cll/cll_v1.1_xhtml-section-chunks/section-morphology-brivla.html (19 June 2021).
↑ May, R. 2017. “The Alphabet and Sounds” In Ceqli. Accessed from http://ceqli.pbworks.com/w/page/5455985/The%20Alphabet%20and%20Sounds (19 June 2021).
↑ Morneau, R. 2007. The Lexical Semantics of a Machine Translation Interlingua, section 2.5.1. On Rick Morneau’s homepage. Accessed from http://www.rickmor.x10.mx/lexical_semantics.html#S2_5_1 (19 June 2021).

[1] Rye, Justin B. 2021. “Learn Not to Speak Esperanto,” section 07. Accessed from http://jbr.me.uk/ranto/index.html#07 (19 June 2021).

[CLL-2] Cowan, John W. 2016. The Complete Lojban Language, section 4.3. Fairfax, VA: The Logical Language Group. Accessed from https://lojban.org/publications/cll/cll_v1.1_xhtml-section-chunks/section-morphology-brivla.html (19 June 2021).

[Ceqli-3] May, R. 2017. “The Alphabet and Sounds” In Ceqli. Accessed from http://ceqli.pbworks.com/w/page/5455985/The%20Alphabet%20and%20Sounds (19 June 2021).

[Latejami-4] Morneau, R. 2007. The Lexical Semantics of a Machine Translation Interlingua, section 2.5.1. On Rick Morneau’s homepage. Accessed from http://www.rickmor.x10.mx/lexical_semantics.html#S2_5_1 (19 June 2021).

[1]

[2]

[3]

[4]

Self-segregating morphology

Contents

Related terms

The problem: Homophones in natural and constructed languages

Self-segregation strategies

The fixed-length strategy

Variable-length strategies

AB Strategies

Lexical exclusivity

References

Navigation menu

Self-segregating morphology

Related terms

The problem: Homophones in natural and constructed languages

Self-segregation strategies

The fixed-length strategy

Variable-length strategies

AB Strategies

Lexical exclusivity

References

Navigation menu

Search