Self-segregating morphology

From the Logical Languages Wiki
Revision as of 10:52, 19 June 2021 by Selguha (talk | contribs) (began page, left incomplete)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

A language is said to have a self-segregating (occasionally self-segmenting) morphology if every utterance in the language can only be broken up into words and morphemes, or parsed, in a single way. The written forms of natural and constructed languages are often self-segregating, but in most languages, it is possible for a spoken phrase to have two or more possible parses. This creates ambiguity. For languages engineered to be unambiguous, it is necessary to define phonological patterns for words in order that no two sequences of words sound identical. All or most loglangs, depending on the definition used, have self-segregating morphologies.

Related terms

The terms monoparsing and audio-visual isomorphism are nearly synonymous, but both are stronger. Monoparsing is synonymous with syntactic unambiguity. It means the property whereby every well-formed text has a unique, transparent grammatical structure. Audio-visual isomorphism, or AVI, means the property whereby there is an exact one-to-one correspondence of informational content between spoken and written forms of a language. Orthographic features like italic or bold typefaces disrupt audio-visual isomorphism, since there is no exact spoken analogue for them. Therefore, a language can have a self-segregating morphology but lack AVI.

The problem: Homophones in natural and constructed languages

Often, in natural languages, pairs of homophones exist as the result of sound changes or borrowing. These types of homophones, like dear and deer, are typically absent from engineered languages. However, in most constructed languages, compounding, derivational morphology or phrase formation can produce homophones. An ideal self-segregating morphology will prevent homophones on both the word level and the intra-word level.

In English, word-level homophony occurs in many phrases: attack and a tack are homophonous if pronounced normally, as are euthanasia and youth in Asia. There are in addition a vast number of phrases with subtle phonetic distinctions that can be hard to perceive, such as the sky [ðɪ̈ˈskaɪ̯] and this guy [ðɪs ˈg̊aɪ̯].

Morpheme-level homophony occurs in agglutinative languages like Esperanto frequently, in words like fireĝido. This word can be parsed as fi‐reĝido ‘a corrupt prince’ or fireĝ‐ido ‘offspring of a tyrant ’.[1]

Engineered languages like Lojban, Latejami and Toaq exemplify the means by which such accidental ambiguities can be prevented. These languages have formulas that describe every possible word. It is, at least in theory, provable that their formulas only generate words that self-segregate.

Self-segregation strategies

Self-segregation strategies (hereafter SS strategies) come in several varieties that can be treated separately, even though their borders are fuzzy and they are often mixed together in the design of a given language. In general, all SS strategies are analogous to types of codes in coding theory. For simplicity, this article will focus on self-segregation at the level of words.

The fixed-length strategy

If every word in a language is the same length, then it is trivial to say the language is self-segregating at the word level. (See fixed-length codes.) The same is true for the morpheme level. In some logical languages, all affixes are the same length (at least all nonfinal affixes).

Variable-length strategies

Strategies in which the length of words or morphemes can vary are more naturalistic, and hence much more common.

AB Strategies

These strategies are the most common of all. They are analogous to prefix codes. They work by defining at least two sets of parsing elements, A and B. The elements may be any type of phonological entity, including:

  • consonants; obstruents vs. sonorants (e.g. Ceqli)
  • syllables; heavy vs light (e.g. Tanbau)
  • tone-bearing sequences (e.g. Toaq)

A self-segregation formula or word-shape formula is defined in terms of A and B. The simplest good formula is A*B, that is, one B element, optionally preceded by any number of A elements. This may also be called the “right-breaking” formula or method. It is common for a good reason: pausing, or cutting off speech, in the middle of a word will never create a new word.

A related strategy uses the “left-breaking” formula AB*: one A element, optionally followed by any number of B elements. This is strictly inferior in one sense, because pausing in the middle of a word can create another word. The left-breaking method does not appear to be attested in loglang morphologies, but Lojban’s sentence-starter particle, .i, exemplifies the method.

There are many other word-shape formulas that produce self-segregating words. A variety of formulas are possible with just A and B elements. Many language have other sets of elements as well. Latejami’s formula uses a third set of elements.

[incomplete]

References

  1. Rye, Justin B. 2021. “Learn Not to Speak Esperanto,” Section 07. Accessed from http://jbr.me.uk/ranto/index.html#07 (19 June 2021).