From the Logical Languages Wiki
Jump to navigation Jump to search

See also my Essays page, Comparison of orthographies page, and Lido phonotactics page. More recent stuff is shown first.

Epenthesis, and other notes

Epenthesis is the insertion of a consonant or vowel segment to repair a sequence that violates the phonotactics of a language. Lojban primarily has vocalic epenthesis; so does TALM. Lojban has two types of vocalic epenthesis (not counting final vowels appended to loanwords, etc.). TALM has vocalic epenthesis in two forms, and one other type of specialized, nonproductive epenthesis. In Lojban, a schwa must be inserted at certain syllable junctures in compound words, and optionally, a non-schwa, non-/a e i o u/ “buffer vowel” can be inserted between consonants.

In TALM, the schwa and the buffer vowel are combined, and made nondistinctive. Nonetheless, schwa insertion is a feature of standard pronunciation. It is necessary to prevent certain consonant pairs from coalescing or becoming indistinguishable from other pairs (or, in a few cases, three-consonant sequences). These pairs include the following:

- Two identical consonants (e.g., /ss/, /bb/, /nn/);

- Two obstruent consonants of different voicing (e.g., /pd/, /kv/, /zt/, potentially homophonous with /bd/~/pt/, /gv/~/kf/, /st/~/zd/);

- Two sibilants or affricates of different place (e.g., /sʃ/, /ʃs/, /t͡ʃs/, /zd͡ʒ/, potentially homophonous with /ʃʃ/~/ʃ/, /ss/~/s/, /ts/, /d͡ʒ/);

- Two sounds that mimic a single contour phoneme (/tʃ/, /t͡ʃʃ/, /tt͡ʃ/, /dd͡ʒ/, potentially homophonous with /t͡ʃ/, /t͡ʃ/, /t͡ʃ/, /d͡ʒ/);

- Two sounds that have a tendency to be pronounced with an intervening stop sound, becoming homophonous with another legal sequence of phonemes (/ml/, /mr/, /nr/, /lr/, /lʃ/, /nʃ/, potentially homophonous with /mbl/, /mbr/, /ndr/, /ldr/, /lt͡ʃ/, /nt͡ʃ/).

It may be difficult to remember to insert a schwa in just these cases. Erring on the side of caution is recommended: when in doubt, epenthesize. Two other options are available as standard pronunciation variants: (1) insert schwa at every coda-onset pair; or (2) insert schwa between any two consonants that are not one of the pairs allowed in root words (see Table [TBD]). It is of course possible, but not recommended, to rely entirely on consonant length and careful pronunciation to distinguish the above pairs.

The schwa does not count as a syllable nucleus, and may in fact be regarded as a phonetic feature of the preceding consonant. That is to say, it can be seen merely as the release of a coda consonant. It may even be voiceless.

There is another kind of epenthesis in TALM, which occurs when a foreign consonant cluster would exceed the native syllable structure. Such a phonotactic violation requires insertion of a phonemic vowel, and, as such, an extra syllable.

The maximal syllable in TALM is CCVVC. Suppose that a loanword contains two final consonants, e.g. flax /flaks/. This word must be given a final vowel. The solution is to echo the nearest vowel to the left, which here is /a/. Thus, flaksa. The same rule is applied to clusters of three or more consonants within a word that cannot be resolved into a single coda consonant and an onset pair. The schwa is not used here, but an echo vowel. The nearest vowel on the left is copied: thus, /ˈfredriksburg/ → /fredriksiˈburgu/; /ˈganksta/ → /gankasˈta/. Where epenthesis would cause stress to fall before the penultimate syllable, stress shifts to the ultimate syllable. Where there is no vowel to the left to copy, the vowel to the right is copied instead: /hrˈvatski/ → /hraˈvatski/.

Words with no vowels require an arbitrary vowel to be inserted where syllable nuclei must occur. The suggested vowel is /e/. This vowel is apt because it is a common epenthetic in natural languages. Also, it parallels the use of /e/ in acronyms. Acronyms in TALM (discussed later) are truncated initialisms; they are based off a string of letters. Since consonant letters are named be, ce, de, etc., acronyms normally have /e/ at nucleic places.

In summary, there are three rules for inserting phonemic vowels to repair the syllable structure of nonnative words.

1. If there is a vowel to the left, copy that vowel.

2. Else, if there is no vowel to the left, copy the vowel to the right.

3. Else, if there is no vowel in an adjacent syllable, insert /e/.

Nonnative words often come from some natural language. A posteriori epenthetic vowels may be used instead of the preceding system if justified by orthography, etyology, or other factors. For loans from natural languages, the rules above are only guidelines.

Finally, there is one other case of epenthesis built into TALM that does not need to be consciously learned. The projector particles i and u function like epenthetic vowels. They join with the following word, repairing illegal onsets. So, /i zbignjef/ is pronounced /i‿z.ˈbignjef/, and /u mxedruli/ pronounced /u‿m.xedruˈli/.

Orthography/phonology tables

Transcription of Hanyu Pinyin initials
Pinyin Phonetic value (IPA) Transcription
b [p] b
p [pʰ] p
m [m] m
f [f] f
d [t] d
t [tʰ] t
n [n] n
l [l] l
g [k] g
k [kʰ] k
h [x~h] h
j [tɕ] j (before i or y), jy
q [tɕʰ] c (before i or y), cy
x [ɕ] x (before i or y), xy
zh [ʈʂ] j
ch [ʈʂʰ] c
sh [ʂ] x
r [ɻ~ʐ] r
z [ts] dz (word-initially or after a vowel), z
c [tsʰ] ts (word-initially or after a vowel), s
s [s] s
w [w] w
y [y] y
Transcription of Hanyu Pinyin finals
Pinyin Phonetic value (IPA) TALM transcription TALM phoneme (IPA)
-i (after apical sibilants) [ɹ̩~z̩], [ɻ̩~ʐ̩] i /i/
a [a] a /a/
e [ɤ] e /e/
ai [ai̯] ai /ai̯/
ei [ei̯] ei /ei̯/
ao [au̯] au /au̯/
ou [ou̯] ou /ou̯/
an [an] an /an/
en [ən] en /en/
ang [aŋ] an /an/
eng [əŋ] en /en/
ong [ʊŋ] on /on/
er [aɚ̯] ar /ar/
-i, yi [i], [(j)i] i, yi /i/, /ji/
-ia, ya [ja] ya /ja/
-ie, ye [je] ye /je/
-iao, yau [jau̯] yau /jau̯/
-iu, you [jou̯] you /jou̯/
-ian, yan [jɛn] yen /jen/
-in, yin [in], [(j)in] -in, yin /in/, /jin/
-iang, yang [jaŋ] yan /jan/
-ing, ying [iŋ], [(j)iŋ] -in, yin /in/, /jin/
-iong, yong [jʊŋ] yon /jon/
-u, wu [u], [(w)u] u, wu /u/, /wu/
-ua, wa [wa] wa /wa/
-uo, -o (after labials) [wo] wo /wo/
-uai, wai [wai̯] wai /wai̯/
-ui, wei [wei̯] wei /wei̯/
-uan, wan [wan] wan /wan/
-un, wen [wən] wen /wen/
-uang, wang [waŋ] wan /wan/
(n/a), weng [wəŋ] wen /wen/
-u (after sibilants) / -ü, yu [y] -u, yu /u/, /ju/
-ue (after sibilants) / -üe, yue [ɥe] -we, yuwe??? /we/, /juwe/
-uan, yuan [ɥɛn] -wen, yuwen??? /wen/, /juwen/
-un, yun [yn] -un, yun /un/, /jun/
(n/a), ê [ɛ] e /e/
(n/a), o [ɔ] o /o/
(n/a), yo [jɔ] yo /jo/

Draft of TALM (23 June 2021)

TALM (“Third Attempt at Loglan Morphology”) is a blueprint for a logical language, excluding syntax or semantics. Its main goal is to implement monoparsing with a simple and average phonology, so that the eventual language it will support can act as an aux-loglang with an international lexicon, like Ceqli.

Phonemic inventory and orthography

There are 26 phonemes in TALM: five vowels and 21 consonants.

Vowel phonemes
Front Back
Close i u
Mid e o
Open a

The consonants are divided into three classes, through which the parsing morphology is defined.

Consonant phonemes
Labial Alveolar Palatal Velar Glottal
Click (ǃ)
Plosive p b t d t͡ʃ d͡ʒ k g ʔ
Fricative f v s z ʃ x
Nasal m n
Lateral l
Rhotic r
Semivowel w j

Regular consonants (C) are highlighted in blue. Medial consonants (M) are highlighted in pink. The glottal stop is in its own class (Q). The alveolar click is used for a single paralinguistic function, and its use is optional. It never occurs within a word.

The alphabet is as follows. Major differences from the IPA are highlighted in red.

The TALM alphabet
Grapheme a b c d e f g h i j k l m n o p q r s t tk ~ ! u v w x y z
Phoneme a b t͡ʃ d e f g x i d͡ʒ k l m n o p ʔ r s t ǃ u v w ʃ y z

Morphology and phonology

All native words have phonological shapes that allow them to be parsed without ambiguity as to word boundaries.

General word-shape formula


H is a heavy, or closed, syllable, i.e. a syllable with a final consonant. S is a stressed syllable, either heavy or light. L is a light, or open, syllable that is always unstressed. Q is a glottal stop; SQ is a stressed light syllable with a glottal-stop coda. The glottal stop behaves like a light syllable, except that no word may consist of a glottal stop alone. So, in plain language, the formula can be stated

A word consists of (a) an unstressed light syllable, optionally preceded by a string of any number of heavy syllables ending in a stressed syllable; or (b) a stressed light syllable followed by a glottal stop.

Two syllables whose intervening consonant is M behave as one syllable for the purposes of the formula. H may be a bisyllable of the form CVMVC, CVMVM, CVMVV, CMVMVC, CMVMVM or CMVMVV. S may be a bisyllable of the form CVˈMV, CMVˈMV, CVˈMVC, CVˈMVM, CVˈMVV, CMVˈMVC, CMVˈMVM or CVˈMVV, where ⟨ˈ⟩ indicates that the following syllable is stressed (as in the IPA). L, however, is always a CV or CVV syllable.

Word classes

There are seven word classes. The definition of a class is primarily phonological and morphological. Classes may be divided in three ways. Lexically, there are function words, content words and names – just like in Lojban. Phonologically, there are native, adapted and foreign words, the last of which can be transcribed or untranscribed. In terms of parsing morphology or self-segregation method, there are Type 1 words, which use an A*B self-segregation method; Type 2 words, which use a “projection” method (to be described below); and Type 3 words, which are bracketed to set them apart from the rest of the utterance. Types 1, 2 and 3 have a one-to-one correspondence with native, adapted and foreign words. Each word class has a conventional designation of its type number plus an identifying letter; for instance, 2b.

Word classes
Class No. Class name Lojban equivalent Lexical category Nativeness Self-seg. method
1a Function word cmavo Function word Native A*B
1b Root word gismu Content word Native A*B
1c Compound word lujvo Content word Native A*B
2a Loanword zi'evla / fu'ivla Content word Adapted Projection
2b Name cmevla Name Adapted Projection
3a Delimited name Type 1 fu'ivla / Non-Lojban name/quote Name Foreign, transcribed Bracketing
3b Foreign material Type 1 fu'ivla / Non-Lojban name/quote Name Foreign, untranscribed Bracketing

Function words

Function words are one to two syllables in length. Their word shapes fit the formula (CM?)?V(V|(M?VV?))?. In other words, their word-shapes range from V to CMVV or CMVMVV.

Monosyllabic function words cannot bear primary stress without potentially absorbing a following monosyllable; CVCV is a root-word shape. However, any monosyllabic function word can be stressed if it takes a glottal stop coda; its word shape is then SQ. This serves to make the word boundary unambiguous. Lojban uses glottal stops in approximately the same way, but in TALM the glottal stop is regarded as part of the function word rather than a “pause” between words. In a string of monosyllabic function words, every odd-numbered word is pronounced with a glottal stop coda.


Shapes such as CVMMV or CVVMV are banned for purely phonotactical reasons; the clusters [rw ry u̯r i̯r u̯y i̯w] were deemed unstable, i.e. likely to result in a sound change such as coalescence or fortition.

A number of other sequences are banned, including, among others, /wu/ and /wo/, /yi/ and /ye/, and the triples /wiy/, /wey/, /way/, /waw/, /yay/, /yaw/, /yoy/, /yow/, /yuy/ and /yuw/.

No two function words can coexist where one is of the shape CwV(MVV?)? and the other is of the shape CuwV(MVV?)?. The same is the case for CyV(MVV?)? and CiyV(MVV?)?. (Without this restriction, emphatic forms of monosyllabic function words, such as /ˈtwaʔ/, could be homophonous with disyllabic function words, such as /ˈtuwaʔ/.) This is ensured as follows: the initial consonant of a CuwV(MVV?)? function word must be one that cannot form an onset pair with /w/; and the initial consonant of a CiyV(MVV?)? function word must be one that cannot form an onset cluster with /y/. The algorithm used to generate function words incorporates this constraint.

Root words

Root words are two to three syllables in length, and may have between four and nine segments. They have the word-shape formula C((M?V)?M)?V(V|M|C)?C(M|C)V. The minimal shape is CVCV. The maximal shape may be CMVMVVCMV (e.g. triraitra), CMVMVVCCV (triraikla), CMVMVMCMV (triyartra), CMVMVMCCV (triyarsta), CMVMVCCMV (trirantra) or CMVMVCCCV (triyankla).


- Want arsta to be syllabified /ar.sta/; insta to be syllabified /in.sta/ (MOP); and absta should be allowed

- Therefore st must be a Level 1 onset.

- Therefore triyarsta is /triyar.sta/; hence --> triyas

- But triyasta should --> triyat

- Cf. papla --> pap, pampla --> pap

- Solution A: make sp st sk (sm (sn (sl))) semi-native onsets but not native onsets. вспы́шка (fspixka) 'flash'; insta, *absterakt (normally abestrakt). Remove the requirement that all native onsets be allowed intervocalically in root words. (Should loanwords be distinguished from names again? Name: fspixka. Loanword: spixka.)

- Another solution: last C retained. papra --> pap; pampra --> pap; papla --> pal; pampla --> pal; pasta --> pat; parsta --> pat; parsna --> pan.]


Root words have short combining forms, called affixes. Every root word has one affix, and the phonological form of the affix is predictable given the form of the parent word. The converse is only true if the entire lexicon is known. The phonological derivation procedure involves truncation, or the stripping away of segments from the word. To see the pattern, it is necessary to number each segment.

C₁((M₁?V₁)?M₂)?V₂(V₃|M₃|C₂)?C₃(M₄|C₄)V₄ → C₁((M₁?V₁)?M₂)?V₂C₃

All affixes begin with the consonant that occupies the C₁ slot and end with the consonant that occupies the C₃ slot. If a root word begins with the string C₁M₁V₁M₂V₂(V₃|M₃|C₂)C₃, its affix is C₁M₁V₁M₂V₂C₃; this is the maximal affix shape. If a root word begins with the string C₁V₂C₃, its affix is C₁V₂C₃; this is the minimal affix shape.

The relation between root-word forms and affixes is potentially many-to-one. For instance, the following root words, if they existed, would all have the same affix:





In actuality, the relation is one-to-one. Only one of the words above may exist in the lexicon. This is ensured by generating an alphabetized list of all affixes before the first root word is created. Each new root word is then placed next to its affix on the list. Thus, a root word blocks all competitors for a given affix when it is entered into the dictionary.

Final vowels

The final vowel of a root word behaves like an inflectional ending. At this time, the grammatical function of final vowels has not yet been determined. One possibility is to indicate changes in argument structure. Another is to indicate syntactic role, somewhat like Esperanto or Latejami's part-of-speech endings. However, it is intended that loanwords (and names) should keep their original final vowels, so the vowel-changing rule will only be productive for native vocabulary.


Root words have phonotactics beyond their shape constraints. Only a relatively small subset of consonant pairs may appear in medial position, and a much smaller subset in onset position.

Word-medial consonant pairs
p b f v m t d s z n l r c j x k g h
p pt ps pn pl pr px
b bd bz bn bl br
f ft fs fn fl fr fx
v vd vz vn vl
m mp mb mf mv mt md ms mz mn ml mx
t ts tr
d dz dr
s sp sm st sn sl sk
z zm zd zn zl
n nt nd ns nz nl nc nj nk ng nh
l lp lb lf lv lm lt ld ls lz ln lc lj lx lk lg lh
r rp rb rf rv rm rt rd rs rz rn rl rc rj rx rk rg rh
x xp xm xt xn xl xk
k kt ks kn kl kr kx
g gd gz gn gl gr
h ht hs hn hl hr hx

Compound words

Compound words consist of two or more affixes plus a final vowel. Thus, their normal shape is (C((M?V)?M)?VC)+C((M?V)?M)?VCV. An example of a compound word is batdirikfraja: -bat- + -dirik- + -fraj- + -a.

Certain irregular affixes may be introduced for special uses, namely ones of the shape CMV, CVV or CMVV. These shapes are safe to be used anywhere after the first syllable, so long as the only primary word stress is penultimate. The function of such affixes has not been worked out yet.

A word with only regular C...VC affixes can be arbitrarily long and still parse as a unitary word; it can even have an arbitrary number of stressed syllables. This feature is useful for poetry and wordplay, but very long words, i.e., words of over seven syllables, should be avoided generally. The general rules are that native words have a single tonic, or stressed, syllable, which is always penultimate; and that they have no more than five consecutive unstressed syllables. As a matter of good style, the maximum number of syllables in a compound word should be seven. Consequently, the maximum number of affixes is six.

A consonant cluster occurs at every juncture between affixes. With only a few exceptions, any two consonants may be adjacent in a compound word. A problem arises because of this: some clusters are homophonous (or nearly so) with single consonants or with other clusters. The cluster /t.ʃ/ is homophonous with the phoneme /t͡ʃ/, for instance. The solution is to require certain consonants to be released before certain other consonants; or, equivalently, to require the insertion of an epenthetic schwa within specific pairs of consonants. This shall be called schwa epenthesis, although it should be borne in mind that the schwa, which is not a phonemic vowel, can be voiceless and can be very short. In general, schwa epenthesis should happen in the following environments:

  • between any two identical consonants;
  • between any obstruents of different voicing;
  • between any two sibilants of different place (alveolar and postalveolar, or vice versa);
  • between /t/ and /ʃ/, /t/ and /t͡ʃ/, /t͡ʃ/ and /ʃ/, and /d/ and /d͡ʒ/;
  • between /n/ and /ʃ/, to prevent stop epenthesis that would cause this pair to surface as [nt͡ʃ].
  • between /n/ and /p b f v/, to prevent assimilation that would cause these pairs to surface as [mp mb mf mv].

If it is too difficult to remember all these rules, a speaker can always adopt one of the following:

  • insert schwa at every juncture between affixes;
  • insert schwa at every consonant pair that is not one of the pairs found in root words (listed above).

Or, a speaker can choose not to epenthesize, but instead rely on consonant length and careful pronunciation. This is discouraged, however.

Because schwa epenthesis is not contrastive, strictly speaking, it is not normally indicated in writing. If need be, it can be indicated with an apostrophe, which suggests its phonetic character as a short release of a consonant.

In non-native words, schwa epenthesis occurs alongside other kinds of epenthesis, where the inserted vowel is /e/ or an echo vowel. We will return to this later.


Loanwords and names can have a greater diversity of shapes than native words. As a consequence, they require a different method by which to parse them in a stream of speech. The method used in TALM is novel among Loglanic languages. It utilizes mandatory particles that determine the length of a following word in syllables. We will introduce a term for this: we will say that such words project the length of the word to their right.

If stress were not distinctive in TALM, projection would be complicated. There would have to be a separate projector word for monosyllabic words, disyllabic words, trisyllabic words, and so on. Stress allows for greater economy. A basic requirement of TALM phonology is that words of two or more syllables have exactly one tonic syllable. (Let this be abbreviated as T.) Because of this rule, word lengths from one to eight syllables can be accommodated with only three separate projector words, one for each of three stress patterns: ultimate, penultimate and antepenultimate. Let us give somewhat arbitrary phonological forms to the three words to show how this works.

Projector words
Word Short definition Full definition
luwa Projects to T+0 Indicates that the following word continues up to the end of the next tonic syllable.
i Projects to T+1; loanword Indicates that the following word continues up to the end of the first syllable after the next tonic syllable.
lai Projects to T+1; name Indicates that the following word continues up to the end of the first syllable after the next tonic syllable and is a name.
liya Projects to T+2 Indicates that the following word continues up to the end of the second syllable after the next tonic syllable.

Words of the loanword and name classes are subject to some constraints to make the number of posttonic syllables unambiguous. With few exceptions, /uwa/ and /ija/ may not appear after a tonic syllable. One additional expedient is available for making sure a listener perceives the end of a name correctly: a name can be terminated by an alveolar click, /ǃ/.

For names, all three stress patterns are available. Loanwords can only have penultimate stress, reflecting their status as phonologically semi-nativized. Therefore, loanwords only take the i particle.

Any word occurring after a projector is necessarily a nonnative word. It may be exactly homophonous with a native word, and in some exceptional cases may have the same etymology, but should always have a different definition. Say that valsi is a root word meaning ‘word’; i valsi, on the other hand, may mean anything but ‘word’. It has a different entry in the dictionary (it may even belong to a different section). Possibly, i valsi might mean something related to words, but this cannot be predicted by its form. Most likely, its etymology is foreign.

It is solely a matter of convention to treat loanwords as separate from their projector particles. The particle and the loanword together can form a single phonological word.

Let us now look at a potential problem: What if a projector-plus-loanword unit is homophonous with a native word? This problem is averted by various means. The forms of the projector words, while mostly arbitrary, are not totally so. Consider lai. The string lai cannot ever occur unstressed at the beginning of a word. Root words may have /lai̯/ as their first syllable, but this syllable is always stressed, while the projector word is not. Compound words cannot contain the syllable lai word-initially because they are made up of affixes, and CVV, the shape of /lai̯/, is not a legal affix shape (at least not one that can occur word-initially). /lai̯C/ cannot be an affix either. CVVC is not a legal affix shape; indeed, it is not natively a legal syllable. Therefore, /lai̯sˈnaska/ can only be parsed as lai snaska, where snaska is a loanword, rather than the compound word lais-naska.

The sound i also never occurs unstressed at the beginning of a word. V.CVCV, VC.CVCV and similar shapes are not legal for root words. Function words like /ira/ could exist, but would normally have stress on the first syllable. (Such words might be banned anyway, out of caution.) luwa and liya are not completely safe per se. They require the exclusion of /lijaC/ and /luwaC/ from the space of legal affixes.

The word snaska above is illustrative of another feature enabled by the projector words. The cluster /sn/ is not a native onset. Loanwords and names should, by rights, be able to begin with a variety of consonant clusters beyond those few that appear in native words. Clusters consisting of /s/ followed by a stop, nasal or /l/ are fairly common in widely borrowed European words, and these should be allowed as onsets. There are other types of clusters that should be accommodated as well. This is done with the following rule: clusters consisting of any consonant (except /ʔ w j/) followed by a legal onset are allowed at the beginning of a word – and only there. In other words, a single extrasyllabic consonant is permitted at the left edge. This device allows for a great deal of faithfulness to most most major languages. It allows for affricates such as /t͡s/ in tsunami. It lets loanwords and names have prenasalized stops in initial position, which are very common in African languages, among others, and are found in well-known words such as mbira, ‘thumb piano’ (a southern African musical instrument). It also allows for many shared Greco-Latin words like strategy to be borrowed with little modification. The leftmost consonant, /m/ in /mˈbira/ and /s/ in (say) /strateˈd͡ʒija/, is able to be syllabified with the preceding projector word. It is always allowed to pronounce i mbira as /im‿ˈbi.ra/. The forms of the projector words have been selected to work well with liaison, mimicking the presence of a prothetic vowel such as occurs in English, in mbira, which is pronounced /əmˈbɪərə/; in Spanish, in estrategia; in Turkish, where the Greek name Smyrna became /izmir/; and in a great number of other languages.

One final feature worth noting about luwa, i, lai and liya is that these words have been chosen so that they can never precede words that are homophonous with themselves. The English name Leah, for instance, must take lai and cannot take liya as its projector.

Adapted names

Names of the adapted type are distinguished by loanwords in having the distinctive projectors lai, luwa or liya. (These forms are intended to suggest Lojban's la name-introducer particle; lai is, in fact, a portmanteau of la and the T+1 projector i. There is potential for a variety of other coalesced forms.)


As mentioned previously, names with antepenultimate stress cannot have the strings /uwa/ and /ija/ after the tonic syllable, except under certain rare conditions. Normally, names with /uwa/ or /ija/ after the tonic syllable have the /i/ or /u/ removed. Thus Russia, natively /ˈrosija/, might become /ˈrosja/. However, immediately following a consonant or consonant pair that may not legally be adjacent to /w/ or /j/, respectively, the high vowel cannot be removed. So, while /ˈparija/ is banned and /ˈparja/ is allowed, /ˈpatrja/ is banned and /ˈpatrija/ allowed, because /trj/ is not a legal consonant triple. In turn, this follows from the fact that /tr/ is not a legal coda and /rj/ is not a legal onset.

Adapted names can have up to three consonants in sequence. Consonant triples can occur word-initially or word-medially. They always consist of either a single coda consonant followed by an onset pair, or a coda pair followed by a single onset consonant. Furthermore, all consecutive pairs of consonants must be legal as heterosyllabic pairs.

Unadapted names

It is unreasonable to subject every name on a map, or in an encyclopedia, or even a novel, to such phonological rules. Unadapted names exist for use in places where faithfulness outweighs pronounceability.

Old stuff

Drafts predating June 2021 below.

Levels, etc.

Ordering A

Level 1a / 1.1: function words (closed class); cmavo.

Level 1b / 1.2: root words (semi-closed class); gismu.

Level 1c / 1.3: compound words (open class); lujvo.

Level 2: loanwords and specialized words; Type 4 fu'ivla/zi'evla.

Level 3: stress-delimited, partially assimilated foreign words (typically names, nine or fewer syllables).

Level 4: bracketed, partially assimilated foreign material; foreign or ungrammatical speech, transcribed into native phonemes.

Level 5: bracketed, unassimilated foreign material; may contain foreign phones or foreign characters.

Function words involved in Level 4 and Level 5

  • "laya": Prefixed to a Level 4 quote; terminated by /!/ (a syllabic alveolar click, or another sound specified as the right-bracket).
  • "liyu": Prefixed to a Level 5 quote; terminated by /!/.
  • "luwa": Elidable/elided written material. Prefixed to a Level 5 quote/phrase by a writer. Indicates that an attempt at a foreign pronunciation is unnecessary; the word "luwa" may be substituted for the quote in read-aloud speech. When followed immediately by a Level 4 name, indicates that the Level 4 name is a transcription of the Level 5 quote.
  • "lirai": Un-elidable written material. Elicits a read-aloud pronunciation as faithful to the original foreign pronunciation as possible.
  • "lurau": Metalinguistic indicator of failure to read aloud faithfully.


1a. Written: lirai Abraham Lincoln; laya eibraham linkon; broda

1b. Read aloud successfully: /ˈliraɪ̯ ˈeɪ̯bɻəhæm ˈlɪnkn̩ ǃ, ˈlaja ˈeɪ̯braham ˈlinkon, ˈbroda/

1c. Read aloud unsuccessfully: /ˈluraʊ̯ ˈlaja ˈeibraham ˈlinkon ! ˈbroda/

Heterosyllabic clusters for a phonology

Set No. 1
p b f v m t d s z n l r c j x k g h
p pt ps pn pl pr px
b bd bz bn bl br bj
f ft fs fn fl fr fx
v vd vz vn vl vr vj
m mt md ms mz mn ml mj mx
t ts tr
d dz dr
s sp sm st sn sl sk
z zm zd zn zl
n nt nd ns nz nl nc nj nk ng nh
l lp lb lf lv lm lt ld ls lz ln lc lj lx lk lg lh
r rp rb rf rv rm rt rd rs rz rn rl rc rj rx rk rg rh
j jm jd jn jl jr
x xp xm xt xn xl xr xc xk
k kt ks kn kl kr kx
g gd gz gn gl gr gj
h ht hs hn hl hr hx

Comparison of orthographies

Orthographic representation of selected diaphonemes
English Esperanto Hanyu Pinyin Malay Latejami Xorban Loglan Lojban Toaq
//ʔ// ∅ / k1 q . . 2
//t͡s// c c c
//d͡z// dz3 z4 z
//t͡ʃ// ch ĉ ch q5 c c ch
//d͡ʒ// j ĝ zh j j j j
//z// z z z z z z z
//ʃ// sh ŝ sh x sy x c c c sh
//ʒ// zh ĵ q j j j
//x// ĥ h kh x x x
//h// h h h h6 7 h h
//w// w ŭ w / u w w w u u w
//j// y j y / i y y y i i y

1 Intervocalic glottal stop is implied when certain vowels appear back-to-back or doubled, such as in the word kemuliaan /kəmuli.aʔan/ 'glory; dignity'. Syllable-finally, glottal stop is written k.

2 Glottal stop is never written in Toaq, but does occur, at least phonetically, as the realization of the empty, or null, onset.

3 There is disagreement over whether Esperanto has a /d͡z/ phoneme. However, Kalocsay and Waringhien consider it to be one in their influential Plena Analiza Gramatiko de Esperanto (1985; p. 47).

4 For the sole purpose of orthographic comparison, we treat Standard Chinese's unaspirated stops as voiced consonants in this chart; Pinyin uses traditionally voiced consonant letters for these sounds.

5 Similarly, we have collapsed the Chinese retroflex and alveolopalatal series into the diaphonemic category of postalveolars. Admittedly this reflects the perceptual habits of the English speaker; it may be more intuitive to Chinese speakers to group the retroflex and alveolar sounds together as apicals. This would be less useful for the purposes of this chart, however.

6 The Latejami phoneme is allowed to vary between a dorsal fricative and a glottal fricative or stop; its main allophones can be inferred to be [ç], [x], [χ], [h] and [ʔ].

7 This sound is actually voiced, /ɦ/, with permitted allophones including [ɣ] and [ʁ].

Comparison of the IPA values of selected consonant letters
Grapheme English Pinyin Malay Latejami Xorban Loglan Lojban Toaq
’ (apostrophe) θ h ʔ
, (comma) ʔ .
. ʔ
; ʔ
c k / s t͡sʰ t͡ʃ t͡ʃ ʃ ʃ t͡sʰ
ch t͡ʃ t͡ʂʰ
h h x h h~x θ h h
j d͡ʒ t͡ɕ d͡ʒ d͡ʒ ʒ ʒ ʒ d͡ʑ
q t͡ɕʰ ʒ ʔ θ ŋ
sh ʃ ʂ ɕ
w w w w w w y w
x ks~gz ɕ ks ʃ x x x
y j j j j j ə ə j
z z t͡s z z z z z
zh ʒ t͡ʂ
Comparison of IPA values for consonant letters and digraphs in various languages
English Spanish Italian German Albanian Pinyin Malay Latejami Xorban Loglan Lojban Toaq
. ɦ ??? h
. ʔ ʔ
c k / s k / s~θ k / t͡ʃ k / t͡s t͡s t͡sʰ t͡ʃ t͡ʃ ʃ ʃ t͡sʰ
ch t͡ʃ ch k x t͡ʂʰ t͡ɕʰ
dh ð
g g / d͡ʒ g~ɣ / x g / d͡ʒ g g g g g g g g
gh g / f~∅ g ɣ~x
gj ɟ~d͡ʑ
gn ɲ
h h / ∅ h / ː h x h h~x θ h h
j d͡ʒ x (j) j j d̥͡ʑ̥ d͡ʒ d͡ʒ ʒ ʒ ʒ d͡ʑ
kh (x) x
ll ʎ~ʝ ɫ
ng ŋ / ŋg ŋg ŋg ŋg ŋg ŋ ŋ ŋg ŋg ŋg ŋg
nj ɲ
ny ɲ
ñ (nj) ɲ
q c~t͡ɕ t͡ɕʰ ʒ ʔ θ ŋ
qu kw kw / k kw k / kv
r ɻ ɾ r ʁ / ɐ̯ ɾ ɻ~ʐ / ʵ r r r r r ɾ
rr r r
s s s s z s s s s s s s s
sc sk / s sk sk / ʃ ???
sch (ʃ) ʃ
sh ʃ ʃ ʂ ɕ
sy ʃ
ß s
th θ / ð t θ
v v b~β v f v v~f v v v v
w w (w) (w) v w w w w y w
x ks~gz / z ks~gz ks~gz? ks d͡z ɕ ks ʃ x x x
xh d͡ʒ
y j ʝ / i̯ (j) ??? y j j j j ə ə j
z z s~θ t͡s~d͡z t͡s z d̥͡z̥ z z z z z
zh ʒ ʒ d̥͡ʐ̥

Phonemic inventories for inspiration for new Ithkuiloids

Consonant phonemes of Ubykh, plus some others
Labial Alveolar Postalveolar Palatal Velar Uvular Epiglottal Glottal
central lateral laminal
laminal apical
plain pal. phar. plain lab. plain lab. plain lab. plain lab. plain lab. plain lab. pal. plain lab. pal. plain lab. phar. phar. & lab. plain lab. pal. plain lab.
Plosive voiceless p t k q qˤʷ ʡ ʡʷ ʔʲ ʔ ʔʷ
voiced b d ɡʲ ɡ ɡʷ ɢʲ ɢ ɢʷ ɢˤ ɢˤʷ
ejective pʲʼ pˤʼ tʷʼ kʲʼ kʷʼ qʲʼ qʷʼ qˤʼ qˤʷʼ
Affricate voiceless t͡s t͡sʷ t͡ɬ t͡ɬʷ t̠͡ʃ t̠͡ʃʷ ȶ͡ɕ ȶ͡ɕʷ ʈ͡ʂ ʈ͡ʂʷ
voiced d͡z d͡zʷ d͡ɮ d͡ɮʷ d̠͡ʒ d̠͡ʒʷ ȡ͡ʑ ȡ͡ʑʷ ɖ͡ʐ ɖ͡ʐʷ
ejective t͡sʼ t͡sʷʼ t͡ɬʼ t͡ɬʷʼ t̠͡ʃʼ t̠͡ʃʷʼ ȶ͡ɕʼ ȶ͡ɕʷʼ ʈ͡ʂʼ ʈ͡ʂʷʼ
Fricative voiceless f s ɬ ɬʷ ʃ ʃʷ ɕ ɕʷ ʂ ʂʷ x χʲ χ χʷ χˤ χˤʷ ʜ ʜʷ h
voiced v z ɮ ɮʷ ʒ ʒʷ ʑ ʑʷ ʐ ʐʷ ɣʲ ɣ ɣʷ ʁʲ ʁ ʁʷ ʁˤ ʁˤʷ
ejective ɬʼ xʲʼ χʼ
Nasal m n ȵ ȵʷ ɳ ɳʷ ŋʲ ŋ ŋʷ
Approximant w l j ɥ
Trill r
Consonant phonemes of Naxi, plus some others & minus /ɥ/
Labial Dental/
Retroflex Palatal Velar Glottal
Plosive voiceless p t ʈ c k ʔ
aspirated ʈ
voiced b d ɖ ɟ ɡ
prenasalized ᵐb ⁿd ᶯɖ ᶮɟ ᵑɡ
Affricate voiceless ts ʈʂ
aspirated tsʰ ʈʂʰ tɕʰ
voiced dz ɖʐ
prenasalized ⁿdz ᶯɖʐ ⁿdʑ
Fricative voiceless f s ʂ ɕ x h
voiced v z ʐ ʑ ɣ
Nasal m n ɳ ɲ ŋ
Lateral approximant l ɭ ʎ
Flap or trill r ɽ
Semivowel w j
Consonant phonemes of Eastern Arrernte
Peripheral Coronal
Laminal Apical
Bilabial Velar Palatal Dental Alveolar Retroflex
Stop p pʷ k kʷ c cʷ t̪ t̪ʷ t tʷ ʈ ʈʷ
Nasal m mʷ ŋ ŋʷ ɲ ɲʷ n̪ n̪ʷ n nʷ ɳ ɳʷ
Prestopped nasal ᵖm ᵖmʷ ᵏŋ ᵏŋʷ ᶜɲ ᶜɲʷ ᵗn̪ ᵗn̪ʷ ᵗn ᵗnʷ ᵗɳ ᵗɳʷ
Prenasalized stop ᵐb ᵐbʷ ᵑɡ ᵑɡʷ ᶮɟ ᶮɟʷ ⁿd̪ ⁿd̪ʷ ⁿd ⁿdʷ ⁿɖ ⁿɖʷ
Lateral Approximant ʎ ʎʷ l̪ l̪ʷ l lʷ ɭ ɭʷ
Approximant β̞ ɰ j jʷ ɻ ɻʷ
Tap ɾ ɾʷ

A mild critique of Latejami (draft)

Rick Morneau's Latejami is one of the most complete and ingenious languages ever constructed. On the whole, it seems to be a remarkable success at being what it sets out to be: an easily learnable and easily speakable intermediary language for machine translation, capable of making translation from any source language straightforward and translation into any target language 'almost trivially easy'. If it has not been utilized as such, that would seem to reflect less on the language itself than on the direction that machine translation has taken since Latejami's publication -- or perhaps it's just a simple case of bad luck and undeserved obscurity. I'm unqualified to assess most of Morneau's work in the Latejami reference grammar, which as the title suggests deals with lexical semantics, other than to comment that it is rigorous, awesomely detailed and worthy of study by every conlanger. However, I can weigh in on a couple of the less important components of the language: morphology and phonology. I question a few of Morneau's choices in these matters. Certain things could have been done in more mnemonic and naturalistic ways, better serving Latejami's goals.

On the plus side, Morneau's phonemic inventory and orthography are both very sensible. He uses a standard five-vowel system and the following set of consonants:

Latejami consonant phonemes
Labial Alveolar Palatal Velar Glottal
Plosive unvoiced p t c  t͡ʃ k
voiced b d j d͡ʒ g
Fricative unvoiced f s x ʃ h
voiced v z q ʒ
Nasal m n
Lateral l
Rhotic r
Semivowel w y j

Phonetic diphthongs are treated as vowel-semivowel sequences. Phonotactics are very strict, permitting no more than one consonant plus an optional semivowel in onset position and, where coda is present, only nasals (N) and optionally semivowels (S) in coda position. Maximal syllable structure is thus CV(S)N, but words always end in a vowel or semivowel. The result is a language that resembles the stereotypical Niger-Congo language, say, Swahili, in its phonotactics. This isn't at all a bad thing. I'd perhaps be laxer on onsets, as I'll explain below, and tighter on codas. Diphthongs are fine, but codas such as that of loyn (/ojn/) are probably not cross-linguistically common enough to be necessary, as well as being subjectively ugly. The other phonotactical rules are all sound. As far as orthography goes -- and this is a really minor point of preference -- I wouldn't represent semivowels with consonant letters in coda. The most common practice is to use vowel letters; this is the convention in almost every natural language written in the Latin alphabet outside of Eastern Europe. Also, I think q is better for glottal stop than for /ʒ/, both in the sense of grapheme assignments and phonemes in the inventory. The phoneme /ʒ/ just isn't very important. In what major languages is it fully contrastive with /d͡ʒ/ or /zj/, not just contrastive in loanwords and/or restricted in occurrence? English, if only arguably; Polish, and not much else. Only the /v/-/w/ contrast appears to be rarer among major languages. Latejami has both /d͡ʒ/ and /zj/ (in at least one relatively common syllable, zyu), and this is enough. The glottal stop, on the other hand, is perfect for Latejami: it's very cross-linguistically common as a phoneme, and probably near-universal as an allophone; and the fact that it's not contrastive in the most widely spoken languages doesn't matter much, since Latejami is an a priori language.

But so far, so good; these are just quibbles. A bigger problem is that Latejami doesn't logically map consonants to morphological classes.

Latejami's self-segregation strategy depends on words being composed of certain types of morphemes in certain possible orders; morpheme types are distinguished by different groups of segments. Morneau's description of these classes is reproduced below:

() indicates that the enclosed item is optional
{} indicates that the enclosed item may appear zero or more times
[] indicates that the enclosed item must appear one or more times
| ::= logical or
V ::= any vowel ::= a | e | i | o | u
S ::= any semivowel ::= y | w
C ::= any consonant ::= b | c | d | f | g | j | k | l | m | n | p | q | r | s | t | v | x | z
	[The letter 'h' is reserved for anaphora ...]
C1 ::= modifier starter ::= b c d f j k q r t x z
	[q and r not used in native words]
C2 ::= classifier terminator ::= g l m p s v
C3 ::= suffix terminator ::= g m n p s v
[Note that C3 is any classifier terminator except l, which is reserved for prefixes and classifier terminators. C3 also includes n, which can never start a modifier (but can terminate one).]
N ::= vocalic-nucleus ::= [V]
prefix ::= l N (n)
suffix ::= N C3 | N m C | N n C
classifier ::= C1 N C2
modifier ::= C1 N (n)
root-morpheme ::= modifier | classifier
root ::= {modifier} classifier
POS ::= part-of-speech marker ::= a, e, aw, yu, etc
word = {prefix} + root + {suffix} + POS
anaphor ::= first-root-CN(n) + h + POS

I will attempt to put this in plain language. It will help to use Morneau's convention of curly braces around elements that may appear zero or more times.

We can unfold Morneau's morphological formula for the Latejami word into the following: {prefix} + {modifier} + classifier + {suffix} + POS marker. Every word has a classifier morpheme. Every word ends with a POS morpheme, which is always a vowel, with an optional on- or offglide. Since the part-of-speech vowels can occur in positions other than word-final, they are not sufficient for self-segregation. Indeed, vowels could be omitted entirely from Latejami's word-resolution algorithm; all that matters, abjad-like, is consonant strings.

The key to self-segregation is the penultimate morpheme in a word, which is either a classifier or a suffix. These morpheme classes are differentiated by having one or two final consonants, which must come from a particular, restricted set, the 'terminator' consonants. The classifier terminators are [g l m p s v], and the suffix terminators are [g m n p s v], plus any cluster of a nasal and another consonant. The presence of one of these elements signals a word break after the following syllable, unless the next consonant is also a suffix terminator. All other segments extend a word to the right. L is tricky due to its dual role as prefix initiator and classifier terminator, but any temporary ambiguity is always resolved by context. If the consonant before l is a terminator consonant, l starts a prefix. If the consonant before l is a non-terminator consonant, l terminates a classifier. Anyway, if we leave out l's prefix-initiator role, as well as a few other details like nasal codas, the picture of Latejami morphology becomes much clearer. A Latejami word, from an algorithmic point of view, is basically composed of consonants. These may be either from set A, [b c d f j k q r t x z], or set B, [g m n l p s v]. A word has the pattern {A}AB{B}, and every B-A juncture is a word boundary.

It can be difficult for a new learner to remember which segments serve the crucial 'terminator' role. To recap, the single segments that act as terminators are [g l m n p s v]. It is not clear why Morneau chose these consonants for this class. They have no salient phonological features in common; they are not even an alphabetical grouping such as [m n p q r s t]. If Morneau had instead picked a natural class of consonants, or a union of natural classes, word resolution would likely be more easily intuited. For instance, the single-classifier-terminator segment class could have been [v z q l m n r] -- the set of sonorants plus the set of voiced continuants. Or it could be [t d s z n l r], the set of alveolar consonants; or [p k b g f h v], the set of peripheral (i.e. non-coronal) obstruents; or [f s x h v z q], the fricatives; etc. Perhaps Morneau wanted maximal phonetic variety among each morphological segment class; if so, it is not clear why.

For reasons of aesthetics, Latejami has an idiosyncratic system of stress. Stress is unnecessary for self-segregation, and strictly speaking is allophonic. However, it is not easily predictable either; it depends on the type and order of morphemes present in a word. Morneau gives four rules that together determine stress placement for any word. Rules 3 and 4 will give the reader a sense of system:

If a word contains at least one modifier and one suffix, the suffix should be given primary (i.e., heavier) stress, and the modifier should be given secondary (i.e., lighter) stress.

If a word contains neither a modifier nor a suffix, then the final vowel of the classifier should be stressed.

Latejami's stress system is remarkably odd. Stress could have turned Latejami into one of the simplest engineered languages. Instead, it reinforces the language's morphological complexity and adds a layer of unuseful complexity on top of that! Consider that every word in Latejami is two syllables or more in length. The minimal word pattern is classifier + POS, i.e. CVC + V. This means that if Latejami had fixed penultimate stress, word boundaries would be totally unambiguous without even looking at individual segments. Words would continue up to the syllable after a stressed syllable. To illustrate, take a string, cvcvCVcvcvcvcvcvCVcvCVcv. ('cv' represents an unstressed syllable and 'CV' a stressed syllable.) The word boundaries must be as follows: cvcvCVcv cvcvcvcvCVcv CVcv. Of course, this would make stress phonemic, as well as de-correlate it from morphemic salience or prominence within a word, but I don't see why these outcomes are worth such convolutions to avoid. With phonemic stress, Latejami would no longer would need different morphological segment classes at all, though they are still worth having to aid the breaking down of unfamiliar words into morphemes.