Conlang 102: Phonology
Greetings and spinigerous! Well, maybe I am a porcupine! Or not. I am a Limax, however, the mastress of worldbuilding! Today, I will continue on the conlanging adventure! This time with the foundational unit for almost all human languages, phonology! Can’t speak without your face making sounds, right?
Sound inventory
Normally, I like to start with definitions, and now is not any different, but the issue is, there are simply so many words and parts to define. But phonology can be defined as
The sounds that native speakers naturally produce as they speak the language.
Sure, there are languages, sign language for example, that don't use sounds, but we are talking about ordinary basic human language here. I will do some for truly alien methods of communication… later.
You might be asking yourself why you need to know this. Well, every language has a feel to it. You know a language is spoken even if you don’t know what is said because it has specific sounds and style to it. Sometimes, that is important. Tolkein was very much into ”phonoaesthetics,” as it is called: how it sounds and feels in its sounds. So that alone is enough of a reason. Another reason is because this is literally the foundational building block of languages, and not doing phonology properly for a conlang is like wanting to build a Lego model but not have to deal with actual Lego bricks; kind of defeats the point of it all, right?
Sounds can broadly be divided into consonants and vowels, but some are kinda half between because of course there are. Keep in mind, I will be simplifying a lot here. Another thing to remember is that when humans listen and hear, it is relative. It is not absolute frequencies and such but rather how it sounds relative to everything else the person is producing.
Vowels
Vowels are, essentially, a free passage from the lungs, where the vocal cords will always be vibrating, and it is the shape of the mouth, tongue, and pathway from the vocal cord that determines how it sounds. This allows it all to resonate in the structure of the mouth freely, and it is this resonation that is the vowel.
In essence, they are loud and can be maintained as long as you have your breath. How many are there?
Well, 7 heights for the tongue, 5 placements of the tongue, you can have rounded or unrounded, nasalised or not nasalised–I’ll get to what all those mean in a bit–that gives us a theoretical max of 140 possible vowels. The chart skips a lot of these simply because at some locations there isn’t enough distinction in the sound for any language to use it to carry meaning.
The rounded/unrounded distinction for vowels is whether you round your lips while doing the vowel, or you don’t. Think about your lips when you say uh and ah, your lips are rounded in the first one, unrounded in the second.
And for nasalised, think about the French and the classic laughing they are portrayed having. Or how they say ”Bon”; you let some of the sound go up into your nasal cavity which gives it that distinct sound. Unnasalised are regular vowels we are all used to.
The Germanic language family–Scandinavian languages, German, Dutch, English–is a vowel-loving family with 20-30 vowels each; most languages tend to not go above 10. A huge portion sticks to 5 (look up the most common ones for it), and believe it or not, 3 vowels is even possible. It is the nigh universal triplet of /a/, /i/, /u/. Almost all languages have those 3 vowels. What the // means, I will explain later. It is, however, as I have seen, exceedingly rare to go below 3; it might not even exist.
Consonants
Consonants can best be described as obstructions to the airflow. While vowels had virtually nothing actually disturbing the airflow and thus it is pure resonation in the mouth (and potentially nose), consonants somehow disturb the flow of air, which in turn generates sounds. And boy do we have a lot of sounds here. Before we move on, I will not go into sounds that are called clicks; they are highly localised in the world and difficult, but they exist and are valid, I just won’t do them here.
As we can see, there are a lot of consonants, all depending on which way it is articulated and WHERE it is articulated. How many, theoretically maxed? 11 places of articulation, 8 manners of articulation, and many binary choices, but the binary ones I will cover this time are voiced/unvoiced, labialised/not, palatalized/not, aspirated/not. This gives us 1408 possible consonants. Some of these combinations are not physically possible, and some of these combinations are so difficult to make out that they are not used by humans.
For the place of articulation, it is where the tongue is moved on the mouth, and the manner of articulation is more what the tongue does there to generate the sound. Sometimes it is also the lips when it is the very front of the mouth, but the tongue is the main player.
For the place of articulation, you have where it starts at the lips, labial, all the way back to glottal, in the throat. I won’t go into the details of it; the chart above shows their ordering from front to back and their names.
Now here’s a quick rundown on the manner of articulations:
Plosives: Stopping the airflow temporarily and then releasing it in an explosive release.
Nasal: Blocking the airflow through the mouth at a specific point and letting the resonation go entirely through the nose.
Fricative: Gently blocking the air so the flow has to move around the tongue/lips as it is exhaled.
Trill: Letting the bit that moves the tongue and lips vibrate as the air flows through.
The rest: Kind of mixes of the previous but are too difficult to describe concisely. /l/ sound is the most popular one in this group.
And for the binary options I mentioned:
Voiced/Unvoiced: Whether or not your vocal cords are vibrating, similar to when you are doing a vowel. Voiced is it vibrating, think like f and v–hold your hand on your Adam’s apple and you can feel it. This is marked by nothing because it was important to Western powers so they got their own bloody letters, no bias at all.
Labialised/Not: Not is standard, but labialised means you round your lips as you pronounce the vowel. Think on the s in saw, and see. Saw is a labialised /s/, while see is not. This is marked by a superscript w.
Palatalised/Not: you pronounce a /j/ at the same time as you do the rest of the articulation. This one is a favourite in Slavic languages. It is a bit confusing for English speakers because /j/ is written as y in english, but you all might have noticed in my writing, I always use J for it; that is why. Dew vs Do is an example of this where the D is palatalised in dew. This is marked by a superscript j.
Aspirated is when you ”puff out more air” with the sound. This one is actually very popular in Asian languages, and many letters used for voiced actually mean aspirated there instead. If you want to see an example, hold your hand in front of your mouth, say the words ”spy” and ”pie”; in pie, it is aspirated, the /p/, but it is not in spy. This is marked by a superscript h.
These are also called secondary pronunciations because they are done on top of the primary pronunciation, which is the placement and style as mentioned above. Personally, I have a huge thing for labialised and palatalised consonants but to do that, properly we need to discuss how to select sounds… And before that, we need to discuss how sounds are to the speaker!
Phones
No, we are not talking about calling someone, but you might call your mum anyway. Have you done it lately?
When it comes to sounds of languages, there are two categories within the brain that we can observe. Have you ever talked with a non-native, or even a native speaker, where your brain hears something and registers one word being said as meaning one thing, but you know from context that this cannot possibly be that word they meant?
And then you have the other case, where their words are just off? Not only dialectal, but they just sound a bit odd with words; your brain registers it as the right word, but it just feels wrong?
This is because your brain essentially groups a bunch of sounds as being the same, hence it registers some sounds as being clearly distinct, and other sounds as being “Off” but still registering as the right sound. Different languages group these differently.
Phonemes
These are the core sounds, the “Idealised” sound of one of these categories. What makes phonemes unique is that all phonemes within a language, generally, always carry meaning. That is, if you grab two random phonemes of a language, they are either so distinct that no one would conflate the two, think like /p/ and /n/, or there is generally what is called a “minimal pair” of words.
A minimal pair of words are words that differ by exactly one single sound. These are used to demonstrate that two sounds are in fact phonemes. Example of minimal pairs in English is bat-pat. Demonstrating that /p/ and /b/ are in fact phonemes. And that is what the // I have been using indicates. They mark phonemes that your brain, in the language, register as possible meaning changing sounds. So /pat/ and /phat/ means the aspirated p carries meaning somewhere, even if in that word it might not.
Allophones
This is in stark contrast to sounds that your brain thinks “Just sound funny”; these are allowed to be there, but sound a bit off. That is again because your brain collapses all these sounds into one phoneme. All these sounds that your brain thinks are the same phoneme are called allophones.
There can be a lot of these in a single phoneme. Vowels are exceptionally prone to have a lot of allophones, relative to how many vowels humans can make. An English example is the dew/do pair. The distinction between them, despite technically being different words, has not gotten so big that they register as different phonemes but rather register as allophones.
These are plentiful exactly because it is freaking difficult to move around that large flap of meat in your mouth and make enough sounds clear, so you try to do it as little as possible and as easily as possible in relation to every sound that came before and that you are expecting to come. So sounds morph and change to minimise the effort by you, the speaker, while being clear enough for the listener.
In linguistics, the exact pronunciation is often written with [], so while /spɪt/ and /pɪt/ are written similarly except for the extra s there, in the other it is [spɪt] [phɪt]; the aspirated h comes in because we are now trying to convey EXACTLY how it is pronounced, not how the brain perceives it.
But for most conlangers, allophones are… generally not considered. It is simply too much unnecessary work that will add nothing of value that no one thinks about anyway. So why did I include it? Primarily to make you aware that you DO in fact produce many of what people call “difficult” sounds, clusters, and more, but you are unaware of it. And that is important because a lot of conlangers are like me: they want to be able to speak the language and avoid everything that makes it not so easy, unlike someone I know that wants everything to be Anglophone friendly and doesn't believe me when suggestions come.
Picking phones
Well, clearly you pick an iPhone first of all 😛 Shitty jokes aside, how do you pick and decide on what sounds to include? And how do you reach the look you want? Well, a good way to do it are these steps:
Take the universals.
Vowels: /a/, /i/, /u/
Consonants: /p/, /t/, /k/, /m/, /n/, /s/
Pick a pattern of sounds.
Pick a few rare outliers.
I’ll take each of these and describe them in more detail.
Universals
Step 1 is hopefully self-explanatory, but I will go through it anyway. These sounds are virtually present in every language known, and my philosophy is, if it is almost entirely omnipresent, there is a damn good reason for it. Whether it is the fact that multicellular life prefers sex over asex or that we have these sounds. If you are going to change these, you need a damn good reason and know what you are doing.
Pattern
Step 2 is trickier. What do I mean by patterns? It can mean a lot of things. For the English language with consonants, it means that it has a huge love for central consonants, that is Dental, Alveolar, and Post-Alveolar consonants, read link or see image earlier. This is combined with the pattern of voiced/unvoiced distinction.
For Arabic, the preference is more toward the back, glottal, uvular, pharyngeal, and a secondary half pattern is pharyngealization–it is a pain that I will get into at a later date. For Russian, the big pattern is that they have palatalization of almost every consonant they have. The rest of the consonants are fairly spread out and “classical,” but the palatalization is what is the big pattern for them.
So, my recommendation is to pick one region of the mouth–front, middle, or back–and use more sounds there than in any other area of the mouth. If you are going for secondary pronunciation features, it probably should be somehow related to said region as well. So if you do front, labialization of consonants could be a thing. If it is middle, palatalization, etc. Albeit here, you can make a discrepancy easily, and it won’t be a problem. Front heavy sounds with palatalization of consonants in general is nothing big. Keep in mind, the voiced/unvoiced distinction is a form of secondary pronunciation.
Outliers
Of course, no language is perfectly symmetrical or has a perfect pattern, so always throw in some curveballs. Do you have front heavy phonology? Add some back sounds. All consonants are palatalized? Well, remove it so some don’t have the distinction, some lack the distinction and are just plain, and maybe one only occurs in the palatalized version and isn’t plain. Make it appear a bit random so it feels like it was through natural processes.
Summa Phonorum
So in summary of it all, don’t touch the universals unless you know what you do. Then add some patterns and sprinkle in randomness.
Summa Summarum
As you can see from this, there are a lot of sounds to pick from. My recommendation is taking fairly friendly sounds, which, believe it or not, excludes voiced consonants. But I will do a practicum eventually for one of my conlangs so you can see my reasoning on the sounds picked. A thing to keep in mind is that you don’t need large inventories of sounds for it to be a valid phonology. Hawaiian is famous for being incredibly tiny: 8 consonants and 5 vowels.
But to determine how your language will truly sound, you will need to get into phonotactics, and what is that? That is for the next blogpost on conlanging!
Want to dive into a discussion about Stellima or the art of writing on Discord? We’d love to have you! And if you have any topics you struggle with or that you would like to suggest for a future blogpost, we’re open to suggestions!
Interested in supporting our work? Join our Patreon and become a part of Stellima as a citizen of Mjatreonn! Or would you like to give us some caffeine to fuel our writing? Consider buying us a coffee at Ko-fi! Every contribution inspires our creativity and keeps us going. Thank you for your support!
Copyright ©️ 2025 Vivian Sayan. Original ideas belong to the respective authors. Generic concepts such as conlanging and phonology are common knowledge and not copyrighted. Any specific information is copyrighted under Creative Commons with attribution, and any derivatives must also be Creative Commons. However, specific language and/or exact phrasing are individually copyrighted by the respective authors. Contact them for information on usage and questions if uncertain what falls under Creative Commons. We’re almost always happy to give permission. Please contact the authors through this website’s contact page.
We at Stellima value human creativity but are exploring ways AI can be ethically used. Please read our policy on AI and know that every word in the blog is written and edited by humans or aliens.