C Declarations: A Short Primer

Based on an article by Greg Comeau,

published in the September 1998 edition of the Microsoft Systems Journal

In the ariticle "A Guide to Understanding Even the Most Complex C Declarations", Greg Comeau presents a set of rules that can be applied to interpret any C declaration however complex it may seem. While the rules are intuitive and might appeal to most advanced C programmers, the beginner may find them difficult to grasp. He does however start the article by presenting a simple rule-set to read and write Kernighan and Ritchie (of the famous book, The C Progamming Language, 1978) style declarations. In this Primer I will present these rules and elaboarte on them with examples.

Here is the standard sytax for C Declarations:

The sytax of a C declaration is of the form:
storage-class type qualifier declarator = initializer;
where storage-class is only one of the following:

typedef
extern
static
auto
register

A type could be one or more of the following:

void
char
short, int, long
float, double
signed, unsigned
struct ...
union ...
typedef type

A qualifier could be one or more of the following:

const
volatile

A declarator contains an identifier and one or more, or none at all, of the following in a variety of combinations:

*
()
[]

possibly grouped within parentheses to create different bindings

The term storage-class refers to the method by which an object is assigned space in memory. Chapter 4 of the C Primer gives detailed descriptions of what each of the storage-classes mean. Suffice it to say that understanding what is being declared has no bearing on the storage-class, as it specifically tells you where it is being declared and assigned space in memory. Also, qualifiers (const, volatile) refer respectively to the non-modifiability of an entity, and the fact that the entity in question is modified elsewhere. Therefore, henceforth we will ignore these two pieces of information.

The above definition simply says what a declaration ought to look like (the syntax that is). The key phrase in the above definition is "to create different bindings". What this means is, to give different interpretations to the declaration based on parenthesizing the declaration. All one has to understand any complex C declaration then, is to know that these declarations are based on the C operator precedence chart, the same one you use to evaluate expressions in C:

Precedence	Operators	Associativity
highest	() [] . -> ++(postfix) --(postfix)	left to right
	++(prefix) --(prefix) !~ sizeof(type)+(unary) -(unary) &(address) *(dereference)	right to left
	* / %	left to right
	+ -	left to right
	<< >>	left to right
	< <= > >=	left to right
	== !=	left to right
	&	left to right
	^	left to right
	\|	left to right
	&&	left to right
	\|\|	left to right
	? :	right to left
	= += -= *= /= %= <<= >>= \|= &= ^=	right to left
lowest	,	left to right

This chart is complicated because it gives the precedence and associativity of all C operators. With declarations, we are only dealing with unary tokens (unary operators need only one operand) so it is a lot simpler The operators of interest to us are marked in red in the above table.

So, here then are the rules for reading and writing C declarations:

Parenthesize declarations as if they were expressions.
Locate the innermost parentheses.
Say "identifier is" where the identifier is the name of the variable.
1. Say "an array of X" if you see [X].
2. Say "a pointer to" if you see *.
3. Say "A function returning" if you see ();
Move to the next set of parentheses.
If more, go back to 3.
Else, say "type" for the remaining type left (such as, char, int, unsigned 16-bit integer etc.)
(pluralize and append prepositions as needed)

Here are some examples to clarify this process and visualize the declarations::

Example 1:

uint16_t i;

The parenthesization of the above declaration is:

uint16_t (i); {1}

Applying the rules (see above) to the parenthesized expression can be done as follows:
The innermost parentesis is (i) {2}
i is the variable name, therefore we say "i is ..." {3}
No more parentheses left so we say "a unsigned 16-bit integer". {4,5,6}
That is, "i is a unsigned 16-bit integer"

Example 2:

uint16_t *i;

The parenthesization of the above declaration is:

uint16_t (*(i)); {1}

Applying the rules (see above) to the parenthesized expression can be done as follows:

The innermost parentesis is (i) {2}

i is the variable name, therefore we say "i is ..." {3}

Move to the next set of parenthesis: (*(i)) {4}
Go back to step 3. {5}
We say "a pointer to" since we see a * {3.b}
No more parentheses left so we say "a unsigned 16-bit integer". {4,5,6}
That is, "i is a pointer to a unsigned 16-bit integer"

Example 3:

uint16_t *i[3];

The parenthesization of the above declaration is:

uint16_t (*((i)[3])); {1} // Note that () and [] have the same
// precedence but we deal with them from left to right.

Applying the rules (see above) to the parenthesized expression can be done as follows:

The innermost parentesis is (i) {2}

i is the variable name, therefore we say "i is ..." {3}

Move to the next set of parenthesis: ((i)[3]) {4}

Go back to step 3. {5}

We say "an array of 3 ..." since we see a [3] {3.a}
Move to the next set of parenthesis: (*((i)[3])) {4}

Go back to step 3. {5}

No more parentheses left so we say "unsigned 16-bit integers". {4,5,6}

That is, "i is an array of 3 unsigned 16-bit integers"

Example 4:

uint16_t (*i)[3];

The parenthesization of the above declaration is:

uint16_t ((*(i))[3]); {1} // Note that parentheses are valid tokens
// in a declaration and therefore must be
// be left in place when finding the final
// parenthesization

Applying the rules (see above) to the parenthesized expression can be done as follows:

The innermost parentesis is (i) {2}

i is the variable name, therefore we say "i is ..." {3}

Move to the next set of parenthesis: (*(i)) {4}

Go back to step 3. {5}

We say "a pointer to ..." since we see a * {3.b}
Move to the next set of parenthesis: ((*(i))[3]) {4}

Go back to step 3. {5}
We say "an array of 3 ..." since we see a [3] {3.a}

No more parentheses left so we say "unsigned 16-bit integers". {4,5,6}

That is, "i is a pointer to an array of 3 unsigned 16-bit integers"

Example 5:

uint16_t *i();

The parenthesization of the above declaration is:

uint16_t (*((i)())); {1} // Note that * has a lower precedence than
// parentheses, ()

Applying the rules (see above) to the parenthesized expression can be done as follows:

The innermost parentesis is (i) {2}
// One could argue that () is also the innermost parenthesis but
// it does not contain anything so we know it must indicate
// a function

i is the variable name, therefore we say "i is ..." {3}

Move to the next set of parenthesis: ((i)()) {4}

Go back to step 3. {5}

We say "a function returning" since we see a () {3.c}
Move to the next set of parenthesis: (*((i)())) {4}

Go back to step 3. {5}
We say "a pointer to ..." since we see a * {3.b}

No more parentheses left so we say "unsigned 16-bit integers". {4,5,6}
That is, "i is a function returning a pointer to unsigned 16-bit integers"

Example 6:

uint16_t (*i)();

The parenthesization of the above declaration is:

uint16_t ((*(i))()); {1} // Note that parentheses are valid tokens
// in a declaration and therefore must be
// be left in place when finding the final
// parenthesization

Applying the rules (see above) to the parenthesized expression can be done as follows:

The innermost parentesis is (i) {2}

i is the variable name, therefore we say "i is ..." {3}

Move to the next set of parenthesis: (*(i)) {4}

Go back to step 3. {5}

We say "a pointer to" since we see a * {3.b}
Move to the next set of parenthesis: ((*(i))()) {4}

Go back to step 3. {5}
We say "a function returning" since we see a () {3.c}

No more parentheses left so we say "a unsigned 16-bit integer". {4,5,6}
That is, "i is a pointer to a function returning a unsigned 16-bit integer"

This is the pretty much all one needs to know to read and write declarations in C.

Try this for a challenging declaration: