Chapter 2: Tokens
What's in Chapter 2?
ASCII characters
Literals
include numbers characters
and strings
Keywords
are predefined
Names are
user-defined
Punctuation
marks
Operators
This chapter defines the basic building blocks of a C program. Understanding the concepts in this chapter will help eliminate the syntax bugs that confuse even the veteran C programmer. A simple syntax error can generate 100's of obscure compiler errors. In this chapter we will introduce some of the syntax of the language.
To understand the syntax
of a C program, we divide it into tokens separated by white
spaces and punctuation. Remember the white
spaces include space, tab, carriage returns and line feeds. A token may
be a single character or a sequence of characters that form a single
item. The first step of a compiler is to process the program into a
list of tokens and punctuation marks. The following example includes
punctuation marks of
( ) { } ;
The compiler then
checks for proper syntax. And, finally, it creates object code that
performs the intended operations. In the following example:
void
main(void){ long z;
z = 0;
while(1){
z = z+1;
}
}
The following sequence shows the tokens and punctuation marks from the above listing:
void
main ( void ) { long z ; z = 0 ; while ( 1 ) { z = z + 1 ; } }
Since tokens are the building blocks of programs, we begin our study of C language by defining its tokens.
Like most programming languages C uses the standard ASCII character set. The following table shows the 128 standard ASCII code. One or more white space can be used to separate tokens and or punctuation marks. The white space characters in C include horizontal tab (9=0x09), the carriage return (13=0x0D), the line feed (10=0x0A), space (32=0x20).
BITS
4 to 6
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
||
0 |
NUL |
DLE |
SP |
0 |
@ |
P |
` |
p |
|
B |
1 |
SOH |
DC1 |
! |
1 |
A |
Q |
a |
q |
I |
2 |
STX |
DC2 |
" |
2 |
B |
R |
b |
r |
T |
3 |
ETX |
DC3 |
# |
3 |
C |
S |
c |
s |
S |
4 |
EOT |
DC4 |
$ |
4 |
D |
T |
d |
t |
5 |
ENQ |
NAK |
% |
5 |
E |
U |
e |
u |
|
0 |
6 |
ACK |
SYN |
& |
6 |
F |
V |
f |
v |
7 |
BEL |
ETB |
' |
7 |
G |
W |
g |
w |
|
T |
8 |
BS |
CAN |
( |
8 |
H |
X |
h |
x |
O |
9 |
HT |
EM |
) |
9 |
I |
Y |
i |
y |
A |
LF |
SUB |
* |
: |
J |
Z |
j |
z |
|
3 |
B |
VT |
ESC |
+ |
; |
K |
[ |
k |
{ |
C |
FF |
FS |
, |
< |
L |
\ |
l |
| |
|
D |
CR |
GS |
- |
= |
M |
] |
m |
} |
|
E |
SO |
RS |
. |
> |
N |
^ |
n |
~ |
|
F |
S1 |
US |
/ |
? |
O |
_ |
o |
DEL |
The first 32 (values 0 to 31 or 0x00 to 0x1F) and the last one (127=0x7F) are classified as control characters. Codes 32 to 126 (or 0x20 to 0x7E) include the "normal" characters. Normal characters are divided into
the space character (32=0x20),
the numeric digits 0 to 9 (48
to 57 or 0x30 to 0x39),
the uppercase alphabet A to Z
(65 to 90 or 0x41 to 0x5A),
the lowercase alphabet a to z
(97 to122 or 0x61 to 0x7A), and
the special characters (all
the rest).
Numeric literals consist of an uninterrupted sequence of digits delimited by white spaces or special characters (operators or punctuation). Although Metrowerks does support floating point, this document will not cover it. The use of floating point requires a substantial about of program memory and execution time, therefore most applications should be implemented using integer math. Consequently the period will not appear in numbers as described in this document. For more information about numbers see the sections on decimals, octals, or hexadecimals in Chapter 3.
Character
literals are written by
enclosing an ASCII character in apostrophes (single quotes). We would
write
'a'
for a character with
the ASCII value of the lowercase a (97). The control characters can
also be defined as constants. For example '\t'
is the tab character.
For more information about character literals see the section on characters
in Chapter 3.
String literals are written as a sequence of ASCII characters bounded by quotation marks (double quotes). Thus, "ABC" describes a string of characters containing the first three letters of the alphabet in uppercase. For more information about string literals see the section on strings in Chapter 3.
There are some predefined tokens, called keywords, that have specific meaning in C programs. The reserved words we will cover in this document are:
keyword
|
meaning
|
auto |
Specifies
a variable as automatic (created on the stack) |
break |
Causes
the program control structure to finish |
case |
One
possibility within a switch statement |
char |
8-bit
integer |
const |
Defines
global parameter as constant in ROM, and defines a local parameter as
fixed value |
continue |
Causes
the program to go to beginning of loop |
default |
Used
in switch statement for all other cases |
do |
Used
for creating program loops |
double |
Specifies
variable as double precision floating point |
else |
Alternative
part of a conditional |
extern |
Defined
in another module |
float |
Specifies
variable as single precision floating point |
for |
Used
for creating program loops |
goto |
Causes
program to jump to specified location |
if |
Conditional
control structure |
int |
32-bit
integer (same as long on
the ARM) It should be avoided in most cases because the implementation
will vary from compiler to compiler. |
long |
32-bit
integer |
register |
Specifies
how to implement a local |
return |
Leave
function |
short |
16-bit
integer |
signed |
Specifies
variable as signed (default) |
sizeof |
Built-in
function returns the size of an object |
static |
Stored
permanently in memory, accessed locally |
struct |
Used
for creating data structures |
switch |
Complex
conditional control structure |
typedef |
Used
to create new data types |
unsigned |
Always
greater than or equal to zero |
void |
Used
in parameter list to mean no parameter |
volatile |
Can
change implicitly outside the direct action of the software. It
disables compiler optimization, forcing the compiler to fetch a new
value each time |
while |
Used
for creating program loops |
Did you notice that all of the keywords in C are lowercase? Notice also that as a matter of style, I used a mixture of upper and lowercase for the names I created, and all uppercase for the I/O ports. It is a good programming practice not to use these keywords for your variable or function names.
We use names to identify our variables, functions, and macros. Names must begin with a letter or underscore and the remaining characters must be either letters or digits. We can use a mixture of upper and lower case or the underscore character to create self-explaining symbols. E.g.,
time_of_day go_left_then_stop
TimeOfDay GoLeftThenStop
The careful selection of names goes a long way to making our programs more readable. Names may be written with both upper and lowercase letters. The names are case sensitive. Therefore the following names are different:
thetemperature
THETEMPERATURE
TheTemperature
The practice of naming macros in uppercase calls attention to the fact that they are not variable names but defined symbols. Remember the I/O port names are implemented as macros in the header file tm4c123gh6pm.h.
We can use the map file to get the absolute addresses for these labels, then use the debugger to observe and modify their contents.
Developing a naming convention will avoid confusion. Possible ideas to consider include:
1. Start every variable name with its type. E.g.,
b means Boolean
true/false
s8 means 8-bit signed
integer
u8 means 8-bit unsigned
integer
s16 means 16-bit signed
integer
u16 means 16-bit
unsigned integer
s32 means 32-bit signed
integer
u32 means 32-bit
unsigned integer
c means 8-bit ASCII
character
s means null terminated
ASCII string
2. Start every local variable with "the" or "my"
3. Start every global
variable and function with associated file or module name. In the
following example the names all begin with Bit_
. Notice how similar
this naming convention recreates the look and feel of the modularity
achieved by classes in C++. E.g.,
/*
**********file=Bit.c*************
Pointer implementation of the a Bit_Fifo
These routines can be used to save
(Bit_Put) and
recall (Bit_Get) binary data 1 bit at a
time (bit streams)
Information is saved/recalled in a first
in first out manner
Bit_FifoSize is the number of 16-bit
words in the Bit_Fifo
The Bit_Fifo is full when it has
16*Bit_FifoSize-1 bits */
#define Bit_FifoSize 4
// 16*4-1=31 bits of storage
unsigned short Bit_Fifo[Bit_FifoSize]; // storage for Bit Stream
struct Bit_Pointer{
unsigned short Mask; // 0x8000,
0x4000,...,2,1
unsigned short *WPt;}; // Pointer to word
containing bit
typedef struct Bit_Pointer Bit_PointerType;
Bit_PointerType Bit_PutPt; // Pointer of where to put next
Bit_PointerType Bit_GetPt; // Pointer of where to get next
/* Bit_FIFO is empty if Bit_PutPt==Bit_GetPt */
/* Bit_FIFO is full if Bit_PutPt+1==Bit_GetPt */
short Bit_Same(Bit_PointerType p1, Bit_PointerType p2){
if((p1.WPt==p2.WPt)&&(p1.Mask==p2.Mask))
return(1);
//yes
return(0);} // no
void Bit_Init(void) {
Bit_PutPt.Mask=Bit_GetPt.Mask=0x8000;
Bit_PutPt.WPt=Bit_GetPt.WPt=&Bit_Fifo[0];
/* Empty */
}
// returns TRUE=1 if successful,
// FALSE=0 if full and data not saved
// input is boolean FALSE if data==0
short
Bit_Put(short data) { Bit_PointerType myPutPt;
myPutPt=Bit_PutPt;
myPutPt.Mask=myPutPt.Mask>>1;
if(myPutPt.Mask==0) {
myPutPt.Mask=0x8000;
if((++myPutPt.WPt)==&Bit_Fifo[Bit_FifoSize])
myPutPt.WPt=&Bit_Fifo[0];
// wrap
}
if (Bit_Same(myPutPt,Bit_GetPt))
return(0);
/* Failed, Bit_Fifo was full */
else {
if(data)
(*Bit_PutPt.WPt)
|= Bit_PutPt.Mask; // set bit
else
(*Bit_PutPt.WPt)
&= ~Bit_PutPt.Mask; // clear bit
Bit_PutPt=myPutPt;
return(1);
}
}
// returns TRUE=1 if successful,
// FALSE=0 if empty and data not removed
// output is boolean 0 means FALSE, nonzero is true
short Bit_Get(unsigned short *datapt) {
if (Bit_Same(Bit_PutPt,Bit_GetPt))
return(0);
/* Failed, Bit_Fifo was empty */
else {
*datapt=(*Bit_GetPt.WPt)&Bit_GetPt.Mask;
Bit_GetPt.Mask=Bit_GetPt.Mask>>1;
if(Bit_GetPt.Mask==0)
{
Bit_GetPt.Mask=0x8000;
if((++Bit_GetPt.WPt)==&Bit_Fifo[Bit_FifoSize])
Bit_GetPt.WPt=&Bit_Fifo[0];
// wrap
}
return(1);
}
}
Punctuation marks (semicolons, colons, commas, apostrophes, quotation marks, braces, brackets, and parentheses) are very important in C. It is one of the most frequent sources of errors for both the beginning and experienced programmers.
Semicolons are used as statement terminators. Strange and confusing syntax errors may be generated when you forget a semicolon, so this is one of the first things to check when trying to remove syntax errors. In this example we assume that Port B has been initialized as an output. Notice that one semicolon is placed at the end of every simple statement in the following example
#define PORTB
(*((volatile unsigned long *)0x400053FC))
void
Step(void){
PORTB = 10;
PORTB = 9;
PORTB = 5;
PORTB = 6;
}
Preprocessor directives
do not end with a semicolon since they are not actually part of the C
language proper. Preprocessor directives begin in the first column with
the #
and
conclude at the end
of the line. The following example will fill the array DataBuffer
with data read from the
input port (PORTB). We assume in this example that Port B has been
initialized as an input. Semicolons are also used in the for
loop
statement (see also Chapter 6),
as illustrated by
void
Fill(void){ short j;
for(j=0; j<100; j++){
DataBuffer[j] = PORTB;
}
}
We can define a label
using the colon. Although C has a goto
statement, I strongly
discourage its use. I believe the software is easier to understand
using the block-structured control statements (if
, if
else
, for
, while
, do
while
, and switch
case
.) The following example
will return after the Port B input reads the same value 100 times in a
row. In this example, we assume Port B has been initialized as an
input. Notice that every time the current value on Port B is different
from the previous value the counter is reinitialized.
char
Debounce(void){ short Cnt; unsigned char LastData;
Start: Cnt=0; /*
number of times Port C is the same */
LastData=PORTB;
Loop: if(++Cnt==100)
goto Done; /* same
thing 100 times */
if(LastData!=PORTB)
goto Start;/* changed */
goto
Loop;
Done: return(LastData);
}
Colons also terminate case
, and default
prefixes that appear in
switch statements. For more information see the section on switch
in Chapter 6. In the following example, the next stepper motor output
is found (the proper sequence is 10,9,5,6). The default case is used to
restart the pattern.
unsigned
char NextStep(unsigned char step){ unsigned char theNext;
switch(step){
case 10: theNext=9; break;
case 9: theNext=5; break;
case 5: theNext=6; break;
case 6: theNext=10; break;
default: theNext=10;
}
return(theNext);
}
For both applications of
the colon (goto
and switch
), we
see that a label
is created that is a potential target for a transfer of control.
Commas separate items that appear in lists. We can create multiple variables of the same type. E.g.,
unsigned
short beginTime,endTime,elapsedTime;
Lists are also used with functions having multiple parameters (both when the function is defined and called):
short
add(short x, short y){ short z;
z = x+y;
if((x>0)&&(y>0)&&(z<0))z
= 32767;
if((x<0)&&(y<0)&&(z>0))z
= -32768;
return(z);
}
void main(void){ short a,b;
a=add(2000,2000)
b=0
while(1){
b=add(b,1);
}
Listing 2-6: Commas separate the parameters of a function
Lists can also be used
in general expressions. Sometimes it adds clarity to a program if
related variables are modified at the same place. The value of a list
of expressions is always the value of the last expression in the list.
In the following example, first thetime
is incremented, thedate
is decremented, then x is set to k+2.
x=(thetime++,--thedate,k+2);
Apostrophes are used to specify
character literals. For more
information
about character literals see the section on characters
in Chapter 3. Assuming the
function OutChar
will print a single ASCII character, the following example will print
the lower case alphabet:
void
Alphabet(void){ unsigned char mych;
for(mych = 'a'; mych <= 'z'; mych++){
OutChar(mych); /* Print next
letter */
}
}
Quotation marks are used to specify string literals. For more information about string literals see the section on strings in Chapter 3. Example
unsigned
const char Msg[12]= "Hello World";
/* Msg
has
11 characters and
termination*/
void PrintHelloWorld(void){
UART_OutString("Hello World");
UART_OutString(Msg);
}
The command Letter='A';
places the ASCII code (65) into the variable Letter
.
The command pt="A";
creates an ASCII string and places a pointer to it into the variable pt
.
Braces {} are used throughout C programs. The most common application is for creating a compound statement. Each open brace { must be matched with a closing brace }. One approach that helps to match up braces is to use indenting. Each time an open brace is used, the source code is tabbed over. In this way, it is easy to see at a glance the brace pairs. Examples of this approach to tabbing are the Bit_Put function within Listing 2-2 and the median function in Listing 1-4.
Square brackets enclose array dimensions (in declarations) and subscripts (in expressions). Thus,
short
Fifo[100];
declares an integer
array named Fifo
consisting of 80 words
numbered from 0 through 99, and
PutPt
= &Fifo[0];
assigns the variable PutPt
to the address of the
first entry of the array.
Parentheses enclose argument lists that are associated with function declarations and calls. They are required even if there are no arguments.
As with all programming languages, C uses parentheses to control the order in which expressions are evaluated. Thus, (11+3)/2 yields 7, whereas 11+3/2 yields 12. Parentheses are very important when writing expressions.
The special characters used as expression operators are covered in the operator section in chapter 5. There are many operators, some of which are single characters
~ ! @ % ^ & * - + = | / : ? < >
,
while others require two characters
++ -- << >> <= += -= *= /= == |= %= &= ^= || && !=
and some even require three characters
<<= >>=
The multiple-character operators can not have white spaces or comments between the characters.
The C syntax can be confusing to the beginning programmer. For example
z
= x+y; /* sets z equal to the sum of x and
y */
z = x_y; /* sets z equal to the value of
x_y */
Go to Chapter 3 on Literals Return to Table of Contents