Chapter 2: Tokens

What's in Chapter 2?

ASCII characters
Literals include numbers characters and strings
Keywords are predefined
Names are user-defined
Punctuation marks
Operators

This chapter defines the basic building blocks of a C program. Understanding the concepts in this chapter will help eliminate the syntax bugs that confuse even the veteran C programmer. A simple syntax error can generate 100's of obscure compiler errors. In this chapter we will introduce some of the syntax of the language.

To understand the syntax of a C program, we divide it into tokens separated by white spaces and punctuation. Remember the white spaces include space, tab, carriage returns and line feeds. A token may be a single character or a sequence of characters that form a single item. The first step of a compiler is to process the program into a list of tokens and punctuation marks. The following example includes punctuation marks of ( ) { } ; The compiler then checks for proper syntax. And, finally, it creates object code that performs the intended operations. In the following example:

void main(void){ short z;
  z = 0;
  while(1){
    z = z+1;
  }
}

Listing 2-1: Example of a function call

The following sequence shows the tokens and punctuation marks from the above listing:

void main ( void ) { short z ; z = 0 ; while ( 1 ) { z = z + 1 ; } }

Since tokens are the building blocks of programs, we begin our study of C language by defining its tokens.

 

ASCII Character Set

Like most programming languages C uses the standard ASCII character set. The following table shows the 128 standard ASCII code. One or more white space can be used to separate tokens and or punctuation marks. The white space characters in C include horizontal tab (9=$09), the carriage return (13=$0D), the line feed (10=$0A), space (32=$20).

BITS 4 to 6

    0 1 2 3 4 5 6 7
  0 NUL DLE SP 0 @ P ` p
B 1 SOH DC1 ! 1 A Q a q
I 2 STX DC2 " 2 B R b r
T 3 ETX DC3 # 3 C S c s
S 4 EOT DC4 $ 4 D T d t
  5 ENQ NAK % 5 E U e u
0 6 ACK SYN & 6 F V f v
  7 BEL ETB ' 7 G W g w
T 8 BS CAN ( 8 H X h x
O 9 HT EM ) 9 I Y i y
  A LF SUB * : J Z j z
3 B VT ESC + ; K [ k {
  C FF FS , < L \ l |
  D CR GS - = M ] m }
  E SO RS . > N ^ n ~
  F S1 US / ? O _ o DEL

Table 2-1. ASCII Character codes.

The first 32 (values 0 to 31 or $00 to $1F) and the last one (127=$7F) are classified as control characters. Codes 32 to 126 (or $20 to $7E) include the "normal" characters. Normal characters are divided into

the space character (32=$20),
the numeric digits 0 to 9 (48 to 57 or $30 to $39),
the uppercase alphabet A to Z (65 to 90 or $41 to $5A),
the lowercase alphabet a to z (97 to122 or $61 to $7A), and
the special characters (all the rest).

Literals

Numeric literals consist of an uninterrupted sequence of digits delimited by white spaces or special characters (operators or punctuation). Although Metrowerks does support floating point, this document will not cover it. The use of floating point requires a substantial about of program memory and execution time, therefore most applications should be implemented using integer math. Consequently the period will not appear in numbers as described in this document. For more information about numbers see the sections on decimals, octals, or hexadecimals in Chapter 3.

Character literals are written by enclosing an ASCII character in apostrophes (single quotes). We would write 'a' for a character with the ASCII value of the lowercase a (97). The control characters can also be defined as constants. For example '\t' is the tab character. For more information about character literals see the section on characters in Chapter 3.

String literals are written as a sequence of ASCII characters bounded by quotation marks (double quotes). Thus, "ABC" describes a string of characters containing the first three letters of the alphabet in uppercase. For more information about string literals see the section on strings in Chapter 3.

Keywords

There are some predefined tokens, called keywords, that have specific meaning in C programs. The reserved words we will cover in this document are:

keyword
meaning
asm Insert assembly code
auto Specifies a variable as automatic (created on the stack)
break Causes the program control structure to finish
case One possibility within a switch statement
char 8-bit integer
const Defines global parameter as constant in ROM, and defines a local parameter as fixed value
continue Causes the program to go to beginning of loop
default Used in switch statement for all other cases
do Used for creating program loops
double Specifies variable as double precision floating point
else Alternative part of a conditional
extern Defined in another module
float Specifies variable as single precision floating point
for Used for creating program loops
goto Causes program to jump to specified location
if Conditional control structure
int 16-bit integer (same as short on the 6811 and 6812) It should be avoided in most cases because the implementation will vary from compiler to compiler.
long 32-bit integer
register Specifies how to implement a local
return Leave function
short 16-bit integer
signed Specifies variable as signed (default)
sizeof Built-in function returns the size of an object
static Stored permanently in memory, accessed locally
struct Used for creating data structures
switch Complex conditional control structure
typedef Used to create new data types
unsigned Always greater than or equal to zero
void Used in parameter list to mean no parameter
volatile Can change implicitly outside the direct action of the software. It disables compiler optimization, forcing the compiler to fetch a new value each time
while Used for creating program loops

 Table 2-2. Keywords have predefined meanings.

Did you notice that all of the keywords in C are lowercase? Notice also that as a matter of style, I used a mixture of upper and lowercase for the names I created, and all uppercase for the I/O ports. It is a good programming practice not to use these keywords for your variable or function names.

Names

We use names to identify our variables, functions, and macros. ICC11/ICC12 names may be up to 31 characters long. Metrowerks names may be up to xxx characters long. Names must begin with a letter or underscore and the remaining characters must be either letters or digits. We can use a mixture of upper and lower case or the underscore character to create self-explaining symbols. E.g.,

time_of_day    go_left_then_stop

TimeOfDay      GoLeftThenStop

The careful selection of names goes a long way to making our programs more readable. Names may be written with both upper and lowercase letters. The names are case sensitive. Therefore the following names are different:

thetemperature
THETEMPERATURE
TheTemperature

The practice of naming macros in uppercase calls attention to the fact that they are not variable names but defined symbols. Remember the I/O port names are implemented as macros in the header files HC11.h and HC12.h.

Every global name defined with the ICC11/ICC12 compiler generates an assembly language label of the same name, but preceded by an underscore. The purpose of the underscore is to avoid clashes with the assembler's reserved words. So, as a matter of practice, we should not ordinarily name globals with leading underscores. Metrowerks labels will not include the underscore. For examples of this naming convention, observe the assembly generated by the compiler (either the assembly itself in the *.s file or the listing file *.lst file.) These assembly names are important during the debugging stages. We can use the map file to get the absolute addresses for these labels, then use the debugger to observe and modify their contents.

Since the ImageCraft compiler adds its own underscore, names written with a leading underscore appear in the assembly file with two leading underscores.

Developing a naming convention will avoid confusion. Possible ideas to consider include:

1. Start every variable name with its type. E.g.,

b means Boolean true/false
n means 8 bit signed integer
u means 8 bit unsigned integer
m means 16 bit signed integer
v means 16 bit unsigned integer
c means 8 bit ASCII character
s means null terminated ASCII string

2. Start every local variable with "the" or "my"

3. Start every global variable and function with associated file or module name. In the following example the names all begin with Bit_. Notice how similar this naming convention recreates the look and feel of the modularity achieved by classes in C++. E.g.,

/* **********file=Bit.c*************
   Pointer implementation of the a Bit_Fifo
   These routines can be used to save (Bit_Put) and
   recall (Bit_Get) binary data 1 bit at a time (bit streams)
   Information is saved/recalled in a first in first out manner
   Bit_FifoSize is the number of 16 bit words in the Bit_Fifo
   The Bit_Fifo is full when it has 16*Bit_FifoSize-1 bits */
#define Bit_FifoSize4
// 16*4-1=31 bits of storage
unsigned short Bit_Fifo[Bit_FifoSize]; // storage for Bit Stream
struct Bit_Pointer{
   unsigned short Mask; // 0x8000, 0x4000,...,2,1
   unsigned short *WPt;}; // Pointer to word containing bit
typedef struct Bit_Pointer Bit_PointerType;
Bit_PointerType Bit_PutPt; // Pointer of where to put next
Bit_PointerType Bit_GetPt; // Pointer of where to get next
/* Bit_FIFO is empty if Bit_PutPt==Bit_GetPt */
/* Bit_FIFO is full if Bit_PutPt+1==Bit_GetPt */
short Bit_Same(Bit_PointerType p1, Bit_PointerType p2){
   if((p1.WPt==p2.WPt)&&(p1.Mask==p2.Mask))
      return(1); //yes
   return(0);} // no
void Bit_Init(void) {
   Bit_PutPt.Mask=Bit_GetPt.Mask=0x8000;
   Bit_PutPt.WPt=Bit_GetPt.WPt=&Bit_Fifo[0]; /* Empty */
}
// returns TRUE=1 if successful,
// FALSE=0 if full and data not saved
// input is boolean FALSE if data==0
short Bit_Put (short data) { Bit_PointerType myPutPt;
   myPutPt=Bit_PutPt;
   myPutPt.Mask=myPutPt.Mask>>1;
   if(myPutPt.Mask==0) {
      myPutPt.Mask=0x8000;
      if((++myPutPt.WPt)==&Bit_Fifo[Bit_FifoSize])
         myPutPt.WPt=&Bit_Fifo[0]; // wrap
   }
   if (Bit_Same(myPutPt,Bit_GetPt))
      return(0); /* Failed, Bit_Fifo was full */
   else {
      if(data)
         (*Bit_PutPt.WPt) |= Bit_PutPt.Mask; // set bit
      else
         (*Bit_PutPt.WPt) &= ~Bit_PutPt.Mask; // clear bit
      Bit_PutPt=myPutPt;
      return(1);
   }
}
// returns TRUE=1 if successful,
// FALSE=0 if empty and data not removed
// output is boolean 0 means FALSE, nonzero is true
short Bit_Get (unsigned short *datapt) {
   if (Bit_Same(Bit_PutPt,Bit_GetPt))
      return(0); /* Failed, Bit_Fifo was empty */
   else {
      *datapt=(*Bit_GetPt.WPt)&Bit_GetPt.Mask;
      Bit_GetPt.Mask=Bit_GetPt.Mask>>1;
      if(Bit_GetPt.Mask==0) {
         Bit_GetPt.Mask=0x8000;
         if((++Bit_GetPt.WPt)==&Bit_Fifo[Bit_FifoSize])
            Bit_GetPt.WPt=&Bit_Fifo[0]; // wrap
      }
      return(1);
   }
}

Listing 2-2: This naming convention can create modularity similar to classes in C++.

 

Punctuation

Punctuation marks (semicolons, colons, commas, apostrophes, quotation marks, braces, brackets, and parentheses) are very important in C. It is one of the most frequent sources of errors for both the beginning and experienced programmers.

Semicolons

Semicolons are used as statement terminators. Strange and confusing syntax errors may be generated when you forget a semicolon, so this is one of the first things to check when trying to remove syntax errors. Notice that one semicolon is placed at the end of every simple statement in the following example

#define PORTB *(unsigned char volatile *)(0x0001)
void Step(void){
  PORTB = 10;
  PORTB = 9;
  PORTB = 5;
  PORTB = 6;
}

Listing 2-3: Semicolons are used to separate one statement from the next.

Preprocessor directives do not end with a semicolon since they are not actually part of the C language proper. Preprocessor directives begin in the first column with the #and conclude at the end of the line. The following example will fill the array DataBuffer with data read from the input port (PORTC). We assume in this example that Port C has been initialized as an input. Semicolons are also used in the for loop statement (see also Chapter 6), as illustrated by

void Fill(void){ short j;
  for(j=0; j<100; j++){
    DataBuffer[j] = PORTC;
  }
}

Listing 2-4: Semicolons are used to separate three fields of the for statement.

 

Colons

We can define a label using the colon. Although C has a goto statement, I strongly discourage its use. I believe the software is easier to understand using the block-structured control statements (if, if else, for, while, do while, and switch case.) The following example will return after the Port C input reads the same value 100 times in a row. Again we assume Port C has been initialized as an input. Notice that every time the current value on Port C is different from the previous value the counter is reinitialized.

char Debounce(void){ short Cnt; unsigned char LastData;
Start:    Cnt=0;          /* number of times Port C is the same */
          LastData=PORTC; 
Loop:     if(++Cnt==100) goto Done;     /* same thing 100 times */
          if(LastData!=PORTC) goto Start;/* changed */ 
          goto Loop; 
Done:     return(LastData);
}

Listing 2-4: Colons are used to define labels (places we can jump to)

Colons also terminate case, and default prefixes that appear in switch statements. For more information see the section on switch in Chapter 6. In the following example, the next stepper motor output is found (the proper sequence is 10,9,5,6). The default case is used to restart the pattern.

unsigned char NextStep(unsigned char step){ unsigned char theNext;
  switch(step){
    case 10: theNext=9; break;
    case 9: theNext=5; break;
    case 5: theNext=6; break;
    case 6: theNext=10; break;
    default: theNext=10;
  }
  return(theNext);
}

Listing 2-5: Colons are also used to with the switch statement

For both applications of the colon (goto and switch), we see that a label is created that is a potential target for a transfer of contrfffffffol.

Commas

Commas separate items that appear in lists. We can create multiple variables of the same type. E.g.,

unsigned short beginTime,endTime,elapsedTime;

Lists are also used with functions having multiple parameters (both when the function is defined and called):

short add(short x, short y){ short z;
  z = x+y;
  if((x>0)&&(y>0)&&(z<0))z = 32767;
  if((x<0)&&(y<0)&&(z>0))z = -32768;
  return(z);
}
void main(void){ short a,b;
  a=add(2000,2000)
  b=0
  while(1){
    b=add(b,1);
}

Listing 2-6: Commas separate the parameters of a function

Lists can also be used in general expressions. Sometimes it adds clarity to a program if related variables are modified at the same place. The value of a list of expressions is always the value of the last expression in the list. In the following example, first thetime is incremented, thedate is decremented, then x is set to k+2.

x=(thetime++,--thedate,k+2);

Apostrophes

Apostrophes are used to specify character literals. For more information about character literals see the section on characters in Chapter 3. Assuming the function OutChar will print a single ASCII character, the following example will print the lower case alphabet:

void Alphabet(void){ unsigned char mych;
  for(mych='a';mych<='z';mych++){
    OutChar(mych); /* Print next letter */
  }     
}

Listing 2-7: Apostrophes are used to specify characters

Quotation marks

Quotation marks are used to specify string literals. For more information about string literals see the section on strings in Chapter 3. Example

unsigned const char Msg[12]= "Hello World"; /* Place for 11 characters and termination*/
void PrintHelloWorld(void){
  SCI_OutString("Hello World");
  SCI_OutString(Msg);
}

Listing 2-8: Quotation marks are used to specify strings

The command Letter='A'; places the ASCII code (65) into the variable Letter. The command pt="A"; creates an ASCII string and places a pointer to it into the variable pt.

Braces

Braces {} are used throughout C programs. The most common application is for creating a compound statement. Each open brace { must be matched with a closing brace }. One approach that helps to match up braces is to use indenting. Each time an open brace is used, the source code is tabbed over. In this way, it is easy to see at a glance the brace pairs. Examples of this approach to tabbing are the Bit_Put function within Listing 2-2 and the median function in Listing 1-4.

Brackets

Square brackets enclose array dimensions (in declarations) and subscripts (in expressions). Thus,

short Fifo[100];

declares an integer array named Fifo consisting of 80 words numbered from 0 through 99, and

PutPt = &Fifo[0];

assigns the variable PutPt to the address of the first entry of the array.

Parentheses

Parentheses enclose argument lists that are associated with function declarations and calls. They are required even if there are no arguments.

As with all programming languages, C uses parentheses to control the order in which expressions are evaluated. Thus, (11+3)/2 yields 7, whereas 11+3/2 yields 12. Parentheses are very important when writing expressions.

 

Operators

The special characters used as expression operators are covered in the operator section in chapter 5. There are many operators, some of which are single characters

~  !  @  %  ^  &  *  -  +  =  |  /  :  ?  <  > ,

while others require two characters

++  --  <<  >>  <=  +=  -=  *=  /=  ==  |=  %=  &=  ^=  ||  &&  !=

and some even require three characters

<<=  >>=

The multiple-character operators can not have white spaces or comments between the characters.

The C syntax can be confusing to the beginning programmer. For example

z = x+y;   /* sets z equal to the sum of x and y */
z = x_y;   /* sets z equal to the value of x_y */

 

Go to Chapter 3 on Literals Return to Table of Contents