Chapter 8: Arrays and Strings
What's in Chapter 8?
Array Subscripts
Array Declarations
Array References
Pointers and Array Names
Negative Subscripts
Address Arithmetic
String Functions defined in string.h
Fifo Queue Example
An array is a collection of like variables that share a single name. The individual elements of an array are referenced by appending a subscript, in square brackets, behind the name. The subscript itself can be any legitimate C expression that yields an integer value, even a general expression. Therefore, arrays in C may be regarded as collections of like variables. Although arrays represent one of the simplest data structures, it has wide-spread usage in embedded systems.
Strings are similar to arrays with just a few differences. Usually, the array size is fixed, while strings can have a variable number of elements. Arrays can contain any data type (char short int even other arrays) while strings are usually ASCII characters terminated with a NULL (0) character. In general we allow random access to individual array elements. On the other hand, we usually process strings sequentially character by character from start to end. Since these differences are a matter of semantics rather than specific limitations imposed by the syntax of the C programming language, the descriptions in this chapter apply equally to data arrays and character strings. String literals were discussed earlier in Chapter 3; in this chapter we will define data structures to hold our strings. In addition, C has a rich set of predefined functions to manipulate strings.
When an array element is referenced, the subscript expression designates the desired element by its position in the data. The first element occupies position zero, the second position one, and so on. It follows that the last element is subscripted by [N-1] where N is the number of elements in the array. The statement
data[9] = 0;
for instance, sets the tenth element of data to zero. The array subscript can be any expression that results in a 16-bit integer. The following for-loop clears 100 elements of the array data to zero.
for(j=0;j<100;j++) data[j] = 0;
If the array has two dimensions, then two subscripts are specified when referencing. As programmers we may any assign logical meaning to the first and second subscripts. For example we could consider the first subscript as the row and the second as the column. Then, the statement
ThePosition = position[3][5];
copies the information from the 4th row 6th column into the variable ThePosition. If the array has three dimensions, then three subscripts are specified when referencing. Again we may any assign logical meaning to the various subscripts. For example we could consider the first subscript as the x coordinate, the second subscript as the y coordinate and the third subscript as the z coordinate. Then, the statement
humidity[2][3][4]=100;
sets the humidity at point (2,3,4) to 100. Array subscripts are treated as signed 16-bit integers. It is the programmer's responsibility to see that only positive values are produced, since a negative subscript would refer to some point in memory preceding the array. One must be particularly careful about assuming what existing either in front of or behind our arrays in memory.
Just like any variable, arrays must be declared before they can be accessed. The number of elements in an array is determined by its declaration. Appending a constant expression in square brackets to a name in a declaration identifies the name as the name of an array with the number of elements indicated. Multi-dimensional arrays require multiple sets of brackets. The examples in Listing 8-1 are valid declarations.
short data[5]; /* define data, allocate space for 5 16-bit
integers */
char string[20]; /* define string, allocate space for 20 8-bit
characters */
int time,width[6]; /* define time, width, allocate space for 16-bit
characters */
short xx[10][5]; /* define xx, allocate space for 50 16-bit
integers */
short pts[5][5][5]; /* define pts, allocate space for 125 16-bit
integers */
extern char buffer[]; /* declare buffer as an external character
array */
Notice in the third example that ordinary variables may be declared together with arrays in the same statement. In fact array declarations obey the syntax rules of ordinary declarations, as described in Chapters 4 and 7, except that certain names are designated as arrays by the presence of a dimension expression.
Notice the size of the external array, buffer[], is not given. This leads to an important point about how C deals with array subscripts. The array dimensions are only used to determine how much memory to reserve. It is the programmer's responsibility to stay within the proper bounds. In particular, you must not let the subscript become negative for above N-1, where N is the size of the array.
Another situation in which an array's size need not be specified is when the array elements are given initial values. As we will see in Chapter 9, the compiler will determine the size of such an array from the number of initial values.
In C we may refer to an array in several ways. Most obviously, we can write subscripted references to array elements, as we have already seen. C interprets an unsubscripted array name as the address of the array. In the following example, the first two lines set xto equal the value of the first element of the array. The third and fourth lines both set pt equal to the address of the array. Chapter 7 introduced the address operator & that yields the address of an object. This operator may also be used with array elements. Thus, the expression &data[3] yields the address of the fourth element. Notice too that &data[0] and data+0 and data are all equivalent. It should be clear by analogy that &data[3] and data+3 are also equivalent.
short x,*pt,data[5]; /* a variable, a pointer, and an array */
void Set(void){
x=data[0]; /* set x equal to the first element of data */
x=*data; /* set x equal to the first element of data */
pt=data; /* set pt to the address of data */
pt=&data[0]; /* set pt to the address of data */
x=data[3]; /* set x equal to the fourth element of data
*/
x=*(data+3); /* set x equal to the fourth element of data
*/
pt=data+3; /* set pt to the address of the fourth element
*/
pt=&data[3]; /* set pt to the address of the fourth element
*/
}
The examples in the section suggest that pointers and array names might be used interchangeably. And, in many cases, they may. C will let us subscript pointers and also use array names as addresses. In the following example, the pointer pt contains the address of an array of integers. Notice the expression pt[3] is equivalent to *(pt+3).
short *pt,data[5]; /* a pointer, and an array */
void Set(void){
pt=data; /* set pt to the address of data */
data[2]=5; /* set the third element of data to 5 */
pt[2]=5; /* set the third element of data to 5 */
*(pt+2)=5; /* set the third element of data to 5 */
}
It is important to realize that although C accepts unsubscripted array names at addresses, they are not the same as pointers. In the following example, we can not place the unsubscripted array name on the left-hand-side of an assignment statement.
short buffer[5],data[5]; /* two arrays */
void Set(void){
data=buffer; /* illegal assignment */
}
Since the unsubscripted array name is its address, the statement data=buffer; is an attempt to change its address. What sense would that make? The array, like any object, has a fixed home in memory; therefore, its address cannot be changed. We say that array is not a lvalue; that is, it cannot be used on the left side of an assignment operator (nor may it be operated on by increment or decrement operators). It simply cannot be changed. Not only does this assignment make no sense, it is physically impossible because an array address is not a variable. There is no place reserved in memory for an array's address to reside, only the elements.
Since a pointer may point to any element of an array, not just the first one, it follows that negative subscripts applied to pointers might well yield array references that are in bounds. This sort of thing might be useful in situations where there is a relationship between successive elements in an array and it becomes necessary to reference an element preceding the one being pointed to. In the following example, data is an array containing time-dependent (or space-dependent) information. If pt points to an element in the array, pt[-1] is the previous element and pt[1] is the following one. The function calculates the second derivative using a simple discrete derivative.
short *pt,data[100]; /* a pointer and an array */
void CalcSecond(void){ short d2Vdt2;
for(pt=data+1;pt<data+99;pt++)
d2Vdt2=(pt[-1]-2*pt[0]+pt[1]);
}
As we have seen, addresses (pointers, array names, and values produced by the address operator) may be used freely in expressions. This one fact is responsible for much of the power of C.
As with pointers (Chapter 7), all addresses are treated as unsigned quantities. Therefore, only unsigned operations are performed on them. Of all the arithmetic operations that could be performed on addresses only two make sense, displacing an address by a positive or negative amount, and taking the difference between two addresses. All others, though permissible, yield meaningless results.
Displacing an address can be done either by means of subscripts or by use of the plus and minus operators, as we saw earlier. These operations should be used only when the original address and the displaced address refer to positions in the same array or data structure. Any other situation would assume a knowledge of how memory is organized and would, therefore, be ill-advised for portability reasons.
As we saw in Chapter 7, taking the difference of two addresses is a special case in which the compiler interprets the result as the number of objects lying between the addresses.
ICC11 ICC12 and Metrowerks implement many useful string manipulation functions. Recall that strings are 8-bit arrays with a null-termination. The prototypes for these functions can be found in the string.h file. You simply include this file whenever you wish to use any of these routines. The rest of this section explains the functions one by one. ICC11 and ICC12 treat each of the counts as an unsigned 16-bit integer.
typedef unsigned int size_t;
void *memchr(void *, int, size_t);
int memcmp(void *, void *, size_t);
void *memcpy(void *, void *, size_t);
void *memmove(void *, void *, size_t);
void *memset(void *, int, size_t);
char *strcat(char *, const char *);
char *strchr(const char *, int);
int strcmp(const char *, const char *);
int strcoll(const char *, const char *);
char *strcpy(char *, const char *);
size_t strcspn(const char *, const char *);
size_t strlen(const char *);
char *strncat(char *, const char *, size_t);
int strncmp(const char *, const char *, size_t);
char *strncpy(char *, const char *, size_t);
char *strpbrk(const char *, const char *);
char *strrchr(const char *, int);
size_t strspn(const char *, const char *);
char *strstr(const char *, const char *);
The first five functions are general-purpose memory handling routines.
void *memchr(void *p, int c, size_t n);
Starting in memory at address p, memchr will search for the first unsigned 8-bit byte that matches the value in c. At most n bytes are searched. If successful, a pointer to the 8-bit byte is returned, otherwise a NULL pointer is returned.
int memcmp(void *p, void *q, size_t n);
Assuming the two pointers are directed at 8-bit data blocks of size n, memcmp will return a negative value if the block pointed to by pis lexicographically less than the block pointed to by q. The return value will be zero if they match, and positive if the block pointed to by pis lexicographically greater than the block pointed to by q.
void *memcpy(void *dst, void *src, size_t n);
Assuming the two pointers are directed at 8-bit data blocks of size n, memcpy will copy the data pointed to by pointer src, placing it in the memory block pointed to by pointer dst. The pointer dst is returned.
void *memmove(void *dst, void *src, size_t);
Assuming the two pointers are directed at 8-bit data blocks of size n, memmove will copy the data pointed to by pointer src, placing it in the memory block pointed to by pointer dst. This routine works even if the blocks overlap. The pointer dst is returned.
void *memset(void *p, int c, size_t n);
Starting in memory at address p, memset will set n 8-bit bytes to the 8-bit value in c. The pointer pis returned.
The remaining functions are string-handling routines.
char *strcat(char *p, const char *q);
Assuming the two pointers are directed at two null-terminated strings, strcat will append a copy of the string pointed to by pointer q, placing it the end of the string pointed to by pointer p. The pointer pis returned. It is the programmer's responsibility to ensure the destination buffer is large enough.
char *strchr(const char *p, int c);
Assuming the pointer is directed at a null-terminated string. Starting in memory at address p, strchr will search for the first unsigned 8-bit byte that matches the value in c. It will search until a match is found or stop at the end of the string. If successful, a pointer to the 8-bit byte is returned, otherwise a NULL pointer is returned.
int strcmp(const char *p, const char *q);
int strcoll(const char *p, const char *q);
Assuming the two pointers are directed at two null-terminated strings, strcmp will return a negative value if the string pointed to by pis lexicographically less than the string pointed to by q. The return value will be zero if they match, and positive if the string pointed to by pis lexicographically greater than the string pointed to by q. In general C allows the comparison rule used in strcoll to depend on the current locale, but in ICC11 and ICC12 strcoll is the same as strcmp.
char *strcpy(char *dst, const char *src);
We assume scr points to a null-terminated string and dst points to a memory buffer large enough to hold the string. strcpy will copy the string (including the null) pointed to by src, into the buffer pointed to by pointer dst. The pointer dst is returned. It is the programmer's responsibility to ensure the destination buffer is large enough.
size_t strcspn(const char *p, const char *q);
The string function strcspn will compute the length of the maximal initial substring within the string pointed to by pthat has no characters in common with the string pointed to by q. For example the following call returns the value 5.
n=strcspn("label: ldaa 10,x ;comment"," ;:*\n\t\l");
A common application of this routine is parsing for tokens. The first parameter is a line of text and the second parameter is a list of delimiters (e.g., space, semicolon, colon, star, return, tab and linefeed). The function returns the length of the first token (i.e., the size of label).
size_t strlen(const char *p);
The string function strlen returns the length of the string pointed to by pointer p. The length is the number of characters in the string not counting the null-termination.
char *strncat(char *p, const char *q, size_t n);
This function is similar to strcat. Assuming the two pointers are directed at two null-terminated strings, strncat will append a copy of the string pointed to by pointer q, placing it the end of the string pointed to by pointer p. The parameter nlimits the number of characters, not including the null that will be copied. The pointer pis returned. It is the programmer's responsibility to ensure the destination buffer is large enough.
int strncmp(const char *p, const char *q, size_t n);
This function is similar to strcmp. Assuming the two pointers are directed at two null-terminated strings, strncmp will return a negative value if the string pointed to by p is lexicographically less than the string pointed to by q. The return value will be zero if they match, and positive if the string pointed to by pis lexicographically greater than the string pointed to by q. The parameter nlimits the number of characters, not including the null that will be compared. For example, the following function call will return a zero because the first 8 characters are the same:
n=strncmp("MC68HC11A8","MC68HC11E9",8);
The following function is similar to strcpy.
char *strncpy(char *dst, const char *src, size_t n);
We assume scr points to a null-terminated string and dst points to a memory buffer large enough to hold the string. strncpy will copy the string (including the null) pointed to by src, into the buffer pointed to by pointer dst. The pointer dst is returned. The parameter nlimits the number of characters, not including the null that will be copied. If the size of the string pointed to by src is equal to or larger than n, then the null will not be copied into the buffer pointer to by dst. It is the programmer's responsibility to ensure the destination buffer is large enough.
char *strpbrk(const char *p, const char *q);
This function, strpbrk , is called pointer to break. The function will search the string pointed to by pfor the first instance of any of the characters in the string pointed to by q. A pointer to the found character is returned. If the search fails to find any characters of the string pointed to by qin the string pointed to by p, then a null pointer is returned. For example the following call returns a pointer to the colon.
pt=strpbrk("label: ldaa 10,x ;comment"," ;:*\n\t\l");
This function, like strcspn, can be used for parsing tokens.
char *strrchr(const char *p, int c);
The function strrchr will search the string pointed to by pfrom the right for the first instance of the character in c. A pointer to the found character is returned. If the search fails to find any characters with the 8-bit value cin the string pointed to by p, then a null pointer is returned. For example the following calls set the pt1 to point to the 'a' in label and pt2 to point to the second 'a' in ldaa.
pt1=strchr("label: ldaa 10,x ;comment",'a');
pt1=strrchr("label: ldaa 10,x ;comment",'a');
Notice that strchr searches from the left while strrchr searches from the right .
size_t strspn(const char *p, const char *q);
The function strspn will return the length of the maximal initial substring in the string pointed to by pthat consists entirely of characters in the string pointed to by q. In the following example the second string contains the valid set of hexadecimal digits. The function call will return 6 because there is a valid 6-digit hexadecimal string at the start of the line.
n=strspn("A12F05+12BAD*45","01234567890ABCDEF");
char *strstr(const char *p, const char *q);
The function strstr will search the string pointed to by pfrom the left for the first instance of the string pointed to by q. A pointer to the found substring within the first string is returned. If the search fails to find a match, then a null pointer is returned. For example the following calls set the pt to point to the 'l' in ldaa.
pt=strstr("label: ldaa 10,x ;comment","ldaa");
A FIFO Queue Example using indices
Another method to implement a statically allocated first-in-first-out FIFO is to use indices instead of pointers. This method is necessary for compilers that do not support pointers. The purpose of this example is to illustrate the use of arrays and indices. Just like the previous FIFO, this is used for order-preserving temporary storage. The function Fifo_Put will enter one 8-bit byte into the queue, and Fifo_Get will remove one byte. If you call Fifo_Put while the FIFO is full (Size is equal to FifoSize), the routine will return a zero. Otherwise, Fifo_Put will save the data in the queue and return a one. The index PutI specifies where to put the next 8-bit data. The routine Fifo_Get actually returns two parameters. The queue status is the regular function return parameter, while the data removed from the queue is return by reference. I.e., the calling routine passes in a pointer, and Fifo_Get stores the removed data at that address. If you call Fifo_Get while the FIFO is empty (Size is equal to zero), the routine will return a zero. Otherwise, Fifo_Get will return the oldest data from the queue and return a one. The index GetI specifies where to get the next 8-bit data. The following FIFO implementation uses two indices and a counter.
/* Index,counter implementation of the FIFO */
#define FifoSize 10 /* Number of 8 bit data in the Fifo */
unsigned char PutI; /* Index of where to put next */
unsigned char GetI; /* Index of where to get next */
unsigned char Size; /* Number currently in the FIFO */
/* FIFO is empty if Size=0 */
/* FIFO is full if Size=FifoSize */
char Fifo[FifoSize]; /* The statically allocated data */
void Fifo_Init(void) {unsigned char SaveSP;
asm tpa
/* enter critical section */
asm staa SaveSP
asm sei
PutI=GetI=Size=0; /* Empty when Size==0 */
asm staa SaveSP /* leaving
critical section */
asm tap
}
int Fifo_Put (char data) { unsigned char SaveSP;
if (Size == FifoSize )
return(0); /* Failed, fifo was full */
else{
asm tpa
/* enter critical section */
asm staa SaveSP
asm sei
Size++;
Fifo[PutI++]=data; /* put data into fifo */
if (PutI == FifoSize) PutI = 0; /* Wrap */
asm staa SaveSP /* leaving
critical section */
asm tap
return(-1); /* Successful */
}
}
int Fifo_Get (char *datapt) { unsigned char SaveSP;
if (Size == 0 )
return(0); /* Empty if Size=0 */
else{
asm tpa
/* enter critical section */
asm staa SaveSP
asm sei
*datapt=Fifo[GetI++];
Size--;
if (GetI == FifoSize) GetI = 0;
asm staa SaveSP /*
leaving critical section */
asm tap
return(-1); }
}
Go to Chapter 9 on Structures Return to Table of Contents