Software Development in the Unix Environment

Prof. Brian L. Evans
Department of Electrical and Computer Engineering
The University of Texas at Austin
bevans@ece.utexas.edu

http://www.ece.utexas.edu/~bevans/talks/software_development/

Table of Contents

Books to Help Get Started - Setting Up Your Unix Account - Getting It Done - Making It Portable - More Information - Documentation Extraction Tools

This information serves as a starting point for people who want to develop software for the Unix environment. I present this material regularly in departmental digital signal processing seminars. The first four seminars took place on March 26, 1999; January 23, 1998; February 28, 1997; and October 18, 1996.

Books to Help Get Started

Developing software under Unix can be daunting at first. Unix shell commands are cryptic, as if Unix had been developed by professional programmers for professional programmers. In demystifying Unix, I have found the following textbooks and reference books useful.

High-level compiled languages

C Programming, reference book: Brian Kernighan and Rob Ritchie, The C Programming Language, 2nd ed., Prentice Hall, 1988, ISBN 0131103709.
C++ Programming, user's guide: Bjarne Stroustrup, The C++ programming language, 3rd ed., Addison-Wesley, 1997, ISBN 0201889544.
C++ Programming, standard library: C++ Standard Library, Addison-Wesley, 1999, ISBN 0201379260.

Unix operating system

getting started: Brian W. Kernighan and Rob Pike, The UNIX programming environment, Prentice-Hall, 1984, ISBN 0139376992.
software development: Unix Software Development, Unix Press, 1992, ISBN 0130176907.
reference: W. Richard Stevens, Advanced Programming in the UNIX Environment, Addison Wesley, 1992, ISBN 0-201-56317-7.

Setting Up Your Unix Account

It takes some time to set up a Unix account for software development. When logging into a Unix system, the system will execute your personal startup file in your home directory:

.cshrc if you run the C shell
.kshrc if you run the Korn shell

To find out what shell you are running, type finger -l username. You can change your shell by typing chsh.

In the appropriate login file, you will need to define a path and other environment variables. For an example, see my login files, which are available on the machines in the Learning Resource Center cluster in my ~bevans account. Some these settings are explained below. Note that many shells will also evaluate a second file

.login if you run the C shell
.profile if you run the Korn shell

but this is not guaranteed from machine to machine. For example, this second login file is not evaluated in Common Desktop Environment terminal windows. Your setup will be more portable if you do not rely on the second login file.

Environment Variables

When you login, the shell is only guaranteed to define the HOME environment variable, which is set to the full path name of your home directory. The use of ~ to represent your home directory is not part of the Portable Operating Systems Interface (POSIX) standard. When typing in commands to Unix in an interactive shell, the shell will generally expand the ~ character properly. When referring to your home directory in your login files or other shell scripts, use $HOME to be portable.

Another useful setting is the name of the user, which is generally maintained by the LOGNAME environment variable. An alternate way of retrieving this information is to evaluate the whoami program. The whoami program, however, does not work properly in all instances. Some platforms define the USER environment variable to hold the user name.

For software development, you will want to set the dynamic library path set properly. This path lists the directories to search for finding dynamic or shared libraries. The environment variable is

SHLIB_PATH for HPs
LD_LIBRARY_PATH for Suns

Shared libraries are used for X window routines and system calls. On the Solaris machines in the Learning Resource Center, for example, I set LD_LIBRARY_PATH to

/usr/openwin/lib:/usr/local/X11/lib:/usr/lib:/usr/ucblib:/usr/local/gnu/lib

When you distribute binary programs to other users and you use shared libraries, then you will want to inform the others users about how to set these environment variables properly.

Path Setting

The path setting, which lists the directories to search for executable programs, is machine dependent. For software development, you will want the directories of the GNU tools (usually /usr/local/bin and /usr/local/gnu/bin) at the beginning of the path in your login file, as shown below:

# Part of the path setting independent of specific tools and operating systems
set basicpath = (/usr/ucb /usr/sbin /usr/bin /bin)
set iopath = (/usr/dt/bin)
set localpath = (/usr/local/bin /usr/local/gnu/bin /usr/local/packages/mh)
set path = ($localpath $basicpath $iopath)

Getting It Done

Software development in the Unix environment is greatly simplified by a variety of freely distributable tools from the Free Software Foundation:

Emacs (emacs)
C and C++ compilers (gcc, g++)
Source code debuggers (gdb)
Make
Lexical parsers and compilers (bison, flex)
Source code control (rcs)

Their tools are consistent across platforms. Complementing these tools are

commercial debugging tools such as Purify and Insure++
profiling tools such as Quantify
code coverage such as Pure Coverage and Code Wizard

For the 1997-1998 academic year, we have a departmental license for the Parasoft toolset (Insure++, CodeWizard) installed on the Solaris machines on the Learning Resource Center cluster. Their tools run on Solaris, HP, Linux, and AIX operating systems among others. During the 1997-1998 academic year, we had a departmental license for the Rational toolset (Purify, Quantify, Pure Coverage). Demonstration versions of these tools may be downloaded.

Compilation

A simple example using the GNU C compiler to compile x.c into x:

gcc -o x x.c

As more and more libraries are called by the program:

gcc -o x x.c -lm -lbsd

We can automate the commands by using makefiles.

Make files

Makefiles have two parts: definitions and commands:

Make files are called Makefile or makefile.
Make files are evaluated by running the make program.
When make runs, all environment variables become make variables
Reference make variables using $(VAR)

Here is a simple makefile to build a C file x.c into an object module x.o and an executable program x.

# x.o must be rebuilt if x.c changes
# note that the white space in front of gcc is a TAB
x.o: x.c
	gcc -c x.c

# x must be rebuilt if x.o changes
x: x.o
	gcc -o x x.o

Note that the white space before the gcc is a TAB. Using spaces instead of a TAB will result in an error. To build (make) the executable x, type

make x

which will execute the following commands:

gcc -c x.c
gcc -o x x.o

If you make the executable again,

unix> make x
make: `x' is up to date.

No commands were executed because neither x.c nor x.o changed.

The above makefile is not very general. We have hard-coded what compiler to use which varies from machine to machine. Here is a more flexible makefile.

# Definitions
CC = gcc
LINKER = gcc
CFLAGS = -c
OBJS = x.o
 
# Commands

# GNU implicit make rule using pattern matching
# defines how to convert a C file into an object module
%.o:	%.c
        $(CC) $(CFLAGS) $<
 
# Explicit make rule for x
x:	$(OBJS)
        $(LINKER) $(OBJS) -o x

Now, we can configure the values of CC and CFLAGS based on the machine and operating system we are using.

Debuggers

The GNU debugger is useful for tracking down run-time errors. It can also be used in conjunction with Purify to investigate run-time Purify warnings and errors. The GNU debugger is perhaps most commonly used to track down the cause of segmentation faults and bus errors in programs. Follow this procedure:

compile your C/C++ code with the -g option, which will build the symbol table of function names into the binary
run the program that produces the core dump.
in Unix, examine the core dump by typing
gdb -c core progname
in gdb, type
backtrace

The backtrace will list the call stack which is the list of all of the functions (in order) that were called leading up the core dump.

If the faulty program produces a segmentation fault or a bus error but does not produce a file called 'core', then your login scripts have told Unix not to produce a core file. Under Solaris, you can undo this setting by typing unlimit coredumpsize.

Automated Debuggers: Purify

Purify is an automated run-time debugger in that it tracks and reports common programming errors. Specifically, Purify detects

memory leaks
reading or writing beyond the bounds of an array
reading or writing freed memory
freeing memory multiple times
reading and using uninitialized memory
reading or writing through null pointers
overflowing the stack by recursive function calls
reading or writing to/from the first page of memory
free memory mismatch using delete on malloc memory or free on new memory

The GNU make rule to build Purify into your program follows. It will produce a duplicate copy of an executable program program called program.purify:

%.purify: %.o $(PT_DEPEND) $(VERSION)
	$(PURIFY) $(LINKER) $(LINKFLAGS_D) $< $(OBJFILES) $(LIBS) -o $(@F)

Here, you must define what linker you are using. To make sure that you are running GNU make, typing which make should return either /usr/local/bin or /usr/local/gnu/bin. Prof. Craig Chase has developed an alternate makefile for his EE380L course to handle Purify.

For more information about Purify, please see the

Guide for software developers at
http://www.ece.utexas.edu/~bevans/talks/software_development/developer.html.
the man page for purify, and
the Purify Web pages at http://www.pureatria.com/.

Managing Versions of Source Code

Keep tracking of different versions of your code is critical for long-term use and maintainability of your code. Many source code management systems exist on the Unix. These are useful for keeping track of changes made to any text file, e.g. C, Latex, HTML, and make files. The two most common systems are:

Source Code Control System (SCCS): commercial system available on Sun and HP computers
Revision Control Systems (RCS): free system available for all Unix machines from FSF

These two systems have similar functionality, but RCS is more flexible. Yes, this means that the freely distributable tool is better than the commercial tool. This happens often in the Unix world.

To get started, create a directory called SCCS to store the versions of the files (create an RCS directory for RCS). Next, create a text file such as the x.c C source file below. The source file below contains tags that the source code management system will replace with the file name, version number, and last date modified into the file.

/* SCCS Version: %W% %G% */
/* RCS Version: $Id$ */
 
#include <stdio.h>
#define BLOCK_SIZE 64
 
main() {
   char c;
   char* mem;
 
   /* Allocate a block of memory numBytes long */
   mem = malloc(BLOCK_SIZE);
 
   /* Index out of range */
   printf("%c\n", mem[BLOCK_SIZE]);

   /* Reading uninitialized memory */
   printf("%c\n", mem[BLOCK_SIZE - 1]);
 
   exit(0);
}

Source code control systems act like a library. Once you check out a file, no one else can write to it. This allows multiple people to develop the same source code with clobbering each other's changes. Here are several useful commands:

Function	SCCS	RCS
Initialize an entry	sccs create -fi x.c	ci -i x.c
Check out a file for editing	sccs edit x.c	co -l x.c
Check in a file	sccs delget x.c	ci -u x.c
To see what changes have been made	sccs prs -e x.c	rlog x.c
List changes since the last version	sccs diffs x.c	rcsdiff x.c
List all files that are checked out	sccs info	see below
List the files that you have checked out	sccs tell -u	n/a

To list all of the files that are checked out, use sccs info for SCCS, and for RCS, use

rlog -L -R RCS/* | sed s/,v// | sed s+RCS/++

If you are using SCCS and you want to back out of the changes you've made to a file you have checked out, then use

sccs unedit x.c

One advantage to using GNU make in conjunction with source code control systems is that GNU make will automatically check out files under source code control if they are newer that the corresponding files in the current directory. This feature guarantees that your code is always up-to-date. In addition, Emacs works seemlessly with RCS, since both are developed by the Free Software Foundation. You can use a single Meta command in Emacs to check files in and out of RCS.

Making It Portable

Changes in processor architecture, operating system, X windows versions, and compiler used are captured in three different ways:

Include files: defines constants, data structures, and function prototypes
Libraries: contains object code for functions
Makefiles: specifies where to find include files and libraries on a particular system

A makefile can include other makefiles. This allows the programmer to write general-purpose makefiles, and the machine-dependent make definitions are provided by make include files. In the Ptolemy Project, we define a series of make configuration files named config-$PTARCH.mk where PTARCH is the environment variable set to be the name of the computer architecture you are using (e.g. sol2 for Solaris 2.4 machines). A collection of these make include files are available. An alternative is to use the GNU autoconfig utility.

It turns out that developing software using Microsoft tools can easily make the code unportable. As Prof. Michael Ogg points out:

Visual C++, while regarded as something of a standard for the NT world, is having more "vendor-specific" features put into it. e.g. with the evolution from DCOM to COM+ all the COM+ features will be automagically generated by the (Visual C++) compiler. You could argue that this is a "good thing" because it hides the mess from the programmer. I would argue this is a "bad thing" because it makes code less and less portable. For a vendor-neutral (and free) C++, there is always the Win32 port of g++ from Cygnus. Some of the dept's C++ courses have been using g++ on Unix platforms, so this would give a cross-platform commonality.

Prof. Ogg notes similar problems with Microsoft's Java language and development tools.

Management of object code

Put the source code and object code in separate parallel directory trees. There should be one object directory tree per platform on which you will compile. For example, if you have a src/common directory for your source code, then you might have a obj.sol2/common directory to contain the Solaris 2.4 object files. In the object directory, you should a symbolic link to the makefile in the equivalent source directory:

cd obj.sol2/common
ln -s ../../src/common/makefile .

The idea is to separate source code from object code so that you can support multiple platforms. In order to determine the architecture, one can parse the string returned by uname. The uname program is portable among Unix operating systems. For example, the ptarch script uses uname and returns sol2 for Solaris 2.4, sol2.5 for Solaris 2.5, hppa for HP-UX 10, and so forth.

More Information

For more information, see the Guide for software developers at http://www.ece.utexas.edu/~bevans/talks/software_development/developer.html. Also, see Programming hints at http://www.ece.utexas.edu/~bevans/talks/software_development/Programming.html.

Documentation Extraction Tools

Documentation extraction tools will extract comments, function prototypes, and class definitions from code to produce a programmer's reference manual. By extracting the documentation from the code, one guarantees that the documentation is up-to-date. Several documentation extraction tools are available:

Ext generates HTML documentation for C code and packages that follow a specific naming convention for functions
Cce generates HTML documentation for arbitrary C++ code
doc++ creates Latex and HTML documentation from C++ and Java code

These tools have been installed on the Solaris and AIX machines on the Learning Resource Center cluster.

Last Updated 03/09/03.