CSCI 221 Fall 2020

Lab 05: linked lists


This repo contains the llist library code that we looked at in lecture. Today you’ll be working with me and with a partner on completing the code for four linked list methods, namely, length, lastOf, deleteEnd, and remove.

I’ll put the exercises first just below. Below that, I have documentation about separate compilation, the make facility, and command-line debuggers. These extra materials are the same as what I shared in Lecture 05-1.


Exercises

Modify the source code for the file llist.cc and complete the code for the four linked list functions. These should act as follows:

int length(llist* list): Return the number of nodes in a given linked list. To do this, just initialize a count variable to 0, and travers the list’s nodes with a loop, starting with list->first. For each node you touch during that traversal, increment that count. When the loop is done, return that count.

Look at the code for several of the linked list functions, many of which using some variant of a linked list traversal. You’ll want to do the same, or similar. This code should work for any length list, including lists of length 0 and length 1. You’ll want to test the code in all these cases.

int lastOf(llist* list): Assume that the list it is given is not empty. It should dig through the list (traverse its nodes) looking for the last node in the list. It should then return its data value.

void deleteEnd(llist* list: This removes the last node in the list, should the list not be empty. You’ll probably want to handle two cases here. (1) if the list contains only one item, then you can just deleteFront. (2) Otherwise, we look for the penultimate node and set its next to nullptr. In both cases, we want to give back that last node’s storage to the heap with delete. If you look at my code for deleteFront, you’ll see my scheme for doing that.

We’ll discuss a solution of this together during lab, after you make your own attempt. The solution I use relies on a “follower” pointer. I sometimes call the traversal pointer the “scout” or the “lead”, because it precedes/leads the follower.

void remove(llist* list, int value): Search through the nodes of the given linked list, looking for the first occurrence of a node whose data component matches value. When you find that node, excise it from the list and call delete on its pointer. The code can be written to assume that the list contains value in some node.

For remove, you’ll want to write code that is a hybrid of the function contains, the function deleteFront, and the function deleteEnd. This is because, as you’ll discover you’ll want to handle two cases: the case where value equals the integer stored at list->first->data, and the case where its in one of the subsequent nodes. In the former case, you can just rely on deleteFront. In the case where value sits in a node beyond the first one, you’ll want to traverse down the list with leader and follower pointers, just like deleteEnd does, and look for whether leader->data holds value (somewhat similar to what needs to be done in contains).

But then, unlike any code we’ve written so far, you’ll want to change follower->next so that it skips over the leader node, and links one node beyond. You’ll then want to call delete on the node where value was found.


Supplemental Materials

This folder contains an implementation of a linked list library, as implemented in the two source files llist.hh and llist.cc. Neither of these files have a main in them. Instead, they can be compiled with other program source code that needs to use a linked list to structure its data. (This is often called the client code to the llist library we’ve invented.) This folder also contains a sample client program, the source file test_llist.cc that is simply a “driver” program that can be used to test the linked list code.

Running the code

The code in this repo can be compiled with the unix command line

g++ --std=c++11 -g -o test_llist test_llist.c llist.cc

and then run with the unix command line

./test_llist <values>

where <values> is either nothing, or a space-separated list of integers that get used as the contents of a linked list, first built by the program.

Once running, the program asks for a series of commands, ones that perform a suite of operations on that linked list. For example, here is an interaction made possible by the program:

$ ./test_llist 5 7 3
Your list is [5, 7, 3].
Enter a list command: insert end 42
Done.
Your list is [5, 7, 3, 42].
Enter a list command: delete front
Done.
Your list is [7, 3, 42].
Enter a list command: search 5
Not found.
Your list is [7, 3, 42].
Enter a list command: quit
Bye!
$

The set of commands it accepts is reported by the help command and is the following:

insert front <number>
insert end <number>
delete front
delete end
search <number>
help
quit

Debugging linked list code

There are several tricky things about writing C++ code that operates on with pointers. The most frequent bug in your code that you’ll encounter is a Segmentation fault. This arises when you attempt to access the data or next field of a node*, say for example, with expressions like list->first->data or current->next. In each (see the llist code) the expressions list->first and current are each of type node*, so they could each be pointing to a valid struct of type node. Or it’s possible that each could be a nullptr.

I’ve trained myself, when writing linked list code, to worry quite a bit any time my code accesses one of these fields. When I type ->data or ->next an alarm goes off in my coding mind, asking the question “Could this pointer be null??” I then work hard to either (a) convince myself that the line of code cannot face that situation, that I’ve eliminated the nullptr scenario by handling such cases with conditional statements (or, because the condition on my loop prevents that scenario within its body). Or (b) I realize that a nullptr is actually possible at that moment, and develop more complex code that handles that case.

Nevertheless, I still make mistakes, and so there are a few strategies I use for figuring out the cause of a segmentation fault. One useful strategy is to add output lines before and after lines of code that are suspect. You can, for example, output the value of a pointer using std::cout << just before a line that relies on a pointer being valid. If the output is the nullptr, with hexadecimal address 0x0, then I’ve found my bug.

Alternatively, most of you will have installed C++ and command-line console tools that provide a debugger. On most people’s systems, the debugger tool is called gdb. This is the debugger provided with the Gnu compiler suite (which includes g++). On many Mac OSX systems, the debugger is instead provided as part of the LLVM compiler tools, and the command is named lldb.

I’ll work to show these commands’ use in lecture and in lab, At a minimum, you can use a debugger in a limited way to at least identify the line of code where your segmentation fault is occurring. Below shows an interaction that results from me testing my llist library when it had a bug in my deleteEnd procedure. (I had forgotten to check if the list only had one item in it.)

Here is that interaction:

$ lldb test_llist
(lldb) target create "test_llist"
Current executable set to 'test_llist' (x86_64).
(lldb) run 5 7 3
Process 4586 launched: '/Users/jimfix/git/ReedCS2-S20/Lec03-3-examples/test_llist' (x86_64)
Your list is [5, 7, 3].
Enter a list command: delete end
Done.
Your list is [5, 7].
Enter a list command: delete end
Done.
Your list is [5].
Enter a list command: delete end
Process 4586 stopped
* thread #1: tid = 0x23d84, 
0x0000000100007ac8 test_llist`llist::deleteEnd(list=0x00000001003000b0) + 56 at llist.cc:76,
queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
frame #0: 0x0000000100007ac8
test_llist`llist::deleteEnd(list=0x00000001003000b0) + 56 at llist.cc:76
   73
   74      node* follower = list->first;
   75      node* leader = list->first->next;
-> 76      while (leader->next != nullptr) {
   77        follower = leader;
   78        leader = leader->next;
   79      }
(lldb) print leader
(llist::node *) $0 = 0x0000000000000000
(lldb) quit
Quitting LLDB will kill one or more processes. Do you really want to proceed: [Y/n] Y
$
 

In the above, I loaded the program into the debugger and ran it with run 5 7 3. This is equivalent to just running the program in the console with this command line

$ ./test_llist 5 7 3

except it gets watched over by the debugger. After a few delete end commands, making the list have only one element, my code then crashed in the middle of deleteEnd at line 76 in llist.cc. I next printed the value of the pointer leader and discovered that it was a null pointer.

You can do similar work while developing your solutions. The key thing that we did to enable this debugging was to compile the code with the debug flag -g. This produces a different executable, one that’s filled with extra code information that both gdb and lldb can use to give better feedback on your running program.


Separate compilation: “header files” versus source files

Take a careful look through the starter code in the three files llist.cc, llist.hh, and test_llist.cc. What you are seeing is a more elaborate, and more careful, modularization of a program that interacts with its user, manipulating a linked list data structure. Rather than have one large program source file, I’ve instead broken up the C++ code into the two files llist.cc and test_llist.cc. I’ve also then written a header file llist.hh which defines the linked list data structure and its operations.

This header file is needed for several reasons, but overall because of the way the C++ compiler works. It compiles each .cc file on its own, and then combines that separately compiled code (the combining of the code is called linking) into a single executable. I’ll describe more how the header file is used, but the short of it is that this header file contains information about the llist structs and functions that are needed during the compilation of both llist.cc and test_llist.cc.

The idea here is that the llist.hh and llist.cc files could be combined with any program that needs a linked list, not just the tester code. They are not specially written just for test_llist and can be seen as a library that the tester relies on.

You’ll notice, further, that my library code in llist.cc and llist.hh define a namespace named llist that serves as the prefix for all the type and function names it defines. So that means that the test program defines a variable like so:

llist::llist* theList;

to talk about a pointer to a llist struct named theList. And it calls functions like so

llist::insertAtFront(theList,what);

to invoke the ones named and defined in llist.cc.

The llist::llist and llist::node structs are defined separately in the header file named llist.hh and the test program and the implementation code each need to use those definitions, the files test_llist.cc and llist.cc each have a line on the top reading

#include "llist.hh"

This has the effect of asking the compiler to load in those definitions when it is compiling that C++ source code. That is, when the compiler is looking through the code in llist.cc, it needs to know the definitions that are provided in llist.hh, and so they are #include-ed at the top. And when the compiler is reading the code for test_llist.cc, it needs those definitions from llist.hh as well, and so they get stitched in by the #include directive at the top of its file.

Finally, you’ll notice that the llist.hh file has the three special #ifndef, #define, and #endif directives surrounding the struct and function declarations. This are needed due to some technical idiosyncracies of how C++ compilation works, inherited from C. They are used to make sure that no compilation #includes this information more than once.

We’ll continue to write C++ source code in this modular fashion.

Makefiles

Speaking of separate compilation, maybe now is the best time to introduce a useful tool for performing compilation. I suppose it’s possible, at this point in the course, that you are getting tired of typing lines like

g++ -std=c++11 -g -o pgm src1.cc src2.cc

or else tired of pressing the UP ARROW to find that long command that you’ve been tired of typing. You might also wonder: What if I was part of a much larger project and it had tons of files that needed to be compiled? Your answer might be Oh I bet they use a fancy IDE that does everything for them. and that may even be correct, but that IDE has to be configured for the project, and for many programmers this aspect of project construction remains a mystery. (IDE means Integrated Development Environment, an application that has editing, compiling, debugging, testing, building, etc. wrapped up into one Swiss-army-knife like tool.)

There is a command-line tool that is used by lots of developers, called make, and using it lays bare the things that an IDE has to manage, or how an IDE gets configured. The tool is used, typically, two ways. When it’s set up right, and you are working on a program, you type the command

make

and the program you’re working on gets built. Or, if you are working on a larger project and it has several targets (i.e. several programs or libraries that can be built with the code), you might instead type

make test_llist

to specifically build that target executable.

Now, this may seem like some sort of magic, but there is some method behind it, namely, make relies on a configuration file, called a make file or makefile, that contains a specification of how each target gets built. That file normally just sits in the same folder as the source (though it doesn’t actually have to) and is normally named Makefile (though it doesn’t have to be named that). A makefile is just a text file that gives the set of commands that need to be issued in order to construct the target programs of a project.

For example, for this source we could have the following makefile lines

test_llist: test_llist.cc llist.hh llist.cc
        g++ -g -std=c++11 -o test_llist test_llist.cc llist.cc

These lines describe a target program: test_llist. It sits at the far left of a line, immediately followed by a colon character test_llist:. That is how you tell make that you’re describing the rules for constructing that target file. Then, on the next line (or even lines) you have a tabbed line (a line that starts with an actual '\t' character resulting from pressing the [Tab] key on the keyboard) with the console command that produces that target.

Here is a different example that I could have used instead

test_llist: test_llist.cc llist.cc llist.hh
        g++ -g -std=c++11 -c llist.cc
        g++ -g -std=c++11 -c test_llist.cc
        g++ -g -std=c++11 -o test_llist test_llist.o llist.o
    rm test_llist.o
    rm llist.o

Though we haven’t talked about this much in class, it’s possible to produce intermediate compilation files, called dot oh files or object files, or simply program objects. These are the genuine result of honest-to-goodness “separate compilation” (truth be told: my section above avoided talking about separate compilation directly). You can compile a program’s C++ source file into its own object file like so

g++ -c src.cc

and this will produce its object file src.o. And then, once you’ve got a bunch of object files, you can link them together with the command

g++ -o pgm src1.o src2.o src3.o etc.o

and this will create your executable ./pgm. And then, once that program executable is built, these object files can be deleted (their actual contents live, with all the other object files’ contents, within the contents of pgm). So it’s okay to then type

rm etc.o

and remove those files.

This all, then, is the full explanation for the longer makefile entry I just showed you above. When someone types the command make test_llist, then each of those command lines gets entered in turn. (It’s even a little better than that: if an error happens due to some line’s command failing, the subsequent lines won’t get run.)

I haven’t quite demystified the makefile entries completely. It’s possible to also specify a targets dependencies. These are the files that are needed to build the target, the files that the target depends on. If someone were to change llist.hh, for example, the make system knows that it needs to rebuild test_llist because of that change. So the line

test_llist: test_llist.cc llist.hh llist.cc

tells make that test_llist relies on those three files for it to be built. It also means that, if you change any one of those files, that it needs to be rebuilt (and, of course, the line below tells it how to build it). It knows this by the time stamp of each of the files. If test_llist is older than llist.cc, for example, then typing

make test_llist

will get make re-make that target. If instead the target has the latest time stamp of all those files, then typing make test_llist leads to no action on make’s part:

$ make test_llist
make: 'test_llist' is up to date.

This, finally, is enough set-up to describe the structure of many typical Makefiles. Here is a third way of describing a target, as a series of entries:

test_llist.o: test_llist.cc llist.hh
        g++ -g -std=c++11 -c test_llist.cc
    
llist.o: llist.cc llist.hh
        g++ -g -std=c++11 -c llist.cc

test_llist: test_llist.o llist.o
        g++ -g -std=c++11 -o test_llist test_llist.o llist.o

What do these lines tell make? These say that if a header file, or the specific C++ source file that uses it, changes, then that object file needs to be rebuilt. And it says that, should any of the object files be rebuilt, than the executable that is made up of them needs to be relinked with their latest versions. That way, if we changellist.hh, both.ofiles need to be recompiled. But, if we just changellist.cc, then onlyllist.oneeds to be recompiled. Then, in both scenarios, thetest_llist` executable needs to be re-constructed with any rebuilt objects.

Lastly Now, if you look at the Makefile I’ve actually provided, there are a few other features (and in later projects I will show you even more). It turns out, for example, if you type just make then the tool will scan Makefile from the top and look for the first target entry. That is treated as the default target. So people often put a first entry like

all: test_llist

This will have no command lines underneath, and doesn’t actually describe the name of a target file. Nevertheless, if you type

make all

or just make (with all being the first/default), then the tool will do the work of building those three programs (because after all we told make that the target all depends on them) and looks for their target entries, builds each of them accordingly.

And then, typically, at the bottom of a makefile people include the lines

clean:
       rm *.o *~ test_llist

so that they can type make clean to clear out all the cruft of editing and compiling (the automatically saved ~ files, the object files, the compiled executable) to have a fresh compilation space.

There are probably lots of resources and examples for make and Makefile construction available on-line. The description and examples above are from my own experience, and also from the tutorial by Bruce Maxwell at Colby College.