17. Linked lists

On of the more interesting qualities of an object is that an object can contain a reference to another object of the same type. There is a common data structure, the linked list, that takes advantage of this feature.

Linked lists are made up of nodes, where each node contains a pointer to the next node in the list. In addition, each node contains a unit of data we will call the cargo. In our first example, the cargo will be a single integer, but later we will write a generic data structure that can contain objects of any type.

17.1. Revenge of the Node

As usual when we write a new object, we’ll start with the instance variables, one or two constructors and to_str so that we can test the basic mechanisms of creating and displaying the new type.

struct Node {
    int cargo;
    Node* next;

    Node() {
        cargo = 0;
        next = nullptr;
    }

    Node(int cargo) {
        this->cargo = cargo;
        next = nullptr;
    }

    Node(int cargo, Node* next) {
        this->cargo = cargo;
        this->next = next;
    }

    string to_str() const {
        return to_string(cargo);
    }
};

The declarations of the instance variables follow naturally from the specification, and the rest follows machanically from the instance variables. We should also note that we are intentionally using struct instead of class, so that our instance variables are public by default (see What is a class?).

To test the implementation so far, we would put something like this in main:

Node* node1 = new Node(1, NULL);
cout << node1->to_str() << endl;

The result is simply:

1

To make it interesting, we need a list with more than one node!

Node* node1 = new Node(1, NULL);
Node* node2 = new Node(2, NULL);
Node* node3 = new Node(3, NULL);

This code creates three nodes, but we don’t have a list yet because the nodes are not linked. The state diagram looks like this:

Unlinked nodes

To link up the nodes, we have to make the first node refer to the second, and the second node refer to the third.

node1->next = node2;
node2->next = node3;

The next variable of node3 is still NULL, which indicates it is the end of the list. Now the state diagram looks like:

Linked nodes

Now we know how to create nodes and link them into lists. What might be less clear at this point is why. We’ll take that up in the next section.

17.2. Lists as collections

The thing that makes lists useful is that they are a way of assembling multiple objects into a single entity, sometimes called a collection. In this example, the first node of the list serves as a pointer to the entire list.

If we want to pass the list as a parameter, all we have to pass is a pointer to the first node, as in the following function:

void print_list(Node* list) {
    Node* node = list;
    while (node != NULL) {
        cout << node->cargo;
        node = node->next;
        if (node != NULL) cout << ", ";
    }
}

We can now call this function, passing it a pointer to the first node in our list:

print_list(node1);

Inside print_list we have a pointer to the first node of the list, but there is no variable that refers to the other nodes. We have to use the next value from each node to get to the next node.

This diagram shows the value of list and the values that node takes on:

Traversing nodes

This way of moving through a list is called a traversal, just like the similar pattern of moving through the elements of a vector. It is common to use a loop variable like node to point to each of the nodes in the list in succession.

The output of this function is:

1, 2, 3

17.3. Lists and recursion

Recursion and lists go together like fava beans and a nice Chianti. For example here is a recursive algorithm for printing a list backwards:

  1. Separate the list into two pieces: the first node (called the head) and the rest (called the tail).

  2. Print the tail backwards.

  3. Print the head.

Of course, step 2, the recursive call, assumes that we have a way of printing a list backwards. But if we assume that the recursive call works – the leap of faith – then we can convince ourselves that this algorithm works.

All we need is a base case, and a way of proving that for any list we will eventually get to the base case. A natural choice for the base case is a list with a single element, but an even better choice is the empty list, represented by NULL.

void print_backward(Node* list) {
    if (list == NULL) return;

    Node* head = list;
    Node* tail = list->next;

    print_backward(tail);
    if (head->next != NULL) cout << ", ";
    cout << head->cargo;
}

The first line handles the base case by doing nothing. The next two lines split the list into head and tail. The last three lines print the list. Take a minute to think about why we have the if statement that prints ", " before the line that prints cargo. If you understand this, you are well on your way to developing your recursive intution.

We call this function exactly as we did print_list:

print_backward(node1);

The result is a backward list:

3, 2, 1

Can we prove that this function will always terminate? In other words, will it always reach the base case? In fact, the answer is no. There are some lists that will make this function crash.

17.4. Infinite lists

There is nothing to prevent a node from pointing back to an earlier node in the list, including itself. For example, this figure shows a list with two nodes, one of which points to itself.

Infinite list

If we call either print_list with this list as an argument, it will loop forever. If we call print_backward it will recurse infinitely. This sort of behavior makes infinite lists difficult to work with.

Nevertheless, they are sometimes very useful. For example, we might represent a number as a list of digits and use an infinite list to represent a repeating fraction.

Regardless, it is problematic that we cannot prove that print_list and print_backward terminate. The best we can do is the hypothetical statement, “If the list contains no loops, then these functions will terminate.” This is another example of a precondition, to which we were introduced in the Preconditions section of an early chapter. As with the earlier example, it imposes a constraint on one of the parameters and describes the behavior of the function if the constraint is satisfied. We will see more examples of this soon.

17.5. The fundamental ambiguity theorem

There is a part of print_backward that might have raised an eyebrow:

Node* head = list;
Node* tail = list->next;

After the first assignment, head and list have the same type and the same value. So why did we create a new variable?

The reason is that the two variables play different roles. We think of head as a pointer to a single node, and we think of list as a pointer to the whole list. These “roles” are not part of the program; they are in the mind of the programmer.

This ambiguity is useful, but it can make programs with lists difficult to read. It is a good practice to use variable names like node and list to communicate how we intend to use them. Such self-documenting code is easier to read and easier to maintain.

We could have written print_backward without head and tail, but it makes it harder to understand:

void print_backward(Node* list) {
    if (list == NULL) return;

    print_backward(list->next);
    cout << list->cargo;
}

Looking at the two function calls, we have to remember that print_backward treats its argument as a list and cout treats it as a single node.

Always keep in mind the fundamental ambiguity theorem:

A variable that points to a node might treat the node as a single object or as the first in a list of nodes.

17.6. Member functions for nodes

You might have wondered why we didn’t write print_list and print_backward as member functions of Node. We said back in Objects and functions that any free-standing function taking an object as an argument could also be written as a member function of that object. It’s just a question of which form is cleaner.

In this case there is a legitimate reason to choose free-standing functions. It is legal to send a NULL as an argument to a free-standing function, but it is not legal to invoke a member function on a NULL object.

Node* node = NULL;
print_list(node);    // legal
node.print_list();   // WRONG! - Segmentation fault (core dumped)

This limitation makes it awkward to write list-manipulating code in a clean, object-oriented style. A little later we will see a way to get around this.

17.7. Modifying linked lists

One obvious way to modify a linked list is to change the cargo of one of its nodes, but the more interesting operations are the ones that add, remove or reorder the nodes.

As an example, we’ll write a function that removes the second node in the list and returns a pointer to it.

Node* remove_second(Node* list) {
    Node* first = list;
    Node* second = list->next;

    // make the first node point to the third
    first->next = second->next;

    // remove the second node from the list and return a pointer to it
    second->next = NULL;
    return second;
}

We are again using temporary variables to make the code more readable.

Note

Here be dragons!

Care must be taken here not to leak memory, as we discussed in the Dynamic memory allocation and memory leaks section. If the return value from this function is not assigned to a variable by the caller, the memory used for the second node will be leaked.

Adding these lines to main will enable us to test it:

print_list(node1);
Node* second_node = remove_second(node1);
print_list(second_node);
print_list(node1);
cout << endl;

Giving us the output:

1, 2, 3
2
1, 3

Here is a state diagram showing the effect of this operation.

Remove second node from list

What happens if we call this function with a list with only one element (a singleton)? What happens if we pass it an empty list? Should there be a precondition for this function? You’ll be asked to handle these two cases as an exercise.

17.8. Wrappers, helpers, and function pointers

For some operations it is useful to divide the labor into two functions. For example, we can print a linked list forward with print_list, and backward with print_backward, and in each case we may want to “pretty print” the list in the conventional format, enclosed in parenthesis, giving us (1, 2, 3) and (3, 2, 1).

Rather than violating the DRY principle by implementing this pretty printing twice, let’s create a wrapper function named pretty_print that has a list and a function pointer as parameters.

void pretty_print(Node* list, void (*list_printer)(Node*)) {
    cout << "(";
    list_printer(list);
    cout << ")" << endl;
}

The syntax for creating a pointer to a function is:

RETURN_TYPE (*FUNCTION_POINTER_VARIABLE_NAME)(FUNCTION_PARAMETERS)

This wrapper function is an example of what is called a helper function, since breaks down the parts of a computation (in this case, printing parenthesis and printing the list) into separate functions which can be reused.

We can now call pretty_print with the addresses of the functions for printing forward and backward as arguments.

pretty_print(node1, &print_list);
pretty_print(node1, &print_backward);

When we run this we get:

(1, 2, 3)
(3, 2, 1)

17.9. The LinkedList class

There are a number of subtle problems with the way we have been implementing linked lists. The biggest one is the problem we encountered in Member functions for nodes when we considered adding the print functions as member functions to Node.

We can solve this problem by wrapping Nodes in a LinkedList class.

class LinkedList
{
  public:
    int num_nodes;
    Node* head;

    LinkedList() {
        num_nodes = 0;
        head = NULL;
    }
}

The LinkedList class gives us a natural place to put member functions, like the following, that adds a new Node to the front of the list:

void LinkedList::insert_at_front(int cargo) {
    Node* front = new Node(cargo, head);
    head = front;
    num_nodes++;
}

You will be asked to develop this LinkedList class further in the exercises, but first we want to generalize it so that it can be used with different types of data.

17.10. Templates

One drawback of our LinkedList implementation is that the data (stored in the cargo instance variable of Nodes) is limited to a single type, int in our present case, even though it probably occurred to you that we might want linked lists with various types of data.

While we could implement a different linked list class for each data type, this would be tedious, and it would violate the DRY principle we discussed earlier.

Fortunately, C++ provides class templates, which give us a solution to this problem.

Templates provide a way to specify linked lists as a generic data structure, meaning they can hold a variety of data types. Using templates on our Node class looks like this:

template <class T>
class Node
{
  public:
    T cargo;
    Node<T>* next;

    Node(T cargo, Node<T>* next) {
        this->cargo = cargo;
        this->next = next;
    }
};

We begin with a new keyword, template, followed by <class T>, where T represents the datatype that will be used to instantiate the templated class. So our instance variable, cargo is given datatype T, and the next pointer is declared to point to a Node<T>.

We can repeat the process of creating an explicitely linking together three nodes and than printing out their contents, this time with nodes containing string data, in a file named test_templated_nodes.cpp:

#include <iostream>
#include <string>
#include "LinkedList.h"
using namespace std;

int main()
{
    Node<string>* node1 = new Node<string>("I", NULL);
    Node<string>* node2 = new Node<string>("love", NULL);
    Node<string>* node3 = new Node<string>("templates!", NULL);

    node1->next = node2;
    node2->next = node3;

    Node<string>* node = node1;

    while (node != NULL) {
        cout << node->cargo << endl;
        node = node->next;
    }

    return 0;
}

The next step is to use templating for our new LinkedList class.

template <class T>
class LinkedList
{
  public:
    int num_nodes;
    Node<T>* head;

    // constructor
    LinkedList() {
        num_nodes = 0;
        head = NULL;
    }

    // modifiers
    void insert_at_front(T cargo) {
        Node<T>* front = new Node<T>(cargo, head);
        head = front;
        num_nodes++;
    }

    T remove_from_front() {
        if (head == NULL)
            throw runtime_error("Can't remove from empty list!");
        T cargo = head->cargo;
        Node<T>* front = head;
        head = head->next;
        delete front;
        num_nodes--;
        return cargo;
    }
};

The instance variable head now points to the templated node, Node<T>*, as do the nodes we create in insert_at_front and remove_from_front. We are careful to free up the released memory in remove_from_front using the delete operator, as we discussed in Dynamic memory allocation and memory leaks. To do this we need to make use of a temporary variable, front. It points to the old head after we advance head to next, so we can then delete it.

We also have to decide what to do when remove_from_front is called on an empty list. For now, we are using the C++ throw command to report an error. This will cause the program to halt, so client code should take pains to avoid calling remove_from_front on an empty list. We’ll discuss this further in the Exceptions section next chapter.

A more complete listing of code for this class is given in Appendix A: Code Source, including another constructor, and member functions to return its length and render it as a string. You will be asked to add several additional member functions to it in the exercises.

17.11. Invariants

Some linked lists are “well-formed,” others are not. For example, if a list contains a loop, it will cause many of our functions to crash, so we might want to require that lists contain no loops. Another requirement is that the num_nodes value in a LinkedList object should be equal to the actual number of nodes in the list.

Requirements like these are called invariants because, ideally, they should be true of every object all the time. Specifying invariants for objects is a useful programming practice because it makes it easier to prove the correctness of code, check the integrity of data structures, and detect errors.

One thing that is sometimes confusing about invariants is that there are some times when they are violated. For example, in the middle of insert_at_front, after we have added the node, but before we have incremented num_nodes, the invariant is violated. This kind of violation is acceptable; in fact, it is often impossible to modify an object without violating an invariant for at least a little while. Normally the requirement is that every function that violates an invariant must restore the invariant before it finishes.

17.12. Glossary

linked list

A data structure that implements a collection using a sequence of linked nodes.

node

An element of a list, usually implemented as an object that contains a pointer to another object of the same type.

cargo

An item of data contained in a node.

A pointer to a node embedded in a node.

generic data structure

A kind of data structure that can contain data of any type.

17.13. Exercises