More on Input Top Recursion Contents


Recall that what we are calling arrays are called lists by the Python community25. Now we will introduce true lists; these lists will be our first attempt at building a data structure. A data structure is simply a bundling of pieces of data together in a single entity along with a set of operations (implemented using functions) to manipulate such entities. Common operations for data structures include putting data into the structure and taking data out. Moreover, the data structure and the functions which implement the operations are constructed so that the operations do not take an inordinate amount of time as more and more data is placed into the structure.

The Node Data Structure

Our lists will use an intermediate data structure known as a node. A node is a simple structure that holds two pieces of data. The first data item is some arbitrary value, while the second is another node. Graphically, nodes are represented as a box with two arrows. The downward pointing arrow typically points to the value while the rightward pointing arrow points to other node.

In actuality, the data items are the addresses in memory where the values and nodes reside; such memory addresses are known as pointers. Thus, the graphical representation of the node is quite accurate, with the arrows "pointing" to the data items. The downward arrow is known as the value pointer and the rightward arrow is known as the next node pointer or simply next pointer.

All data structures are built upon other data structures, so there must be some low-level data structure that is built into the language. Python has two such built-in structures, arrays and objects26. Objects would be a good choice as a structure upon which to build nodes, but you don't know about objects yet, so we will use arrays instead. You will learn more about objects in a later chapter and in subsequent classes.

Each of our nodes will actually be an array of length two. The first slot in the array will hold the value pointer and the second will hold the next pointer. Now that we have our structure, we will need to define a set of operations for nodes. The first one we define is a function to create a node. Such a function is known as a constructor:

    def NodeCreate(value,next):     #this is the constructor
        return [value,next]

Note that the constructor takes two arguments, the value and the next pointer and simply returns an array holding those two values. Next, we define two operations for retrieving the data items stored in the node:

    def NodeValue(n):
        return n[0]

    def NodeNext(n):
        return n[1]

The node from which we wish to extract information is passed as an argument in calls to these two functions. The definitions rely on the fact that the value is stored in the first slot and the next pointer is stored in the second slot of the array that constitutes a node. Typically, functions that retrieve values or other information from a data structure are known as accessors or getters.

Finally, we define operations for changing the data in a node:

    def NodeSetValue(n,value):
        n[0] = value

    def NodeSetNext(n,next):
        n[1] = next

Such functions are typically known as mutators or setters, since the change the data inside the structure. Note that they implement the procedure pattern, as they do not compute any new values.

With these definitions in place, we can now make nodes and change their values. We will use the value None to indicated that the next pointer of a node should not be followed. Such pointers-which-must-not-be-followed are generically known as null pointers.

    a = NodeCreate(3,None)
    b = NodeCreate(4,None)
    print("a's value is",NodeValue(a))
    print("b's value is",NodeValue(b))
    print("a's value now is",NodeValue(a))

Executing this code yields the following output:

    a's value is 3
    b's value is 4
    a's value now is two

Continuing on from the previous bit of code, we can even join nodes together:

    NodeSetNext(a,b)        # link!
    print("b's value, through a is",NodeValue(NodeNext(a)))
    print("a is",a)

The first statement sets the next pointer of a to b; in other words, the null pointer of node a was replaced by a pointer to node b. Afterwards this statement, a and b are said to be linked together, as in a chain.

Indeed, a group of nodes linked together via next pointers is known as a linked list. The next pointer of the last node in a list of nodes is always null. The output of the first print statement conforms to expectation; we see the value of the node a's next pointer points to, namely b, is 4:

    b's value, through a is 4

The output of the second print statement reveals that nodes are indeed arrays:

    a is [2, [4, None]]

In fact, if we replace b's next pointer, currently None, with a new node and then print a again:

    print("a is",a)

we start to see why nodes can be used to form lists:

    a is [2, [4, [6, None]]]

Note that the value 6 can only be reached through a or through b. There is no variable like a or b that points directly to that node.

The Linked-List Data Structure

Now we could stop at this point and just use nodes and their operations to make true lists, but note that the node operators freed us from knowing that nodes were based upon arrays. If you look closely at the above code where we tested our node operations, you will see that there is absolutely no clue that nodes are built from arrays. In fact, we only got an inkling of the internal structure of a node when we printed a node directly using print, which is not a node operator. Likewise, we can build a list data structure based upon nodes and be freed from both knowing lists are built from nodes and that nodes are built from arrays.

The first list operation we will define is the constructor:

    def ListCreate(*args):
        items = None
        for i in range(len(args)-1,-1,-1): # work from right to left
            items = join(args[i],items)
        return items

This function has a bit of Python voodoo in it. The first bit is the asterisk before the single formal parameter, arg. This means that any number of arguments can be passed to ListCreate and all of them will be bundled up into an array that is bound to the variable args. The second bit is the funny looking range(len(args)-1,-1,-1), which produces the indices of all the elements in args in reverse order. This causes the elements of args to be joined up in reverse. The reversal is necessary because the first value that is joined ends up at the back of items27.

When ListCreate is called with no arguments, the constructor returns an empty list:

    >>> ListCreate()

Thus an empty list is represented by None. If we pass a number of arguments, we see output similar to when we joined a bunch of nodes together:

    >>> ListCreate(2,"six",False)
    [2, ['six', [False, None]]]

Therefore, ListCreate can take any number of values to populate a newly created list28.

The last bit of new stuff in the body of the ListCreate function is the reference to a join function:

    def join(value,list):
        return NodeCreate(value,list)

The join function takes a value and a list as arguments and returns a new node that glues the two arguments together. The result is a list one item larger than the list that is passed in.

We can see from the definition of ListCreate that a list either has the value None, if there were no arguments, or the result of join, if there were. Since join returns a node, then a list is actually a node (or None if the list is empty). You may be wondering at this point why we bother to distinguish lists and nodes. One reason is that lists in other programming languages aren't quite so easy to implement as in Python and, in those languages, nodes are quite distinct from lists. Thus, we want you to be ready for those languages. Another reason is it is important to practice the concept of abstraction. Modern software systems exhibit a large number of abstraction layers. With our lists, we can see three layers of abstractions: arrays, which abstract some underlying machine code, nodes, which abstract arrays of length two, and lists, which abstracts a chain of nodes. Each level of abstraction frees us from thinking about the details of the underlying layers.

Let us now look at the join operation more closely. Here is a rewrite that uses a temporary local variable, n, to hold the new node that will join the given value and list together:

    def join(value,list):
        n = NodeCreate(value,None)
        return n

First, let us create a list to which we will join a value:

    a = ListCreate(42)

Graphically, the situation looks like this:

Now, we call join to join the value of 13 to the list a.

    a = join(13,a)

At the point of the #before comment in join, the formal parameters value and list have been set to 13 and a, respectively. The situation looks like this:

The variables list and value, since they are local to the function join, are shown in blue29. After the step n = NodeCreate(value,None), marked with the comment #during, a new node with the given value is created and bound to the variable n. The situation changes to:

The next step sets the next pointer of n to point to the given list, via the statement NodeSetNext(n,list):

Finally, the value of n is returned and a is assigned this new value:

From the drawing, we can see that list a now includes the value 13 at the front of the list, as intended. Of course, at this point the variables local to join, shown in blue, are no longer in scope.

Adding values to the front of a list is known as prepending, so join30 prepends a value to a list. Instead of adding new values to the front, we can also add values to the back of the list. Such a procedure is termed appending, but we will restrict appending to lists proper. In other words, we will only append a list to an existing list:

    def append(listA,listB): #listA length must be > 0
        node = listA
        while (NodeNext(node) != None):
            node = NodeNext(node)

Here, we set the variable node to the beginning of listA. Then, as long as node's next pointer points to a node and not None31, we set node to the next node in the list. We repeat this process until we reach the last node in the list, whose next pointer does indeed point to None. At this point, we drop out of the loop and set the next pointer of this last node to listB32. Generally, you can quickly tell if a function is destructive or non-destructive by looking at the return value. If the function implements the procedure pattern, it is likely destructive. If it implements the function pattern, it is likely non-destructive33.

Here is an example:

    >>> a = ListCreate(2,4,6)
    >>> b = ListCreate(1,3,5)
    >>> append(a,b)
    [2, [4, [6, [1, [3, [5, None]]]]]]

    >>> NodeNext(a)
    [4, [6, [1, [3, [5, None]]]]]

If we wish to append a single value to a list, we must turn the single value into a list first:

    >>> a = ListCreate(2,4,6)

    >>> append(a,ListCreate(8))
    [2, [4, [6, [8, None]]]]

The process of moving from one node to the next in a list is known as walking the list. Obviously, the longer the list, the longer it takes to walk it. Thus, appending to a list can take much longer than prepending to the same list, since prepending takes the same amount of time regardless of how long the list is. However, as you will learn in your data structures class, there is a much faster way to append to a list. If you cannot wait to find out this clever implementation of append, search for the phrase "linked list tail pointer" on the interwebs.

Once we are able to insert values into a list, we now need functions to look at the values in the list. Two common operations of a list are head, which retrieves the first item in the list, and tail, which returns a the list made up of the all the nodes in the original list except the first node:

    def head(items):
        return NodeValue(items)

    def tail(items):
        return NodeNext(items)

As you can see, these functions are wrappers to the underlying node functions, but are used frequently because they are smaller names with generally universally understood semantics. Often, wrappers to the functions that change node values and pointers are defined as well:

    def setHead(items,value):

    def setTail(items,next):

Here is an example:

    >>> a = ListCreate(1,8,7,2)

    >>> a
    [1, [8, [7, [2, None]]]]
    >>> head(a)
    >>> tail(a)
    [8, [7, [2, None]]]
    >>> head(a)
    >>> head(tail(a))
    >>> setHead(tail(a),0)
    >>> a
    [1, [0, [7, [2, None]]]]

Note that taking the tail of a list does not alter the original list in any way34.

Finally, the last two operations we will discuss retrieve and set the values of nodes anywhere in the list. To specify which node is of interest, the caller of these functions will supply an index. This index will function exactly like the zero-based index of arrays; index zero will refer to the first value, index one will refer to the second value, and so on. The function ListIndex gets a value at the given index and ListSetIndex replaces the old value at the given index with the given value. Both of these functions will use a helper function named ListIndexNode, which will return the node at a given index35:

    def ListIndexNode(items,index):
        node = items
        for i in range(0,index,1):
            node = tail(node)
        return node

    def ListIndex(items,index):
        node = ListIndexNode(items,index)
        return head(node)

    def ListSetIndex(items,index,value):
        node = ListIndexNode(items,index)

Note that the ListIndexNode function walks the list for ListIndex and ListSetIndex. Note further that this implementation does no error checking. What happens if the index is greater than or equal to the number of nodes in the list?

Rather than have you type all this code in, the above node and list operations are bundled into a single module, named You can get the module with this command:


on Linux and this command:

    curl -O

on a Mac. For Windows, browse to:

and save the page.

These versions of generates better error messages for ListIndex and ListSetIndex.

More on Walking lists

This idea of walking lists is an extremely important concept; so important that we will review the two "walks" that are commonly needed when manipulating lists. The first is walking to a certain location in list. Usually, one walks to index (covered previously) or to a value. Of course, walking to a value is simply the search pattern, implemented for lists. Here is a search function that works on lists:

    def find(value,items):
        spot = items;
        while (spot != None and head(spot) != value):
            spot = tail(spot)
        return spot

Note that this version returns the node containing the value if the value is found and None otherwise. Because of the possibility that a value may not be in the list, we must add the condition spot != None to the while loop test. If we didn't, then spot would eventually reach None and the head operation on None would generate an error. Sometimes, find would be written with a simpler loop test, but complicated by a return from the body of the loop:

    def find(value,items):
        spot = items;
        while (spot != None):
            if (head(spot) == value): return spot
            spot = tail(spot)
        return None

These two implementations are semantically equivalent; the former is usually preferred stylistically, but the latter is likely more common.

The second walk one is likely to encounter is a walk to the next-to-the-last node in a list, which is useful in many situations: removing the last item, adding a value to the end of a list quickly (again, search for "linked list tail pointer" for details), and so on.

One must walk the list to get to the penultimate node. Here is one attempt; it keeps two pointers when walking the list, a leading pointer and a trailing pointer that stays one node behind the leader. The walk ends when the next pointer of the leading node becomes null. We call this pattern the trailing value pattern:

    def getPenultimateNode(items):
        trailing = None;
        leading = items;
        while (tail(leading) != None): #when to stop walking
            trailing = leading 
            leading = tail(leading)
        return trailing

If we walked until the lead node became null, then the trailing node would be the last node in the list, not the next-to-the-last node. However, checking the next pointer of a node is a dangerous proposition. What happens in the case of an empty list? How many nodes must be in a list for this function to work properly? Highlight the following line to see the answers:

[ When passed an empty list, the tail operation in the loop test will generate an error. Therefore, there must be at least one node in the list.]

The above approach can be simplified by checking if the trailing pointer's next pointer points to a node whose next pointer is null. Although the check is a bit more complicated, it obviates the need for the leading node:

    def getPenultimateNode(items):
        trailing = items;
        while (tail(tail(trailing)) != None):
            trailing = tail(trailing)
        return trailing

The two versions of getPenultimateNode are similar, but not exactly equivalent. How many nodes must be in a list for the second version to work properly? Highlight the following line to see the answers:

[ There must be at least two nodes in the list.]

A Walking Application

An application of the trailing value pattern is to insert a value into an ordered list so that the list remains ordered36. For example, if the ordered list is:

           ^  ^

and we wish to insert 10 into the list, we must place the 10 between the 8 and the 13 so that the list values remain in increasing order (from left to right). The ^ marks show where the trailing and leading pointers should end up when the walk is finished. If the 10 is inserted between the trailing and leading pointers, we end up with:


as desired. Using getPenultimateNode as a starting point, we have:

    def insertInOrder(value,items):
        trailing = None;
        leading = items;
        while (...): #when to stop walking
            trailing = leading 
            leading = tail(leading)
        # insert new value in between trailing and leading
        return "done"

The ellipses mark where changes to the trailing value pattern need to be made. The first ellipsis (the while test) should force the walk to stop when the correct insertion point is found. Clearly, that is when leading value is greater than the value to be inserted. The second ellipsis concerns how the actual insertion is to be performed. We know we will need to:

All that's left then is to fill in the blanks. Here is the result:

    def insertInOrder(value,items):
        trailing = None;
        leading = items;
        while (head(leading) < value): #when to stop walking
            trailing = leading 
            leading = tail(leading)
        # insert new value in between trailing and leading
        n = NodeCreate(value,None)
        return "done"

Note that we wanted to stop when the leading value is greater than the value to be inserted. Since we are working with a while loop, we need to reverse the logic so that the walk continues as long as the leading value is less than the new value37.

Testing our code with the above example yields:

    >>> a = ListCreate(1,3,8,13,14,17)
    >>> insertInOrder(10,a)
    >>> a

It appears our code works correctly! Very exciting! Unfortunately, the excitement of getting code to work seduces both novice and experienced Computer Scientists alike into thinking the task is finished. Many times, including this case in particular, initial success only means the basic idea is correct, not that the code is correct, for all possible inputs. Indeed, for our implementation of insertInOrder, there are edge cases for which this code fails. An edge case is a set of inputs that force the code to do as much (or as little) work as possible. For insertInOrder, what inputs causes the function to do as much work as possible? The only place where a variable amount of work is performed is the loop that walks the list. In order to make the loop go as many times as possible, it is clear that we would need to insert a value at the very end of the list. Doing so yields:

    >>> a = ListCreate(1,3,8,13,14,17)
    >>> insertInOrder(20,a)
    Traceback (most recent call last):
      File "", line 15, in <module>
        print("inserting 18:",insertInOrder(20,a))
      File "", line 6, in insertInOrder
        while (head(leading) < value): #when to stop walking
      File "/home/lusth/l1/book/", line 34, in head
        return NodeValue(items)
      File "/home/lusth/l1/book/", line 12, in NodeValue
        def NodeValue(n): return n[0]
    TypeError: 'NoneType' object is not subscriptable

Likewise, making the loop do as little work as possible yields errors in the code. What inputs would cause the loop run very little or not at all? Obviously, if the value to be inserted is smaller than any value in the list, the loop does not need to be run. Less obviously, what if the given list is empty? Fixing insertInOrder to handle these edge cases is left as an exercise.

Processing Lists versus Arrays

Lists can be processed with loops, much like arrays. In fact, there is a direct translation from an array loop to a list loop. Suppose you have the following array loop:

    for i in range(0,len(items),1):     # items is an array
        # loop body here

The corresponding list loop would be:

    spot = items                        # items is an list
    while (spot != None):
        # loop body here
        spot = tail(spot)

In addition, the empty array [] is replaced by the empty list None and the array reference items[i] is replaced by head(items). Finally, references to the append method of arrays is replaced by calls to the join function. For example:


is replaced by:

    result = join(head(spot),results)

Let's look a some example translations. The counting pattern is very important for lists, since there is no built-in function like len for determining the length of a list. The array version of this pattern is:

    count = 0
    for i in range(0,len(items),1):
         count += 1

Translating the code to work with arrays using the above substitutions yields:

    count = 0
    spot = items                        #typical list walking initialization
    while (spot != None):               #typical list walking condition
        count += 1
        spot = tail(spot)               #typical list walking update

As with arrays, each "step" taken results in a counter being incremented. For another example, let's look at a filtered accumulation. First, the array pattern:

    def sumEvens(items):
        total = 0
        for i in range(0,len(items),1):
            if (items[i] % 2 == 0):
                total += items[i]
        return total

Now the list translation:

    def sumEvens(items):
        total = 0
        spot = items
        while (spot != None):
            if (head(spot) % 2 == 0):
                total += head(spot)
            spot = tail(spot)
        return total

Finally, let's look at the extreme index pattern, since it adds a slight wrinkle. Here is the array version:

    ilargest = 0
    for i in range(1,len(items),1):
        if (items[i] > items[ilargest]):
            ilargest = i

Note that we have no instructions for translating items[ilargest]. We can, however, save both the largest value seen so far as well as the current index:

    index = 0
    spot = items
    largest = head(spot)
    ilargest = 0
    while (spot != None):
        if (head(spot) > largest):
            ilargest = index
            largest = head(spot)
        index += 1
        spot = tail(spot)

In general, any processing that requires finding an index will be more complicated with a list.

The filter and map patterns

Filtering and mapping lists with loops is problematic, since using join instead of append ends up reversing the elements in the resulting list. This is because join puts an element at the beginning and thus the last element joined ends up at the front of the list and the next-to-the-last element ends up in the second position, and so on. Still, if the order of the result doesn't matter, then lists and loops go well together for this kind of processing. Here is an array loop that filters out the odd elements, leaving the even ones in the result:

    def extractEvens(items):
        evens = []
        for i in range(0,len(items),1):
            if (isEven(items[i])):
        return evens

Translating this to list notation yields:

    def extractEvens(items):
        evens = None
        spot = items
        while (spot != None):
            if (isEven(head(spot))):
                evens = join(head(spot),evens)
            spot = tail(spot)
        return evens

Because of the reversal:




The shuffle and merge patterns

As with arrays, shuffling and merging lists using loops is rather complicated. So much so that we will not even bother discussing loop-with-list versions. However, using recursion with lists greatly simplifies this type of processing, as we shall see in the next chapter.

The fossilized pattern

The primary reason for accidentally implementing the fossilized pattern is to forget the while loop update. For example, while:

    spot = items
    while (spot != None):
        # loop body here
        spot = tail(spot)

may have been intended, what gets coded is:

    spot = items
    while (spot != None):
        # loop body here

Since spot is never updated, it never becomes None (unless, of course, it was None to begin with). This leads to an infinite loop.

The wrong-spot pattern

The wrong-spot pattern is another common error. Here instead of referring to spot, one mistakenly refers to items instead. An example is this attempt at finding the smallest value in a list:

    smallest = head(items)
    spot = items
    while (spot != None):
        if (head(spot) < smallest):
            smallest = head(items)  # ERROR: should use spot here
        spot = tail(spot)

This loop never sets smallest to anything except the first value in items.

Why Lists?

You may be asking, why go through all this trouble to implement lists when arrays seem to do every thing lists can do. The reason comes down to the fact that no data structure does all things well. For example, a data structure may be very fast when putting values in, but very slow in taking values out. Another kind of data structure may be very slow putting values in but very fast taking values out. This is why there are quite a number of different data structures, each with their own strengths.

In the case of lists and arrays, adding an item to a list can be much quicker than adding an item to an array. Consider four scenarios:

  1. prepending a value to a list containing one item
  2. prepending a value to a list containing one million items
  3. prepending a value to an array containing one item
  4. prepending a value to an array containing one million items

Scenarios 1, 2, and 3 all take about the same amount of time, whereas scenario 4 takes about a million times longer than the other scenarios. This is because when a value is joined to an array, a new array is created and all the elements in the original array are copied into the new array. If the old array has one value, then one value is copied. If the old array has a million values, then a million values are copied.

Another reason is that, in other programming languages besides Python, arrays have a fixed size and you cannot put more values in the array than there is room. A list, on the other hand, can grow without bound, subject to the memory available to the program. That said, sometimes arrays are the better data structure to use. Therefore, the List module contains two functions for converting between arrays and lists:

    ArrayToList(items)  # where items is an array; a list is returned
    ListToArray(items)  # where items is a list; an array is returned

Use these functions when the data is in one form, but you need it in another.

We will use lists heavily in the next chapter on recursion.


  1. Assume a list holds the values 3, 8 and 10. Draw a box and pointer diagram of this list.
  2. Suppose the loop in ListCreate is replaced with:
        for i in args: #an alternate loop
            items = join(i,items)

    Explain what happens and why when more than one argument is passed to ListCreate.

  3. Assume a list holds the values 3 and 8. Consider appending the value 5. Draw a set of box and pointer diagrams, similar to those used in the discussion of join, that illustrate the action of append. You should have a drawing of the situation before the while loop begins, one for each update of the variable node while the loop is running, one after the new node is created, and one showing the list just before the function returns. Label your drawings appropriately.
  4. Define a function named appendValue, which takes a list and a value and appends the value to the list. It should do so by setting the next pointer of the last node in the first list to a new node containing the given value. Your function should look very similar to append.
  5. The module exhibits an important principle in Computer Science, abstraction. The list operations treat nodes as abstract objects; that is to say, lists do not know nor care about how nodes are constructed. Verify this is true by modifying the node operations so that value pointer is stored in the second slot of the underlying two-element array and the next pointer is stored in the first slot. Now test the list operations to see that they work exactly as before.
  6. The list operation ListIndexNode exhibits another important concept in Computer Science, generalization. Since both ListIndex and ListSetIndex need to walk the list, that list-waking code was generalized into the function ListIndexNode. Modify the module so that ListIndex and ListSetIndex each incorporate the code of ListNodeIndex. Then comment out ListIndexNode. Both operations should function as before, but you should notice how similar they look. When you see such similarity, the similar code is a candidate for generalization.
  7. Define a function named length, which takes a list as an argument and returns the number of items in the list.
  8. Define a function named reverse, which takes a list as an argument and returns a new list with the same items as the given list, but in the opposite order. For example (reverse (ListCreate 1 2 3 4)) should return the list [4, [3, [2, [1, None]]]].
  9. Define a function named chop, which takes a list as an argument and destructively removes the last element of a list. Your function should look similar to getPenultimateNode.
  10. Define a function named equals, which takes two lists as arguments and returns whether or not the two lists have the same elements in the same order.
  11. The linked lists developed in this chapter are known as singly-linked lists, a class of lists with a single pointer links a node to the next node in the list. Another class of lists, doubly-linked lists, provide an additional pointer from a node to its predecessor. Define a constructor for nodes upon which doubly-linked lists could be based.
  12. Fix the given version of insertInOrder so that it correctly handles edge cases.
  13. Consider the version of insertInOrder given in this chapter. If a value to be inserted matches a value already in the list, does the new value end up before or after the old value? How would you change the code to get the opposite behavior?


For Linux and Mac users, the code from this chapter can be found here.

For Windows users, look here (you will need to save the file as

More on Input Top Recursion Contents