Loops Top Lists Contents

More on Input

Now that we have learned how to loop, we can perform more sophisticated types of input.

Converting command line arguments en mass

Suppose all the command-line arguments are integers that need to be converted from their string versions stored in sys.argv. We can use a loop and the accumulate pattern to accumulate the converted string elements:

    def convertArgsToNumbers():
        result = []
        # start at 1 to skip over program file name
        for i in range(1,len(sys.argv),1):
            num = int(sys.argv[i])
            result.append(num)
        return result

The accumulator, result, starts out as the empty array. For each element of sys.argv beyond the program file name, we convert it and store the result in num. We then append that number to the growing array.

With a program file named convert.py as follows:

    import sys

    def main():
        ints = convertArgsToNumbers()
        print("original args are",sys.argv[1:])
        print("converted args are",ints)
        return 0

    def convertArgsToNumbers():
        ...

    main()

we get the following behavior:

   $ python convert.py 1 34 -2
   original args are ['1', '34', '-2']
   converted args are [1, 34, -2]

Note the absence of quotation marks in the converted array, signifying that the elements are indeed numbers.

Reading individual items from files

Instead of reading all of the file at once using the read function, we can read it one item at a time. When we read an item at a time, we always follow this pattern:

    open the file
    read the first item
    while the read was good
        process the item
        read the next item
    close the file

In Python, we tell if the read was good by checking the value of the variable that points to the value read. Usually, the empty string is used to indicate the read failed.

Processing files a line at a time

Here is another version of the copyFile function from the chapter on input and output. This version reads and writes one line at a time. In addition, the function returns the number of lines processed:

    def copyFile(inFile,outFile):
        in = open(inFile,"r")
        out = open(outFile,"w")
        count = 0
        line = in.readline()
        while (line != ""):
            out.write(line)
            count += 1
            line = in.readline()
        in.close()
        out.close()
        return count

Notice we used the counting pattern, augmented by printing out the current line every time the count was incremented.

Using a Scanner

A scanner is a reading subsystem that allows you to read whitespace-delimited tokens from a file. To get a scanner for Python, issue this command:

    wget troll.cs.ua.edu/cs150/book/scanner.py

To use a scanner, you will need to import it into your program:

   from scanner import *

Typically, a scanner is used with a loop. Suppose we wish to count the number of short tokens (a token is a series of characters surrounded by empty space) in a file. Let's assume a short token is one whose length is less than or equal to some limit. Here is a loop that does that:

    def countShortTokens(fileName):
        s = Scanner(fileName)              #create the scanner
        count = 0
        token = s.readtoken()              #read the first token
        while (token != ""):               #check if the read was good
            if (len(token) <= SHORT_LIMIT):
                count += 1
            token = s.readtoken()          #read the next token
        s.close()                          #always close the scanner when done
        return count

Note that the use of the scanner follows the standard reading pattern: opening (creating the scanner), making the first read, testing if the read was good, processing the item read (by counting it), reading the next item, and finally closing the file (by closing the scanner) after the loop terminates. Using a scanner always means performing the five steps as given in the comments. This code also incorporates the filtered-counting pattern, as expected.

Reading Tokens into an Array

Note that the countShortTokens function is doing two things, reading the tokens and also counting the number of short tokens. It is said that this function has two concerns, reading and counting. A fundamental principle of Computer Science is separation of concerns. To separate the concerns, we have one function read the tokens, storing them into an array (reading and storing is considered to be a single concern). We then have another function count the tokens. Thus, we will have separated the two concerns into separate functions, each with its own concern. Here is the reading (and storing) function, which implements the accumulation pattern:

    def readTokens(fileName):
        s = Scanner(fileName)              #create the scanner
        items = []
        token = s.readtoken()              #read the first token
        while (token != ""):               #check if the read was good
            items.append(token)            #add the token to the items array
            token = s.readtoken()          #read the next token
        s.close()                          #always close the scanner when done
        return items

Next, we implement the filtered-counting function. Instead of passing the file name, as before, we pass the array of tokens that were read:

    def countShortTokens(items):
        count = 0
        for i in range(0,len(items),1)
            if (len(items[i]) <= SHORT_LIMIT):
                count += 1
        return count

Each function is now simpler than the original function. This makes it easier to fix any errors in a function since you can concentrate on the single concern implemented by that function.

Reading Records into an Array

Often, data in a file is organized as records, where a record is just a collection of consecutive tokens. Each token in a record is known as a field. Suppose every four tokens in a file comprises a record:

    "Amber Smith"       President   32   87000.05
    "Thad Jones"        Assistant   15   99000.42
    "Ellen Thompson"    Hacker       2  147000.99

Typically, we define a function to read one collection of tokens at a time. Here is a function that reads a single record:

    def readRecord(s):                   # we pass the scanner in
        name = s.readstring()
        if (name == ""):
            return ""                    # no record, returning the empty string
        name = name[1:-1]                # strip the quotes from the name
        title = s.readtoken()
        years = s.readint()
        salary = s.readfloat()
        return [name,title,years,salary]

Note that we return either a record as an array or the empty string if no record was read. Since name is a string, years of service is an integer, and salary is a real number, we read them appropriately with the readstring, readinteger, and readfloat methods, respectively.

To total up all the salaries, for example, we can use an accumulation loop (assuming the salary data resides in a file named salaries). We do so by repeatedly calling readrecord:

    def totalPay(fileName):
        s = Scanner(fileName)
        total = 0
        record = readRecord(s)
        while (record != ""):
            total += record[3]
            record = readRecord(s)
        s.close()            
        return total

Note that it is the job of the caller of readRecord to create the scanner, repeatedly send the scanner to readRecord, and close the scanner when done. Also note that we tell if the read was good by checking to see if readRecord return None.

The above function has two stylistic flaws. It uses those magic numbers we read about in the chapter on assignment. It is not clear from the code that the field at index three is the salary. To make the code more readable, we can set up some "constants" in the global scope (so that they will be visible everywhere): The second issue is that that the function has two concerns (reading and accumulating). We will fix the magic number problem first.

    NAME = 0
    TITLE = 1
    SERVICE = 2
    SALARY = 3

Our accumulation loop now becomes:

    total = 0
    record = readRecord(s)
    while (record != ""):
        total += record[SALARY]
        record = readRecord(s)

We can also rewrite our readRecord function so that it only needs to know the number of fields:

    def readRecord(s):                   # we pass the scanner in
        name = s.readstring()
        if (name == ""):
            return ""                    # no record, returning None
        name = name[1:-1]                # strip the quotes from the name
        title = s.readtoken()
        years = s.readint()
        salary = s.readfloat()

        # create an empty record

        result = [0,0,0,0]               

        # fill out the elements

        result[NAME] = name
        result[TITLE] = title
        result[SERVICE] = service
        result[SALARY] = salary

        return result

Even if someone changes the constants to:

    NAME = 3
    TITLE = 2
    SERVICE = 1
    SALARY = 0

The code still works correctly. Now, however, the salary resides at index 0, but the accumulation loop is still accumulating the salary due to its use of the constant to access the salary.

Creating an Array of Records

We can separate the two concerns of the totalPay function by having one function read the records into an array and having another total up the salaries. An array of records is known as a table. Creating the table is just like accumulating the salary, but instead we accumulate the entire record into an array:

    def readTable(fileName):
        s = Scanner(fileName)
        table = []
        record = readRecord(s)
        while (record != ""):
            table.append(record)
            record = readRecord(s)
        s.close()            
        return table

Now the table holds all the records in the file. We must remember to enclose the record in square brackets before we accumulate it into the growing table. The superior student will try this code without the brackets and ascertain the difference.

The accumulation function is straightforward:

    def totalPay(fileName):
        table = readTable(fileName)
        total = 0
        for i in range(0,len(table),1):
            record = table[i]
            total += record[SALARY]
        return total

We can simply this function by removing the temporary variable record:

    def totalPay(fileName):
        table = readTable(fileName)
        total = 0
        for i in range(0,len(table),1):
            total += table[i][SALARY]
        return total

Since a table is just an array, so we can walk it, accumulate items in each record (as we just did with salary), filter it and so on.

Other Scanner Methods

A scanner object has other methods for reading. They are

readline()
read a line from a file, like Python's readline.
readchar()
read the next non-whitespace character
readrawchar()
read the next character, whitespace or no

You can also use a scanner to read from the keyboard. Simply pass an empty string as the file name:

    s = Scanner("")

Finally, you can scan tokens and such from a string by first creating a keyboard scanner, and then setting the input to the string you wish to scan:

    s = Scanner("")
    s.fromstring(str)

lusth@cs.ua.edu


Loops Top Lists Contents