Reading the data

This section uses the same data file that we just saw. If you want to see all of the data click on the Show button below. Once it appears, you can hide it again by clicking on the Hide button.

If you clicked on the Show button you can tell that it is a lot of data. But we don’t actually know the number of lines of data in the file by looking at it. Since we don’t know the number of lines here, we won’t use a for loop to process it. We will use a while loop to keep reading the data until there isn’t any more (returns an empty string).

We can process it by opening the file containing the data, then reading each line in. As long as the line is not empty, we keep going doing a split and processing the data. (This data is separated by a : instead of a ,.)

In Python, we must open files before we can use them and close them when we are done with them. Opening a file returns a Python object that has predefined functions and procedures, just like the turtle objects we have seen before. Table 1 shows the functions and procedures that can be used to open and close files.

Method Name Use Explanation
open open(filename,'r') Open a file called filename and use it for reading. This will return a reference to a file object.
open open(filename,'w') Open a file called filename and use it for writing. This will also return a reference to a file object.
close filename.close() Closes the file after you are finished with it.

The code below will print out the pollution information for all the cities that start with "A". It will print the city, state on one line and the two pollution values for that city on another line. It stops looping when there aren’t any more lines to be read (when the returned line is an empty string).

 
1
# open the file for reading
2
in_file = open("uspoll.txt","r")
3
4
# read a line from the file
5
line = in_file.readline()
6
7
# while there is another line
8
while line:
9
10
    # create a list by splitting at the :
11
    values = line.split(":")
12
13
    # get the city from the list
14
    city = values[0]
15
16
    # if the city starts with an A print the info
17
    if (city.find("A") == 0):
18
        print('City: ', city)
19
        print("Pollution values:",values[1],values[2])
20
21
    # read the next line
22
    line = in_file.readline()
23
24
# close the file
25
in_file.close()
26

(printData)

csp-18-3-2: csp-18-3-1: The following program prints the pollution information for all cities that start with a ``D``, but the code is mixed up. Drag the blocks of statements from the left column to the right column and put them in the right order. Then click on Check Me to see if you are right. You will be told if any of the lines are in the wrong order or have the wrong indention.
# read the next lineline = in_file.readline()
# split at the :v = line.split(":")# get the citycity = v[0]
# open the file for readingin_file = open("uspoll.txt","r")# read a line from the fileline = in_file.readline()
# close the filein_file.close()
# while there is another linewhile line:
# if city starts with an D print infoif (city.find("D") == 0):print('City: ', city)print("Pollution values:",v[1],v[2])

There is actually a way of using a for loop to read in a file. We can read the whole thing into one giant list, and then use a for loop to process each line in the list.

19
 
1
# open the file, read the lines into a list, and close the file
2
in_file = open("uspoll.txt","r")
3
lines = in_file.readlines()
4
in_file.close()
5
6
# loop through the lines list
7
for line in lines:
8
9
    # split at :
10
    values = line.split(":")
11
12
    # get the city
13
    city = values[0]
14
15
    # if city starts with A print the info
16
    if (city.find("A") == 0):
17
        print('City: ', city)
18
        print("Pollution values:",values[1],values[2])
19

(printData_withLines)

csp-18-3-3: What is the disadvantage of the second program, with a for loop?




Write code to that will read an input file uspoll.txt . It will print the city name and the pollution values for all cities that have a PM 10 pollution of 20 or more.

3
 
1
2
3

(18_3_2_WSq)

19
 
1
# open the file, read the lines into a list, and close the file
2
in_file = open("uspoll.txt","r")
3
line = in_file.readline()
4
5
# loop through the lines list
6
while line:
7
8
       # split at :
9
       values = line.split(":")
10
11
       # get the PM 10 pollution
12
       pollution = float(values[1])
13
       if (pollution > 19):
14
              print('City: ', values[0])
15
              print("Pollution values:",values[1],values[2])
16
       line = in_file.readline()
17
18
in_file.close()
19

(18_3_2_WSa)

Next Section - What’s the largest pollution values?