7. Strings and things

7.1. Containers for strings

We have seen five types of values - booleans, characters, integers, floating-point numbers, and strings - but only four types of variables - bool, char, int, and double. So far we have no way to store a string in a variable or perform operations on strings.

There are actually two standard ways to store string values in C++. One is a part of the language that has been around since the early days of C++’s predecessor language, C, and is sometimes called “a native C string.” The syntax for C strings is a bit ugly, and using them requires some concepts we have not introduced yet, so for the most part we are going to avoid them.

We are going to use the more modern C++ string type that is part of C++’s standard library. This string type also requires concepts we have not introduced yet, and we will begin to introduce them now using strings as our first example.

The strings we will use are objects. An object is a data type that can have both values, which are equivalent to the variables we have already seen, and member functions, which are basically functions built-in to the object.

To use these string objects, we need to include the required header file:

#include <string>

Here is a full program using strings:

#include <iostream>
#include <string>
using namespace std;

int main()
{
    string str1;
    str1  = "Hello, ";
    string str2 = "strings!";
    cout << str1 << str2 << endl;
    return 0;
}

The first line in the body of main creates a string without giving it a value. The second line assigns it the string value "Hello". The third line initializes a new string variable, str2 to the value "strings!".

We can output strings in the usual way, as we do here in the fourth line of main.

7.2. Extracting characters from a string

Strings are called “strings” because they are made up of a sequence, or string, of characters. The first operation we are going to perform on a string is to extract one of the characters. C++ uses square brackets ([ and ]) for this operation:

string fruit = "banana";
char letter = fruit[1];
cout << letter << endl;

The expression fruit[1] indicates that we want character number 1 from the string named fruit. The result is stored in a char variable named letter. We we output the value of letter, we get a surprise:

a

a is not the first letter of "banana", unless you are a computer scientist. For perverse reasons, computer scientists always start counting from zero. The 0th letter (“zeroth”) of "banana" is b. The 1th letter (“oneth”) is a and the 2th (“twoeth”) letter is n.

If you want the zeroth letter of a string, you have to put a zero in the square brackets:

char letter = fruit[0];

7.3. Length

To find the length (number of charaters) of a string, we can use the length member function. The syntax for calling a member function is different from what we’ve seen before:

int size;
size = fruit.length();

To describe this function call, we say we are invoking the length function on the string named fruit. The length function of strings returns the number of characters in the string (it’s length), which is here assigned to the int variable, size.

This vocabulary may seem strange, but we will see many more examples where we invoke a function on an object. The syntax for function invokation is called dot notation, because the dot (period) separates the name of the object fruit, from the name of the function, length.

length takes no arguments, as indicated by the empty parenthese (), and it returns an integer equal to the number of characters in the string, 6, in this case.

To find the last letter of a string, you might be tempted to try something like this:

int size = fruit.length();
char last = fruit[size];    // WRONG!

That won’t work. The reason is that there is no letter in “banana” with index 6. The indexes of the letters in the string “banana” look like this:

banana string diagram

Since we started counting at 0, the 6 letters are numbered from 0 to 5. To get the last character, you have to subtract 1 from size.

int size = fruit.length();
char last = fruit[size-1];

7.4. Traversal

A common thing to do with a string is start at the beginning, select each character in turn, do something to it, and continue until the end. This pattern of processing is called a traversal. A natural way to encode a traversal is with a while statement:

int index = 0;
while (index < fruit.length()) {
    char letter = fruit[index]
    cout << letter << endl;
    index++;
}

This loop traverses the string and outputs each letter on a line by itself. Notice that the condition is index < fruit.length(), which means that when index is equal to the length of the string, the condition is false and the body of the loop is not executed. The last character we access is the one with the index fruit.length()-1.

The name of the loop variable is index. An index is a variable or value used to specify one member of an ordered set, in this case the set of characters in the string. The index indicates (hence the name) which one you want. The set has to be ordered so that each letter has an index and each index refers to a single character.

7.5. The find function

The string class provides several other member functions you can invoke on strings. The find function is the opposite of the [] operator. Instead of taking an index and extracting the character at that index, find takes a character and finds the index where that character appears.

string fruit = "banana";
int index = fruit.find('a');
cout << "Index of a in banana: " << index << endl;

This example finds the index of the letter 'a' in the string. In this case, the letter appears three times, so it is not obvious what find should do. According to the documentation it returns the index of the first appearance, so the result is 1. If the given letter does not appear in the string, find returns -1.

int index = fruit.find('x');
cout << "Index of x in banana: " << index << endl;

In addition, there is a version of find that takes another string as an argument and finds the index where the substring appears in the string.

int index = fruit.find("nana");
cout << "Index of nana in banana: " << index << endl;

This example returns the value 2.

You should remember from the Overloading section that there can be more than one function with the same name, as long as they take a unique sequence of the number and types of their parameters. In this case, C++ knows which version of find to invoke by looking at the type of the argument we provide.

If we are looking for a letter in a string, we may not want to start at the beginning of the string. There is another version of find that takes an additional argument, the index where we should start looking.

index = fruit.find('a', 2);
cout << "Index of a in banana starting at 2: " << index << endl;

We will get 3 for this one, since the first occurance of 'a' in "banana" starting at index 2 is the one at index 3.

7.6. Our own find function

You are encouraged to make use of functions from the standard library whenever they are available. That said, as a learner of C++ it is a useful exercise to write some of these functions yourself. So let’s write our own version of find, starting with a first draft.

int find(string s, char ch) {
    for (int i = 0; i < s.length(); i++) {
        if (s[i] == ch)
            return i;
    }
    return -1;
}

This version works as expected, returning the index of the first occurance of ch in s, or -1 if ch is not there.

Now we want to add the additional option of beginning the search at a specified position in the string. C++ supports two different ways we can do this. The first is to overload find with a new version that has an additional argument.

int find(string s, char ch, int start) {
    for (int i = start; i < s.length(); i++) {
        if (s[i] == ch)
            return i;
    }
    return -1;
}

This will work, but it seems to violate one of the principles of good software design, the don’t repeat yourself (DRY) principle.

Rather than implementing two almost identical find functions, modern C++ offers us default arguments.

int find(string s, char ch, int start = 0) {
    for (int i = start; i < s.length(); i++) {
        if (s[i] == ch)
            return i;
    }
    return -1;
}

We now need only one version of find, which can accept either two or three arguments. When the function is called with only two arguments, the third, start, is given the default value of 0.

Only the last parameters in a function can have default arguments. You will explore this further in the exercises.

7.7. Looping and counting

The following program counts the number of times the letter 'i' appears in a string:

string state = "Mississippi";
int count = 0;
int index = 0;

while (index < state.length()) {
    if (state[index] == 'i') {
        count = count + 1;
    }
    index++;
}
cout << count << endl;

This program demonstrates a common idiom, called a counter. The variable count is initialized to zero and then incremented each time we find an 'i'. (To increment is to increase by one; it is the opposite of decrement, and unrelated to excrement, which is a noun.) When we exit the loop, count contains the result: the total number of i’s.

7.8. String concatenation

Interestingly, the + operator can be used on strings; it performs string concatenation. To concatenate means to join the two operands end to end. For example:

string fruit = "banana";
string baked_good = " nut bread";
string dessert = fruit + baked_good;
cout << dessert << endl;

The output of this program is banana nut bread.

Unfortunately, the + operator does not work on native C strings, so you cannot write something like

string desert = "banana" + " nut bread";   // WRONG!

because both operands are C strings. As long as one of the operands is an instance of the string class, though, C++ will automatically convert the other.

It is also possible to concatenate a character ont the beginning or end of a string. In the following example, we will use concatenation and character arithmetic to output an abecedarian series.

“Abecedarian” refers to a series of list in which the elements appear in alphabetical order. For example, in Robert McCloskey’s book *Make Way for Ducklings*, the names of the duckings are Jack, Kack, Lack, Mack, Nack, Ouack, Pack, and Quack. Here is a loop that outputs these name in order:

string suffix = "ack";

char letter = 'J';
while (letter < 'Q') {
    cout << letter + suffix << endl;
    letter++;
}

The output of this program is:

Jack
Kack
Lack
Mack
Nack
Oack
Pack
Qack

Of course, that’s not quit right because we’ve misspelled “Ouack” and “Quack”. We’ll let you fix that as an exercise.

Again, be careful to use string concatenation only with string objects and not native C strings. Unfortunately, an expression like letter + "ack" is syntactically legal in C++, although it produces a strange result.

7.9. strings are mutable

You can change the letters in a string one at a time using the [] operator on the left side of an assignment. For example,

string greeting = "Hello, world!";
greeting[0] = 'J';
cout << greeting << endl;

produces the output, Jello, world!.

7.10. Getting user input

The programs we have written so far are pretty predictable; they do the same thing every time they run. Most of the time, though, we want programs that take input from the user and respond accordingly.

There are many ways to get input, including keyboard input, mouse movements and button clicks, as well as more exotic mechanisms like voice control and retinal scanning. In this text we will consider only keyboard input.

In the header file iostream.h, C++ defines an object named cin that handles input in much the same way that cout handles output. To get an integer value from the user:

int x;
cin >> x;

The >> operator causes the program to stop executing and wait for the user to type something. If the user types a valid sequence of digits, the program converts it into an integer value and stores it in x.

If the user types something other than an integer, C++ doesn’t report an error, or anything sensible like that. Instead, it puts some meaningless value in x and continues.

Fortunately, there is a way to check and see if and input statement succeeds. We can invoke the good function on cin to check what is called the stream state. good returns a bool: if true, then the last input statement succeeded. If not, we know that some previous operation failed, and also that the next operation will fail.

Note

There is a technical term for the “meaningless value” in a computer program. It is called garbage, and there is a well known expression, garbage in, garbage out, that you should remember.

To use good in our programs, we might write something like this:

#include <iostream>
#include <string>
using namespace std;

int main()
{
    int x;

    // prompt the user for input
    cout << "Enter an integer: ";

    // get input
    cin >> x;

    // check and see if the input statement succeeded
    if (cin.good() == false) {
        cout << "That was not an integer." << endl;
        return -1;
    }

    // print the value we got from the user
    cout << x << endl;
    return 0;
}

cin can also be used to input a string:

string name;

cout << "What is your name? ";
cin >> name;
cout << name << endl;

Unfortunately, this statement only takes the first word of input, and leaves the rest for the next input statement. So, if you run this program and type your full name, it will only output your first name.

Because of these problems (inability to handle errors and funny behavior), we will avoid using the >> operator altogether, unless we are reading data from a source that is known to be error-free.

Instead, we will use a function in the string class called getline.

string name;

cout << "Enter your full name: ";
getline(cin, name);
cout << "Your full name is: " << name << endl;

The first argument to getline is cin, which is where the input is coming from. The second argument is the name of the string where you want the result to be stored.

getline reads the entire line until the user hits Return (Enter). This is useful for inputting strings that contain spaces.

In fact, getline is generally useful for getting input of any kind. For example, if you wanted to the user to type an integer, you could input a string and then check to see if it is a valid integer (it contains only decimal digit characters). If so, you can convert it to an integer value. If not, you can print an error message and ask the user to try again. We will see how to convert a string of digits to an int later.

7.11. strings are comparable

All comparison operators that work on ints and doubles also work on strings. For example, if you want to know if two strings are equal:

if (word == "banana") {
    cout << "Yes, we have no bananas!" << endl;
}

The other comparision operations are useful for putting words in alphabetical order.

#include <iostream>
#include <string>
using namespace std;

int main()
{
    string word;

    cout << "Enter your word: ";
    getline(cin, word);

    if (word < "banana") {
        cout << "Your word, " << word << ", comes before banana." << endl;
    } else if (word > "banana") {
        cout << "Your word, " << word << ", comes after banana." << endl;
    } else {
        cout << "Yes, we have no bananas!" << endl;
    }

    return 0;
}

You should be aware, though, that the string class does not handle upper and lower case letters the same way that people do. All the upper case letters come before all the lower case letters. As a result,:

Your word, Zebra, comes before banana.

A common way to address this problem is to convert strings to a standard format, like all lower case, before performing the comparison. The next section explains how. We will not address the more difficult problem, which is making the program realize that zebras are not fruit.

7.12. Character classification

It is often useful to examine a character and test whether it is upper or lower case, or whether it is a character or a digit. C++ provides a library of functions that perform this kind of character classification. In order to use these functions, you have to include the header file cctype.

char letter = 'a';
if (isalpha(letter)) {
    cout << "The character " << letter << " is a letter." << endl;
    letter = toupper(letter);
    cout << "In upper case it is: " << letter << "." << endl;
}

You might expect the return value from isalpha to be a bool, but for reasons we don’t even want to think about, it is actually an integer that is 0 if the argument is not a letter, and some non-zero value if it is.

This oddity is not as inconvenient as it seems, because it is legal to use this kind of integer in a conditional, as shown in the example. The value 0 is treated as false, and all non-zero values are treated as true.

Technically, this sort of thing should not be allowed - integers are not the same thing as boolean valudes. Nevertheless, the C++ habit of converting automatically between types can be useful.

Other character classification functions include isdigit, which identifies the digit characters 0 through 9, and isspace, which identifies all kinds of whitespace, including spaces, tabs, newlines, and a few others. There is also isupper and islower, which distinguish upper and lower case letters.

Finally, the last two we will discuss, toupper and tolower, convert letters from one case to the other. Both take a single character as an argument and return a (possibly converted) character.

char letter = 'a';
letter = toupper(letter);
cout << letter << endl;

The output of this code is A.

7.13. Glossary

concatenate

To join two operands end-to-end.

counter

A variable used to count something, usually initialized to zero and then incremented.

decrement

Decrease the value of a variable by one. The decrement operator in C++ is --.

dot notation

The syntax for invoking (calling) a member function on an object. It consists of the name of the object, followed by a dot (.), followed by the name of the function, followed by a (possibly empty) sequence of arguments enclosed in parathesis.

increment

Increase the value of a variable by one. The increment operator in C++ is ++. In fact, that’s why C++ is called C++, because it is meant to be one better than C.

index

A variable or value used to select on of the members of an ordered set, like a character from a string.

member function

A function that is part of an object. Member functions are invoked on instances of an object using dot notation. An example of a member function is the length function of string objects.

object

A collection of related data that comes with a set of functions that operate on it. The objects we have used so far are the cout object and strings.

traverse

To iterate through all the elements of a set performing a similar operation on each.

whitespace

Any character or series of characters that represent horizonal or vertical space.

7.14. Exercises