7. Strings and things¶
7.1. Containers for strings¶
We have seen five types of values - booleans, characters, integers,
floating-point numbers, and strings - but only four types of variables -
bool
, char
, int
, and double
. So far we have no way to store
a string in a variable or perform operations on strings.
There are actually two standard ways to store string values in C++. One is a part of the language that has been around since the early days of C++’s predecessor language, C, and is sometimes called “a native C string.” The syntax for C strings is a bit ugly, and using them requires some concepts we have not introduced yet, so for the most part we are going to avoid them.
We are going to use the more modern C++ string type that is part of C++’s standard library. This string type also requires concepts we have not introduced yet, and we will begin to introduce them now using strings as our first example.
The strings we will use are objects. An object is a data type that can have both values, which are equivalent to the variables we have already seen, and member functions, which are basically functions built-in to the object.
To use these string objects, we need to include the required header file:
#include <string>
Here is a full program using strings:
#include <iostream>
#include <string>
using namespace std;
int main()
{
string str1;
str1 = "Hello, ";
string str2 = "strings!";
cout << str1 << str2 << endl;
return 0;
}
The first line in the body of main
creates a string
without giving it
a value. The second line assigns it the string value "Hello"
. The third
line initializes a new string variable, str2
to the value "strings!"
.
We can output strings in the usual way, as we do here in the fourth line of
main
.
7.2. Extracting characters from a string¶
Strings are called “strings” because they are made up of a sequence, or
string, of characters. The first operation we are going to perform on a
string is to extract one of the characters. C++ uses square brackets
([
and ]
) for this operation:
string fruit = "banana";
char letter = fruit[1];
cout << letter << endl;
The expression fruit[1]
indicates that we want character number 1 from the
string named fruit
. The result is stored in a char
variable named
letter
. We we output the value of letter, we get a surprise:
a
a
is not the first letter of "banana"
, unless you are a computer
scientist. For perverse reasons, computer scientists always start counting from
zero. The 0th letter (“zeroth”) of "banana"
is b
. The 1th letter
(“oneth”) is a
and the 2th (“twoeth”) letter is n
.
If you want the zeroth letter of a string, you have to put a zero in the square brackets:
char letter = fruit[0];
7.3. Length¶
To find the length (number of charaters) of a string, we can use the length
member function. The syntax for calling a member function is different from
what we’ve seen before:
int size;
size = fruit.length();
To describe this function call, we say we are invoking the length function
on the string named fruit
. The length function of strings returns the
number of characters in the string (it’s length), which is here assigned to the
int
variable, size
.
This vocabulary may seem strange, but we will see many more examples where we
invoke a function on an object. The syntax for function invokation is called
dot notation, because the dot (period) separates the name of the object
fruit
, from the name of the function, length
.
length
takes no arguments, as indicated by the empty parenthese ()
,
and it returns an integer equal to the number of characters in the string,
6, in this case.
To find the last letter of a string, you might be tempted to try something like this:
int size = fruit.length();
char last = fruit[size]; // WRONG!
That won’t work. The reason is that there is no letter in “banana” with index 6. The indexes of the letters in the string “banana” look like this:
Since we started counting at 0, the 6 letters are numbered from 0 to 5. To
get the last character, you have to subtract 1 from size
.
int size = fruit.length();
char last = fruit[size-1];
7.4. Traversal¶
A common thing to do with a string is start at the beginning, select each
character in turn, do something to it, and continue until the end. This pattern
of processing is called a traversal. A natural way to encode a traversal
is with a while
statement:
int index = 0;
while (index < fruit.length()) {
char letter = fruit[index]
cout << letter << endl;
index++;
}
This loop traverses the string and outputs each letter on a line by itself.
Notice that the condition is index < fruit.length()
, which means that
when index
is equal to the length of the string, the condition is false
and the body of the loop is not executed. The last character we access is the one
with the index fruit.length()-1
.
The name of the loop variable is index
. An index
is a variable or value
used to specify one member of an ordered set, in this case the set of
characters in the string. The index indicates (hence the name) which one you
want. The set has to be ordered so that each letter has an index and each index
refers to a single character.
7.5. The find
function¶
The string
class provides several other member functions you can invoke on
strings. The find
function is the opposite of the []
operator. Instead
of taking an index and extracting the character at that index, find
takes a
character and finds the index where that character appears.
string fruit = "banana";
int index = fruit.find('a');
cout << "Index of a in banana: " << index << endl;
This example finds the index of the letter 'a'
in the string. In this case,
the letter appears three times, so it is not obvious what find
should do.
According to the
documentation it returns
the index of the first appearance, so the result is 1. If the given letter
does not appear in the string, find
returns -1.
int index = fruit.find('x');
cout << "Index of x in banana: " << index << endl;
In addition, there is a version of find
that takes another string
as an
argument and finds the index where the substring appears in the string.
int index = fruit.find("nana");
cout << "Index of nana in banana: " << index << endl;
This example returns the value 2.
You should remember from the Overloading section that there can
be more than one function with the same name, as long as they take a unique
sequence of the number and types of their parameters. In this case, C++ knows
which version of find
to invoke by looking at the type of the argument we
provide.
If we are looking for a letter in a string, we may not want to start at the
beginning of the string. There is another version of find
that takes an
additional argument, the index where we should start looking.
index = fruit.find('a', 2);
cout << "Index of a in banana starting at 2: " << index << endl;
We will get 3 for this one, since the first occurance of 'a'
in
"banana"
starting at index 2 is the one at index 3.
7.6. Our own find
function¶
You are encouraged to make use of functions from the standard library whenever
they are available. That said, as a learner of C++ it is a useful exercise to
write some of these functions yourself. So let’s write our own version of
find
, starting with a first draft.
int find(string s, char ch) {
for (int i = 0; i < s.length(); i++) {
if (s[i] == ch)
return i;
}
return -1;
}
This version works as expected, returning the index of the first occurance
of ch
in s
, or -1
if ch
is not there.
Now we want to add the additional option of beginning the search at a specified
position in the string. C++ supports two different ways we can do this. The
first is to overload find
with a new version that has an additional
argument.
int find(string s, char ch, int start) {
for (int i = start; i < s.length(); i++) {
if (s[i] == ch)
return i;
}
return -1;
}
This will work, but it seems to violate one of the principles of good software design, the don’t repeat yourself (DRY) principle.
Rather than implementing two almost identical find
functions, modern C++
offers us default arguments.
int find(string s, char ch, int start = 0) {
for (int i = start; i < s.length(); i++) {
if (s[i] == ch)
return i;
}
return -1;
}
We now need only one version of find
, which can accept either two or
three arguments. When the function is called with only two arguments, the
third, start
, is given the default value of 0
.
Only the last parameters in a function can have default arguments. You will explore this further in the exercises.
7.7. Looping and counting¶
The following program counts the number of times the letter 'i'
appears
in a string:
string state = "Mississippi";
int count = 0;
int index = 0;
while (index < state.length()) {
if (state[index] == 'i') {
count = count + 1;
}
index++;
}
cout << count << endl;
This program demonstrates a common idiom, called a counter. The variable
count
is initialized to zero and then incremented each time we find an
'i'
. (To increment is to increase by one; it is the opposite of
decrement, and unrelated to excrement, which is a noun.) When we exit the
loop, count
contains the result: the total number of i’s.
7.8. String concatenation¶
Interestingly, the +
operator can be used on strings; it performs string
concatenation. To concatenate means to join the two operands end to end.
For example:
string fruit = "banana";
string baked_good = " nut bread";
string dessert = fruit + baked_good;
cout << dessert << endl;
The output of this program is banana nut bread
.
Unfortunately, the +
operator does not work on native C strings, so you
cannot write something like
string desert = "banana" + " nut bread"; // WRONG!
because both operands are C strings. As long as one of the operands is an
instance of the string
class, though, C++ will automatically convert the
other.
It is also possible to concatenate a character ont the beginning or end of a string. In the following example, we will use concatenation and character arithmetic to output an abecedarian series.
“Abecedarian” refers to a series of list in which the elements appear in alphabetical order. For example, in Robert McCloskey’s book *Make Way for Ducklings*, the names of the duckings are Jack, Kack, Lack, Mack, Nack, Ouack, Pack, and Quack. Here is a loop that outputs these name in order:
string suffix = "ack";
char letter = 'J';
while (letter < 'Q') {
cout << letter + suffix << endl;
letter++;
}
The output of this program is:
Jack
Kack
Lack
Mack
Nack
Oack
Pack
Qack
Of course, that’s not quit right because we’ve misspelled “Ouack” and “Quack”. We’ll let you fix that as an exercise.
Again, be careful to use string concatenation only with string objects and
not native C strings. Unfortunately, an expression like letter + "ack"
is
syntactically legal in C++, although it produces a strange result.
7.9. string
s are mutable¶
You can change the letters in a string
one at a time using the []
operator on the left side of an assignment. For example,
string greeting = "Hello, world!";
greeting[0] = 'J';
cout << greeting << endl;
produces the output, Jello, world!
.
7.10. Getting user input¶
The programs we have written so far are pretty predictable; they do the same thing every time they run. Most of the time, though, we want programs that take input from the user and respond accordingly.
There are many ways to get input, including keyboard input, mouse movements and button clicks, as well as more exotic mechanisms like voice control and retinal scanning. In this text we will consider only keyboard input.
In the header file iostream.h
, C++ defines an object named cin
that
handles input in much the same way that cout
handles output. To get an
integer value from the user:
int x;
cin >> x;
The >>
operator causes the program to stop executing and wait for the user
to type something. If the user types a valid sequence of digits, the program
converts it into an integer value and stores it in x.
If the user types something other than an integer, C++ doesn’t report an error,
or anything sensible like that. Instead, it puts some meaningless value in
x
and continues.
Fortunately, there is a way to check and see if and input statement succeeds.
We can invoke the good
function on cin
to check what is called the
stream state. good
returns a bool
: if true
, then the last input
statement succeeded. If not, we know that some previous operation failed, and
also that the next operation will fail.
Note
There is a technical term for the “meaningless value” in a computer program. It is called garbage, and there is a well known expression, garbage in, garbage out, that you should remember.
To use good
in our programs, we might write something like this:
#include <iostream>
#include <string>
using namespace std;
int main()
{
int x;
// prompt the user for input
cout << "Enter an integer: ";
// get input
cin >> x;
// check and see if the input statement succeeded
if (cin.good() == false) {
cout << "That was not an integer." << endl;
return -1;
}
// print the value we got from the user
cout << x << endl;
return 0;
}
cin
can also be used to input a string
:
string name;
cout << "What is your name? ";
cin >> name;
cout << name << endl;
Unfortunately, this statement only takes the first word of input, and leaves the rest for the next input statement. So, if you run this program and type your full name, it will only output your first name.
Because of these problems (inability to handle errors and funny behavior), we
will avoid using the >>
operator altogether, unless we are reading data
from a source that is known to be error-free.
Instead, we will use a function in the string
class called getline
.
string name;
cout << "Enter your full name: ";
getline(cin, name);
cout << "Your full name is: " << name << endl;
The first argument to getline
is cin
, which is where the input is
coming from. The second argument is the name of the string
where you want
the result to be stored.
getline
reads the entire line until the user hits Return
(Enter
).
This is useful for inputting strings that contain spaces.
In fact, getline
is generally useful for getting input of any kind. For
example, if you wanted to the user to type an integer, you could input a string
and then check to see if it is a valid integer (it contains only decimal digit
characters). If so, you can convert it to an integer value. If not, you can
print an error message and ask the user to try again. We will see how to
convert a string
of digits to an int
later.
7.11. string
s are comparable¶
All comparison operators that work on int
s and double
s also work on
string
s. For example, if you want to know if two strings are equal:
if (word == "banana") {
cout << "Yes, we have no bananas!" << endl;
}
The other comparision operations are useful for putting words in alphabetical order.
#include <iostream>
#include <string>
using namespace std;
int main()
{
string word;
cout << "Enter your word: ";
getline(cin, word);
if (word < "banana") {
cout << "Your word, " << word << ", comes before banana." << endl;
} else if (word > "banana") {
cout << "Your word, " << word << ", comes after banana." << endl;
} else {
cout << "Yes, we have no bananas!" << endl;
}
return 0;
}
You should be aware, though, that the string
class does not handle upper
and lower case letters the same way that people do. All the upper case letters
come before all the lower case letters. As a result,:
Your word, Zebra, comes before banana.
A common way to address this problem is to convert strings to a standard format, like all lower case, before performing the comparison. The next section explains how. We will not address the more difficult problem, which is making the program realize that zebras are not fruit.
7.12. Character classification¶
It is often useful to examine a character and test whether it is upper or lower case, or whether it is a character or a digit. C++ provides a library of functions that perform this kind of character classification. In order to use these functions, you have to include the header file cctype.
char letter = 'a';
if (isalpha(letter)) {
cout << "The character " << letter << " is a letter." << endl;
letter = toupper(letter);
cout << "In upper case it is: " << letter << "." << endl;
}
You might expect the return value from isalpha
to be a bool
, but for
reasons we don’t even want to think about, it is actually an integer that is
0
if the argument is not a letter, and some non-zero value if it is.
This oddity is not as inconvenient as it seems, because it is legal to use this
kind of integer in a conditional, as shown in the example. The value 0
is
treated as false
, and all non-zero values are treated as true
.
Technically, this sort of thing should not be allowed - integers are not the same thing as boolean valudes. Nevertheless, the C++ habit of converting automatically between types can be useful.
Other character classification functions include isdigit
, which identifies
the digit characters 0 through 9, and isspace
, which identifies all kinds
of whitespace,
including spaces, tabs, newlines, and a few others. There is also isupper
and islower
, which distinguish upper and lower case letters.
Finally, the last two we will discuss, toupper
and tolower
, convert
letters from one case to the other. Both take a single character as an
argument and return a (possibly converted) character.
char letter = 'a';
letter = toupper(letter);
cout << letter << endl;
The output of this code is A.
7.13. Glossary¶
- concatenate¶
To join two operands end-to-end.
- counter¶
A variable used to count something, usually initialized to zero and then incremented.
- decrement¶
Decrease the value of a variable by one. The decrement operator in C++ is
--
.- dot notation¶
The syntax for invoking (calling) a member function on an object. It consists of the name of the object, followed by a dot (
.
), followed by the name of the function, followed by a (possibly empty) sequence of arguments enclosed in parathesis.- increment¶
Increase the value of a variable by one. The increment operator in C++ is
++
. In fact, that’s why C++ is called C++, because it is meant to be one better than C.- index¶
A variable or value used to select on of the members of an ordered set, like a character from a string.
- member function¶
A function that is part of an object. Member functions are invoked on instances of an object using dot notation. An example of a member function is the
length
function of string objects.- object¶
A collection of related data that comes with a set of functions that operate on it. The objects we have used so far are the
cout
object andstring
s.- traverse¶
To iterate through all the elements of a set performing a similar operation on each.
- whitespace¶
Any character or series of characters that represent horizonal or vertical space.