# 10. Vectors¶

A **vector** is a set of values where each value is identified by a number
(called an index). A `string`

is similar to a vector, since it is made up
of an indexed set of characters. The nice thing about vectors is that they can
be made up of any type of element, including basic types like `int`

and
`double`

, and user-defined types like `Point`

and `Time`

.

In order to use `vector`

/s, you need to include the header file at the top
of your program:

```
#include <vector>
```

You can create a vector similarly to how you create other variable types:

```
vector<int> count;
vector<double> double_vector;
```

The type that makes up the vector appears in angle brackets (`<`

and `>`

).
The first line creates a vector of integers named `count`

; the second
creates a vector of doubles. Although these statements are legal, they are not
very useful because they create vectors that have no elments (their size is
zero). It is more common to specify the size of the vector in parentheses:

```
vector<int> count(4);
```

The syntax here is a little odd. It looks like a combination of a variable
declaration and a function call. In fact, that’s exactly what it is. The
function we are invoking is a `vector`

constructor. A **constructor** is a
special function that creates new objects and initializes their instance
variables. In this case, the constructor takes a single argument, which is
the size of the new vector.

The following figure shows how vectors are represented in state diagrams:

The large circles inside the boxes are the **elements** of the vector. Each
element is accessed using the *address* of its box, beginning with 0. When you
allocate a new vector, the elements are not initialized. They could contain any
values.

There is another constructor for `vectors`

that takes two parameters. The
second is a **fill value**, the value that will be assigned to each of the
elements.

```
vector<int> count(4, 0);
```

This statement creates a vector of four elements and initializes all of them to zero.

## 10.1. Accessing elements¶

The `[]`

operator reads and writes elements of a vector in much the same
way it accesses the characters in a string. As with `string`

s, the indices
start at zero, so `count[0]`

refers to the “zeroth” element of the vector,
and `count[1]`

refers to the “oneth” element. You can use the `[]`

operator anywhere in an expression.

```
count[0] = 7;
count[1] = count[0] * 2;
count[2]++;
count[3] -= 60;
```

All of these are legal assignment statements. Here is the effect of this code fragment:

Since elements of this vector are numbered from 0 to 3, there is no element with the index 4.

You can use any expression as an index, as long as it has type `int`

. One
of the most common ways to index a vector is with a loop variable. For
example:

```
int i = 0;
while (i < 4) {
cout << count[i] << endl;
i++;
}
```

This while loop counts from 0 to 4. When the loop variable `i`

is 4, the
condition fails and the loop terminates. Thus, the body of the loop is only
executed when `i`

is 0, 1, 2, and 3.

Each time through the loop we use `i`

as an index into the vector, outputting
the ith element. This type of vector traversal is very common. Vectors and
loops go together like fava beans and a nice Chianti.

## 10.2. Copying vectors¶

There is one more constructor for vectors, which is called a copy constructor because it takes one vector as an argument and creates a new vector that is the same size, with the same elements.

```
vector<int> copy(count);
```

Although this syntax is legal, it is almost never used for vectors because there is a better alternative:

```
vector<int> copy = count;
```

The `=`

operator works on vectors pretty much the way you would expect.

## 10.3. `for`

loops¶

The loops we have written so far have a number of elements in common. All of them start by initializing a variable; they have a test, or condition, that depends on that variable; and inside the loop they do something to that variable, like increment it.

This type of loop is so common that there is an alternative loop statement,
called `for`

, that expresses it more concisely. The general syntax looks
like this:

```
for (INITIALIZER; CONDITION; INCREMENTOR) {
BODY
}
```

This statement is exactly equivalent to

```
INITIALIZER
while (CONDITION) {
BODY
INCREMENTOR
}
```

except that is is more concise and, since it puts all the loop related statements in one place, it is easier to read. For example:

```
for (int i = 0; i < 4; i++) {
cout << count[i] << endl;
}
```

is equivalent to

```
int i = 0;
while (i < 4) {
cout << count[i] << endl;
i++;
}
```

Notice that it is legal to declare a variable inside a `for`

statement.
This syntax is convenient, but you should be aware that a variable declared
inside a loop only exists inside the loop. If you try to refer to `i`

later,
you will get a compiler error.

## 10.4. Vector size¶

There are only a couple of functions you can invoke on a `vector`

. One of
them is very useful though: `size`

. Not surprisingly, it returns the size of
the vector (the number of elements).

It is a good idea to use this value as the upper bound of a loop, rather than a constant. That way, if the size of the vector changes, you won’t have to go through the program changing all the loops; they will work correctly for any size vector.

```
for (int i = 0; i < count.size(); i++) {
cout << count[i] << endl;
}
```

## 10.5. Random numbers¶

Most computer programs do the same thing every time they are executed, so they
are said to be **deterministic**. Usually, determinism is a good thing, since
we expect the same calculation to yield the same result. For some applications,
though, we would like the computer to be upredictable. Games are an obvious
example.

Making a program truly *nondeterministic* turns out to be not so easy, but
there are ways to make it at least seem nondeterministic. One of them is to
generate **pseudorandom numbers** and use them to determine the outcome of the
program. Pseudorandom numbers are not truly random in the mathematical sense,
but for our purposes, they will do.

C++ provides a function called `rand`

that generates pseudorandom numbers.
It is declared in the header file `cstdlib`

, which contains a variety of
*standard library* functions, hence its name.

The return value from `rand`

is an integer between 0 and `RAND_MAX`

, where
`RAND_MAX`

is a large number (about 2 billion on our system) also defined in
the header file. Each time you call `rand`

you get a different pseudorandom
number. To `rand`

in action, compile and run this program:

```
#include <cstdlib>
#include <iostream>
using namespace std;
int main()
{
cout << "RAND_MAX is: " << RAND_MAX << endl;
cout << "Let's generate 10 random numbers." << endl;
for (int i = 1; i < 11; i++) {
cout << "Random number " << i << ": " << rand() << endl;
}
return 0;
}
```

Now try running the program again several times. What do you notice? You get the same set of 10 random numbers each time!

If we want our numbers to vary on program runs, we can set a **random seed**
for the pseudorandom number function `rand`

by calling `srand`

(for
“seed rand”). The `srand`

function takes an integer argument. With a
different argument to `srand`

, `rand`

generates a different sequence of
numbers.

A common way to give `srand`

a “random” seed is to use the `time`

function,
which returns the current time in seconds.

Note

The `time`

function returns
Unix time, which is the number
of seconds elapsed since January 1, 1970.

To use `srand`

this way, we include the `ctime`

header file and seed it
with the current time:

```
srand((unsigned) time(NULL));
```

which will cause `rand`

to generate a different sequence of numbers.

Of course, we don’t usually want to work with gigantic integers. More often we want to generate integers between some lower bound and some upper bound. We will leave this as an exercise for you to do this.

## 10.6. Statistics¶

The numbers generated by `rand`

are supposed to be distributed uniformly.
That means that each value in the range should be equally likely. If we count
the number of times each value appears, it should be roughly the same for all
values, provided that we generate a large number of values.

In the next few sections, we will write programs that generate a sequence of random numbers and check whether this property holds true.

## 10.7. Vector of random numbers¶

The first step is to generate a large number of random values and store them in a vector. By “large number”, of course, we mean 20. It’s always a good idea to start with a manageable number, to help with debugging, and then increase it later.

The following function takes a single argument, the size of the vector. It
allocates a new vector of `int`

s, and fills it with random values between
0 and `upper_bound - 1`

.

```
vector<int> random_vector(int n, int upper_bound)
{
vector<int> vec(n);
for (int i = 0; i < vec.size(); i++) {
vec[i] = rand() % upper_bound;
}
return vec;
}
```

The return type is `vector<int>`

, which means that this function returns a
vector of integers. To test this function, it is convenient to have a function
that outputs the contents of a vector.

```
void print_vector(const vector<int>& vec)
{
for (int i = 0; i < vec.size(); i++) {
cout << vec[i] << " ";
}
}
```

Notice that it is legal to pass `vector`

s by reference. In fact it is quite
common, since it makes it unnecessary to copy the vector. Since `print_vector`

does not modify the vector, we declare the parameter `const`

.

The following code generates a vector and outputs it:

```
int num_values = 20;
int upper_bound = 10;
vector<int> vector = random_vector(num_values, upper_bound);
print_vector(vector);
```

On our machine we got this as output:

```
3 6 7 5 3 5 6 2 9 1 2 7 0 9 3 6 0 6 2 6
```

which is pretty random-looking.

If these numbers are really random, we expect each digit to appear the same number of times - twice each. In fact, the number 6 appears five times, and the numbers 4 and 8 never appear at all.

Do these results mean the values are not really uniform? It’s hard to tell. With so few values, the chances are slim that we would get exactly what we expect. But as the number of values increases, the outcome should be more predictable.

To test this theory, we’ll write some programs that count the number of times each value appears, and then see what happens when we increase the number of values.

## 10.8. Counting¶

A good approach to problems like this is to think of simple functions that are
easy to write, and that might turn out to be useful. Then you can combine them
into a solution. This approach is sometimes called **bottom-up design**. Of
course it is not easy to know ahead of time which functions are likely to be
useful, but as you gain experience you will have a better idea.

Also, it is not always obvious what sort of things are easy to write, but a good approach is to look for subproblems that fit a pattern you have seen before.

Back in the Traversal section of the
Strings and things chapter we looked a loop that traversed a
string and counted the number of times a given letter appeared. You can think
of this program as an example of a pattern called *traverse and count*. The
elements of this pattern are:

A set or container that be traversed, like a string or a vector.

A test that you can apply to each element in the container.

A counter that keeps track of how many elements pass the test.

In this case, we have a function in mind called `how_many`

that counts the
number of elements in a vector that equal a given value. The parameters are the
vector and the integer value we are looking for. The return value is the number
of times the value appears.

```
int how_many(const vector<int>& vec, int value)
{
int count = 0;
for (int i = 0; i < vec.size(); i++) {
if (vec[i] == value) count++;
}
return count;
}
```

`how_many`

only counts the occurances of a particular value, and we are
interested in seeing how many times each value appears. We can solve this
problem with a loop:

```
int num_values = 20;
int upper_bound = 10;
vector<int> vector = random_vector(num_values, upper_bound);
cout << "value\thow_many" << endl;
for (int i = 0; i < upper_bound; i++) {
cout << i << '\t' << how_many(vector, i) << endl;
}
```

This code uses the loop variable as an argument to `how_many`

, in order to
check each value between 0 and 9, in order. The result is:

```
value how_many
0 2
1 1
2 3
3 3
4 0
5 2
6 5
7 2
8 0
9 2
```

Again, it is hard to tell if the digits are really appearing equally often. If
we increase `num_values`

to 100,000 we get the following:

```
value how_many
0 10130
1 10072
2 9990
3 9842
4 10174
5 9930
6 10059
7 9954
8 9891
9 9958
```

In each case, the number of appearances is within about 1% of the expected value (10,000), so we conclude that the random numbers are probably uniform.

## 10.9. A histogram¶

It is often useful to take the data from the previous tables and store them for
later access, rather than just print them. What we need is a way to store 10
integers. We could create 10 integer variables with names like `how_manyOnes`

,
`how_manyTwos`

, etc. But that would require a lot of typing, and it would be a
real pain later if we decided to change the range of values.

A better solution is to use a vector with size 10. That way we can create all ten storage locations at once and we can access them using indices, rather than ten different names. Here’s how:

```
int num_values = 100000;
int upper_bound = 10;
vector<int> vector = random_vector(num_values, upper_bound);
vector<int> histogram(upper_bound);
for (int i = 0; i < upper_bound; i++) {
int count = how_many(vector, i);
histogram[i] = count;
}
```

We called the vector `histogram`

because that’s a statistical term for a
vector of numbers that counts the number of appearances of a range of values.

The trick thing here is that we are using the loop variable in two different
ways. First, it is an argument to `how_many`

, specifying which value we are
interested in. Second, it is an index into `histogram`

, specifying which
location the result should be stored in.

Although this code works, it is not as efficient as it could be. Every time it
calls `how_many`

, it traverses the entire vector. In this example we have to
traverse the vector ten times!

It would be better to make a single pass through the vector. For each value in the vector we could find the corresponding count and increment it. In other words, we could use the value from the vector as an index into the histogram. Here’s what that looks like:

```
vector<int> histogram(upper_bound, 0);
for (int i = 0; i < num_values; i++) {
int index = vector[i];
histogram[index]++;
}
```

The first line intializes the elements of the histogram to zeros. That way,
when we use the increment operator (`++`

) inside the loop, we know we are
starting from zero. Forgetting to initialize counters is a common error.

## 10.10. Glossary¶

- vector
A named collection of values, where all the values have the same type, and each value is identified by an index.

- element
One of the values in a vector. The

`[]`

operator selects elements of a vector.- constructor
A special function that creates a new object and initializes its instance variables.

- deterministic
A program that does the same thing every time it is run.

- pseudorandom
A sequence of numbers that appear to be random, but which are actually the result of a deterministic computation.

- seed
A value used to initialize a pseudorandom number sequence. Using the same seed should yield the same sequence of values.

- bottom-up design
A method of program development that starts by writing small, useful functions and then assembling them into larger solutions.

- histogram
A vector of integers where each integer counts the number of values that fall into a certain range.

## 10.11. Exercises¶

Write a program to generate 10 random numbers between 1 and 10.

Rerun the

`how_many`

experiment from the Counting section with larger values for`num_values`

and describe your results. Try calling`srand`

seeded with the current time. Repeat this several times, documenting your results.Encapsulate the single pass histogram from the A histogram section in a function named

`histogram`

that takes a vector and a range of values in the vector (which was 0 through 9 in the example), and returns a histogram of the values in the vector.