11. Pointers and arrays¶
Many parts of the C++ language are inherited directly from the programming language C, from which it was originally derived. C has often been described as a “middle-level programming language”, since its abstractions lie between low-level machine language and assembly language, and high-level languages like Python and Ruby.
C++ retains the “close to the machine” language features from C, though we have been intentionally avoiding them so far in this book. It’s time now to look deeper into what is going on inside our computer when we make use of them in C++.
11.1. Variables and values revisted¶
We were first introduced values and variables back in the Variables, types and expressions chapter. We defined a variable as “a named location that stores a value.” Actually, in the following code:
int n = 3;
n = n + 2;
the two occurences of the variable n
in the second assignment mean very
different things. On the left hand side of the assignment operator n
represents the address in memory where n
is located. In the expression on
the right hand side of the assignment statement, n
represents the value
currently stored in the memory location n
, which in this example is the
integer 3. This second assignment can be read as “take the value stored in
the location referenced by the name n
, add 2 to it, and then store it
back in that same location.”
The two sides of an assignment statement in C++ are given special names, lvalue and rvalue, to reflect this important distinction. Data values can not be lvalues, which is why
1 = n // WRONG!
will cause a compiler error.
11.2. References revisted¶
In Pass by reference we introduced the address-of operator, &
,
though we didn’t call it that yet. As it’s name indicates, this operator
yields the memory address of the variable it is applied to.
Let’s look at a program that tests two functions, wont_swap
and
will_swap
, to see how the address-of operator works. Note that both use a
type cast, long()
,
like we did way back in the Converting between types section, to display
addresses in decimal notation. We need long()
here, since addresses on most
machines these days are 64 bits.
#include <iostream>
using namespace std;
void wont_swap(char a, char b)
{
cout << "a: " << a << " b: " << b << endl;
cout << "address of a: " << long(&a) << endl;
cout << "address of b: " << long(&b) << endl;
char temp = a;
a = b;
b = temp;
cout << "a: " << a << " b: " << b << endl;
}
void will_swap(char& a, char& b)
{
cout << "a: " << a << " b: " << b << endl;
cout << "address of a: " << long(&a) << endl;
cout << "address of b: " << long(&b) << endl;
char temp = a;
a = b;
b = temp;
cout << "a: " << a << " b: " << b << endl;
}
int main()
{
char x = 'x';
char y = 'y';
cout << "x: " << x << " y: " << y << endl;
cout << "address of x: " << long(&x) << endl;
cout << "address of y: " << long(&y) << endl;
cout << "Calling wont_swap..." << endl;
wont_swap(x, y);
cout << "x: " << x << " y: " << y << endl;
cout << "Calling will_swap..." << endl;
will_swap(x, y);
cout << "x: " << x << " y: " << y << endl;
return 0;
}
When we ran this program, we got:
x: x y: y
address of x: 140730371504847
address of y: 140730371504846
Calling wont_swap...
a: x b: y
address of a: 140730371504796
address of b: 140730371504792
a: y b: x
x: x y: y
Calling will_swap...
a: x b: y
address of a: 140730371504847
address of b: 140730371504846
a: y b: x
x: y y: x
The address values will almost certainly be different on your computer, but the
important point here is the relationship between the addresses of x
and
a
, and y
and b
. In wont_swap
they are different, since it
is passing by value. In will_swap
they are the same.
11.3. Pointers¶
We turn now to pointers.
A pointer is a variable that stores a memory address. Pointers “point to”
a storage location that holds an object of another type. So we have pointers
to char
, pointers to int
, pointers to double
, etc. The following
creates a variable cp
that points to an character variable:
char letter = 'a';
char* cp = &letter;
cout << "letter: " << letter << endl;
cout << "address of letter: " << long(&letter) << endl;
cout << "value of pointer to letter: " << long(cp) << endl;
cout << "address of pointer to letter: " << long(&cp) << endl;
cout << "dereferencing the pointer cp gives: " << *cp << endl;
Running this will yield something like this:
letter: a
address of letter: 140725200837471
value of pointer to letter: 140725200837471
address of pointer to letter: 140725200837472
dereferencing the pointer cp gives: a
In the first line we create a variable of type char
named letter
and
assign it the value 'a'
. The second line creates a pointer to char
variable named cp
and initializes it to the address of letter. Using the
first three cout
statements, we confirm that letter
does indeed
store the character 'a'
, and that cp
stores the address of letter.
The next cout
statement shows where cp
itself is stored, and the last
cout
statement introduces the dereference operator, *
, which
returns the value at the memory location to which the pointer points.
The use of the *
symbol in two different ways here is confusing. When
applied to an lvalue, as in:
int* ip;
it means “pointer to” the type to which it is applied. When it appears in front of a pointer variable as part of an rvalue, it is the dereference operator, returning the value referenced by the pointer.
Here is another working version of our swap function that uses pointers:
void will_swap_with_pointers(char* a, char* b)
{
cout << "a: " << *a << " b: " << *b << endl;
cout << "address of a: " << long(a) << endl;
cout << "address of b: " << long(b) << endl;
int temp = *a;
*a = *b;
*b = temp;
cout << "a: " << *a << " b: " << *b << endl;
}
and a few more lines to call this new version in main
:
char c = 'c';
char d = 'd';
cout << "Calling will_swap_with_pointers..." << endl;
will_swap_with_pointers(&c, &d);
Notice that since the type of the two parameters in this function is
pointer to char, we need to send it the address of c
and d
using
the address-of operator.
11.4. The arrow operator ->
and this
¶
We created Point
s back in Point objects:
struct Point {
double x, y;
};
We can now create a pointer to a Point
with:
Point p = {3, 4};
Point* pp = &p;
The question is, can we access the instance variables x
and y
from our
pointer pp
? It turns out we can, but we need a new operator, ->
,
called the arrow operator:
pp->x = 5; // p.x is now 5
The arrow operator enables access to instance variables and member functions of a pointer to an object.
If we want to turn our Point
structure into an object with constructors
and member functions, we might want a constructor that takes the values of
x
and y
as arguments and initializes the instance variables:
struct Point {
double x, y;
Point(double x, double y) {
this->x = x;
this->y = y;
}
};
11.5. Dynamic memory allocation and memory leaks¶
Using the address-of operator is not the only way to assign a value to a
pointer. The new
operator takes a type and returns an address of memory
large enough to store that type.
int* ip = new int(42);
cout << "ip: " << long(ip) << endl;
cout << "dereferencing the pointer ip gives: " << *ip << endl;
Running this will yield something like:
ip: 94572116348592
dereferencing the pointer ip gives: 42
This dynamic memory allocation can be useful, but it requires us to take responsibility for handling memory management ourselves, and it comes with a danger, we can leak memory! A memory leak occurs when we dynamically assign memory to a pointer and then reassign that pointer to point somewhere else.
ip = new int(5);
cout << "ip is now pointing to: " << long(ip) << endl;
cout << "dereferencing the pointer ip gives: " << *ip << endl;
cout << "But we leaked memory!" << endl;
which yields something like:
ip is now pointing to: 94572116349664
dereferencing the pointer ip gives: 5
But we leaked memory!
The problem is that the memory located at address 94572116348592
is still
listed as used by the operating system, but it is no longer available to our
program, since we have no way to address it.
When allocating our own memory using new
, we need to explicitly free it
up with delete
int* ip = new int(42);
int* tip = ip;
ip = new int(5);
delete tip;
11.6. Arrays¶
So far, we have been using C++’s vector class whenever we wanted a contingous sequence of values of the same type. There is a more primative version of this kind of sequence that C++ inherits from C, the array. An array is a fixed sized sequential collection of elements of the same type. Each element in the array is addressed by its index, beginning with 0.
We declare an array by putting its size in square brackets next to the name of the array. Each element is then accessed using the name of the array followed by its index in square brackets.
The following code creates an array of 10 int
s, with indices 0 through
9, and then initializes each element in a for
loop to values 0, 1, 4, …,
89.
int square_numbers[10];
for (char i = 0; i < 10; i++) {
square_numbers[i] = i * i;
}
The state diagram looks like this:
Running:
cout << "square_numbers[7] is " << square_numbers[7] << "." << endl;
will produce:
square_numbers[7] is 49.
Accessing array elements is done the same way as we saw with vectors back in Accessing elements.
Back then we avoided discussing what happens when we run something like the following, which will give us a surprise!
cout << "square_numbers[12] is " << square_numbers[12] << "." << endl;
Both vectors and arrays in C++ are memory unsafe, meaning you can use them to
access places in memory – like that addressed by square_numbers[12]
–
which are not within the boundaries of your array.
C++ will compile this statement without objection. When you run it, you will get an arbitrary value of whatever happens to be stored in that memory location. The technical term computer scientists use for arbitrary data like this is garbage, and it can cause havoc in your programs. The expression “garbage in, garbage out” captures the problem. Any computation that uses a garbage value will itself become garbage, making programs very hard to debug.
11.7. Pointers and arrays¶
Arrays and pointers in C++ are closely related. To quote Brian Kernighan and Dennis Ritchie from The C Programming Language, “In C, there is a strong relationship between pointers and arrays, strong enough that pointers and arrays should be discussed simultaneously.”
To see this relationship in action, add the following lines after the
square_numbers
initialization loop in the previous section:
int* iptr = square_numbers;
cout << "The 8th of our square numbers is " << *(iptr + 7) << "." << endl;
The array variable square_numbers
, without using the square brackets, is a
pointer to the address of the first integer in the array, as we can see
here, where a pointer variable iptr
is assigned to it. Incrementing the
pointer varialble makes it point to the next element in the array. Incrementing
it by 7 points it to the 8th element of the array.
11.8. Dynamic memory allocation with arrays¶
There are two dynamic memory operators, new[] and delete[] for allocating memory for arrays.
11.9. C Strings¶
At the beginning of the Strings and things chapter, we mentioned native C strings, stating that we lacked concepts needed to introduce them. We now do.
A string in the C programming language is an arrays of characters (char
)
ending in a special character called the null character, which is a char
with a
numeric value of zero ('\0'
).
11.10. Dynamic memory allocation¶
11.11. Glossary¶
- lvalue¶
A reference to a storage location in memory. When a variable appears on the left hand side of an assignment statement, it refers to the memory location where the rvalue will be stored.
- pointer variable¶
A variable that holds an address in memory. Pointers can be initialized after they are created, and can be reassigned to different address values.
- reference variable¶
A variable that holds the address of another variable. References can not be reassigned once they are created.
- rvalue¶
A value appearing on the right hand side of an assignment statement. The rvalue is stored in the location reference by the lvalue.