ikerhurtado.com
You're in
Iker Hurtado's pro blog
Developer | Entrepreneur | Investor
Software engineer (entrepreneur and investor at times). These days doing performant frontend and graphics on the web platform at Barcelona Supercomputing Center

Relearning C++: Notes on arrays and strings

5 Mar 2015   |   iker hurtado  
Share on Twitter Share on Google+ Share on Facebook
In this post I write down some notes as from my study of C++ basics on arrays (built-in and library) and strings

Built-in arrays

An array is a series of elements of the same type placed in contiguous memory locations. They can be individually referenced by adding an index (between brackets) to a unique identifier.

In order to subscript an array, it is good practice to define that variable to have type size_t (in the cstddef header). It is a machine-specific unsigned int that is guaranteed to be large enough to hold the size of any object in memory.

The number of elements is part of the array’s type. So, the dimension must be known at compile time -it must be a constant expression.

Arrays hold objects. Thus, there are no arrays of references.

C/C++ does not perform array index-bound check. Hence, if the index goes beyond the array's bounds, it does not issue a error.

Initialization

A default-initialized array of built-in type that is defined inside a function will have undefined values. Static arrays, and those declared directly in a namespace (outside any function), are always initialized. If no explicit initializer is specified, all the elements are default-initialized (with zeroes, for fundamental types).

It's not possible to initialize an array as a copy of another array, nor it's legal to assign one array to another.

Some complicated array declarations:

int *ptrs[7];            //  array of seven pointers to int
int (*arrayPtr)[7] = &arr; //  arrayPtr points to an array of seven ints
int (&arrayRef)[7] = arr;  //  arrayRef refers to an array of seven ints

Pointers and arrays

Pointers and arrays are closely related. Usually, when we use an array the compiler converts it to a pointer.

So, an array can always be implicitly converted to the pointer of the proper type:

int array [20];
int * arrayPtr;
arrayPtr = array;  // This assignment operation is valid

After that, arrayPtr and array would be equivalent and would have very similar properties. Except that arrayPtr can be assigned a different address, whereas array can never be assigned anything, and will always represent the same block of 20 elements.

It's possible to use the subscript operator on any pointer, including a negative one. The following code is valid:

int *arrayPtr = &array[2]; // arrayPtr points to the element indexed by 2
int num = arrayPtr[1]; // arrayPtr[1] is equivalent to *(arrayPtr + 1) and array[3]
int num2 = arrayPtr[-2]; // arrayPtr[-2] is the same element as array[0]

Ultimately, the brackets are a dereferencing operator known as offset operator. They dereference the variable they follow just as * does, but they also add the number between brackets to the address being dereferenced.

In order to do easier and safer to use pointers and to keep a semantic similar to library containers, the new library includes two functions, named begin and end. Use examples:

int array[] = {0,1,2,3,4,5}; 
int *beg = begin(array); // pointer to the first element 
int *last = end(array);  // pointer one past the last element

C-Style strings

A (C-style) string is a sequence of chars that are interpreted as a piece of text. It is implemented as a char array that have a null terminator (‘\0′ character) meaning the string end.

Sequences of characters enclosed in double-quotes (") are string literals and their type is a null-terminated array of characters.

Some examples of C-Style strings declarations (and initilizations):

char str[256]; // Can hold a C-String of up to 255 characters terminated by '\0'
char str1[] = "Hello"; // Declare and initialize with a "string literal"
char str1char[] = {'H', 'e', 'l', 'l', 'o', '\0'};  // Same as above
char str2[256] = "Hello";  // Length of array is 256, keeping a smaller string.

Because string literals are regular arrays, they have the same restrictions and cannot be assigned values.

string = "Hello";   // not valid
string[0] = 'h';  // valid
Although C++ supports C-style strings, they should not be used by C++ programs. For most applications, in addition to being safer and easier, it is also more efficient to use library strings.

The cstring header (ported from C's string.h) contains these commonly-used functions to operate on C-strings. Also the cstdlib header contains functions to convert C-strings to fundamental types. Finally, The cctype contains character handling functions.

The standard library array

The built-in arrays are directly implemented as a language feature -inherited from the C language- but they have some drawbacks. In order to solve them the C++ standard library provides an alternative array type as a standard container. It is a type template (a class template, in fact) defined in header array.

It is as efficient in terms of storage size as an ordinary array. This class merely adds a layer of member and global functions to it, so that arrays can be used as standard containers.

The main differences: now these arrays allow being copied (expensive operation), we can know their size and decay into pointers only when explicitly force to do so (by means of its member data).

For more detail go to std::array - cppreference.com

The string type

The string class, in header string and under namespace std, models character sequences with an interface similar as standard container but adding specific features to operate with strings of single-byte characters. It is an instantiation of the basic_string template class that uses char type with a typedef.

It's worthy to note that C++ string is not immutable -like string in other languages.

This class handles bytes independently of the encoding used: If used to handle sequences of multi-byte or variable-length characters (such as UTF-8), all members of this class (such as length or size), as well as its iterators, will still operate in terms of bytes (not actual encoded characters).
For reference info of the type see: string - C++ Reference

The string::size_type type

The string class defines several companion types, is one of them. These companion types make it possible to use the library types in a machine-independent manner.

size_type is a companion type defined in string. It is an unsigned int type big enough to hold the size of any string. Any variable used to store the result from the string size operation should be of type string::size_type. Example:

auto len = line.size();

You can avoid problems due to conversion between unsigned and int by not using ints in expressions that use string.size().

Relational operators

Relational operators with string use the same strategy as a (case-sensitive) dictionary (their implementation use string::compare for the comparisons):

1. If two strings have different lengths and if every character in the shorter string is equal to the corresponding character of the longer string, then the shorter string is less than the longer one.

2. If any characters at corresponding positions in the two strings differ, then the result of the comparison is the result of comparing the first character at which the strings differ.

In addition, the string library provides a set of compare functions that are similar to the C library strcmp function.

string type, string literals and C-style strings

The string library lets convert both character literals and character string literals to strings. Also, many string methods admit C-style strings as arguments (const char* s). This lines are valid:

string s3 = s1 + ", " + s2 + '\n';
str1.compare(c_style_string)
It is important to remember that string literals are not standard library strings but 'an array of n const char'. When we mix string with C-style string or character literals, at least one operand to each + operator must be of string type.
Good info about string literals: Constants - C++ Tutorials

Functions to convert between string and numbers

The C++11 standard introduced several very useful functions to convert between numeric data and library strings:

string s = to_string(i);  // converts the int i to its character representation
int d = stoi(s);          // converts the string s to int

POST A COMMENT: