Data Types
The following topics are covered in this section:
Every piece of data has to belong to some basic category. Consider a simple example in real life: every number has to be of a particular type. The number 5 is a natural number (or it can be called as a whole number). 6.5 is a real number (it has a decimal point). Similarly, in programming we have what are called as data types. When a variable is declared, the programmer has to specify which data type it belongs to. Only then will the compiler know how many bytes it should allocate for that particular variable. Or in other words, each data type occupies a different memory size and if a variable is declared as belonging to one particular data type it cannot be assigned a different data type value. In simpler terms, suppose the variable ‘x’ is declared such that it can hold only whole numbers; then it cannot (and should not) be assigned some alphabet.
There are two categories of data types: fundamental data types and user-defined data types. The second category of data types will be dealt with later.
The fundamental (or built-in or primitive) data types are:
The first three data types: integer, floating point and character are used frequently.
An integer can contain only digits (numbers) from 0 to 9. Examples of integers are:
It includes positive and negative numbers but the numbers have to be whole numbers. It does accept the decimal point. Hence the following numbers are not integer data types:
These numbers come under the second category (floating point type) and not under integers. If the program has to accept such values from the user do not declare the variable as an integer. If a variable is declared as an integer and the user enters a value of 2.3, the program will assign 2 as the value for that integer variable. Similarly, if the user enters 3.2, the program will assign 3 to the integer variable.
Remember: Once a variable is declared as an integer, it will only store whole numbers (if the user types a value with the decimal point, the program will ignore everything that is typed after the decimal point).
How to declare a variable as belonging to the type integer? The syntax is:
int variable-name;
Each data type occupies a certain amount of memory space. An integer will occupy 2 bytes of memory (which means 16 bits). From this it is possible to calculate the maximum and minimum values that an integer can store. 2^16 = 65536 (hence 65536 different combinations of 16 bits are possible). Divide this by 2 because integers (by default) range from negative to positive values. We have a 0 in between and so subtract one from this to get 32,767. Hence an integer can take values from –32,768 up to +32,767 (a total of 65536 different values).
A natural question springs to mind, "What would happen if a value greater than 32,767 is entered?" Since this value cannot be accommodated within the allocated two bytes, the program will alter the value. It’s not exactly altering the value; it will basically change your value into something different. The user might enter 123456 as the integer value but the program will store it as –7623 or something like that. Whenever you use variables ensure that you have declared them as belonging to the correct data type.
This restriction on maximum range might seem to be a problem. In C++ ‘qualifiers’ can be used to vary the range of fundamental data types. Qualifiers are only supplements to the basic data types and they cannot be used separately on their own. They work only with a basic (or fundamental) data type. The 4 qualifiers available in C++ are:
Signed and unsigned integers were discussed in the first chapter. When an integer is specified as signed, then automatically the most significant bit of the number is used as a sign bit (to denote the sign of the number). Hence it can be used if the programmer needs positive and negative number values for the variable. By declaring a variable as an integer, by default you can specify both positive and negative values. By default an integer is a signed integer. In other words,
int variable-name;
is the same as
signed int variable-name;
In the second form, ‘signed’ is the qualifier and it is used to explicitly state that the variable is a signed integer. For an unsigned integer the syntax will be:
unsigned int variable-name;
An unsigned integer can hold a value up to 65,535 (a signed integer can hold only up to 32,767). Of course, in an unsigned integer you cannot assign a negative value. The range is from 0 to 65,535. To go beyond 65,535 and make use of both positive and negative values as well, the qualifier long should be used.
long int variable-name;
Long integers occupy 4 bytes of memory (32 bits). Remember, long int actually means signed long int (you can give positive and negative values).
If you specify
unsigned long int variable-name;
you can only assign positive values to the variable. Thus, two qualifiers can be used together with a basic data type.
What about the ‘short’ qualifier? Short integer is the same as a signed integer. It occupies two bytes and has the same range of positive and negative values as the normal integer case.
int x;
is usually the same as
short int x;
Compilers (depending on the operating system) will assume ‘int’ as a ‘long int’ or a ‘short int’. VC++ (since it works in the Windows OS) will default to ‘long int’ if you specify a variable as type ‘int’ (i.e. it will allocate 4 bytes to an ‘int’ variable). Turbo C++ (which is a DOS based compiler) will default to ‘short int’ when you specify a variable as type ‘int’. Thus the statement:
int var;
will allocate ‘var’ 4 bytes if you are using VC++ but the same statement will allocate 2 bytes if you are using Turbo C++ compiler.
Programmers sometimes prefer to explicitly state what type of integer they want to use by making use of the ‘short’ and ‘long’ qualifiers. ‘short int’ always occupies only 2 bytes (irrespective of whether the OS is Windows or DOS) while a ‘long int’ always occupies 4 bytes.
Two qualifiers can be used together, but do not try using:
short long int variable-name;
This will cause a compile-time error. So be careful with what qualifiers you use. And remember that the default for int is equivalent to short signed integer.
Floating type data include integers as well as numbers with a decimal point. It can also have an exponent. Exponent means 10 to the power of some integer value (whole number). 20000 = 2 x 10^4 = 2e4 = 2E4.
If you specify decimal numbers, floating point data type will store up to a precision of 6 digits after the decimal point. Suppose 0.1234567 is assigned to a floating-point variable, the actual value stored would be 0.123457 (it will round up to the sixth digit after the decimal place). Valid floating-point numbers are:
Do not use an exponent with a decimal point. For example: 2e^{2.2 }is an invalid floating point because the exponent has to be an integer. Floating point numbers use 4 bytes of memory and has a much greater range than integers because of the use of exponents. They can have values up to 10^38 (in positive and negative direction). The same qualifiers used for an integer can be applied to floating point numbers as well. To declare a floating variable, the syntax is:
float variable-name;
This is similar to the floating-point data type but it has an even greater range extending up to 10^{308}. The syntax to declare a variable of type double is:
double variable-name;
Beware: Visual C++ (VC++) usually uses its default as ‘double’ instead of ‘float’. Suppose we type:
float x=31.54;
you will get a warning message saying that a ‘double’ (i.e. 31.54) is being converted into a floating point. It is just to warn you that you are using a ‘float’ and not a ‘double’. (Even if there are warnings, there won’t be any problem in running your program).
A character uses just one byte of memory. It can store any character present on the keyboard (includes alphabets and numbers). It can take numbers from 0 to 9 only. The following are valid characters:
If the number 13 is entered as the value for a character, the program will only store 1 (i.e it will store the first character that it encounters and will discard the rest). A character is stored in one byte (as a binary number). Thus whatever the user enters is converted into a binary number using some character set to perform this conversion. Mostly all computers make use of the ASCII (American Standard Code for Information Interchange). For example, according to the ASCII coding, the letter ‘A’ has a decimal value of 65 and the letter ‘a’ has a value of 97.
There is another form of coding called the EBCDIC (Extended Binary Coded Decimal Information Code) which was developed by IBM and used in IBM computers. However, ASCII remains the most widely used code in all computers. The table at the end of the book gives a listing of the ASCII values and their equivalent characters. The syntax to declare a variable which can hold a character is:
char variable-name;
This data type will only accept two values: true or false. In C++, ‘true’ and ‘false’ are keywords. Actually a value of true corresponds to 1 (or a non-zero value) and a value of false corresponds to 0.
#include <iostream.h> |
Remember: The size of data types given above is a general case. Data type sizes depend on the operating system. So it may vary from system to system.
Name |
Bytes (Compiler dependent) |
Description |
Range (depends on number of bytes) |
char |
1 |
Character or 8 bit integer. |
signed: -128 to 127 |
bool |
1 |
Boolean type. It takes only two values. |
true or false |
short |
2 |
16 bit integer. |
signed: -32768 to 32767 |
long |
4 |
32 bit integer. |
signed:-2147483648 to 2147483647 |
int |
2/4 |
Integer. Length depends on the size of a ‘word’ used by the Operating system. In MSDOS a word is 2 bytes and so an integer is also 2 bytes. |
Depends on whether 2/4 bytes. |
float |
4 |
Floating point number. |
3.4e + / - 38 (7 digits) |
double |
8 |
double precision floating point number. |
1.7e + / - 308 |
Remember: ‘short int’ always occupies only 2 bytes (irrespective of whether the OS is Windows or DOS) while a ‘long int’ always occupies 4 bytes.
Determining the range:
You might be wondering why the ranges are from -128 to 127 or from -32768 to 32767? Why not from –128 to 128? We’ll take the case of signed characters to understand the range. A character occupies one byte (8 bits). In a signed character the MSB (i.e. the 7^{th} bit is used to denote the sign; if the MSB is 1 then the number is negative else it is positive). The binary number: 0111 1111 represents +127 in decimal value (we’ve seen about conversion from binary to decimal numbers in the first chapter). The number 0000 0000 represents 0 in decimal. But what about 1000 0000? Since the MSB denotes a negative number, does this stand for –0? Since there is no point in having a –0 and a +0, computers will take 1000 0000 as –128. 0000 0000 is taken to be zero. Thus in the negative side we have a least value of –128 possible while in the positive side we can have a maximum of only +127.
Another point to note is that negative numbers in computers are usually stored in 2’s complement format. For example: 1000 0001 actually stands for –1 but if this number is stored in 2’s complement format then it stands for –127. Similarly 1000 0010 would appear to be –2 but is actually –126. To understand 2’s complement you should know about 1’s complement. Let us say we have a signed binary number: 1000 0001. The 1’s complement of this number is 1111 1110 (i.e. to find the 1’s complement of a binary number just change the 1s to 0s and 0s to 1s but do not change the sign bit). To find the 2’s complement just add 1 to the 1’s complement. Hence the 2’s complement of 1000 0001 is 1111 1111 (or –127). In the same way the 2’s complement of 1111 1111 (-127) is 1000 0001 (or –1). Thus the computer instead of storing 1111 1111 (for –127) will store the 2’s complement of the number (i.e. it will store 1000 0001).
Remember: When converting a number to its 2’s complement leave it’s sign bit unchanged. And numbers are stored in 2’s complement format only if they are negative. Positive numbers are stored in the normal format.
Everything in the computer is stored in binary format. Even if we ask the computer to store an octal number or a hexadecimal number in a variable, it will still be stored in binary format (we’ll see this later) because computers only understand 0s and 1s. How does a computer perform calculations? Yes, it has to do it in binary format. Let’s take an example of binary addition:
0 1 1 0 1 0 1 0 (106)_{10}
0 0 1 1 0 1 0 0 (52)_{10}
-----------------------------------------------------------------
1 0 0 1 1 1 1 0
First of all, what 2 decimal numbers are we adding? Convert them to decimal and you’ll get: 106 and 52. When the computer has to add 106 and 52, it would in effect be adding the 2 binary numbers as shown above. Binary arithmetic is simple:
0 + 0 = 0
1 + 0 = 1
0 + 1 = 1
1 + 1 = 0 and a carry of 1
Use this and try out the addition of 106 and 52. Voila! You’ll get the answer of 158.
Proceeding further, we need to investigate as to how negative numbers are really stored in computers. We discussed earlier that if +10 is represented as 0000 1010 then -10 would be represented as 1000 1010 (since the MSB is considered as the sign bit). Right? Let’s check it out.
If we add -10 to 10 then we should get the answer as zero.
0 0 0 0 1 0 1 0
1 0 0 0 1 0 1 0
-----------------------------------------------------------------
1 0 0 1 0 1 0 0
And the answer is? -20.
So, where did we go wrong? One question you might have is why did we add the sign bit also? The computer isn’t smart enough (or rather it doesn’t have the circuitry) to separate the sign bit from the rest of the number. The 2 numbers that you want to add, are fed as input to an adder circuit which will blindly add up both the numbers and give the result. To overcome this problem, we can make use of 2s complement.
For example: 1000 0001 actually stands for –1 but if this number is stored in 2’s complement format then it stands for –127. Similarly 1000 0010 would appear to be –2 but is actually –126. To understand 2’s complement you should know about 1’s complement. Let us say we have a signed binary number:
1 0 0 0 0 0 0 1
The 1’s complement of this number is
1 1 1 1 1 1 1 0
To find the 1’s complement of a binary number just change the 1s to 0s and 0s to 1s but do not change the sign bit.
Next, to find the 2’s complement just add 1 to the 1’s complement.
1 1 1 1 1 1 1 0
1
-----------------------------------------------------------------
1 1 1 1 1 1 1 1
Hence the 2’s complement of 1000 0001 is 1111 1111 (or –127). In the same way the 2’s complement of 1111 1111 (-127) is 1000 0001 (or –1). Thus the computer instead of storing 1111 1111 (for –127) will store the 2’s complement of the number (i.e. it will store 1000 0001). Why? Let’s go back to our initial problem of adding +10 and -10. Now let’s assume that the computer stores -10 in 2’s complement format. The addition will now be:
0 0 0 0 1 0 1 0 (+10)
1 1 1 1 0 1 1 0 (-10)
-----------------------------------------------------------------
0 0 0 0 0 0 0 0
Voila! The answer is 0. But you may ask ‘what about the carry of 1?’. Well, since it is an extra bit, it overflows (which means it is lost and we needn’t worry about it). I’m not going to get into more details of 2’s complement since this should be sufficient for learning C++. To learn more on this subject, you can check out books on digital systems or computer system architecture.
Note: When you perform binary addition, 1 + 1 is equal to 0 and a carry of 1 (because in the binary system we only have 2 states: 1 and 0; so we can’t have 1 + 1 = 2 in binary).
Remember: When converting a number to its 2’s complement its sign bit is unchanged. Numbers are stored in 2’s complement format only if they are negative. Positive numbers are stored in the normal format.
Copyright © 2004 Sethu Subramanian All rights reserved.