Files and Streams (sstutor)

Streams and Files - III

The following topics are covered in this section:

Writing structures to files
Padding/packing of structures
Detecting end of file

Writing a structure to a file

The read ( ) and write ( ) functions are another way to read and write blocks of binary data. This is particularly useful in writing and reading structures to files.
The read and write ( ) function syntaxes are very similar to each other. But I think you'll find the arguments a bit hard to understand. The syntax is as follows:

objectname.read(char * buf, int n);
objectname.write(const char * buf, int n);

The syntax might appear to be a bit weird. Let's go one by one. First of all, read ( ) and write ( ) functions have to be called by an object which belongs to one of the streams. You can't call them without an object (in this case the object is a stream).

The read ( ) function will read ‘n’ characters (or ‘n’ bytes) from the invoking stream and puts them in the buffer pointed to by ‘buf’. The write ( ) function will write ‘n’ characters (or ‘n’ bytes) to the invoking stream from the buffer (buf). ‘n’ is basically an integer that denotes the size in bytes.

(char *) buffer : This tells the compiler the starting memory location. If you are using the write function then this denotes the starting point for copying data into the file. The function needs the first argument to be a pointer to a character and that’s why we use casting (casting is discussed in a later chapter). Basically when you pass the first argument to the read/write( ) functions, you should make that argument look like a pointer to a character.

int n : Here we specify the number of bytes we want to write or read (usually we make use of the sizeof operator to determine the size of what we want to write/read because you can’t expect the programmer to remember the sizes).

Beware: The read and write( ) functions write and read binary data (not text format).

Remember: These two functions are very useful in reading/writing an entire array in binary format to a file.

// Program to create a file, write to it and then display the contents of the file

#include <iostream.h>
#include <fstream.h>

struct email
{
char name[20];
char id[20];
};

int main ( )
{
email user;
email check;                                             //user, check are structure variables.
cout<<"Enter a name: ";
cin>>user.name;
cout<<"Enter the email address : ";
cin>>user.id;                                                         //get values for elements of user
ofstream out("c:/email.txt", ios::out | ios::binary);         //Open the file test.txt
out.write( (char *) &user, sizeof (struct email) );
out.close( );

cout<<endl<<"Contents of file are : ";

ifstream in ("c:/email.txt", ios::in | ios::binary);
in.read((char *) &check, sizeof(struct email)); //read the structures
cout<<endl<<check.name;
cout<<endl<<check.id;
in.close( );
return 0;
}

The output is:

Enter a name: ajay
Enter the email address : ajay@yahoo.com
Contents of file are :
ajay
ajay@yahoo.com

The program upto the write ( ) function is normal. We open a stream called out for writing the structure to the file.

out.write((char *) &user, sizeof (struct email));

Compare the above line with the syntax of the write ( ) function. You'll notice that instead of ‘buf’ we've used ‘&user’. This is because we want to save the structure variable ‘user’ into the file. So we point to the address of ‘user’. Then instead of ‘n’ we've used:

sizeof (struct email)

In the general syntax, ‘n’ refers to the number of bytes you want to write. In this case we want to write as many bytes as the structure will occupy. Instead of specifying some fixed number, it's better to use the ‘sizeof’ operator to find the number of bytes. The name of the structure is ‘email’. So by saying

sizeof (struct email)

the compiler will find out how many bytes the structure ‘email’ occupies. This value is the same as that of ‘user’ (since ‘user’ is a variable of structure ‘email’).

Next we create a stream to read the contents of the file test.txt (just to see whether the structure was saved in the file).

in.read((char *) &check, sizeof(struct email));

The syntax is same as that of write ( ) function. Except that we've made use of another structure variable namely: ‘check’. This is also a variable belonging to structure type ‘email’ except that we haven't obtained any values for check.name and check.num

Now, we will read ‘sizeof (struct email)’ bytes and store it in the structure variable ‘check’ and then print the values of check.name and check.num. Instead of using ‘check’, you can also use the structure variable ‘user’. A different structure variable is used for the two purposes to demonstrate clearly that we are really writing and reading from the file.

Remember: It is always a good idea to store structures in binary-format rather than text-format (since they contain a mixture of data types and you wouldn’t want conversion of data).

Padding/Packing of Structures

What do you think is the space occupied by the structure variable ‘t’ below:

struct test
{
long int num; // long int is 4 bytes
char a; //char is 1 byte
}t;

The actual size is 5 bytes but the compiler might do padding to the structure and make the size of the structure 8 bytes. Why? Some compilers prefer to uniformly allocate memory space to the structure members (i.e. each structure member is given a fixed length). In the above structure ‘test’ the compiler notes that ‘long int’ is the maximum element size (of 4 bytes) and so it prefers to allocate memory space in increments of 4 bytes. Thus if ‘num’ occupies memory from byte number 1 to byte 4, then ‘char a’ will occupy memory from byte number 5 to byte 8 (even though ‘char’ requires only 1 byte). In this case we say that the packing is 4 bytes (because each member is packed to 4 bytes). Why does the compiler do packing?

Consider the structure:

struct test
{
short int s;         // 2 bytes
char c;             // 1 byte
double d;         // 8 bytes
long int i;         // 4 bytes
}

The structure ‘test’ will now have packing of 8 bytes (because a ‘double’ occupies 8 bytes). Let us assume that the compiler allocates memory starting from byte 0.

The allocation will be: [2+1+(5)] + [8] + [4 + (4)] = 24 bytes

The number within ( ) denotes the number of extra bytes added for the sake of uniformity (the padded bytes).

‘s’ and ‘c’ (both combined occupy less than 8 bytes) are put together in one field. Since ‘d’ cannot be accommodated within the same 8-byte field, five bytes are padded to the first field. ‘d’ will occupy the next 8-byte field. No padding of bytes is required here. The last field is occupied by the integer ‘i’ and it occupies only 4 bytes. So another 4 bytes are padded here to make this field 8 bytes long.

Thus the starting location of each member (in terms of bytes) when 8-byte packing is used is: 0, 2, 8, 16 (a total memory space of 24 bytes is needed).

Suppose we didn’t use padding (i.e. if packing is 1 byte), the positions will be: 0, 2, 3, 11 (a total of 15 bytes)

If we use packing of 4 bytes, the positions will be: 0, 2, 4, 12 (a total of 16 bytes)

As you might have noticed, when we use packing, all the members will be at multiples of 8 or 4 (depending on the packing used) and such an alignment is faster to access. A 32-bit processor would be able to access members at 32 bit boundaries faster. If there is no packing the elements are at varying positions (i.e. they are not uniformly placed in memory). Though more space is occupied by padding, the speed of the program can be improved using padding. To take advantage of this fact some compilers implement padding whenever they deal with structures.

There is no need to worry about padding if you are developing software entirely in a single compiler. The problem arises when you write two programs (one for writing structures and another for reading the file) on two different compilers. If both compilers implement packing of bytes then there won’t be a problem. For example: Turbo C++ does not use padding while VC++ uses padding. So, if you write a program to write structures to files in VC++, the program will use padding to store the data in the file. While reading the file using a TC++ program, the compiler does not know about packing and so when you use the read ( ) function, you will get strange results. An example to highlight this problem is given below.

//Program to write structures to a file using a compiler which implements padding
#include <iostream.h>
#include <fstream.h>

struct test
{
char a,b,c;
long int num;
}t; //by default each member is packed to 4 bytes.

int main( )
{
t.a='a';
t.b='b';
t.c='c';
t.num=456789;
ofstream str("c:\\text.txt");
str.write((char *)&t, sizeof(t)); //it will write a structure of 8 bytes in the file
str.close( );
return 0;
}

This program writes the structure to a file (and it uses padding). Let us suppose that we write another program to read this file in a compiler which doesn’t support padding.

//Program to read structures using a compiler which does not know padding

#include <iostream.h>
#include <fstream.h>
struct test
{
char a,b,c;
long int num;
}t; //compiler assumes ‘t’ as occupying 7 bytes

int main( )
{
ifstream str("c:\\text.txt");
str.read((char *)&t, sizeof(t)); //it reads only 7 bytes.
cout<<t.a<<t.b<<t.c<<t.num;
str.close( );
return 0;
}

The output will be:

abc116937984

The actual contents stored in the file was: abc456789. This happens because the program attempts to read without considering padding, it tends to read the wrong bytes.

To avoid such situations compilers provide a preprocessor directive called ‘pragma pack( )’. Using this directive we can specify how many packing bytes we want the compiler to use. The syntax is:

#pragma pack(n)

where ‘n’ represents the number of bytes for packing.

For example:

#pragma pack(8)

will pack each structure member to 8 bytes while

#pragma pack(1)

is the same as using no padding.

Remember: Compilers that do not use padding (like TC++) will not support the #pragma pack ( ) directive. These compilers always use a packing of 1 and this cannot be changed.

Thus if programmers feel that they will be working across different compilers, they prefer to specify the directive:

#pragma pack(1)

which instructs the compiler to treat the members just as they are (no padding). This directive has to be added along with the #include directive (outside the main ( ) function).

Thus:

#pragma pack(1)
struct test
{
short int s;         // 2 bytes
char c;             // 1 byte
double d;         // 8 bytes
long int i;         // 4 bytes
};

will cause any variable of type ‘test’ to occupy only 15 bytes.

Remember: Padding is a problem mainly when you are working with storing data in files and when you are using two compilers.

End of File

There exists a member function that you can use to identify whether the end of file (EOF) has been reached or not.

int eof( );

This function will return true if the end of file has been reached.

You could test for EOF using something similar to the following:

while( !in.eof( ) )
{
//body of while loop
}

where ‘in’ is an input stream.

Go back to the Contents Page 2