Streams and Files - V


The following topics are covered in this section:


Random Access of Files (File Pointers)

Using file streams, we can randomly access binary files. By random access, you can go to any position in the file as you wish (instead of going in a sequential order from the first character to the last). Earlier in this chapter we discussed about a bookmarker that will keep moving as you keep reading a file. This bookmarker will move sequentially but you can also make it move randomly using some functions. Technically this bookmarker is a file pointer and it determines as to where to write the next character (or from where to read the next character). We have seen that file streams can be created for input (ifstream) or for output (ofstream). For ifstream the pointer is called as ‘get’ pointer and for ofstream the pointer is called as ‘put’ pointer. fstream can perform both input and output operations and hence it has one ‘get’ pointer and one ‘put’ pointer. The ‘get’ pointer indicates the byte number in the file from where the next input has to occur. The ‘put’ pointer indicates the byte number in the file where the next output has to be made. There are two functions to enable you move these pointers in a file wherever you want to:

seekg ( ) - belongs to the ifstream class
seekp ( ) - belongs to the ofstream class

We’ll write a program to copy the string "Hi this is a test file" into a file called mydoc.txt. Then we’ll attempt to read the file starting from the 8th character (using the seekg( ) function).

Strings are character arrays terminated in a null character (‘\0’). If you want to copy a string of text into a character array, you should make use of the function:

strcpy (character-array, text);

to copy the text into the character array (even blank spaces will be copied into the character array). To make use of this function you might need to include the string.h header file.

#include <iostream.h>
#include <fstream.h>
#include <string.h>

int main( )
{
ofstream out("c:/mydoc.txt",ios::binary);
char text[80];
strcpy(text,"Hi this is a test file");
out<<text;
out.close( );
ifstream in("c:/mydoc.txt",ios::binary);
in.seekg(8);
cout<<endl<<"Starting from position 8 the contents are:"<<endl;

while ( !in.eof( ) )
{
char ch;
in.get(ch);
if ( !in.eof( ) )
    {
    cout<<ch;
    }
}

in.close( );
return 0;
}

The output is:

Starting from position 8 the contents are:
is a test file

As you can see, the output doesn’t display, "Hi this " because they are the first 7 characters present in the file. We’ve asked the program to display from the 8th character onwards using the seekg( ) function.

in.seekg(8);

will effectively move the bookmarker to the 8th position in the file. So when you read the file, you will start reading from the 8th position onwards.

The following fragment of code is interesting:

while ( !in.eof( ) )
{
char ch;
in.get(ch);
if ( !in.eof( ) )
    {
    cout<<ch;
    }
}

You might be wondering as to why we need to check for the EOF again using an ‘if’ statement. To understand the reason, try the program by removing the ‘if’ statement. The result will be surprising and interesting. Think over it and you will be able to figure out the logic.

The syntax for seekg( ) or seekp( ) is:

seekg(position, ios::beg)
seekg(position, ios::cur)
seekg(position, ios::end)

By default (i.e. if you don’t specify ‘beg’ or ‘cur’ or ‘end’) the compiler will assume it as ios::beg.

Just like we have 2 functions to move the bookmarker to different places in the file, we have another 2 functions that can be used to get the present position of the bookmarker in the file.

You would think that the value returned by tellg ( ) and tellp ( ) are integers. They are like integers but they aren’t. The actual syntax for these functions will be:

streampos tellg ( );

where streampos is an integer value that is defined in the compiler (it is actually a typedef).

Of course you can say:

int position = tellg ( );

Now, the variable ‘position’ will have the location of the bookmarker. But you can also say:

streampos position = tellg( );

This will also give the same result. ‘streampos’ is defined internally by the compiler specifically for file-streams.

Similarly, the syntax of seekg ( ) and seekp ( ) was mentioned as:

seekg(position, ios::beg)

Again in the above syntax, ‘position’ is actually of type ‘streampos’.


Sequential and Random Access Files

Basically variables are used for temporary storage and files are used for permanent storage of data. Based on how files are accessed, they can be divided into sequential and random access files. Actually this division of files depends on how we read and write to files (physically the file is stored as a sequence of bytes in memory).

All data is represented in the form of bits. A set of 8 bits (or a byte) can be used to represent one character. A set of similar bytes will form a ‘field’. A set of related fields will form a record and a file consists of a set of related records. Let us suppose that a University maintains a database consisting of its student’s details.

Fields and records are the terms used when we deal with files. Records are equivalent to ‘structures’ or ‘objects’ in C++.

Usually when storing such data (like a student record as shown above), the programmer will use one unique ‘key field’. A ‘key field’ is the field which can be used to identify or locate a particular record in the file. For example in the above diagram, the student ID number will be the key field (thus if we want to access the details about the student Ajay then we can just refer to student ID number 1). The key field should be something unique (i.e. no two records should have the same key field).

Usually we write and read records from a file. Sequential access files are the simplest way of organizing a file. In sequential access files we write the variables continuously one after the other. The length of each record isn’t fixed and can vary (i.e. each record needn’t occupy the same amount of memory). The advantage of this is that we do not waste any memory. The disadvantage is that if you want to access the 3rd record stored in the file, you will have to read the first two records before accessing the third (i.e. you cannot directly jump to the third record). The reason for this is because in sequential access files the record length is not fixed and you cannot predict as to where the third record might be stored. This leads to a few other problems. It is not possible to directly insert a new data in the middle of the file. If a new record has to be inserted, the old record has to be copied into a new file (up to the point where you want to insert), then the new entry should be added to the new file and then the remaining records from the old file have to be copied to the new one. You cannot directly update/modify a record in sequential access files. Let us suppose that we have a disk file containing the data:

This file has been stored sequentially and maybe the name Ajay needs to be modified to Williams. If you attempt to overwrite the existing record the resultant will be:

1 Williams2 Suresh 88

Because data is stored continuously in a sequential file, if the modified entry you make is longer in length than the existing entry then the neighbouring field will get overwritten.

Random access files overcome this problem since they have fixed length records. The problem here is that even if we want to store a small sized record we still have to occupy the entire fixed record length. This leads to wastage of some memory space. For example if we are using 10 bytes to store a complete sentence in the file then even if you want to store a single letter (like ‘a’) 10 bytes will also be used up for this. But even though some memory space is wasted this method will speed up access time (because now we know where each record is stored. If a record length is fixed as 10 bytes, then the fifth record will start at byte number 50 and it is easier to jump directly to that location instead of reading the first four records before accessing the fifth).

Word processing program usually store files in a sequential format while database management programs store files in a random access format. A simple real life analogy: Audio tapes are accessed sequentially while audio CDs (Compact Discs) are accessed randomly.

So, how do we create sequential and random access files in C++? Actually we have already covered both these topics without explicitly using the terms sequential and random access. Whenever you make use of the ‘read’ and ‘write’ functions to write structures/objects to a file, you are actually creating a random access file (because every record will have the size of the structure). Whenever you use the << and >> operator to read and write to disk files, you are accessing the file sequentially (this was the first example program). Whenever you write to a stream (or a file) using << operator, you are writing varying length records to the file. For example: You might first write a string of 10 characters followed by an integer. Then you may write another string of 20 characters followed by a ‘double’. Thus the records are all of varying lengths.

To effectively use random access files we make use of the seekp( ) and seekg ( ) functions. Though these can be used on sequential files it will not be very useful in sequential access files (because when you are searching for a data you are forced to read each and every character/byte, whereas in random access files you can jump to the particular record that you are interested in).


Command Line Arguments

You know that functions can have arguments. You also know that main ( ) is a function. In this section we'll take a look at how to pass arguments to the main function. Usually filenames are passed to the program.

First of all, let us suppose that we have a file by the name marks.cpp. From this file we make an exe file called marks.exe. This is an executable file and you can run it from the command prompt. The command prompt specifies what drive and directory you are currently in. The command prompt can be seen as the ms-dos prompt.

C:\WINDOWS>

This denotes that you are currently in C drive and in the directory named Windows. (By the way, if you want to go to the MS DOS command prompt from Windows, just go to "Start" and click on "Run". Type "command" in the text box and click "Ok").

Your marks.exe program is in this directory (let us assume it is here. If it isn’t in this directory then you have to change to that particular directory). To run the program you will type:

C:\WINDOWS> marks name result

You must be thinking that we will type only the name of the program? In this case the C++ program that you wrote is assumed to have arguments for the main ( )function (i.e. in the marks.cpp file you have provided arguments for the main( ) function):

int main (int argc, char * argv[ ] )
argc (the first argument - argument counter) stands for the number of arguments passed from the command line.
argv (argument vector) is an array of character type that points to the command line arguments.
In our example, the value of ‘argc’ is 3 (marks, name, result). Hence for ‘argv’ we have an array of 3 elements. They are:

argv[0] which is marks.
argv[1] which is name
argv[2] which is result

Note: argv[0] will be the name that invokes the program (i.e. it is the name of the program that you have written).

If you feel a little vague in this section don't worry. In the next section we'll take a look at a simple program.

A program using Command Line Arguments

// This file is named test.cpp

#include <iostream.h>

int main ( int argc, char * argv[ ] )
{
cout<<"The value of argument counter (argc) is: "<<argc;
int i;
for ( i = 0 ; i<argc ; i ++ )
{
cout<<endl<<argv[i];
}
return 0;
}

Save the file as test.cpp. Compile it and then make the executable file (test.exe). If you run test.exe from Windows (i.e. by just double clicking on the file),

the output will be as follows:

The value of argument counter (argc) is: 1
c:\windows\test.exe

This will be the output since you didn't specify the arguments. To specify the arguments you have to go to DOS prompt. From there type:

c:\windows>test one two three
You have to go to the folder in which you have the test.exe file (I assume that your program is in the windows directory in C drive).

The output will be:

The value of argument counter (argc) is: 4
c:\windows\t.exe
one
two
three

There are numerous things you can do with this. You can pass the names of files upon which you want the C++ program to operate, or you could pass the name of a file that you want your program to create, etc. Depending on your application, you can make use of the arguments. For example: if you write a program for zipping files, then the arguments can be used to specify the files that you want to zip.


Recap


Go back to the Contents Page 2


Copyright © 2004 Sethu Subramanian All rights reserved.