KRs 13-18: MVS, PDS compression, index scans and NMIS

You are here: Home > Knowledge Refreshers

KRC++

KR 13: MVS

In MVS we have roughly 3 types of files:

Ordinary files (called sequential files or physical sequential datasets

These are like text files. They basically consist of a group of records. These are stored in the physical storage device in the form of blocks. Each block will contain 1 or more records without any spacing in between 2 records. A file will contain 1 or more blocks with spacing between the blocks.

2. Libraries (called PDS- partitioned datasets)

A PDS is a group of ordinary files clubbed together for convenience (the ordinary files in a PDS are called members). Every library will have a directory which contains the names of all its members and also pointers to those members.

3. VSAM files

You might have heard a lot about catalogs.

Catalog is used to store the name and location of every dataset present on the computer. For the location, it will contain information about which volume (tape cartridge/ disk pack etc.) the dataset is stored in.

So when MVS has to search for a dataset it would go to the catalog, get the volume info and if it is a tape then it would check the label on the tape for that dataset. If the catalog points to a diskpack (DASD), MVS would go to that volume and check the VTOC (volume table of contents) of the DASD. Every diskpack has a VTOC (which is like the table of contents of a book – it contains the names of all datasets stored in it along with the track information). So MVS will get the track of the dataset from the VTOC and then access the dataset.

We have 2 types of catalogs in MVS:

System Catalog
User Catalog.

The system catalog will contain all info related to the operating system (MVS), procedure libraries, OS files etc.

KR 15: PDS compression

We've done compression a lot of times (if we find shortage of space we compress the PDS). But how is it that MVS creates space when it actually says that there is no space? In this edition we’ll take a look at some facts about PDS (also called library) – some of it was covered in an earlier edition:

Members of a PDS are arranged alphabetically.
Generally, all catalogued procedures and load modules are members of PDS.
Though a PDS appears like a Windows folder, the size of a PDS is constant. By adding more members we cannot increase the size of the PDS.
When a member of a PDS is deleted, only an entry is removed from the directory (every PDS has a directory which contains the names of all its members). So, MVS will only delete the entry for this member from the directory. The space occupied by the member is not available for use till we compress the PDS.
Similarly, when we replace a member, the replacing member will be placed at the first available position at the end of the PDS and the space of the replaced member will be unusable till the PDS is compressed.

Did you know that SMS stands for Storage Management Subsytem (more on this in later editions).

KR 16: Index scans

Well, we’ll get back to our topic of indexes. (Note: you’ll understand this edition better if you refresh yourself on the index structure – regd. root pages and leaf pages).

In types of indexed access we’ve already seen the direct look-up method (this is used by our optimizer when we specify the entire index key). It will just start from the root page and go down to the leaf page, retrieve the Record ID (RID) and then retrieve the correct row from the tablespace.

What happens when we don’t specify the entire index key?

Let’s say we have a table with a 3 column index (Project code, Activity code and date). This example table might seem a bit silly.

If we write an SQL which has only the project no. and activity no. then the optimizer cannot make use of the direct look-up method (because the full key is not present). In this case it does a matching index scan (MIS). In this DB2 will try to locate the index leaf page containing rows with the col.1 and column2 combination of our SQL. For this particular combination there could be many rows in the leaf page and DB2 will have to retrieve all such rows.

For ex.: SELECT * from PROJ_T where project_code=’A001’ and act_code=’CODING'

DB2 will start at the root index page, go down to the leaf page containing this combination and then retrieve all rows containing this combination (since this combination could have multiple dates).

For a matching index scan, the optimizer requires a starting point in the root page (i.e. you should specify the higher order column in the index key for the optimizer to use MIS). Why? Without this (in our example the higher-order key is the project_code) the optimizer won’t have a starting point in the root page.

What if we don’t use the higher order column in our SQL?

Then DB2 will go for a non-matching index scan. More on this in the next edition.

KR 17: NMIS

Continuing where we left off…..What if we don’t specify the higher order column of the index in our SQL?

DB2 will go for a non-matching index scan (NMIS) since it doesn’t have any starting point in the index structure (i.e. it can’t go from the root page to the leaf page). In NMIS, the optimizer will decide to scan thro the leaf pages one by one (note: this is part of the index structure. It is not scanning thro the tablespace).

You might wonder what’s the advantage of scanning thro index leaf pages instead of the tablespace itself. Before going into that there are a couple of points to note. There are 2 types of I/Os that are involved in DB2:

1.)     Index I/O (where DB2 has to retrieve index pages).

2.)     Table I/O (where DB2 will access the table to retrieve pages from the tablespace).

I/O operations in general consume time. So, any method which can reduce the number of I/Os is always preferred.

NMIS is sometimes useful:

1.)     If the data we want is part of the index then DB2 doesn’t need to even access the tablespace. For ex.: consider a table that has an index on the columns YEAR, MONTH, DAY. If our SQL were:

            SELECT YEAR from CALENDAR_T where MONTH='02' and DAY='29'

            then DB2 would use NMIS (since we haven’t specified YEAR in our where clause). Now DB2 will scan thro the index leaf pages. But when it finds the matching rows in the index pages, it doesn’t need to do table I/Os to retrieve the whole record (because all we’ve asked for is just the YEAR which is stored in the index itself). Scanning index pages is also faster than scanning thro tablespace pages because the index pages will be smaller than the table pages (this is called index-only access).

2.) If we have a clustered index then NMIS would result in lower number of tablespace I/Os.

Hmm….if you’re wondering what a clustered index means, we’ll take a look at that in our next edition.

Go back to the main contents page