You are here: Home > Knowledge Refreshers

KR-9: Clarifications and SQL

 

There were some doubts on partitioned Tablespace and segmented tablespace. This is discussed below: 

Clarification (this is a preliminary discussion – I’ll update more on it after a little more research). 

Basically a partitioned tablespace is used when your table has a large number of records (around millions of records). In this case the entire tablespace will have only rows pertaining to this huge table. The tablespace itself is divided into partitions. Let’s say that we have a table containing info about all the customers within a state. If our applications are going to work on the data on a city-by-city basis then each of the partitions will contain records for a particular city within the state (i.e. Partition 1 might have all records of customers in Pune while Partition 2 may have all Mumbai records). The bottom line is that there should be some parameter which you will use to store records in the partition. 

A segemented tablespace is used for medium sized tables. The tablespace is divided into segments and each segment will contain rows belonging to ONE table (A segment cannot have rows from 2 tables – this would make it a simple tablespace). But segment number 1 might have rows from Table 1 while segment 2 can have rows from Table 2. 

To summarize: Every segment will contain rows of 1 table only. If the table is large it could occupy many segments. Different segments can hold different tables. 

An entire partitioned tablespace will hold only 1 table (every partition in the tablespace will have rows of the same table but the entire table is divided into partitions based on some criteria). 

Note: Partitioned tablespaces need a clustered index (I’ll discuss this in a later edition). 

Before continuing with the discussion on optimizer, it would be good to know some interesting points about SQLs: 

  • In an SQL ensure that you use equivalent data types (do not compare a char(6) with a char(5) column). Do not use a host variable whose precision is different from the precision of the column you are comparing with. Whenever DB2 has to convert data it will not use the index. 
  • Do not assume the order of the output records returned by an SQL. Depending on the access path chosen by the optimizer the order could vary. Use ORDER BY clause if you want the output ordered. 
  • BETWEEN is more efficient than using the combination of >= and <= since the optimizer chooses an efficient path when it encounters a BETWEEN.     
  • If possible use IN instead of LIKE. 
  • If a % of _ appear at the beginning avoid using LIKE (because in this case DB2 will not use the index). For ex: avoid this statement - SELECT * from CUSTOMER where cust_name like ‘%on’. 
  • Avoid using the <> (NOT operator) if you can make use of a >= or a <= operator. 

    For ex: Instead of SELECT * from STUDENTS where average <> 0 

    use SELECT * from STUDENTS where average > 0 

Of course this can be used if you are really sure that no student will have a negative average. It is important to know your data before using such statements. 


KR-10: More SQLs



Some more stuff on SQLs: 
  •     Code the most restrictive predicate first in your SQL (i.e. the condition which you feel will be able to filter out more rows).

        · Are lots of predicates good for the optimizer?
  • It might appear that many predicates are good but this need not be the case. If you lower the number of predicates then the bind time can be reduced; the work of the optimizer might get reduced since it needn’t worry about too many conditions. But removing predicates may alter the data access path also (and it could lead to inefficient queries). The key is to strike a balance. Do not use predicates unnecessarily. If a predicate is not going to filter out rows then don’t use it. Again, it is important that you know your data so that you can use that knowledge to optimize the use of predicates.
  • Avoid using arithmetic expressions in predicates because DB2 may not use indexes. Do the arthimetic expression before the SQL, store the value in the host variable and then use the host variable within the SQL. 
  • If you can predict the number of rows that will be returned by a cursor then use the clause OPTIMIZE FOR n ROWS within your cursor (This will help DB2 take a better access path). But if the cursor returns too many or too few rows than ‘n’ then the cursor will become inefficient (use it if you have an idea of what you’ll get). 

Joins: 

  • Use predicates to reduce the number of rows being joined. 
  • Try to join based on the index columns. 
  • When using ORDER BY in a join, limit the columns used in the order by clause to columns of one table (this will reduce the burden of sorting on DB2). 

Some more info on partitioned tablespaces: 

  • They can store more than 64 GB of data (64GB is the restriction on segmented tablespaces). 
  • You can start and stop the partitions individually (increases the availability/ concurrency). 
  • In a partitioned tablespace scan, entire partitions can be skipped based on the predicate (ex: if you have partitoned your tablespace such that customers of each city are stored in one partition; and then if you query based on a particular city DB2 can skip other partitions). 

KR-11: binding, load modules etc.



I was thinking of changing topic but found something interesting (regd. Compiling, binding etc.)…. 

The process of executing a COBOL-DB2 program is as follows (let’s assume that we have a COBOL code containing SQLs): 
  1. Precompiler: The first step is to pass the code through the DB2 pre-compiler. This will comment out the SQL’s, put in some DB2 invoking statements in the code and send the code to the COBOL compiler (this is called the modified source code). In this stage the DCLGENs and use of host variables will be checked (if you specify DCLGENs). But the precompiler will not check as to whether the tables you have used exist/ whether you have used the columns correctly (i.e. the precompiler does not check with DB2). 
  2. DBRM: The precompiler will also produce a DBRM (Database Request Module) which will contain info about how your program is going to use DB2. The precompiler then puts a timestamp in the modified source code and in the DBRM. 
  3. Load Module: A source code has to be compiled and linked to produce the load module (i.e. the executable file). The same process happens with our modified source code. But this load module cannot be executed without being combined with the DBRM because the modified source code will have DB2 calls which haven’t been resolved (the SQL stuff is still in the DBRM). This process of combining your load module and DBRM is called binding. 
  4. Binding: Authority is checked for (whether you have permission to use the tables), the existence of the tables and columns is also checked here (if you make a mistake in your SQL, you might have noticed that we get a bind error indicating the incorrect part of the SQL). Also an efficient way to access the data is decided at bind time.
  5. While binding it will also expect the timestamp on the load module and the DBRM to match (because the binder has to bind the correct DBRM with the correct source code). Once this is done, the execution plan is created. Now our load module can be executed using this plan. 

Have you ever wondered as to what IKJEFT01, IKJEFT1B, PC.SYSLIB, PC.SYSIN, COB.SYSLIB, DSN are? (answers to that in the next edition!). 


Knowledge Refresher 12: (IKJEFT01, IGYCRCTL etc...)


This edition is all about IKJFET01, IKJEFT1B and all those strange words used in our cobol JCLs… 

IKJEFT01 is a generic program name for the TSO Terminal Monitor program (called TMP). All DB2 batch jobs should run in TMP. This program doesn’t return codes for system errors and user abends. IKJEFT1B is similar to IKJEFT01 but returns codes which can be checked out in the JES spool. 

To run DB2 commands in TMP we need to issue the DSN command (this command starts the DSN command processor). Once the DSN command processor has been started we can issue commands like BIND PLAN, RUN etc. We can also specify some other BIND options over here. To stop the DSN processor we use the command END. (You would have noticed this in our COBOL codes where we first give DSN, followed by some BIND PLAN statement, followed by RUN and finally the END). 

DSNHCOB2 is a catalogued procedure to precompile, compile and link the COBOL code. This proc uses 3 programs: DSNHPC (for precompiling), IGYCRCTL (for compiling) and IEWL (for linking). The proc DSNHCOB2 is present in SYS1.PROCLIB. On some systems they may not use the proc but will use the 3 programs within the JCL to perform the same set of functions. 

What about PC.SYSLIB, PC.DBRMLIB etc…? The figure below should explain that: 


                       

What does the above diagram mean? For the precompiler step, the DCLGEN’s should be present in the path specified by the DD name PC.SYSLIB. The source code should be present in PC.SYSIN (or in the JCL you may have a step in the PROC called PC which will have DD names of SYSLIB, SYSIN etc). 

    Go back to the main contents page