Wednesday, May 02, 2007

Changes to DDL Models in PeopleTools 8.48

A recent thread on Oracle-L led me to look at how Oracle has changed the way that PeopleTools 8.48 collects Oracle Cost-Based Optimiser statistics. It now uses DBMS_STATS instead of the ANALYZE command. This has also caused me to reconsider some options for managing statistics for a PeopleSoft system.

Application Engine programs can collect Cost-Based Optimiser statistics on specific tables by calling the %UpdateStats([,high/low]); PeopleCode macro. This uses one of two DDL models depending on whether the high or low option is specified. However, these DDL models only exist for Oracle and DB2/MVS. %UpdateStats() has no function on other platforms.

This was PeopleSoft’s platform generic solution (before their takeover by Oracle, and before Dynamic Sampling was available in the Oracle database) to the very real problem that occurs when statistics on a working storage or reporting table, that is emptied, repopulated and used during a batch process, do not accurately represent the content of the table and hence cause the optimiser to choose an inappropriate execution plan. PeopleSoft provided a method of refreshing the statistics during the process, and introduced new DDL models because each database platform would have its own command. However, this approach relies upon developers to add the %UpdateStats() macro for every occasion where data has changed sufficiently to require refreshing the statistics. Unfortunately, developers are not always best placed to make that decision. There are still plenty of places in delivered PeopleSoft code where this macro could be usefully added.

Up to PeopleTools 8.47, PeopleSoft delivered two DDL models that used the ANALYZE command. The %UpdateStats(,high) ran a full compute of the table:

ANALYZE TABLE [TBNAME] COMPUTE STATISTICS;

While %UpdateStats(,low) estimated statistics with the default sample size:

ANALYZE TABLE [TBNAME] ESTIMATE  STATISTICS;

From PeopleTools 8.48, these DDL models now call the Oracle supplied PL/SQL package DBMS_STATS. The high option still performs a full compute of statistics.

DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=>[TBNAME], estimate_percent=>1, method_opt=> 'FOR ALL COLUMNS SIZE 1',cascade=>TRUE);

While the low option estimates statistics with the sample size determined by the pseudo-variable. DBMS_STATS.AUTO_SAMPLE_SIZE.

DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=>[TBNAME], estimate_percent=> dbms_stats.auto_sample_size, method_opt=> 'FOR ALL INDEXED COLUMNS SIZE 1',cascade=>TRUE);

So it would appear that PeopleSoft now follow the recommendations that Oracle have been making since version 8i of the database to use DBMS_STATS instead of the ANALYZE command. This is certainly a step in the right direction. It also makes good sense to use the automatic sample size. ESTIMATE_PERCENT defaults to DBMS_STATS.AUTO_SAMPLE_SIZE from Oracle 10g. Previously it was NULL, which caused a full compute.

However, there is a problem.

PeopleSoft have chosen to specify the METHOD_OPT as FOR ALL INDEXED COLUMNS SIZE 1. If you have specified histograms on any of your columns, or generated them automatically with DBMS_STATS, the PeopleSoft command will remove them from indexed columns and will leave any histograms on unindexed columns unchanged, and potentially out of date.

The default in Oracle 9i is FOR ALL COLUMNS SIZE 1. This removes all histograms on all columns, although this is at least the same behaviour as the ANALYZE command.

In Oracle 10g, METHOD_OPT defaults to FOR ALL COLUMNS SIZE AUTO. The Oracle manual states that the database ‘…determines the columns to collect histograms based on data distribution and the workload of the columns’. So, Oracle may remove histograms if it judges that they are not necessary.

I have no hesitation in recommending that value for METHOD the delivered DDL models should be changed. I would suggest using FOR ALL COLUMNS SIZE REPEAT.

Addendum 12.6.2007: I have noticed that the DDL models change again in PT8.48.07. The full compute is now for ALL COLUMNS, but the estimate is still for ALL INDEXED COLUMNS! Closer, but still no cigar!

2 comments :

David Kurtz said...

I am feeling a bit stupid tonight. Having written this blog entry and a corresponding article I completely missed the fact that PeopleSoft are delivering the DDL model for %UpdateStats(,high) with a sample size of 1%. I am sure that this is a bug, and that estimate_percent should really be set to 100, but I can't find anything on Customer Connection.
My thanks to Neil Jarvis for pointing this out.

David Kurtz said...

This is very embarrassing. It is over 7 months since I posted this entry, and only now have I noticed that Oracle have mixed up the estimate and compute DDL models.

Model 4 is used by %UpdateStats(<table>,LOW);
Model 5 is used bt %UpdateStats(<table>,HIGH);

Here is an extract from the trace of a very simply AE program

-- 22.07.03 .(DMK.MAIN.Step01) (SQL)
RECSTATS PSLOCK LOW
/
-- Row(s) affected: 1
/
/
Restart Data CheckPointed
/
COMMIT
/

-- 22.07.06 .(DMK.MAIN.Step02) (SQL)
RECSTATS PSVERSION HIGH
/
-- Row(s) affected: 1
/
/
Restart Data CheckPointed
/
COMMIT
/

In the corresponding Oracle trace file I can find

BEGIN DBMS_STATS.GATHER_TABLE_STATS (ownname=> 'SYSADM', tabname=>'PSLOCK',
estimate_percent=>1, method_opt=> 'FOR ALL COLUMNS SIZE 1',cascade=>TRUE);
END;

BEGIN DBMS_STATS.GATHER_TABLE_STATS (ownname=> 'SYSADM', tabname=>'PSVERSION',
estimate_percent=> dbms_stats.auto_sample_size, method_opt=> 'FOR ALL
INDEXED COLUMNS SIZE 1',cascade=>TRUE); END;

So it would make sense to change the script that delivers the DDL models ddlora.dms to read as follows:

...
INSERT INTO PSDDLMODEL (
STATEMENT_TYPE,
PLATFORMID,
SIZING_SET,
PARMCOUNT,
MODEL_STATEMENT)
VALUES(
:1,
:2,
:3,
:4,
:5)
\
$DATATYPES NUMERIC,NUMERIC,NUMERIC,NUMERIC,CHARACTER
1,2,0,6,$long
CREATE TABLE [TBNAME] ([TBCOLLIST]) TABLESPACE [TBSPCNAME] STORAGE (INITIAL **INIT** NEXT **NEXT** MAXEXTENTS **MAXEXT** PCTINCREASE **PCT**) PCTFREE **PCTFREE** PCTUSED **PCTUSED**;
//
2,2,0,7,$long
CREATE [UNIQUE] **BITMAP** INDEX [IDXNAME] ON [TBNAME] ([IDXCOLLIST]) TABLESPACE **INDEXSPC** STORAGE (INITIAL **INIT** NEXT **NEXT** MAXEXTENTS **MAXEXT** PCTINCREASE **PCT**) PCTFREE **PCTFREE** PARALLEL NOLOGGING;
//
3,2,0,6,$long
CREATE TABLESPACE [TBSPCNAME] DATAFILE '**DIR**[TBSPCNAME].DBF' SIZE **SIZE** DEFAULT STORAGE (INITIAL **INIT** NEXT **NEXT** MAXEXTENTS **MAXEXT** PCTINCREASE **PCT**);
//
4,2,0,0,$long
DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=>[TBNAME], estimate_percent=>dbms_stats.auto_sample_size, method_opt=> 'FOR ALL COLUMNS SIZE AUTO',cascade=>TRUE);
//
5,2,0,0,$long
DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=>[TBNAME], estimate_percent=>1, method_opt=> 'FOR ALL COLUMNS SIZE AUTO',cascade=>TRUE);
//
/
...

I have kept the 1% sample size for the compute model, but there is no reason why you could not choose a larger value, or even 100%. It is really a matter of how long you want to spend analysing tables. The automatic sample size will usually be a very small value, often less than 1%. A higher sample size will only rarely produce statistics that will lead to a better execution plan.

You can see the value that Oracle calculates for auto_sample_size by tracing the dbms_stats call, and looking for the sample clause in the dynamic SQL. Eg.

create global temporary table ...
from "SYSADM"."PS_SET_CNTRL_REC" sample ( .6141997991) t where 1 = 2