Timing: rowsource statistics. Part 2: Overhead and inconsistent time
Timing sampling frequency:
Number of calls getting timestamp depends of parameter _rowsource_statistics_sampfreq (default is 128).
- If this parameter is set to 0, there are no time calculations in rowsource statistics. The functions qerstSnapStats()/qerstUpdateStats()do not get timestamp
- If this parameter is set to 1, time is calculated always. Every pair qerstSnapStats()/qerstUpdateStats() gets timestamp (this is the case in the excerpts above).
- If this parameter is set to N (as default value 128) then timestamp will be got every N tuples. It means that only N call of qerstSnapStats()/qerstUpdateStats() on some rowsource level will get timestamp.
- There is special value 3 which enables gathering only rowcounts.
Pay attention on the case when _rowsource_statistics_sampfreq = N.
If this parameter is set to N (as default value 128) then timestamp will be got every N tuples. It means that only N call of qerstSnapStats()/qerstUpdateStats() on some rowsource level will get timestamp. Consequenced calls (for example next 127 times) of qerstSnapStats()/qerstUpdateStats() use that calculated time. In this case reported time of some rowsources is not real, but something like forecasted. As a result in this case we can find inconsistent time of some steps of an execution plan, when reported time of child steps more then time of parent steps.
From the other hand the most accurate (_rowsource_statistics_sampfreq = 1) leads to increased number of additional calls of the function getting timestamp, as a result increased time for execution of a query.
These are examples of the Digger’s output of the Digger for the same query
select --+ index(tbl) count(pad) from tbl; ----------------------------------------------------- | Id | Operation | Name | Rows | ----------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | SORT AGGREGATE | | 1 | | 2 | TABLE ACCESS BY INDEX ROWID| TBL | 50 | | 3 | INDEX FULL SCAN | IDX | 50 | -----------------------------------------------------
but with different _rowsource_statistics_sampfreq: 0 (timing disabled), 1 (the most accurate timing), 2, 4, 8, 16, 128
special case: _rowsource_statistics_sampfreq = 3 (only rowcounts)
Thus, we can separate three levels of details of rowsource statistics managed by the parameter _rowsource_statistics_sampfreq:
1: number of rows returned by rowsource
2: (1) + statistics as consistent gets, current gets, starts (number of times an execution plan step has been executed), etc
3: (1) + (2) + timing
As we know from 10g there are 3 ways how to enable gathering of rowsource statistics:
1) set parameter statistics_level to “all” on session or system level
alter session set statistics_level=all
By this another hidden parameter parameter _rowsource_execution_statistics is set into TRUE, it enables gathering of rowsource statistics.
- statistics_level=all also sets parameter timed_os_statistics.
- This is the only way to enable rowsource statistics gathering in 9i (except actual rows which are populated by enabled sql trace without statistics_level=all).
Setting the parameter without sql trace enables gathering of rowsource statistics only for CBO (optimizer modes all_rows/first_rows), and the only way to get the statistics – to query V$SQL_PLAN_STATISTICS[_ALL].
Setting the parameter with sql trace enables gathering of rowsource statistics for queries with any optimizer mode. In this case the subset statistics also can be printed in trace file. SQL trace without parameter statistics_level=all gathers only rowcounts.
In this case _rowsource_statistics_sampfreq is used only if the parameter has been set explicitly.
Otherwise using value is 1.
SQL> select ksppinm name, 2 ksppstdvl value, 3 ksppstdf IsDefault, 4 decode(bitand(ksppstvf, 5), 1, 'TRUE', 0, 'FALSE') IsSetExplicitly 5 from x$ksppi x, x$ksppcv y 6 where (x.indx = y.indx) 7 and ksppinm = '_rowsource_statistics_sampfreq'; NAME VALUE ISDEFAULT ISSETEXPLICITLY ----------------------------------- ---------- ---------- -------------------- _rowsource_statistics_sampfreq 128 TRUE FALSE SQL> select count(*) from v$parameter where name ='_rowsource_statistics_sampfreq'; COUNT(*) ---------- 0
Here although the value of the parameter still be equal 128, the value using for frequency of getting timestamps will be 1.
If _rowsource_statistics_sampfreq is set manually then using value is _rowsource_statistics_sampfreq.
SQL> alter session set "_rowsource_statistics_sampfreq"=128; Session altered. SQL> select ksppinm name, 2 ksppstdvl value, 3 ksppstdf IsDefault, 4 decode(bitand(ksppstvf, 5), 1, 'TRUE', 0, 'FALSE') IsSetExplicitly 5 from x$ksppi x, x$ksppcv y 6 where (x.indx = y.indx) 7 and ksppinm = '_rowsource_statistics_sampfreq'; NAME VALUE ISDEFAULT ISSETEXPLICITLY ----------------------------------- ---------- ---------- -------------------- _rowsource_statistics_sampfreq 128 TRUE TRUE SQL> select count(*) from v$parameter where name ='_rowsource_statistics_sampfreq'; COUNT(*) ---------- 1
In this case the value using for getting timestamps will be 128.
- Notice that hidden parameters set explicitly appear in v$parameter
It is possible to set _rowsource_execution_statistics=true inside a session with statistics_level=typical manually.
In such case (and in cases below) real frequence of getting timestamps is equal to parameter _rowsource_statistics_sampfreq regardless how it was set (explicitly or not).
2) hint gather_plan_statistics
it enables rowsource statistics for a query.
In this case just _rowsource_statistics_sampfreq is used.
3) sql trace (10046 event)
In this case just _rowsource_statistics_sampfreq is used.
Keep in mind that in this case it is possible to see rowsource statistics not only via trace file or tkprof report, but also using dbms_xplan.display_cursor, it is much more convenient.
- In each of these three cases optimizer environment (v$[sql/ses/sys]_optimizer_env) property sqlstat_enabled set to true.
Timing: overhead
Let’s see how and why different precise of timing calculation (_rowsource_statistics_sampfreq) affects amount of CPU and execution time of a query.
I will run a query with different _rowsource_statistics_sampfreq: default, 0, 128, 16, 8, 4, 2, 1.
In testcases below I will use simple table in 10 millions of rows and with an index
create table test(id not null, pad) as select rownum as id, '*' as pad from dual connect by level <= 10000000; create index idx on test(id); SQL> exec dbms_stats.gather_table_stats(user, 'TEST', estimate_percent => 100, cascade => true); PL/SQL procedure successfully completed
and two queries over the table: one with Full Table Scan and another with Index Full Scan.
All runs on two versions – 10g and 11g.
In all cases I will use SQL tracing, because of easy access to consumed CPU instead of just rely on elapsed time.
In 10g sql trace for the session, in 11g SQL trace for exact SQL statement
alter session set events 'sql_trace[sql:9pzmsyh5t14bb]';
After all queries will be run and traced, I again run the same queries, but during each run of the SQL query I measured number of timer calls (gethrtime). I did it after previous run in order to avoid measurement (dtrace) overhead, so CPU time in the trace results in the table below does not contain dtrace overhead.
So, the first query with Full Table Scan:
select count(pad) from test; -------------------------------------------- | Id | Operation | Name | Rows | -------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | SORT AGGREGATE | | 1 | | 2 | TABLE ACCESS FULL| TEST | 10M| --------------------------------------------
Result of the test are below: (all tkprof reports are made with option aggregate=no)
10g Full Table Scan: full script, output, raw trace, tkprof report
11g Full Table Scan: full script, output, raw trace, tkprof report
- Notice interesting detail: such form of SQL tracing:
alter session set events 'sql_trace[sql:9pzmsyh5t14bb]'
does not trace parse calls
freq | 10g | 11g | ||
cpu | timer calls | cpu | timer calls | |
default (128) | 0,66 | 234 392 | 0,65 | 156 262 |
0 | 0,59 | 5 | 0,57 | 5 |
128 | 0,66 | 234 392 | 0,64 | 156 262 |
16 | 1,13 | 1 875 017 | 1,02 | 1 250 015 |
8 | 1,64 | 3 750 020 | 1,32 | 2 500 017 |
4 | 2,50 | 7 500 019 | 2,02 | 5 000 019 |
2 | 4,44 | 15 000 026 | 3,41 | 10 000 025 |
1 | 8,18 | 30 000 027 | 6,28 | 20 000 033 |
And the same actions for changed query and Index Full Scan
select --+ index(test) count(pad) from test; ----------------------------------------------------- | Id | Operation | Name | Rows | ----------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | SORT AGGREGATE | | 1 | | 2 | TABLE ACCESS BY INDEX ROWID| TEST | 10M| | 3 | INDEX FULL SCAN | IDX | 10M| -----------------------------------------------------
10g Index Full Scan: full script, output, raw trace, tkprof report
11g Index Full Scan: full script, output, raw trace, tkprof report
freq | 10g | 11g | ||
cpu | timer calls | cpu | timer calls | |
default (128) | 3,49 | 937 531 | 2,78 | 312 519 |
0 | 3,28 | 5 | 2,62 | 7 |
128 | 3,52 | 937 534 | 2,69 | 312 519 |
16 | 5,24 | 7 500 036 | 3,32 | 2 500 021 |
8 | 7,18 | 15 000 038 | 4,31 | 5 000 025 |
4 | 11,25 | 30 000 044 | 5,28 | 10 000 029 |
2 | 19,01 | 60 000 056 | 7,92 | 20 000 041 |
1 | 34,03 | 120 000 067 | 13,06 | 40 000 057 |
We can see that _rowsource_statistics_sampfreq increases number of additional timer calls which obviously increases CPU consumption.
Also we can see that default value 128 does not lead to significant overhead
Pay attention aslo on:
– in the Full Table Scan case number of additional timer calls in 10g is in 1.5 times more than in 11g. We saw the reason above in section “How it works” – qerstUpdateStats() calls gethrtime() twice
– in the Index Full Scan case number of additional timer calls in 10g is in 3(!) times more than in 11g. Why?
Let’s try to take a look at where, in what call stack, gethrtime() were called.
dtrace -p <PID> -n 'pid$target:libc.so.1:gethrtime:entry{@tim[ustack()]=count();}' > report.txt
Excerpts:
10g:
... libc.so.1`gethrtime oracle`qerstUpdateStats+0x2fd oracle`qerstFetch+0x152 oracle`qertbFetchByRowID+0x271 oracle`qerstFetch+0xf0 oracle`qergsFetch+0x16c oracle`qerstFetch+0xf0 oracle`opifch2+0xa4e oracle`kpoal8+0xdb7 oracle`opiodr+0x433 oracle`ttcpip+0x46a oracle`opitsk+0x52d oracle`opiino+0x3f0 oracle`opiodr+0x433 oracle`opidrv+0x2f1 oracle`sou2o+0x5b oracle`opimai_real+0x84 oracle`main+0x64 oracle`0xdba2dc 9999999 libc.so.1`gethrtime oracle`qerstUpdateStats+0x2fd oracle`qerstRowP+0x3d oracle`qerixtFetch+0x47d oracle`qerstFetch+0xf0 oracle`qerpfFetch+0x102 oracle`qerstFetch+0x12a oracle`qertbFetchByRowID+0x271 oracle`qerstFetch+0xf0 oracle`qergsFetch+0x16c oracle`qerstFetch+0xf0 oracle`opifch2+0xa4e oracle`kpoal8+0xdb7 oracle`opiodr+0x433 oracle`ttcpip+0x46a oracle`opitsk+0x52d oracle`opiino+0x3f0 oracle`opiodr+0x433 oracle`opidrv+0x2f1 oracle`sou2o+0x5b 9999999 libc.so.1`gethrtime oracle`qerstUpdateStats+0x2fd oracle`qerstRowP+0x3d oracle`qertbFetchByRowID+0x88f oracle`qerstFetch+0xf0 oracle`qergsFetch+0x16c oracle`qerstFetch+0xf0 oracle`opifch2+0xa4e oracle`kpoal8+0xdb7 oracle`opiodr+0x433 oracle`ttcpip+0x46a oracle`opitsk+0x52d oracle`opiino+0x3f0 oracle`opiodr+0x433 oracle`opidrv+0x2f1 oracle`sou2o+0x5b oracle`opimai_real+0x84 oracle`main+0x64 oracle`0xdba2dc 9999999 libc.so.1`gethrtime oracle`qerstUpdateStats+0x31e oracle`qerstFetch+0x152 oracle`qertbFetchByRowID+0x271 oracle`qerstFetch+0xf0 oracle`qergsFetch+0x16c oracle`qerstFetch+0xf0 oracle`opifch2+0xa4e oracle`kpoal8+0xdb7 oracle`opiodr+0x433 oracle`ttcpip+0x46a oracle`opitsk+0x52d oracle`opiino+0x3f0 oracle`opiodr+0x433 oracle`opidrv+0x2f1 oracle`sou2o+0x5b oracle`opimai_real+0x84 oracle`main+0x64 oracle`0xdba2dc 9999999 libc.so.1`gethrtime oracle`qerstUpdateStats+0x31e oracle`qerstRowP+0x3d oracle`qerixtFetch+0x47d oracle`qerstFetch+0xf0 oracle`qerpfFetch+0x102 oracle`qerstFetch+0x12a oracle`qertbFetchByRowID+0x271 oracle`qerstFetch+0xf0 oracle`qergsFetch+0x16c oracle`qerstFetch+0xf0 oracle`opifch2+0xa4e oracle`kpoal8+0xdb7 oracle`opiodr+0x433 oracle`ttcpip+0x46a oracle`opitsk+0x52d oracle`opiino+0x3f0 oracle`opiodr+0x433 oracle`opidrv+0x2f1 oracle`sou2o+0x5b 9999999 libc.so.1`gethrtime oracle`qerstUpdateStats+0x31e oracle`qerstRowP+0x3d oracle`qertbFetchByRowID+0x88f oracle`qerstFetch+0xf0 oracle`qergsFetch+0x16c oracle`qerstFetch+0xf0 oracle`opifch2+0xa4e oracle`kpoal8+0xdb7 oracle`opiodr+0x433 oracle`ttcpip+0x46a oracle`opitsk+0x52d oracle`opiino+0x3f0 oracle`opiodr+0x433 oracle`opidrv+0x2f1 oracle`sou2o+0x5b oracle`opimai_real+0x84 oracle`main+0x64 oracle`0xdba2dc 9999999 libc.so.1`gethrtime oracle`qerstSnapStats+0x44 oracle`qerstRowP+0x29b oracle`qerixtFetch+0x47d oracle`qerstFetch+0xf0 oracle`qerpfFetch+0x102 oracle`qerstFetch+0x12a oracle`qertbFetchByRowID+0x271 oracle`qerstFetch+0xf0 oracle`qergsFetch+0x16c oracle`qerstFetch+0xf0 oracle`opifch2+0xa4e oracle`kpoal8+0xdb7 oracle`opiodr+0x433 oracle`ttcpip+0x46a oracle`opitsk+0x52d oracle`opiino+0x3f0 oracle`opiodr+0x433 oracle`opidrv+0x2f1 oracle`sou2o+0x5b 10000000 libc.so.1`gethrtime oracle`qerstSnapStats+0x44 oracle`qerstRowP+0x29b oracle`qertbFetchByRowID+0x88f oracle`qerstFetch+0xf0 oracle`qergsFetch+0x16c oracle`qerstFetch+0xf0 oracle`opifch2+0xa4e oracle`kpoal8+0xdb7 oracle`opiodr+0x433 oracle`ttcpip+0x46a oracle`opitsk+0x52d oracle`opiino+0x3f0 oracle`opiodr+0x433 oracle`opidrv+0x2f1 oracle`sou2o+0x5b oracle`opimai_real+0x84 oracle`main+0x64 oracle`0xdba2dc 10000000 libc.so.1`gethrtime oracle`qerstSnapStats+0x44 oracle`qerstFetch+0x48 oracle`qertbFetchByRowID+0x271 oracle`qerstFetch+0xf0 oracle`qergsFetch+0x16c oracle`qerstFetch+0xf0 oracle`opifch2+0xa4e oracle`kpoal8+0xdb7 oracle`opiodr+0x433 oracle`ttcpip+0x46a oracle`opitsk+0x52d oracle`opiino+0x3f0 oracle`opiodr+0x433 oracle`opidrv+0x2f1 oracle`sou2o+0x5b oracle`opimai_real+0x84 oracle`main+0x64 oracle`0xdba2dc 10000001 libc.so.1`gethrtime oracle`qerstSnapStats+0x44 oracle`qerstFetch+0x48 oracle`qerpfFetch+0x102 oracle`qerstFetch+0x12a oracle`qertbFetchByRowID+0x271 oracle`qerstFetch+0xf0 oracle`qergsFetch+0x16c oracle`qerstFetch+0xf0 oracle`opifch2+0xa4e oracle`kpoal8+0xdb7 oracle`opiodr+0x433 oracle`ttcpip+0x46a oracle`opitsk+0x52d oracle`opiino+0x3f0 oracle`opiodr+0x433 oracle`opidrv+0x2f1 oracle`sou2o+0x5b oracle`opimai_real+0x84 oracle`main+0x64 10000001 libc.so.1`gethrtime oracle`qerstUpdateStats+0x134 oracle`qerstFetch+0x152 oracle`qerpfFetch+0x102 oracle`qerstFetch+0x12a oracle`qertbFetchByRowID+0x271 oracle`qerstFetch+0xf0 oracle`qergsFetch+0x16c oracle`qerstFetch+0xf0 oracle`opifch2+0xa4e oracle`kpoal8+0xdb7 oracle`opiodr+0x433 oracle`ttcpip+0x46a oracle`opitsk+0x52d oracle`opiino+0x3f0 oracle`opiodr+0x433 oracle`opidrv+0x2f1 oracle`sou2o+0x5b oracle`opimai_real+0x84 oracle`main+0x64 10000001 libc.so.1`gethrtime oracle`qerstUpdateStats+0x155 oracle`qerstFetch+0x152 oracle`qerpfFetch+0x102 oracle`qerstFetch+0x12a oracle`qertbFetchByRowID+0x271 oracle`qerstFetch+0xf0 oracle`qergsFetch+0x16c oracle`qerstFetch+0xf0 oracle`opifch2+0xa4e oracle`kpoal8+0xdb7 oracle`opiodr+0x433 oracle`ttcpip+0x46a oracle`opitsk+0x52d oracle`opiino+0x3f0 oracle`opiodr+0x433 oracle`opidrv+0x2f1 oracle`sou2o+0x5b oracle`opimai_real+0x84 oracle`main+0x64 10000001
11g:
... libc.so.1`gethrtime oracle`qerstUpdateStats+0x206 oracle`qerstRowP+0x4e oracle`qertbFetchByRowID+0xa4e oracle`qerstFetch+0x3ea oracle`qergsFetch+0x7c3 oracle`qerstFetch+0x3ea oracle`opifch2+0x8a4 oracle`kpoal8+0xfb9 oracle`opiodr+0x3a7 oracle`ttcpip+0x4f7 oracle`opitsk+0x608 oracle`opiino+0x335 oracle`opiodr+0x3a7 oracle`opidrv+0x2e6 oracle`sou2o+0x58 oracle`opimai_real+0x20a oracle`ssthrdmain+0x125 oracle`main+0xd4 oracle`0x15d18bc 9999999 libc.so.1`gethrtime oracle`qerstUpdateStats+0x206 oracle`qerstFetch+0x1a2 oracle`qertbFetchByRowID+0x4c7 oracle`qerstFetch+0x3ea oracle`qergsFetch+0x7c3 oracle`qerstFetch+0x3ea oracle`opifch2+0x8a4 oracle`kpoal8+0xfb9 oracle`opiodr+0x3a7 oracle`ttcpip+0x4f7 oracle`opitsk+0x608 oracle`opiino+0x335 oracle`opiodr+0x3a7 oracle`opidrv+0x2e6 oracle`sou2o+0x58 oracle`opimai_real+0x20a oracle`ssthrdmain+0x125 oracle`main+0xd4 oracle`0x15d18bc 9999999 libc.so.1`gethrtime oracle`qerstRowP+0x25d oracle`qertbFetchByRowID+0xa4e oracle`qerstFetch+0x3ea oracle`qergsFetch+0x7c3 oracle`qerstFetch+0x3ea oracle`opifch2+0x8a4 oracle`kpoal8+0xfb9 oracle`opiodr+0x3a7 oracle`ttcpip+0x4f7 oracle`opitsk+0x608 oracle`opiino+0x335 oracle`opiodr+0x3a7 oracle`opidrv+0x2e6 oracle`sou2o+0x58 oracle`opimai_real+0x20a oracle`ssthrdmain+0x125 oracle`main+0xd4 oracle`0x15d18bc 10000000 libc.so.1`gethrtime oracle`qerstFetch+0x234 oracle`qertbFetchByRowID+0x4c7 oracle`qerstFetch+0x3ea oracle`qergsFetch+0x7c3 oracle`qerstFetch+0x3ea oracle`opifch2+0x8a4 oracle`kpoal8+0xfb9 oracle`opiodr+0x3a7 oracle`ttcpip+0x4f7 oracle`opitsk+0x608 oracle`opiino+0x335 oracle`opiodr+0x3a7 oracle`opidrv+0x2e6 oracle`sou2o+0x58 oracle`opimai_real+0x20a oracle`ssthrdmain+0x125 oracle`main+0xd4 oracle`0x15d18bc 10000001
As we can see in 10g additional rowsource level appears – qerpfFetch (prefetch), which also wrapped by qerstSnapStats/qerstUpdateStats functions. As the result number of calls in three times (1.5*2) more than in 11g.
Amount of overhead depends on number of these additional calls of the function getting timestamp and also it depends on implementation of this function, that is in different OS overhead can be different.
Mentioned calls gethrtime() in Solaris or gettimeofday() in Linux are syscalls. But these are special type of syscalls, these syscalls are implemented as syscalls with minimal overhead. Such syscalls in Solaris are called “fast trap syscalls” in Linux vsyscalls/VDSO (only 64 bit architectures).
- more about vsyscall/VDSO in Linux:
– vsyscalls and the vDSO,
– gettimeofday speedup
It is why we do not see it as real syscalls. In this case in order to catch these calls we should trace library calls which are wrappers for the real syscalls. For example, in Linux it is possible to use ltrace (library trace).
Thus, performance overhead depends on
– number of additional timer calls: which itself depends on number of processed rows in each rowsource and number of rowsources
– implementation of the function getting timestamp, so in different OS overhead can be different
– and how exactly it works inside OS, even in the same OS it can work in different modes
Timing: Overhead: VDSO on Linux
- gettimeofday speedup:
A Virtual Dynamic Shared Object (VDSO), is a shared library that allows application in user space to perform some kernel actions without as much overhead as a system call. The VDSO is often used to provide fast access to the gettimeofday system call data.
Enabling the VDSO instructs the kernel to use its definition of the symbols in the VDSO, rather than the ones found in any user-space shared libraries, particularly the glibc. The effects of enabling the VDSO are system-wide – either all processes use it or none do.
When enabled, the VDSO overrides the glibc definition of gettimeofday with it’s own. This removes the overhead of a system call, as the call is made direct to the kernel memory, rather than going through the glibc.
The behavior of gethrtime() can be changed (system-wide):
- The VDSO boot parameter has three possible values:
0
Provides the most accurate time intervals at μs (microsecond) resolution, but also produces the highest call overhead, as it uses a regular system call
1
Slightly less accurate, although still at μs resolution, with a lower call overhead
2
The least accurate, with time intervals at the ms (millisecond) level, but offers the lowest call overhead
VDSO behavior is enabled by default. The value used to enable the VDSO affects the behavior of gettimeofday. It can be enabled by echoing the desired value to /proc/sys/kernel/syscall64.
echo 1 > /proc/sys/kernel/vsyscall64
It allows us to compare performance of gehrtime() as VDSO and as usual syscall.
Environment: VirtualBox with Oracle 11.2.0.3 on Oracle Linux 6.3 inside
I will change VDSO behavior and run the same query as above, consequentially in three modes: 2,1,0 (gethrtime() becames a real syscall).
Again I use sql trace instead of statistics_level or hint gather_plan_statistics in order to easy access to cpu time.
The tracefile: orcl_ora_3017.trc
# echo 2 > /proc/sys/kernel/vsyscall64
----------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:05.45 | 41400 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:05.45 | 41400 | | 2 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 10M| 10M|00:00:04.54 | 41400 | | 3 | INDEX FULL SCAN | IDX | 1 | 10M| 10M|00:00:01.74 | 23555 | ----------------------------------------------------------------------------------------------- FETCH #139665587261584:c=5403178,e=5453897,p=0,cr=41400,cu=0,mis=0,r=1,dep=0,og=1,plh=4182611088,tim=1350789769054152
cpu=5,40
# echo 1 > /proc/sys/kernel/vsyscall64
----------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:05.17 | 41400 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:05.17 | 41400 | | 2 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 10M| 10M|00:00:04.29 | 41400 | | 3 | INDEX FULL SCAN | IDX | 1 | 10M| 10M|00:00:01.63 | 23555 | ----------------------------------------------------------------------------------------------- FETCH #139665578749400:c=5119221,e=5169173,p=0,cr=41400,cu=0,mis=0,r=1,dep=0,og=1,plh=4182611088,tim=1350789789329746
cpu=5,12
# echo 0 > /proc/sys/kernel/vsyscall64
----------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:11.67 | 41400 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:11.67 | 41400 | | 2 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 10M| 10M|00:00:09.15 | 41400 | | 3 | INDEX FULL SCAN | IDX | 1 | 10M| 10M|00:00:03.28 | 23555 | ----------------------------------------------------------------------------------------------- FETCH #139665585290416:c=11504250,e=11668447,p=0,cr=41400,cu=0,mis=0,r=1,dep=0,og=1,plh=4182611088,tim=1350789819491035
cpu=11,50
Summary
mode | cpu time |
---|---|
0 | 11,50 |
1 | 5,12 |
2 | 5,40 |
In order to measure number of times gettimeofday() has been called on Linux it is possible either use ltrace (in any mode) or any syscalls measurement method (as strace or dtrace on OL6) in mode 0 (when gettimeofday() is real syscall).
I have used dtrace and the simplest script:
# dtrace -p 3017 -n 'syscall::gettimeofday:entry/pid == $target/{@calls=count()}' dtrace: description 'syscall::gettimeofday:entry' matched 1 probe ^C 40000640
Thus, number of timer calls in this case is 40 mln – the same number as in Solaris.
Timing: known issues
In some environments it is possible to get execution plans with absolutely crazy timings, as below.
This case is Solaris 10 on VirtualBox:
select count(*) from (select rownum id from dual connect by level <= 999999) t1, (select rownum id from dual connect by level <= 999999) t2 where t1.id = t2.id; -------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | OMem | 1Mem | Used-Mem | -------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:02.56 | | | | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:02.56 | | | | |* 2 | HASH JOIN | | 1 | 1 | 99999 |07:15:32.57 | 3749K| 1936K| 5594K (0)| | 3 | VIEW | | 1 | 1 | 99999 |03:28:07.19 | | | | | 4 | COUNT | | 1 | | 99999 |02:31:21.65 | | | | | 5 | CONNECT BY WITHOUT FILTERING| | 1 | | 99999 |01:53:30.97 | | | | | 6 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | | | | | 7 | VIEW | | 1 | 1 | 99999 |03:28:17.39 | | | | | 8 | COUNT | | 1 | | 99999 |02:50:24.37 | | | | | 9 | CONNECT BY WITHOUT FILTERING| | 1 | | 99999 |00:56:48.52 | | | | | 10 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | | | | --------------------------------------------------------------------------------------------------------------------
Pay attention that the query has been done in ~3 seconds, but reported time of some steps more than 7 hours.
The reason is non-monotonic timer. It means that subsequent call of timestamp can return value less that previous call (time from the past).
More details of this problem and the tool allowing to check it is described here: http://www.doerzbach.com/index.php?entry=entry120113-124240
P.S.
By the way Real-Time SQL monitoring works absolutely another way… but this is another story. I am going to explain it also in one of next posts.
P.P.S.
Seems that’s all. I hope didn’t forget anything 😉
Any additions, corrections, questions are welcome!
Hi Alexander,
a great post, as usual. Regarding inconsistent timings: when we see this, can we trust any part of the plan rowsource stats? For example, in this plan with a 7 hour hash join, can we conclude that the statement took 7h15m + 3h28m = 10h43m? Or that it took at least 7h15m? Or should we just discard the timing information altogether?
It’s also annoying the way Oracle fails to properly include rowsource stats for scalar subqueries into those of their parents — in this case it’s even worse because in addition to timings, it also affects LIO. Is there any remedy for that?
Hi Nikolay,
There are two resons for inconsistent timing in rowsource statistics:
1) the first one is “by design”, due the mechanic of _rowsource_statistics_sampfreq
2) the second one is due to errors in timer’s work
The first reason:
When _rowsource_statistics_sampfreq is more than 1, let’s take default value 128, then Oracle calculates time for one of 128 rows. And use calculated time for the rest 127 rows. At this point some error appears, because time spent in calculated row and time spent in rest 127 rows can be different, at least calculated piece calls additional functions, system calls in order to get timestamps, when rest 127 rows do not.
So, when the parameter _rowsource_statistics_sampfreq > 1 then reported time is adjusted, approximated.
Let’s take a look on example where I ran a query with different _rowsource_statistics_sampfreq.
This is SQL+ output from the test with the executoin plan with Index Full Scan on 11g, and below is an excerpt from this output:
Here we can see
– in every case except the last one, when _rowsource_statistics_sampfreq = 1, reported time of child rowsources (spets 2 and 3) more than parent rowsources (step 1 of the execution plan, SORT AGGREGATE).
– top rowsource of the execution plan, SORT AGGREGATE is always report right time (compare it with elapsed time of a query) because only few calculations have happened there, only two timestamps were need for the top rowsource.
It is why reported time for the top rowsource usually is real and true.
Pay attention on the interesting detail here: reported time on the steps which are wrong in case when _rowsource_statistics_sampfreq > 1 is close or equal to the time of the same steps in case when _rowsource_statistics_sampfreq = 1 in our case.
Let’s compare the time on the step 2 and 3 in all execution plans in this excerpt. It still be the same. In all plans except the latest, we see the time which we would have if set _rowsource_statistics_sampfreq = 1.
But this rule does not work always. In some environment some another measurement mistake appears, when with less _rowsource_statistics_sampfreq time of steps 2 and 3 from our example become more accurate.
The second reason:
this crazy timing:
happened due to issues with timer in virtualized environment.
Pay attention that this timing has been got with _rowsource_statistics_sampfreq = 1.
The problem in this case that “next” timestamp can be (not every timestamp, but some of them) less than “previous”. It is easy to check using simple piece of code.
In my case it reported
As I suggest, when Oracle subtracts previous timestamp from the next (when the next less then the previous) using unsigned variable it gets huge numbers.
Obviously we cannot trust reported time in this case, except top level (step 1, SORT AGGREGATE) because only few timestamps on the top level, very small probability to get wrong timing here.
Hi Alexander,
thank you for the great explanation. Jonathan Lewis mentioned the role of the timer function some years ago: http://jonathanlewis.wordpress.com/2007/11/25/gather_plan_statistics/. But he did not say much about the details.
Regards
Martin