What You Can Do When Your Database Runs out of Temp Space
When a sort operation is too large to fit in memory, Oracle allocates space in a temporary tablespace in order to write data off to disk. Temporary space is a resource shared by multiple sessions on the database, and quotas cannot be set to limit how much temporary space can be used by any individual user or session. If a user submits a query with an incomplete WHERE clause, an enormous Cartesian product may result. That is all it takes to fill the temporary tablespace and possibly impact many other users on the system. When the temporary tablespace fills, any statement that requires additional temporary space will fail with an “ORA-1652: unable to extend temp segment” error.
In this paper, we will first briefly review how Oracle manages sorting operations. Next we’ll discuss how a database administrator can determine if any statements on the database have failed because the temporary tablespace ran out of space. We’ll also present two techniques a DBA can use to understand how space in the temporary tablespace is being used and how users are being impacted by a full temporary tablespace. The first technique we’ll look at includes how to direct Oracle to log every statement that fails for lack of temporary space. The second technique provides a set of queries a DBA can run at any time to capture in real time how temporary space is currently being used on a per-session or per-statement basis. These techniques can help DBAs address chronic or intermittent shortages of temporary space.
Oracle Sorting Basics
Many different circumstances can cause Oracle to sort data. For example, Oracle sorts data when creating an index and when processing most queries that include an ORDER BY or GROUP BY clause. Oracle sessions begin sorting data in memory. If the amount of data being sorted is small enough, the entire sort will be completed in memory with no intermediate data written to disk. When Oracle needs to store data in a global temporary table or build a hash table for a hash join, Oracle also starts the operation in memory and completes the task without writing to disk if the amount of data involved is small enough. While populating a global temporary table or building a hash is not a sorting operation, we will lump all of these activities together in this paper because they are handled in a similar way by Oracle.
If an operation uses up a threshold amount of memory, then Oracle breaks the operation into smaller ones that can each be performed in memory. Partial results are written to disk in a temporary tablespace. The threshold for how much memory may be used by any one session is controlled by instance parameters. If the workarea_size_policy parameter is set to AUTO, then the pga_aggregate_target parameter indicates how much memory can be used collectively by all sessions for activities such as sorting and hashing. Oracle will automatically assess and decide how much of this memory any individual session should be allowed to use. If the workarea_size_policy parameter is set to MANUAL, then instance parameters such as sort_area_size, hash_area_size, and bitmap_merge_area_size dictate how much memory each session can use for these operations.
Each database user has a temporary tablespace (or temporary tablespace group in Oracle 10g) designated in their user definition. Whenever a sort operation grows too large to be performed entirely in memory, Oracle will allocate space in the temporary tablespace designated for the user performing the operation. You can see a user’s temporary tablespace designation by querying the dba_users view.
Temporary segments in temporary tablespaces—which we will call “sort segments”—are owned by the SYS user, not the database user performing a sort operation. There typically is just one sort segment per temporary tablespace, because multiple sessions can share space in one sort segment. Users do not need to have quota on the temporary tablespace in order to perform sorts on disk. In fact, quotas on temporary tablespaces are ignored by Oracle.
Temporary tablespaces can only hold sort segments. Oracle’s internal behavior is optimized for this fact. For example, writes to a sort segment do not generate redo or undo. Also, allocations of sort segment blocks to a specific session do not need to be recorded in the data dictionary or a file allocation bitmap. Why? Because data in a temporary tablespace does not need to persist beyond the life of the database session that created it.
One SQL statement can cause multiple sort operations, and one database session can have multiple SQL statements active at the same time—each potentially with multiple sorts to disk. When the results of a sort to disk are no longer needed, its blocks in the sort segment are marked as no longer in use and can be allocated to another sort operation.
A sort operation will fail if a sort to disk needs more disk space and there are 1.) no unused blocks in the sort segment, and 2.) no space available in the temporary tablespace for the sort segment to allocate an additional extent. This will most likely cause the statement that prompted the sort to fail with the Oracle error, “ORA-1652: unable to extend temp segment.” This error message also gets logged in the alert log for the instance.
It is important to note that not all ORA-1652 errors indicate temporary tablespace issues. For example, moving a table to a different tablespace with the ALTER TABLE…MOVE statement will cause an ORA-1652 error if the target tablespace does not have enough space for the table.
Identifying SQL Statements that Fail Due to Lack of Temporary Space
It is helpful that Oracle logs ORA-1652 errors to the instance alert log as it informs a database administrator that there is a space issue. The error message includes the name of the tablespace in which the lack of space occurred, and a DBA can use this information to determine if the problem is related to sort segments in a temporary tablespace or if there is a different kind of space allocation problem.
Unfortunately, Oracle does not identify the text of the SQL statement that failed. Thus we are informed that a problem has occurred but we are not given tools with which to identify the cause of the problem nor measure the user impact of the statement failure.
However, Oracle does have a diagnostic event mechanism that can be used to give us more information whenever an ORA-1652 error occurs by causing Oracle server processes to write to a trace file. This trace file will contain a wealth of information, including the exact text of the SQL statement that was being processed at the time that the ORA-1652 error occurred. This diagnostic event imposes very little overhead on the system, because Oracle only writes information to the trace file when an ORA-1652 error occurs.
You can set a diagnostic event for the ORA-1652 error in your individual database session with the following statement:
ALTER SESSION SET EVENTS '1652 trace name errorstack';
You can set the diagnostic event instance-wide with the following statement:
ALTER SYSTEM SET EVENTS '1652 trace name errorstack';
The above statement will affect the current instance only and will not edit the server parameter file. That is to say, if you stop and restart the instance, the diagnostic event setting will no longer be active. I don’t recommend setting this diagnostic event on a permanent basis, but if you want to edit your server parameter file, you could use a statement like the following:
ALTER SYSTEM SET EVENT = '1652 trace name errorstack' SCOPE = SPFILE;
You can also set diagnostic events in another session (without affecting all sessions instance-wide) by using the “oradebug event” command in SQL*Plus.
You can deactivate the ORA-1652 diagnostic event or remove all diagnostic event settings from the server parameter file with statements such as the following:
ALTER SESSION SET EVENTS '1652 trace name context off'; ALTER SYSTEM SET EVENTS '1652 trace name context off'; ALTER SYSTEM RESET EVENT SCOPE = SPFILE SID = '*';
If a SQL statement fails due to lack of space in the temporary tablespace and the ORA-1652 diagnostic event has been activated, then the Oracle server process that encountered the error will write a trace file to the directory specified by the user_dump_dest instance parameter. The entry in the instance alert log that indicates an ORA-1652 error occurred will also indicate that a trace file was written. An entry in the instance alert log will look like this:
Tue Jan 2 17:21:14 2007 Errors in file /u01/app/oracle/admin/rpkprod/udump/rpkprod_ora_10847.trc: ORA-01652: unable to extend temp segment by 128 in tablespace TEMP
The top portion of a sample trace file is as follows:
Oracle Database 10g Release 10.2.0.2.0 - 64bit Production ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_2 System name: SunOS Node name: rpk Release: 5.8 Version: Generic_108528-27 Machine: sun4u Instance name: rpkprod Redo thread mounted by this instance: 1 Oracle process number: 18 Unix process pid: 10847, image: oracle@rpk (TNS V1-V3) *** ACTION NAME:() 2007-01-02 17:21:14.871 *** MODULE NAME:(SQL*Plus) 2007-01-02 17:21:14.871 *** SERVICE NAME:(SYS$USERS) 2007-01-02 17:21:14.871 *** SESSION ID:(130.13512) 2007-01-02 17:21:14.871 *** 2007-01-02 17:21:14.871 ksedmp: internal or fatal error ORA-01652: unable to extend temp segment by 128 in tablespace TEMP Current SQL statement for this session: SELECT "A1"."INVOICE_ID", "A1"."INVOICE_NUMBER", "A1"."INVOICE_DAT E", "A1"."CUSTOMER_ID", "A1"."CUSTOMER_NAME", "A1"."INVOICE_AMOUNT", "A1"."PAYMENT_TERMS", "A1"."OPEN_STATUS", "A1"."GL_DATE", "A1"."ITE M_COUNT", "A1"."PAYMENTS_TOTAL" FROM "INVOICE_SUMMARY_VIEW" "A1" ORDER BY "A1"."CUSTOMER_NAME", "A1"."INVOICE_NUMBER" ----- Call Stack Trace -----
From the trace file you can clearly see the full text of the SQL statement that failed. You can also see when it failed along with attributes of the database session such as module, action, and service name. It is important to note that the statements captured in trace files with this method may not themselves be the cause of space issues in the temporary tablespace. For example, one query could run successfully and consume 99.9% of the temporary tablespace due to a Cartesian product, while a second query fails when trying to allocate just a small amount of sort space. The second query is the one that will get captured in a trace file, while the first query is more likely to be the root cause of the problem.
The trace file will contain additional information, including a call stack trace and a binary stack dump. This information is not likely to be useful, unless perhaps you want to learn more about Oracle internals.
The diagnostic event facility has been built into the Oracle database product for a very long time, but it is not widely documented. Oracle Support’s position appears to be that you should not use this facility unless directed to do so by Oracle Support. There are certain widely-known diagnostic events such as 10046 for extended SQL trace and 10053 for tracing the cost-based optimizer, and there are certain events that can alter Oracle’s behavior significantly. In general, you absolutely should not try setting diagnostic events in a production database unless you have a very good idea of what they do.
Although I am not aware of an Oracle Support document that officially blesses setting diagnostic event 1652 for identifying SQL statements that fail due to lack of sort space, there are bulletins on Metalink that do show how to set events to dump an error stack for basic Oracle errors. Metalink document 217274.1, for example, shows how to set a diagnostic event for the ORA-942 (“table or view does not exist”) error. We are doing the exact same thing here for the ORA-1652 error, and therefore it seems like a relatively safe thing to do.
Like most debugging or diagnostic facilities, you should only use the ORA-1652 diagnostic event to the extent you really need to. If you regularly get ORA-1652 errors in one batch job and you can add an ALTER SESSION statement to the beginning of the batch job, then doing so would be preferable to setting the diagnostic event at the instance-level. Typically there shouldn’t be a need to set this diagnostic event at the instance level on a permanent basis or in the server parameter file.
Monitoring Temporary Space Usage
Instead of waiting for a temporary tablespace to fill and for statements to fail, you can monitor temporary space usage in the database in real time. At any given time, Oracle can tell you about all of the database’s temporary tablespaces, sort space usage on a session basis, and sort space usage on a statement basis. All of this information is available from v$ views, and the queries shown in this section can be run by any database user with DBA privileges.
The following query displays information about all sort segments in the database. (As a reminder, we use the term “sort segment” to refer to a temporary segment in a temporary tablespace.) Typically, Oracle will create a new sort segment the very first time a sort to disk occurs in a new temporary tablespace. The sort segment will grow as needed, but it will not shrink and will not go away after all sorts to disk are completed. A database with one temporary tablespace will typically have just one sort segment.
SELECT A.tablespace_name tablespace, D.mb_total, SUM (A.used_blocks * D.block_size) / 1024 / 1024 mb_used, D.mb_total - SUM (A.used_blocks * D.block_size) / 1024 / 1024 mb_free FROM v$sort_segment A, ( SELECT B.name, C.block_size, SUM (C.bytes) / 1024 / 1024 mb_total FROM v$tablespace B, v$tempfile C WHERE B.ts#= C.ts# GROUP BY B.name, C.block_size ) D WHERE A.tablespace_name = D.name GROUP by A.tablespace_name, D.mb_total;
The query displays for each sort segment in the database the tablespace the segment resides in, the size of the tablespace, the amount of space within the sort segment that is currently in use, and the amount of space available. Sample output from this query is as follows:
TABLESPACE MB_TOTAL MB_USED MB_FREE ------------------------------- ---------- ---------- ---------- TEMP 10000 9 9991
This example shows that there is one sort segment in a 10,000 Mb tablespace called TEMP. Right now, 9 Mb of the sort segment is in use, leaving a total of 9,991 Mb available for additional sort operations. (Note that the available space may consist of unused blocks within the sort segment, unallocated extents in the TEMP tablespace, or a combination of the two.)
Sort Space Usage by Session
The following query displays information about each database session that is using space in a sort segment. Although one session may have many sort operations active at once, this query summarizes the information by session. This query will need slight modification to run on Oracle 8i databases, since the dba_tablespaces view did not have a block_size column in Oracle 8i.
SELECT S.sid || ',' || S.serial# sid_serial, S.username, S.osuser, P.spid, S.module, S.program, SUM (T.blocks) * TBS.block_size / 1024 / 1024 mb_used, T.tablespace, COUNT(*) sort_ops FROM v$sort_usage T, v$session S, dba_tablespaces TBS, v$process P WHERE T.session_addr = S.saddr AND S.paddr = P.addr AND T.tablespace = TBS.tablespace_name GROUP BY S.sid, S.serial#, S.username, S.osuser, P.spid, S.module, S.program, TBS.block_size, T.tablespace ORDER BY sid_serial;
The query displays information about each database session that is using space in a sort segment, along with the amount of sort space and the temporary tablespace being used, and the number of sort operations in that session that are using sort space. Sample output from this query is as follows:
SID_SERIAL USERNAME OSUSER SPID MODULE PROGRAM MB_USED TABLESPACE SORT_OPS ---------- -------- ------ ---- ------ --------- ------- ---------- -------- 33,16998 RPK_APP rpk 3061 inv httpd@db1 9 TEMP 2
This example shows that there is one database session using sort segment space. Session 33 with serial number 16998 is connected to the database as the RPK_APP user. The connection was initiated by the httpd@db1 process running under the rpk operating system user, and the Oracle server process has operating system process ID 3061. The application has identified itself to the database as module “inv.” The session has two active sort operations that are using a total of 9 Mb of sort segment space in the TEMP tablespace.
Sort Space Usage by Statement
The following query displays information about each statement that is using space in a sort segment. This query will need slight modification to run on Oracle 8i databases, since the dba_tablespaces view did not have a block_size column in Oracle 8i.
SELECT S.sid || ',' || S.serial# sid_serial, S.username, T.blocks * TBS.block_size / 1024 / 1024 mb_used, T.tablespace, T.sqladdr address, Q.hash_value, Q.sql_text FROM v$sort_usage T, v$session S, v$sqlarea Q, dba_tablespaces TBS WHERE T.session_addr = S.saddr AND T.sqladdr = Q.address (+) AND T.tablespace = TBS.tablespace_name ORDER BY S.sid;
The query displays information about each statement using space in a sort segment, including information about the database session that issued the statement and the temporary tablespace and amount of sort space being used. Sample output from this query is as follows:
SID_SERIAL USERNAME MB_USED TABLESPACE ADDRESS HASH_VALUE ---------- -------- ------- ---------- ---------------- ---------- SQL_TEXT -------------------------------------------------------------------------------- 33,16998 RPK_APP 8 TEMP 000000038865B058 3641290170 SELECT * FROM NOTIFY_MESSAGES NM WHERE NM.AWAITING_SENDING = 'y' AND NOT EXISTS ( SELECT 1 FROM NOTIFY_MESSAGE_GROUPS NMG WHERE NMG.MESSAGE_GROUP_ID = NM.MESSAG E_GROUP_ID AND NMG.INCOMPLETE = 'y' ) ORDER BY NM.NOTIFY_MESSAGE_ID 33,16998 RPK_APP 1 TEMP 00000003839FFE20 1874671316 select * from rpk_stat where sample_group_id = :b1 order by stat#, seq#
This example shows that session 33 with serial number 16998, connected to the database as the RPK_APP user, has two statements currently using sort segment space in the TEMP tablespace. One statement is currently using 8 Mb of sort segment space, while the other is using 1 Mb. The text of each statement, along with its hash value and address in the shared SQL area are also displayed.
When an operation such as a sort, hash, or global temporary table instantiation is too large to fit in memory, Oracle allocates space in a temporary tablespace for intermediate data to be written to disk. Temporary tablespaces are a shared resource in the database, and you can’t set quotas to limit temporary space used by one session or database user. If a sort operation runs out of space, the statement initiating the sort will fail. It may only take one query missing part of its WHERE clause to fill an entire temporary tablespace and cause many users to encounter failure because the temporary tablespace is full.
It is easy to detect when failures have occurred in the database due to a lack of temporary space. With the setting of a simple diagnostic event, it is also easy to see the exact text of each statement that fails for this reason. There are also v$ views that DBAs can query at any time to monitor temporary tablespace usage in real time. These views make it possible to identify usage at the database, session, and even statement level.
Oracle DBAs can use the techniques outlined in this paper to diagnose temporary tablespace problems and monitor sorting activity in a proactive way. These tactics can be helpful for addressing both chronic and intermittent shortages of temporary space.