Introduction To 600/7445 Internal Error Analysis (Doc ID 390293.1)
Introduction To 600/7445 Internal Error Analysis (Doc ID 390293.1)
1
                                                     PowerView is Off                                                           Consultor IT    (0)   Contact Us   Help
     Dashboard      Knowledge     Service Requests   Patches & Updates                                  Search Knowledge Base                                  Advanced
 Dashboard >
In this Document
    Purpose
    Troubleshooting Steps
    References
APPLIES TO:
   Oracle Database - Enterprise Edition - Version 8.1.7.4 to 11.2.0.3 [Release 8.1.7 to 11.2]
   Information in this document applies to any platform.
   ***Checked for relevance on 29-Apr-2013***
PURPOSE
   The purpose of this troubleshooting article is to provide an insight into key areas of internal
   600/7445 trace files to assist in deriving to a known bug or to highlight what might be needed
   by ORACLE Support to progress an issue further. Whilst the article can be used to some
   extent on 11g it is primarily written for versions <V10204 due to :-
   a) The worked example is based on a bug fixed in V10204 and higher releases
   b) The alert/trace file structure in 11g has changed
   The article will make reference to other notes so that the traces for 11g can also
   be proactively reviewed if wished.
   The nature of Internal errors means no one approach can guarantee getting to a solution but again
   the intention is to provide an indication on notes that support make available to customers and
   a simple workflow process that assists for a vast majority of SRs. A simple example is illustrated
   to generate a known bug that is harmless to the database when generated, that said it
   should ONLY be used on a TEST database as it's not encouraged to willingly raise such errors in
   'LIVE' environments. This example will then be used to highlight various sections and how each
   section can be useful for general 600/7445 analysis and again to provide the best chance of
   identifying a known bugfix should one be available.
   The article also makes links to a number of others for completeness including Note:232963.1
   should a testcase be required as is the ideal case for all internal errors but understandably
   is not always possible dependent on the problem area. Tests were made on SUN but the
   bug is not platform specific, it may however show slightly different messages within the alert
   log and for Windows based platforms some tools/notes might not be applicable. However this
   does not distract from the primary aim of how a trace file can be analyzed.
An Internal Error whether ORA-00600 or ORA-07445 can fall into many categories :-
   These are a simplified subset, depending on root cause there can be many 600/7445 errors.
   Typically the argument(s) of the error when unique should mean each are different problems
   but if occurring on same timestamp or one always soon follows another there might be some
   relationship and this will be for support to determine.
TROUBLESHOOTING STEPS
   Worked Example
   ------------------
   A real life example will follow and a working methodology will then be provided. Conclusions
   will then be made at the end of the analysis and some standard questions will be commented
   on that should always be appropriate to analysis of Internal errors and perhaps any SR raised
   into support. The bug number identified will be reported in a later section so as to show the
   workflow/analysis methodology used without knowing the solution in advance. This article
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=999858894285166&id=390293.1&_afrWindowMode=0&_adf.ctrl-state=csbhabf62_473                                1/9
22/08/13                                                                         Document 390293.1
   workflow/analysis methodology used without knowing the solution in advance. This article
   is only suitable for <V10204 databases in terms of the testcase as the bug it relates to
   if fixed in the V10204PSR and higher RDBMS versions. The article is still appropriate as an
   introduction to analysis for 11g RDBMS and we will mention the main differences for when an internal error is seen in 11g. Due to the nature of Internal errors it is
   possible that the error
   is reported differently between versions/PSRs/platforms making analysis more complex. The
   example and trace results for this article came from V10201/64bitSUN but the same principles
   apply to all.
   sqlplus scott/tiger
   drop table a;
   drop table b;
   The UPDATE will fail at session level with ORA-03113 error. A trace file will also be written
   to the UDUMP destination of the database and will now be highlighted, where appropriate
   if a section can be found using keyword searches in the trace it will be mentioned.
   Before any analysis of UDUMP/BDUMP traces take place there should be an understanding of
   how the error(s) are reported in the alert log.
   ALERT LOG
   ---------------
   As this is a worked example we know what to expect in advance but the alert and an understanding
   of what actions take place around an internal error can be fundamental to resolution. It is
   certainly possible for internal errors to be a consequence of a problem rather than the cause.
   The errors continue in the alert and as we can see >1 error is raised but all are reported
   to be within the same tracefile.
     There will be cases where >1 error and >1 trace exist but this is
   beyond the scope of this 'Introduction'.
   Various sections of the trace are now reported based on this example, not all 600/7445 traces
   will allow for each section to be reviewed. Again this is a very simple example and often a
   combination of traces will need to be understood which is beyond the scope of this article.
        xxxx/xxxx_ora_24779.trc
        Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
        With the Partitioning, OLAP and Data Mining options
        ORACLE_HOME = xxxxxx
        System name: SunOS
        Node name: xxxxxx
        Release: 5.8
        Version: Generic_117350-38
        Machine: sun4u
        Instance name: xxxxx
        Redo thread mounted by this instance: 1
        Oracle process number: 15
        Unix process pid: 24779, image: oracle@xxxxx (TNS V1-V3)
        This section gives general information on the machine/instance and version of RDBMS where the
        internal error has been seen, whilst important its certainly does not really contain much to narrow
        bug searches as this info is customer/machine specific where as a bug itself should really hold
        information generic to all customers.
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=999858894285166&id=390293.1&_afrWindowMode=0&_adf.ctrl-state=csbhabf62_473                             2/9
22/08/13                                                               Document 390293.1
       This article is not written to show the differences between ORA-00600/ORA-07445 errors in any
       detail.
       Not all internal issues will show a current SQL statement and there is no simple reason why
       this is the case. If this does happen it is recommended to try and use all other sections to
       narrow the search space down. When present in an internal (600/7445) trace file it should
       always be locatable using a search in the trace file of 'Current SQL' and should be the first
       hit found.
       In addition there maybe a 'PLSQL Call Stack' that pinpoints the schema.object and line number
       for an internal issue.
       A 600/7445 trace should always have a 'Call stack Trace' whether it contains function names or not,
       after this follows 'PROCESS STATE' information and in this we can search on keyword 'session'.
       The first hit should take us to the section of the trace file that shows where the error came from
       and the session/user affected (above they have been blanked to xxxxxx). In many cases we will see
       user, terminal and machine information that can often be useful for any issue that needs to be
       reproduced for further investigation. On many occasions an internal error will be reported in alert
       logs and the DBA will not necessarily have been informed by any users of the database/application
       that they encountered a problem.
       This section can often also show 3rd party sessions e.g. TOAD/PLSQL or even where the client is
       another ORACLE product e.g. FORMS/APPS and even if just a client process e.g. EXP/RMAN. This can be
       another powerful method of narrowing a problem down, in the case of a problem coming from 3rd party
       software it is important to determine wherever possible if an issue can be reproduced in SQLPLUS.
       ============
       Plan Table
       ============
       ---------------------------------------+-----------------------------------+
       | Id | Operation | Name | Rows | Bytes | Cost | Time |
       ---------------------------------------+-----------------------------------+
       | 0 | UPDATE STATEMENT | | | | 3 | |
       | 1 | UPDATE | A | | | | |
       | 2 | TABLE ACCESS FULL | A | 2 | 52 | 3 | 00:00:01 |
       | 3 | SORT AGGREGATE | | 1 | 26 | | |
       | 4 | TABLE ACCESS FULL | B | 1 | 26 | 3 | 00:00:01 |
       ---------------------------------------+-----------------------------------+
       This section can be found in the 600/7445 trace (when present) using the case
       sensitive search 'Explain plan' and will be the first hit unless the SQL for
       example contained the same string.
       In this section the 'Plan Table' for the failing SQL will be reported and
       especially for CBO related internal errors might give a pointer to finding
       a workaround. For this worked example nothing really can be used but we might
       for example see the plan using a 'hash join' and if so a possible w/a to
       try would be hash_join_enabled=false. Caution should be used here, permanent
       workarounds should only be set in place if matching to a known bug and the
       implications are understood.
       This can be found using the case sensitive search in the trace file for 'Current cursor:' [do not
       include the quotes], this section will either tie up with the current SQL statement reported in
       the 600/7445 or when there is no SQL statement reported it 'might' assist in further analysis.
       Once the current cursor is known the information on this can be found using the search 'Cursor#X'
       where X is the value shown e.g in this case 4.
       If a SQL statement is using BIND variables it is the cursor information that often proves useful
       at isolating the values used in the SQL which again can be the key to reproducing a problem if
       data specific.
       Now that the sections have been explained we need to ensure we use as systematic a procedure as
       possible to determine if a known issue/bug exists. It should be clear that a 600/7445 trace contains
       far more information but if the above sections together with what follows cannot conclude to a known
       issue it is recommended that a SR is raised to support.
       OERI notes
       ----------
       An 'OERI' article is a document provided where possible to customers that provides a brief meaning
       of the error/impact and known bugs relating to the 600/7445. The quickest way to find this
       information from MOS is to search as :
So if the internal error was 600[12333] the search is:600 12333 oeri
So for this article and worked example the search would be : 7445 memcpy oeri
       Equally the 600/7445 tool as per Note:153788.1 can be used and if the trace file is submitted into
       this tool an OERI note will be reported where published. If the testcase is attempted on a WINDOWS
       platform the error reported in the alert and trace may not be the same and the tool might not return
       any hits. Finding a suitable example for all platforms is beyond the scope of this document.
       So for the latter search we should find Note:310531.1, it should be noted that not all bugs
       associated with an error will be tagged but ideally they would be.
       It should also be clear from the notes seen in this type of search that ORA-600[x]/7445[y] does not
       mean only one bug can attribute to the Internal error.
       BUG.8 notes
       -----------
       If the OERI or if a search gets a hit to a bug there maybe a link to a note where the MOS Noteid is
       <BUGNO>.8
       For this worked example Note:5162852.8 exists, also to illustrate their purpose we can take for
       example Note:4451759.8
These articles are automatically generated when a bug becomes fixed and will typically include :-
       --Product affected
       --Range of releases the bug/error is believe appropriate to
       --Known affected version(s)
       --Fixed version
       Together with the 'symptoms' of the bug and where possible workarounds. This information is also in
       the bug itself but the note provides this into a much clearer form.
       When a PSR is released ORACLE ensures there is an article that summarises all the bugfixes made into
       that PSR, this articles allows a quick search to customers to see if there is an ORA-600/ORA-7445
       error of the same type in the PSR. As mentioned earlier in the OERI section just because there is a
       fix for the same type, it does not guarantee that the problem is the same.
       For example if one of these PSR notes mentions ORA-00600[729] it means that the fix relates to one
       specifc way this error can be encountered, it does not stop 600[729] occurring for other reasons and
       even with the latest PSR applied to a given RDBMS release the internal error could occur due to a
       new unknown bug.
These documents can be found in MOS using a search with keywords like :-
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=999858894285166&id=390293.1&_afrWindowMode=0&_adf.ctrl-state=csbhabf62_473   5/9
22/08/13                                                                          Document 390293.1
        <version> patch set bug fixes problem type
        So if we are looking to see what fixes are made into the V92070 PSR we would use
        the six keywords : 9.2.0.7 patch set bug fixes problem type
        MOS will produce many hits but then the internet browsers search function can be used to search on
        the string '9.2.0.7' to quickly isolate the note which in this case is Note:308894.1
        It is recommended to search always on the highest PSR available for the version you are running on
        since the note will then also contain links to the previous PSRs.
        This is the stage of the article where we need to use the research gathered from trace files and
        known MOS notes to see if we can use the MOS knowledge base to determine if the internal error is a
        known bug. It was commented that Section4 of a trace file can be very powerful here but again it is
        impossible to provide one unique way to search the knowledge bases. Instead we will use the worked
        example and will give some recommendations for 'bad' and 'good' search practices.
        a) Using first few calls of stack trace e.g ksedmp ssexhd sigacthandler memcpy
        b) Use of MOS 'Advanced Search' and using the 'using any of the words' option
        The reason this is a poor search method is that it is likely too many hits will be returned
        especially if keywords like 600/7445 are used. The default search functionality is an AND base
        search where all keywords must be present and this provides for better searches and fewer unrelated
        hits.
        c) Using all of the stack trace. As explained above a default MOS search will find hits that contain
        all the keywords and using the whole stack allows a real risk of getting zero hits returned.
        It should be noted that more than one search might actually be needed to get a feeling for what
        bugs/notes are relevant to the customers issue. The scope of what can be searched will depend on if
        a problem is reproducible.
both get very few bug hits and BUG:5162852 is quickly identified.
   The example will not be reported in the same detail but instead we can use
   it to highlight the key differences.
   After the error has been reproduced we need to look at the alert log which
   can be located via :-
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=999858894285166&id=390293.1&_afrWindowMode=0&_adf.ctrl-state=csbhabf62_473   6/9
22/08/13                                                                           Document 390293.1
   It is the background and user destinations for the 'trace' directory that we will
   need to look.
   For more details on how critical/internal errors are reported within 11g Note.443536.1
   can be used. For further analysis of the ORA-00600 we are interested in the 'Incident Trace'
   and the full path to this file is still provided by the alert log. Such traces can also
   be located via 'ADRCI' but is beyond the scope of this article.
   If we look at the appropriate incident trace file we will see very similar traces to
   the 10.2 example and the same analytical approach to these traces can be followed.
   To provide 11g files to support it is vital to provide the appropriate incident traces
   and Note.443529.1/Note.411.1 can be reviewed but again is outside of the scope
   of how to try and diagnose the internal error itself.
   If functionality that was working has just stopped it would tend to suggest either some
   issue of data within one or more objects or some more significant change. If a 600/7445
   is seen within Development particularly on higher PSRs/releases it could well be that
   you have found a new bug. Again depending on reproducibility a testcase ideally would
   be required.
Has any new change been made recently e.g new code/new PSR or DB version/higher DB load?
   Rarely do 600/7445 errors just start to occur due to a database being open for a certain
   amount of time. Even a small change of init/spfile parameters can influence how the database operates
   so if any change is known it is better to mention asap no matter how small.
   On many occasions a 600/7445 will be seen as a oneoff and not to be seen again. If pre-analysis
   cannot be matched to a known issue and if in any doubt about an error a SR should always
   be raised. It is however necessary to be aware that if the error is a oneoff or sporadic there is a
   tradeoff between impact of error and further analysis, as covered below.
   On other occasions we might see the error on the hour or some consistent period of time, in
   these cases root cause might be some job that runs or linked to an ORACLE process that
   gets invoked at certain times. Finding some pattern can sometimes be vital to solving an issue
   and knowing when an issue truly started is very important to analysis.
   Of course any 600/7445 [or any issue for all product support] should be raised as a SR to
   ORACLE if there is any concern. Sometimes an Internal errors impact is very clear in that
   we might see an error in a controlled manner only when a session logsoff the database or the
   database closes down. It is equally possible that a database may seem to be running stable
   with no users reporting problems and only by proactively looking at the alert can errors be seen.
   In all cases if a SR is raised to Support we will need to try and assess the impact of a given
   error and to make a confirmation asap if linked to something serious within the database eg corruption.
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=999858894285166&id=390293.1&_afrWindowMode=0&_adf.ctrl-state=csbhabf62_473   7/9
22/08/13                                                                               Document 390293.1
   In many cases of analysis there soon becomes a tradeoff between seeing the error in the alert
   and the further diagnostics needed to try and get to the root cause. For this reason if an error
   is a oneoff or very infrequent and determined not to visibly affect users a decision may need to
   be made on how far to go especially if running on a desupported release or not on the latest PSR.
The above questions relate to Support asking this information from the customer, one of the key questions asked by customers to Support is :-
'Why am I being asked to go to the latest PSR when a bug has not been identified?'
   a) Note:209768.1 explains the backport policy for new bugs. So if Support could not match analysis to a known issue, a new bug can be filed. Sometimes
   Development can release 'diagnostics patches' to better analyze new issue but it's recommended to use it on latest patchset. Also if on older PSRs it is far more
   likely for the customer to be running with >1 'oneoff' fix placing the customer into a configuration that is probably not seen with many other customers.
   b) Perhaps more importantly each PSR contains many fixes and can solve many issues without
   Support being able to determine why. For this reason the latest PSR is seen as a statistically faster
   option to solve a problem when an error cannot be matched to a known bug.
Of course if considering the latest PSR appropriate backups and testing need to be made in advance.
   In conclusion 600/7445 errors are not hopefully a common event to be seen by DBA's and
   analysis of these errors is not something that would be expected by themselves. However it is often of
   interest to many customers as to what is looked at within support and how we make a formal analysis.
   Therefore the articles purpose is to highlight some of these processes with the hope that some
   self analysis can be made and to assist customer/support interaction on these issues.
REFERENCES
   BUG:5162852 - UPDATE USING AVG FUNCTION FAILS WITH ORA-07445[KGIDMP]/[KXSSQL] AND [SUBSR1]
   BUG:8 -
      Related
      Products
               Oracle Database Products > Oracle Database > Oracle Database > Oracle Database - Enterprise Edition > RDBMS > ORA-600 by dbase unrelated to another product or component
      Keywords
           DYNAMIC ADV DIAGNOSTIC TOOLS; TROUBLESHOOT
     Back to Top
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=999858894285166&id=390293.1&_afrWindowMode=0&_adf.ctrl-state=csbhabf62_473                                                8/9
22/08/13                                                                                             Document 390293.1
      Back to Top
  Copy right (c) 2013, Oracle. A ll rights reserv ed.   Legal Notices and Terms of Use   Priv acy Statement
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=999858894285166&id=390293.1&_afrWindowMode=0&_adf.ctrl-state=csbhabf62_473 9/9