Questions to ask apps user during Performance Issue

Questions to ask application user for better understanding of the Performance Issue.

There are is no step by step process to fix the performance issues. But if we have some idea on the current issues & environments and previous executions, we can try to support and solve the issues.

Below are the questions for the application  user who raises Database slowness issue.

It is always good practice sending these questions via email or asking them in the call for better understanding of the Performance Issue.

questions

  1. Is the performance  issue related to overall server slowness or a particular module/jobs/query?
  2. What is the normal run time of the query ?
  3. Has the query run before successfully? If so, how often does it run?
  4. Can sql_id of this query be provided from your end ?
  5. Was the query performance acceptable in the past?
  6. Are there any metrics on how long the query has run when successful?
  7. How much data is typically returned from the query?
  8. When was the last time statistics were gathered on the objects referenced in the query?
  9. When has the issue been started or initially monitored ?
  10. Is the problem consistent, or does it only run slow at certain times of the day, or under certain loads ?
  11. Please indicate exact date and/or time of last incident ( When problem with slow performance started and when it finished .)
  12. What has recently been changed ?
  13. Has the load undergone a major increase (such as a new application, or part of an application being implemented) ?
  14. Through which user this query got executed ?

 

In Detail:  Why we need answers for these questions?

  • Is the performance  issue related to overall server slowness or a particular module/jobs/query?

If the apps team is not able to provide more details about a particular module/jobs/query , we can verify overall  DB performance  and if no issue found in DB,  then we can include/involve other teams such as platform team / network team / storage team etc.

  • What is the normal run time of the query ?

Troubleshooting queries on  Oracle Database

If the current  execution is very time consuming than the usual, there are chances of killing the process after taking the approval. Most of the time killing and rerunning/restarting the query/job helps a lot.

Script to get CPU intensive queries in oracle

  • Has the query run before successfully? If so, how often does it run?

Sometimes users run the query very first time with large datasets directly in production from tools like Toad and SQL developer and complaints about slowness . We can suggest them to in the non-prod with same size of datasets for their better understanding the comparison. Here we can stop the unnecessarily logged users in the prod environment and direct them to non-prod.

  • Can sql_id of this query be provided from your end ?

we can analyze the previous executions and try to verify the execution plans. Follow below links

  • Was the query performance acceptable in the past?

If  acceptable ,we can compare the execution based available stats. If not acceptable , we can guide the users to do performance  tests or load test in the non-prod environment to do code review  and modification of the

  • Are there any metrics on how long the query has run when successful?

Steps to generate good plan from nonPROD to PROD

We can compare with current execution and verify the load and increased data in the tables.

  • How much data is typically returned from the query?

Need to know the no.of rows returned by the query or volume of the data by job/module.

  • When was the last time statistics were gathered on the objects referenced in the query?

Stale stats is the most observed cause for 99%  issues. Slowness of jobs/queries starts due to lack of stats . Gathering stats solves the problem immediately.

  • When has the issue been started or initially monitored ?

There will be unattended jobs/queries running many hours, after asking this question to  users, they  request to kill those processes.

  • Is the problem consistent, or does it only run slow at certain times of the day, or under certain loads ?

We can compare the stats/queries with AWRDIFF reports and provide additional queries during the time of issue.

  • Please indicate exact date and/or time of last incident ( When problem with slow performance started and when it finished .)

This is useful for comparing issue with the AWR reports

  • What has recently been changed ?

Most of the issues come because of recent changes, should have complete change tracking in the entire environment

  • Has the load undergone a major increase (such as a new application, or part of an application being implemented) ?

Need to verify weekly capacity reports to show the difference if the load has been increased.

  • Through which user this query got executed ?

With this we can get the idea on gravity of issue and we can  give priority  for the service accounts versus  normal apps users versus read-only users and notify unnecessary users to logoff and kill if they connect. More controlled user environment means less chances of  issues.

 

See also