Performance tuning is one area where most Junior DBAs face those tough-to-break problems. The kind of problems which require deep understanding of the concepts before you can even point what is going wrong in the system. We will be going through the basics knowledge in the Performance Tuning area in series of posts. This is the first post in the series and in this post we will focus on understanding Selectivity and Cardinality.
=> It represents the fraction of rows filtered by an operation so you can say it is a measure of uniqueness.
=> Its value is between 0 and 1
=> If your query returned 100 rows and then you put a filter (example “where” condition) which made final result to only 10 then your selectivity is 0.1 (10/100) or you can also say your selectivity is 10%
=> In another example if your query returned 200 rows and even after putting filter the final result is also 200 then your selectivity is 1 or 100%. This is known as bad Selectivity. It is “BAD” because as such all records are coming.
=> A column is highly selective if a SQL return a small number of duplicate rows. This is known as GOOD Selectivity.
=> A columns is least selective if a SQL returns all or large number of rows.This is known as BAD Selectivity.
When you run SELECT * FROM EMP and do not give any filters then selectivity will be automatically 1 as all rows will be returned.
Adding a composite Index is the best way to make a BAD selectivity become GOOD selectivity. Using more than one column makes the Index more unique which in turns improves Index selectivity.
The number of rows returned by an operation is the cardinality. The relationship between selectivity and cardinality is below:
cardinality = selectivity × number of input rows
So say your query got 200 records from database and by putting some filters etc you made the final number of rows to be 50, so your selectivity becomes 0.25 (50/200) and you cardinality is 50 (200 * 0.25).
Sometimes the Oracle optimizer is not able to predict the number of rows that a given operator will return (because of reasons like missing table statistics). This can prevent Oracle from estimating the cost of a query plan correctly, which can lead to the selection of a suboptimal plan. Cardinality estimation errors can cause slow running queries.
Let us look at some examples now:
1) Say you have a table named EMP which has 10 records. You want to know the biggest EMP_NUMBER out of this table.
SELECT MAX(EMP_NUMBER) from EMP;
selectivity = number of rows accessed/total number of rows = 10/10 = 1 (100% of the rows were accessed)
cardinality = number of rows accessed = 10
2) Now let us say you put the filter in your SQL query on the LAST_NAME column
SELECT MAX(EMP_NUMBER) from EMP where LAST_NAME='SMITH';
Assuming that there are only 4 Employee with last name as ‘SMITH’
selectivity = number of rows accessed/total number of rows = 4/10 = 0.4 (40% of the rows were accessed)
cardinality = number of rows accessed = 4
This is one of the post in Performance Tuning Fundamentals Series. Click on below links to read more posts from the series:
[catlist name="Performance Tuning Series"]
[Post Views: 477]
You can connect with me on LinkedIn.
Latest posts by Brijesh Gogia (see all)
- Cassandra – 4 –Cassandra Architecture terminology - March 30, 2019
- Cassandra – 3 – Related Terms : ACID, BASE,CAP Theorem - March 15, 2019
- Cassandra – 2 – Basics of Cassandra - March 9, 2019