I was sent a CREATE INDEX script by a third party today – it was non-clustered. Whilst reviewing the index definition I noticed that the Clustered Index key had been added as an include column.
This started me thinking – given that all non-clustered index pages must contain a pointer to the clustered index key anyway, what does the act of including the clustered index key actually do?
Let’s set up some data. We can then directly compare the data pages for the two indexes and check what difference including the id column makes! For now I’ll simply add the id column as a key column in the non-clustered index.
CREATE TABLE t1 (id INT IDENTITY (1,1) PRIMARY KEY CLUSTERED, txt VARCHAR(100)) GO CREATE NONCLUSTERED INDEX t1Index ON t1 (txt); GO CREATE TABLE t2 (id INT IDENTITY (1,1) PRIMARY KEY CLUSTERED, txt VARCHAR(100)) GO CREATE NONCLUSTERED INDEX t2Index ON t2 (txt,id); GO INSERT INTO t1 SELECT 'TestText' FROM sys.[objects] o1 CROSS JOIN sys.[objects] o2; INSERT INTO t2 SELECT 'TestText' FROM sys.[objects] o1 CROSS JOIN sys.[objects] o2;
On my system with a fresh database the above code inserted 9025 rows into each of the above tables.
We can use DBCC IND to interrogate what pages have been created
DBCC IND ([TestDb],t1,-1)
The above produces a result set similar to the below
What the data above means is out of the scope of this article, but note the different page types (10 = IAM, 1 = Data Page, 2 = Index Page) and the different Index levels (0 at the leaf level, 1 at the root level in my case).
We can use the above data along with DBCC PAGE to check the contents of our row values. Trace Flag 3604 must be enabled for DBCC to return results.
Firstly we will examine an index page for the indexes we created above (IndexId 2) for the leaf level pages (IndexLevel 0)
DBCC PAGE ([TestDb],1,630184,3) WITH TABLERESULTS
The above dumps a whole bunch of information, but the stuff we want is the index data itself under each slot.
The above data was taken from the t1 table – the index without the id column included in the index.
Reading the data in the above format from the record itself (the right hand column) we can see the first two values are 36 – this is the page status byte that tells us the status of the page – after this our fixed data starts. In our case the value below represents our Clustered Index key value stored in Little-endian (We need to read each byte left-right, but the whole string right-left). The value 00000b82 represents our clustered index value stored in hexadecimal – the value itself being 2946 for this example.
So we can see that regardless of the Clustered Index not being included in the non-clustered index definition itself, the Clustering key is still stored on the data page.
Comparing this to a page from t2, we can see that there is no difference in what is being stored on the page, regardless of the difference in the definition of the indexes.
What about included columns? Does the Id column get added to intermediary nodes on the b-tree?
Replicating all of the above, but changing the t1 index definition to the below gives us our answer
CREATE NONCLUSTERED INDEX t2Index ON t2 (txt) INCLUDE (id);
Dumping the pages for IndexLevel 1 we get the following results for t1 and t2 respectively
Again we can see that the first byte contains our page status, and the 4 following bytes refer to the Clustered Index Key, regardless of whether id is included in the index definition or not.
Rather confusingly, you can both add the clustered index key to a non-clustered index definition and script out that index definition with the clustered index key included, but the clustered index is always included regardless. This means that including the clustered index key columns in a non-clustered index makes no difference whatsoever to the underlying data that is actually stored.