Understanding SQL Server Storage Structures - Chapter 11


    To get the best performance of the SQL Server , you need to have a good understanding of the internal storage structures and how SQL Server stores and manages objects and data within the database.
  1. SQL Server Storage Structures SQL Server doesn’t see data and storage in exactly the same way a DBA or end-user does. DBA sees initialized devices, device fragments allocated to databases, segments defined within databases, tables defined within segments, and rows stored in tables. SQL Server views storage at a lower level as device fragments allocated to databases, pages allocated to tables and indexes within the database, and information stored on pages (see Figure 11.1, Page 321). There are two basic types of storage structures in a database: Linked data pages and index trees. All information in SQL Server is stored at the page level. When a database is created, all space allocated to it is divided into a number of pages, each page 2KB in size. There are five types of pages within SQL Server: 1. Data and log pages 2. Index pages 3. Text/image pages 4. Allocation pages 5. Distribution pages All pages in SQL Server contain a page header. The page header is 32 bytes in size and contains the logical page number, the next and previous logical page numbers in the page linkage, the object_id of the object to which the page belongs, the minimum row size, the next available row number within the page, and the byte location of the start of the free space on the page. The contents of a page header can be examined by using the dbcc page command. You must be logged in as sa to run the dbcc page command. The syntax for the dbcc page command is as follows: dbcc page (dbid | page_no [,0 | 1 | 2]) See the example on pages 321-322. The SQL Server keeps track of which object a page belongs to, if any. The allocation of pages within SQL Server is managed through the use of allocation units and allocation pages.
  2. Allocation Pages Space is allocated to a SQL Server database by the create database and alter database commands. The space allocated to a database is divided into a number of 2KB pages. Each page is assigned a logical page number starting at page 0 and increased sequentially. The pages are then divided into allocation units of 256 contiguous 2KB pages, or 512 bytes (1/2 MB) each. The first page of each allocation unit is an allocation page that controls the allocation of all pages within the allocation unit. The allocation pages control the allocation of pages to tables and indexes within the database. Pages are allocated in contiguous blocks of eight pages called extents. The minimum unit of allocation within a database is an extent. When a table is created, it is initially assigned a single extent, or 16KB of space, even if the table contains no rows. There are 32 extents within an allocation unit (256/8). An allocation page contains 32 extent structures for each extent within that allocation unit. Each extent structure is 16 bytes and contains the following information: 1. Object ID of object to which extent is allocated 2. Next extent ID in chain 3. Previous extent ID in chain 4. Allocation bitmap 5. Deallocation bitmap 6. Index ID (if any) to which the extent is allocated 7. Status The allocation bitmap for each extent structure indicates which pages within the allocated extent are in use by the table. The deallocation bit map is used to identify pages that have become empty during a transaction that has not yet been completed. The actual marking of the page as unused does not occur until the transaction is committed, to prevent another transaction from allocating the page before the transaction is complete.
  3. Data Pages A data page is the basic unit of storage within SQL Server. All the other types of pages within a database are essentially variations of the data page (see page 328). All data pages contain a 32-byte header, as described earlier. With a 2KB page (2048 bytes) this leaves 2016 bytes for storing data within the data page. In SQL Server, data rows cannot cross page boundaries. The maximum size of a single row is 1962 bytes, including row overhead. Data pages are linked to one another by using the page pointers (prevpg, nextpg) contained in the page header. This page linkage enables SQL Server to locate all rows in a table by scanning all pages in the link. Data page linkage can be thought of as a two-way linked list. This enables SQL Server to easily link new pages into or unlink pages from the page linkage by adjusting the page pointers (see page 329). In addition to the page header, each data page also contains data rows and a row offset table. The row-offset table grows backward from the end of the page and contains the location or each row on the data page. Each entry is 2 bytes wide.
  4. Data Rows Data is stored on data pages in data rows. The size of each data row is a factor of the sum of the size of the columns plus the row overhead. Each record in a data page is assigned a row number. A single byte is used within each row to store the row number. Therefore, SQL Server has a maximum limit of 256 rows per page, because that is the largest value that can be stored in a single byte (2^8). For a data row containing all fixed-length columns, there are four bytes of overhead per row: 1 byte to store the number of variable-length columns (in this case, 0) 1 byte to store the row number 2 bytes in the row offset table at the end of the page to store the location of the row on the page If a data row contains variable-length columns, there is additional overhead per row. A data row is variable in size if any column is defined as varchar, varbinary, or allows null values. In addition to the 4 bytes of overhead described previously, the following bytes are required to store the actual row width and location of columns within the data row: 2 bytes to store the total row width 1 byte per variable-length column to store the starting location of the column within the row 1 byte for the column offset table 1 additional byte for each 256-byte boundary passed (adjust table) Within each row containing variable-length columns, SQL Server builds a column offset table backward for the end of the row for each variable-length column in the table. Because only 1 byte is used for each column with a maximum offset of 255, an adjust byte must be created for each 256-byte boundary crossed as an additional offset. Variable-length columns are always stored after all fixed-length columns, regardless of the order of the columns in the table definition. Fixed-length Row: Create table t1 (col-a char (10), col-b char (25), col-c int) Diagram of a row with all fixed-length fields 0 1 2 12 37 41 0 10 data for col-a data for col-b data for col-c No variable length fields (byte 0 = 0) Row number = 10 No row length stored Data row is 41 bytes wide Variable-length Row: Create table t2 (cola char (10), colb varchar (25), colc int cold varchar (10)) Diagram of a row with variable-length fields 0 1 2 12 16 18 38 43 44 45 46 47 2 12 cola colc 47 colb cold 1 43 38 18 (cold) (colb) Two variable length fields (stored in byte 0) Row number = 12 (stored in byte 1)Data row is 47 bytes wide (stored in bytes 16&17)Adjust byte located at byte 43 (stored in byte 44)Two bytes at end of record to store location of variable length columns
  5. Estimating Row and Table Sizes Knowing the size of a data row and the corresponding overhead per row helps you determine the number of rows that can be stored per page. The number of rows per page affects the system performance. A greater number of rows per page can help query performance by reducing the number of ages that need to be read to satisfy the query. Conversely, fewer rows per page help improve performance for concurrent transactions by reducing the chances of two or more users accessing rows on the same page that may be locked. Let's take a look at how you can estimate row and table sizes. Fixed-length fields with no null values. Sum of column widths + overhead Overhead (4 bytes): 1 byte to store the row number 1 byte to store the number of variable-length columns 2 bytes per row in the row offset table For example, t1 (Fixed-length fields) (10+25+4)+4 = 43 bytes per row Variable-length fields or nullable columns: Sum of all fixed column widths + 1 byte to store the row number + 1 byte to store the number of variable-length columns + 2 byte for the column offset table + sum of average size of variable-length columns + number of variable-length columns (1 byte per column) + 1 byte (for column offset table within row) + 2 bytes (for row length) = subtotal + subtotal / 256 rounded up to next integer = average row size Each row containing fixed-length columns has a minimum 4 bytes of overhead, plus a minimum of 5 bytes of overhead if it contains at least one nullable or variable-length field. Table t2 14 (10 + 4 for fixed fields) + 4 (for overhead) +18 (13 + 5 for sum of average size of variable fields) + 2 (number of variable fields) + 1 (for columns offset table) + 2 (for row length) = 41 (subtotal) +1 (41/256 rounded up to adjust table bytes) = 42 bytes (average row size) When you know the average data row size, you can determine the number of rows per page by dividing the row size into the average space on the data page, 2016 bytes. For example, if your average row size is 42 bytes, the average number of rows per page is the following: 2016 / 42 = 48 rows per page. ** Round down any fraction. When you know the average number of rows per page, you can calculate the number of pages required to store the data by dividing the total number of rows in the table. For example, if you have 100,000 rows in the table, you need the following number of pages to store the data: 100,000 / 48 = 2083.333 ** Round up.
  6. The Row Offset Table The location of a row within a page is determined by using the row offset table at the end of the page. To find a specific row within the page, SQL Server looks in the row offset table for the starting byte address within the data page for that row ID. Note that SQL Server keeps all free space at the end of the data page, shifting rows up to fill in where a previous row was deleted and ensuring no space fragmentation within the page. If the offset table contains a zero value for a row ID, that indicates that the row has been deleted. See Figure 11.11 (Without clustered index) and Figure 11.12 (with clustered index) Pages 334 & 335.
  7. Index and B-Tree Structure To this point, we have examined only the table structure. A table with no clustered index is stored as a heap structure. All data is added at the end of the table. You can think of it as a single file where all new records are merely added to the bottom of the file. This structure is the fastest way of adding data to the table. When there is no clustered index on the table, there is a row in sysindexes with an indid of 0. The root column of this row points to the last page in the table where inserts are to occur. SQL Server adds records to this page until the page is full and then links in a new page to the table. Needless to say, this is not very efficient method for retrieving data. Indexes are storage structures separate from the data pages in the table itself. The primary functions of indexes are to provide faster access to the data and provide a means for enforcing uniqueness of your data rows. All SQL Server indexes are B-Tree. There is a single root page at the top of the tree, branching out into N number of pages at each intermediate level until it reaches the bottom, or leaf level, of the index. The index tree is traversed by following pointers from the upper-level pages down through the lower-level pages. In addition, each index level is a separate page chain. There may be many intermediate levels in an index. The number of levels is dependent on the index key width, the type of index, and the number of rows and/or pages in the table. The number of levels is important in relation to index performance.
  8. Clustered Index When a clustered index is created on a table, the data in the table is physically stored in clustered index key order. SQL Server allows only one clustered index per table.
  9. Estimating Clustered Index Size The formula for determining clustered index row with all fixed-length fields is the following: Sum of fixed-length fields + 1 byte (number of variable columns) + 4 bytes (page pointer) The formula for determining clustered index row with variable-length fields is the following: Sum of fixed-length fields + 1 byte (number of variable columns) + 4 bytes (page pointer) + 2 bytes (index row width) + sum of average variable-columns + 1 byte (adjust table) + 1 byte (location of offset table) + 1 byte per variable-length column Assume you have a clustered index on a char(10) column t hat doesn't allow null values. Your index row size would be the following: 10 bytes + + 1 byte (number of variable columns = 0 in this case) + 4 bytes (page pointer) = 15 bytes per index row. Because the clustered index contains a pointer to each page in the table, the number of rows at the bottom level of the ndex is equal to the number of pages in the table. If your data row size is 42 bytes, you can get 48 (2016/42) data rows per data page. If you have 100,000 (records)rows in the table, it works out to 2048 pages in the table. Therefore, ou have 2084 rows in the bottom level of the clustered index. Each index page, like a data page, has 2016 bytes available for toring index row entries. To determine the number of clustered index rows per page, divide the index row size into 2016 bytes and multiply by the fill factor. For this example, assume that the default fill factor is being applied, which fills clustered index pages about 75 percent full: 2016 / 15 = 134.4 rounded down to 134 rows per page * 0.75 = 100 rows per page. At 100 rows per page, you need 2084 /100 = 20.84, rounded up to 1 pages, to store all the rows at the bottom level of the index. The top level of the index must be a single root page. You need o build levels on top of one another in the index until you reach a single root page. To determine the total number of levels in the index, use the following algorithm: N = 0 Divide number of data pages by number of index rows per page While number of index pages at level N > 1 Begin Divide number of pages at level N by number of rows per page N = N + 1 End Applying the algorithm Level 0: 2084 / 100 = 21 pages Level 1: 21 / 100 = 1 page Thus, the index contains two levels, for a total size of 22 pages, or 44 KB. The I/O cost of retrieving a single row by using the index is the number of levels in the index plus a single data page. 2 index levels + 1 data page = 3 page I/Os
  10. Nonclustered Index A nonclustered index is a separate index structure independent of the physical sort order of the data in the table. SQL Server allows up to 249 nonclustered indexes per table. The nonclustered index tree contains the index keys in sorted order, with the leaf level of the index containing a pointer to the data page and the row number within the page where the index key can be found. There is a row in the leaf level for every data row in the table. The intermediate and root level pages contain the entire leaf level row information plus an additional page pointer to the page at the next level down containing the key value in the first row (See page 341, Figure 11.18). Because nonclustered indexes contain page and row pointers, the size and number of levels in a nonclustered index is dependent on the width of the index key and the number of rows in the table. The nonleaf rows of a nonclustered index contain the entire leaf row information-index key plus page and row pointer-and an additional page pointer to print to the pages at the next lower level. The nonleaf rows of a clustered index are constructed similarly to a clustered index.
  11. Estimating Nonclustered Index Size The formula for determining nonclustered index row with all fixed-length fields is the following: Sum of fixed-length fields + 1 byte (number of variable columns) + 6 bytes (page pointer) The formula for determining nonclustered index row with variable-length fields is the following: Sum of fixed-length fields + 1 byte (number of variable columns) + 6 bytes (page pointer) + 2 bytes (index row width) + sum of average variable-columns + 1 byte (adjust table) + 1 byte (location of offset table) + 1 byte per variable-length column Assume you have a nonclustered index on a char(10) column that doesn't allow null values. Your index row size would be the following: 10 bytes + 1 byte (number of variable columns = 0 in this case) + 6 bytes (page pointer) = 17 bytes per leaf index row. Because the nonclustered index contains a pointer to each page in the table, the number of rows at the leaf level of the index is equal to the number of pages in the table. If you have 100,000 (records)rows in the table, you will have 100,000 rows in the nonclustered leaf level. Each nonclustered index page, like a data page, has 2016 bytes available for storing index row entries. To determine the number of nonclustered leaf index rows per page, devide the leaf index row size into 2016 bytes: 2016 / 17 = 118.6 rounded down to 118 rows per page * 0.75 = 88 rows per page. At 88 rows per page, you need 100,000/88 = 1136.4, rounded up to 1137 pages, to store all the rows at the leaf level of the index. The nonleaf rows of a nonclustered index are like clustered index rows in that they contain pointers to all pages at the next level down in the index. Each nonleaf row contains the full index row from the leaf level plus an additional page pointer, 4 bytes in size. Therefore, the size of the nonleaf rows equals this: Size of leaf row + 4 bytes (page pointer) For the example, the nonleaf rows are the following: 17 bytes (leaf index row size) + 4 bytes (page pointer) = 21 bytes per nonleaf index row To determine the number of nonclustered nonleaf index rows per page, divide the nonleaf index row size into 2016 bytes and multiply by the fill factor. For this example, we use a fill factor of 75 percent: 2016 / 21 = 96 rows per page * 0.75 = 72 nonleaf rows per page Like the clustered index, the top level of the index must be a single root page. To determine the total number of levels in the nonclustered index, use the following algorithm: N = 0 Divide number of data pages by number of index rows per page While number of index pages at level N > 1 Begin Divide number of pages at level N by number of rows per page N = N + 1 End Applying the algorithm Level 0: 100,000 / 88 = 1137 pages Level 1: 1137 / 72 = 16 pages Level 2: 16 / 72 = 1 page Thus, the nonclustered index contains three levels, for a total size of 1154 pages, or 2308 KB. The I/O cost of retrieving a single row by using the index is the number of levels in the index plus a single data page. 3 index levels + 1 data page = 4 page I/Os
  12. Indexes and Performance Indexes can help speed up data retrieval by giving a direct path to the desired data and avoiding a costly table scan. However, indexes have an adverse impact on update performance because the indexes need to be maintained "on-the-fly" by SQL Server to keep the proper sort order within the index and table. With or without a clustered index on the table, every time you insert, delete, or update a row, causing the row to move within the table, all nonclustered indexes on the table need to be updated to reflect the new location. With a large number of indexes on a table, the overhead during data modification may become excessive. For tables involved in Online Transaction Processing (OLTP) types of applications, you should try to keep the number of indexes to fewer than five.
  13. SQL Server Index Maintenance SQL Server indexes are self-maintaining structures. That is, they allocate additional space as needed and deallocate space as rows are deleted, while maintaining the sort order of the index tree. There is typically no need to perform a "reorg" on SQL indexes.
  14. 50-50 Page Splits Normally, when inserting data into a table with a clustered index, the row is inserted into the appropriate position within the appropriate data and/or index page. If the data or index page is full, SQL Server performs a 50-50 page split.
  15. 100-0 (Monotonic) Page Splits If the table has no clustered index, or the clustered index is on a sequential key and data rows are inserted in sequential key order, all new rows are added to the end of the last page in the table. When a data index page is full and a new row needs to be added to the end of the page, SQL Server performs a monotonic page split. Monotonic page split provides less index maintenance and overhead. The downside is that free space on previous pages isn't reused, because all new rows are added at the end of the page chain. The only way to recover this "lost" space is to rebuild the clustered index and reapply a new fill factor.
  16. Overflow Pages Let's examine what happens if you have a nonunique clustered index that allows duplicate rows and you need to add a duplicate value to the end of the page. An overflow page is generated only if the duplicate key value matches the last data row on the page and no more room exists to store additional rows on that page. The clustered index still points to the original data page, and the data page points to the overflow page.
  17. Page Merges As you delete data rows from pages, the remaining rows are shuffled upward on the page to keep all free space at the end of the page. SQL Server merges the index row into an adjacent index page at the same level. This behavior helps to keep the index tree smaller and more efficient.
  18. The Fill Factor The fill factor is a percentage specifying how full you want your leaf index and/or data pages when the index is created. A lower fill factor has the effect of spreading the data and leaf index rows across more pages by leaving more free space in the page. This reduces page splitting and dynamic reorganization, which can improve performance in environment where there are a lot of inserts and updates to the data. A higher fill factor has the effect of packing more data and indexes per page, by leaving less free space in the pages. This is useful in environments where the data is relatively static, because it reduces the number of pages required for storing the data and its indexes, and helps improve performance for queries by reducing the number of pages that need to be accessed. Providing a fill factor when creating a clustered index to the rows on the data pages for the table. A fill factor on a nonclustered index affects only the leaf index rows and doesn't affect the data pages. Also, the fill factor is applied only at index creation time and is not maintained by the SQL Server. Once you begin updating and inserting data, the fill factor eventually is lost. Therefore, it is only useful to specify a fill factor if the table contains data.
  19. Setting the Fill Factor The default fill factor is set at the server level, but it is typically provided at index creation time. The typical default fill factor set at the server level is 0. This setting minimizes the data storage requirements, but leaves some free space within the index tree to prevent excessive page splits within the nonleaf pages. If you want to change the server-wide default for the fill factor, use the sp_configure command: Sp_configure 'fill factor', N Create index idx_name on table (column) With fillfactor = N N can be any integer between 0 and 100.
  20. Reapplying the Fill Factor To reinforce the fill factor on a table or index in SQL Server, you can drop and create the index, specifying the new fill factor, or use the dbcc reindex command.
  21. Updates and Performance All updates within SQL Server are essentially a delete followed by an insert, unless performed as a direct update in-place. With a clustered index on the table, the row is reinserted to the appropriate location on the appropriate data page relative to its physical sort order. Without a clustered index, the row is inserted into one of three locations: 1. The same physical location on the same data page 2. The same data page if there is room' 3. The last page in the heap SQL Server performs two types of updates: 1. Deferred updates 2. Direct updates Direct updates can occur as in-place or not-in-place updates.
  22. Deferred Updates A deferred update is a multi-step process that occurs when the conditions are not met for a direct update to occur. Deferred updates are always required for the following: 1. Updates that include a join 2. Updates to columns used for referential integrity 3. Updates in which the data modification statement can have a cascading effect
  23. Direct Updates Direct updates can be performed either in-place or not-in-place. A direct update not-in-place is still a delete followed by an insert, but the updates can be applied in a single pass.

Last update February 23, 1998.