03 July, 2019

Internal Structure of Data File

Data files contain data and objects such as tables, indexes, stored procedures, and views.
Storage space allocated to a database is divided into pages that are contiguously numbered from 0 to N. That means data file is a large array of pages.
When Microsoft SQL Server starts expanding the database file from its default size, the newly created pages are numbered starting from the (last highest page number in the file)+1.
Similarly, when SQL server shrinks the file it removes pages descendingly (starting with the highest page number) from the database file.
Disk I/O operations (read/write) are performed at the page level.

Basically, a SQL Server data file has the following basic structure.
SQL server refer the first page in the data file as page number 0.In all the data file first 9  pages (till page number 8) are in same order as shown below.In the primary data file, the 10th page (Page number 9) will be the boot page which store the metadata about the database.







File Header:

All the files have a header into the page number 0. This header stores metadata about that particular file and is not recoverable by checkdb. In case of damage, you must restore the entire file.
There will be only one header page for each data file and that reside in the 0th position.
You can explore the header of some database using the following DBCC command:
DBCC fileheader [( {‘dbname’ | dbid} [, fileid])
Ex: DBCC fileheader ( 'EMP',2)
You will find useful information on RedoStartLSN, BindingId, SectorSize, Status, and Growth records.

How SQL server tracking free space-->Page Free Space:

The PFS pages are used to track allocation status of pages and determines the amount of free space.
While allocating pages , database engine identify the extent with free pages using the GAM and SGAM.
Once the database engine  found extent with free page, it use the PFS page to identify the unallocated page in that extent and amount of free space in the page. The  SQL Server allocates a new extent only when it cannot find a page with sufficient space to hold the data.
Free space is only tracked for pages storing LOB values  (ie  text/image, varchar(max),nvarchar(max),varbinary(max) ,row overflow data) and heap data pages.
By default LOB data are stored in a separate page by keeping only a pointer along with record stored in the data pages. These are the only pages where data can be stored in the available free pages.
For index pages, the data should be stored in the order of index and there is no options in the insertion point of the records.So it will be unnecessary overhead of tracking free space in the index pages.
PFS page keep one byte for each page in the PFS interval. and cover a range of 8,088 pages in a file.
The basic PFS structure is:
Bit 1: Indicates whether the page is allocated or not.
Bit 2: Indicates if the page is from a mixed extent.
Bit 3: Indicates that this page is an IAM page.
Bit 4: Indicates that this page contains ghost records
Bits 5 to 7: A combined three-bit value, which indicate the page fullness as follows:
0: The page is empty
1: The page is 1–50% full
2: The page is 51–80% full
3: The page is 81–95% full
4: The page is 96–100% full
The first page of each file is a PFS, and another PFS is allocated for every 8,000 pages after the first PFS.

How SQL server Managing Extent Allocations

SQL Server uses two special types of pages to record which extents have been allocated and for which type of use (mixed or uniform) the extent is available:

Global Allocation Map (GAM) pages: 

SQL Server uses Global Allocation Map (GAM)  pages to keep track of all free extents in a database file. A GAM has a bit for each extent in the interval this covers.
If the bit is 1, the corresponding extent is free.
If the bit is 0, the corresponding extent is in use as uniform or mixed extent.
A GAM can cover about 64,000 extents, or almost 4 GB of data. So, there is one GAM page for every 4 GB of file size.

Shared Global Allocation Map (SGAM) pages: 

SGAM pages track what extents are currently being used as mixed extent and also have at least one unused page.The SGAM has a bit for each extent in the interval it covers.
If the bit is 1, the corresponding extent is used as a mixed extent and has at least one page free to allocate.
If the bit is 0, the extent is either not used as a mixed extent or it is mixed extent and with all its pages being used.
A SGAM can cover about 64,000 extents, or almost 4 GB of data. So, there is one SGAM page for every 4 GB of file size.











To allocate a uniform extent, the SQL Server Database Engine searches the GAM for a 1 bit and sets it to 0.
To allocate a mixed extent, the SQL Server Database Engine searches the GAM for a 1 bit, sets it to 0, and then also sets the corresponding bit in the SGAM to 1.
To find a mixed extent with free pages, the SQL Server Database Engine searches the SGAM for a 1 bit.
To deallocate an extent, the SQL Server Database Engine makes sure that the GAM bit is set to 1 and the SGAM bit is set to 0.

The fifth and sixth page, page number 4 and 5 are not used in the current architecture of the SQL server.
The page type of these pages is 0. DBCC page for these pages will print the header part and it will end with invalid page type error.

How SQL server Tracking Modified Extents:

SQL Server uses two internal data structures (DCM &BCM) to track extents modified since the last full backup and extents modified by bulk copy operations .
These data structures greatly speed up differential backups. They also speed up the logging of bulk copy operations when a database is using the bulk-logged recovery model.

Differential Changed Map (DCM):

This page is used to keeps track of which extents in a file have been modified since the last full database backup.
If the bit for an extent is 1, the extent has been modified since the last BACKUP DATABASE statement.
If the bit is 0, the extent has not been modified.
A DCM can cover about 64,000 extents, or almost 4 GB of data. So, there is one DCM page for every 4 GB of file size.

While taking differential backup, SQL server read just the DCM pages to determine which extents have been modified and  backup only extents marked as changed in the DCM pages.
It helps SQL server to speed up the differential backup process with out scanning through all pages to check it is modified or not after the last full backup.

IF DCM page not inplace, SQL Server stores the last LSN that changed  In the page header of each page. (in the m_lsn field).It also stores the last LSN number included in the last full backup.
That means, SQL Server could read each page, compare the two LSN values and copy the page to the backup file, if its LSN is larger than the full-backup-LSN.
However, that would require every page to be read into memory, quite an expensive operation for large databases.
The length of time that a differential backup runs is proportional to the number of extents modified since the last BACKUP DATABASE statement and not the overall size of the database.
Note:
Differential backup will not clear the DCM page and differential backup contain all the changes happened from the last full backup. While taking full backup, SQL server reset the bits in DCM page to mark it is not changed.

Bulk Changed Map (BCM):

This page is used to keep tracks the extents that have been modified by bulk logged operations since the last BACKUP LOG statement.
If the bit for an extent is 1, the extent has been modified by a bulk logged operation after the last BACKUP LOG statement.
If the bit is 0, the extent has not been modified by bulk logged operations.
A BCM can cover about 64,000 extents, or almost 4 GB of data. So, there is one BCM page for every 4 GB of file size.

Although BCM pages appear in all databases, they are only relevant when the database is using the bulk-logged recovery model.
In this recovery model, while taking log backup, SQL server scans the BCM pages for extents that have been modified and include the extents that are marked as changed in the log backup along with transaction log.
Observetion:
This helps SQL server to make bulk logged operation to be recoverable if the database is restored from database backup and a sequence of transaction log backups.
In the bulk logged recovery model, bulk logged operation like BCP,Bulkinsert,Select Into are minimally logged in the transaction log and modified extents are tracked through the BCM pages.This helps SQL server to have optimized performance for bulk operations.

BCM pages are not relevant in a database that is using the simple recovery model, because no bulk logged operations are logged.
They are not relevant in a database that is using the full recovery model, because that recovery model treats bulk logged operations as fully logged operations.

Note that BCM pages are not cleared by taking a full backup or differential backup.The base of the transaction log backups is the previous transaction log backup.
SQL server should know what changes are happened between two transnational log backup.If a differential or full backup cleared the BCM page,next transaction log backup can not track/recover the changes between the previous log backup and full/differential backup.
After the log backup only, the bits in the BCM pages are flipped to mark there is no extent changed due to the bulk logged operation.
In short a full backup or differential backup will not break the transaction log backup chain.

How SQL server Managing space used by objects:Index Allocation Map (IAM):

This page is used to keeps track of the extents that belong to specific allocation unit of a table\object.
The IAM page also has a large bitmap in which each bit represents one extent. The first bit in the map represents the first extent in the range, the second bit represents the second extent, and so on.
If the bit is 1, the extent it represents is allocated to the allocation unit owning the IAM page.
If a bit is 0, the extent it represents is not allocated to the allocation unit owning the IAM.
This kind of page covers a 4-GB range, belongs to a GAM interval. So For each 4-GB range of data, partition, and allocation unit type, an IAM page is required to track.
For example, a table on four partitions that has all three types of data (in-row, LOB, and row-overflow) has at least 12 IAM pages.

When the SQL Server Database Engine has to insert a new row and no space is available in the current page, it uses the IAM and PFS pages to find a page to allocate, or, for a heap or a Text/Image page, a page with sufficient space to hold the row.
The SQL Server Database Engine uses the IAM pages to find the extents allocated to the allocation unit. For each extent, the SQL Server Database Engine searches the PFS pages to see if there is a page that can be used.

Boot Page:

The base metadata about the whole database is stored in this page.
Boot page is available as 9th page only in the primary data file. Boot page will not be available in the secondary data file.We can see the content of this page using the DBCC Page command and some of the values stores in this page are self explanatory.If this page is corrupted for some reason, it is not possible to recover the database using DBCC CheckDb. The page restore also will not help in this situation. The only possible way to recover the database is restore from last good backup.

From the 11th page on wards, you can see mix different types pages like data page,index pages,row-overflow pages and LOB pages.

1.Data Page: It holds data rows with all data, except text, ntext, image, nvarchar(max), varchar(max), varbinary(max), and xml data, when text in row is set  to ON.

2.Text/Image Page/LOB Page: It stores LOB ( Large Object data) like text, ntext, varchar(max), nvarchar(max),  varbinary(max), image and xml data. LOB pages are used to store the large objects which are not stored as part of row data.

3.Row overflow data pages: When the total row size of all fixed and variable columns in a table exceeds the 8,060-byte limitation, SQL Server dynamically moves one or more variable length columns to pages in the ROW_OVERFLOW_DATA pages , starting with the column with the largest width. This is done whenever an insert or update operation increases the total size of the row beyond the 8,060-byte limit.

4.Index Page: This pages are used to stores the all index entries.

No comments:

Post a Comment