SQL Server

Microsoft SQL Server (MSSQL) is a commercial DBMS supporting both structured and unstructured data sources, scaling to petabytes of data. While highly optimized primarily for OLTP use cases (such as in-memory Hekaton engine), it can also be configured to work as an OLAP DBMS.

History

Microsoft, Ashton-Tate (the creator of dBASE) and Sybase (currently SAP) started a partnership to build DBMS in 1988, resulting in SQL Server 1.0 (Filipi) in 1989 working under OS/2 (cooperated work between Microsoft and IBM). After ending relationship with Ashton-Tate, SQL Server 1.1 (Pietro) supported in Windows 3.0 and OS/2 has released in 1991.

In 1993, due to different pursue of marketing schemes and design, the partnership between Sybase and Microsoft ended, Microsoft created SQL Server 4.21a(SQLNT) available in Windows NT. Microsoft independently updated and released SQL Server 6.0(SQL95) supported in Windows 95 and the main feature was replication.

In 1998, SQL Server 7.0 (Sphinx) has dramatic changes compared to other previous update versions. Its core database engine is no longer based on Sybase engine written in C but independently written C++.

Foreign Keys

Supported

Both Primary and Foreign Key constraints are supported. Declaring a column as Primary key automatically results in a unique index built on it. There are some limits on foreign key references:

  1. Maximum number of other tables and columns that can be referenced as a foreign key: 253
  2. SQL Server 2016 onwards, the number of maximum incoming foreign key references has been increased to: 10000(from 253).

Storage Model

N-ary Storage Model (Row/Record) Custom

By default SQL Server is a NSM(row-store). When In-Memory OLTP via Hekaton is enabled, an in-memory buffer pool for hot-data stores incoming rows. Over time the data is moved to the cold traditional disk based storage. Optionally column stores on specific columns of specific tables can be also enabled when expecting OLAP style usage by declaring a Clustered Columnstore index on that column.

Query Compilation

Code Generation Stored Procedure Compilation

Yes, as mentioned in their technical whitepaper, In-Memory OLTP, queries and procedures stored as T-SQL are aggressively converted to machine code. Memory optimized tables support natively compiled UDFs and stored procedures.

Concurrency Control

Multi-version Concurrency Control (MVCC) Two-Phase Locking (Deadlock Prevention)

MS SQL Server supports transactions. The Concurrency control scheme it uses by default is 2 Phase Locking. However it is possible to enable MVCC style row-versioning to ensure readers dont block writers. MVCC can be enabled by using the Isolation Levels - Read Committed with Row Versioning(Read Committed Snapshot) or by Snapshot Isolation. The Hekaton In-memory optimized engine which can be enabled by declaring a table as Memory Optimized also supports MVCC.

Query Interface

SQL

MS SQL Server supports an extension of SQL known as Transact-SQL (T-SQL).

Views

Virtual Views Materialized Views

MS SQL Server supports many different types of views.

  1. Standard User Defined Views: This is the standard View normally supported by DBMSs.
  2. Indexed View: Just another name for a Materialized View.
  3. Partitioned View: A Partitioned view is one which contains data joined from tables over the same(locally partitioned) or different SQL server instances.
  4. System View: This can be used to view and analyze catalog metadata such as information about user defined databases.

As is obvious from the views described above, the complexity of queries supported for views is pretty high - i.e. views involving joins not only over this SQL server instance but also between instances seems to be supported.

Query Execution

Tuple-at-a-Time Model

MS-SQL Server 2017 uses the Volcano-style query processing model.(Note: Based on Slides from Class) Intraquery parallelism within SQL Server works as follows - the data to be operated on is partitioned between threads running on different CPUs. Then each thread operates on the data assigned to it while using SIMD intrinsics wherever possible. It should be noted that Intraquery parallelism should only be turned on for OLAP queries and would result in overhead/be inappropriate for OLTP workloads. The linked reference goes into a lot more detail and lists some useful knobs associated with this option.

Isolation Levels

Read Uncommitted Read Committed Serializable Snapshot Isolation Repeatable Read

The Isolation levels supported are:

  1. Serializable: Implemented using Range-locks. This prevents other transactions from updating or writing other rows in the range.
  2. Snapshot: Ensures that statements of a transaction reads a consistent value during transaction duration. This is ensured via MVCC schemes where multiple versions of the data are maintained(thus ensuring readers dont block writers) and transactions read data consistent corresponding to their own timestamp.
  3. Repeatable Read: Write locks are used to enforce the behavior that uncommitted writes cannot be read. Even shared locks are issued and all these are held till the end of the transaction.
  4. Read Committed(Default): The behavior of this option depends on what READ_COMMITTED_SNAPSHOT is set to. If set to OFF(the default), this isolation level is guaranteed using read(shared) locks to prevent data being read by one(or a group of txns) to be written into(by another txn). If set to ON, row-versioning for maintaining the isolation level.
  5. Read Uncommitted: No read locks or exclusive locks are issued.

System Architecture

Shared-Disk

By default MS SQL Server clusters require shared disks. There do seem to be some unconventional and proprietary solutions which seem to be available for having completely shared nothing architectures.

Joins

Nested Loop Join Hash Join Sort-Merge Join

MS SQL Server 2017 supports Hash joins, Merge Joins and Nested Loop joins. For Hash Joins, the system initially defaults to an in-memory hash join. But as soon as the memory requirements for the build phase become too large, the system switches to a Grace Hash Join and Recursive Hash Join depending on the input size and characteristics. If the build phase memory needs are only slightly larger than the available memory, SQL Server will apply a hybrid of In-memory and Grace hash join. They support an additional feature called the Batch Mode Adaptive Join Operator. Basically what this does is to scan the first join input and based on its characteristics decides which join would be better. This feature is only supported for selecting between Hash and Nested Loop joins.

Logging

Physical Logging

MS SQL Server supports physical logging via a feature they advertise as the Transaction Log. Their logging scheme allows for individual or incomplete transaction recovery as well as rollbacks of the database to ensure durability. The logs are managed carefully by spilling over to multiple files when exceeding a tunable threshold. The system truncates log files as much as it can to prevent overly huge log file sizes.

Storage Architecture

Hybrid

SQL Server supports In-Memory OLTP transactions via Hekaton. However if the memory needs of transactions are much more than memory can handle it is better to not use memory optimized tables, or alternatively use it for specific performance critical tables. OOM issues are definitely a possibility in the worst cast - SQL Server has setup an entire Wiki page to help customers resolve their Out of Memory issues. In a scenario where the working set may not fit in-memory, it is possible to use the original setup of SQL server without Hekaton by not declaring tables as memory optimized.

Stored Procedures

Supported

Yes, stored procedures are supported and can be written using the Transact-SQL Syntax or using the Microsoft .NET Framework common runtime language (CLR) method.

Checkpoints

Non-Blocking Consistent Fuzzy

According to the SQL Server documentation, the following types of checkpoints are supported:

  1. Automatic Checkpoints: These resemble fuzzy checkpoints with the addition that they can be tuned to automatically be taken depending on the number of changes in the database over time. Naturally they end up having some additional overhead during recovery.
  2. Indirect Checkpoints: These more resemble consistent checkpoints as they limit the number of dirty pages allowed while checkpointing is ongoing. As a result it is advised that performance degradation might be observed. The main benefit is that they have lower recovery time.

Indexes

B+Tree Hash Table

Yes, both Primary key and secondary key indexes are supported. For both of these the index types(ref to definitions) supported are:

  1. Clustered:
  2. Unclustered
  3. Unique
  4. Filtered
  5. Hash
  6. Columnstore
  7. Spatial
  8. XML
  9. Full-Text

The internal data structures used for these indexes are BW-Trees primarily. BW-Trees are an in-memory optimized lock-free variant of the simple B+ Tree used in traditional DBMS. In-Memory hash table based Hash Indexes are also supported.