Oracle RDBMS

Oracle database is a relational DBMS that has extended the relational model to an object-relational model, to store business models in an RDBMS. Database Schema: Database schema is a collection of logical data structures Table: It represents the real-world entity and can have integrity constraints on columns Data Access: Structured Query Language (SQL): Oracle RDBMS uses PL/SQL which is an extension which helps store application logic in the DB itself Transaction Management: Oracle RDBMS supports multiuser concurrency. Data Concurrency: Oracle RDBMS enforces a statement-level and transaction-level read consistency. This is done to avoid the dirty read problem and based on the level of consistency, DBMS guarantees data returning from the single or multiple queries is consistent and committed. Oracle RDBMS is a set of physical structures like files and applications inside a single physical database can interact with multiple logical databases. Oracle RDBMS has multitenant architecture Sharding Architecture: Partition horizontally across multiple physical Oracle RDBMS. Useful for OLTP applications. After sharding, every database(shard) has a dedicated server and resources - CPU, flash, disk and memory and together they make up a single logical database.

History

In 1977, Larry Ellison, Robert Miner, and Ed Oates founded Software Development Laboratories, which was hired by the United States Central Intelligence Agency (CIA) in order to write a new database system based upon SQL. This system came to be known as Oracle. The company changed its name to Relational Software, Inc. in 1979 and then to Oracle Systems Corporation in 1982.

Query Interface

SQL

Oracle follows the industry-accepted standards for SQL. Oracle SQL has many extensions to standard SQL language to provide additional statements. Oracle RDBMS processes SQL statements using a query optimizer which generates execution plans based on access paths and statistics. Optimization The optimizer generates most of the possible ways to process a query and assigns a cost to each step and finally takes the plan with the lowest cost. The main components are •Query Transformer: changes the form of a query to generate the execution plan •Estimator: estimates the cost of a particular execution plan •Plan Generator: Generates different possible plans, sub plans for nested queries. It uses an adaptive query optimizations feature which changes the plans based on the statistics collected during the statement execution. This optimization uses a dynamic programming plan. Optimizer statistics describe the details of data storage and distribution. It includes table statistics, column statistics, index statistics, system statistics.

Concurrency Control

Multi-version Concurrency Control (MVCC)

Oracle employs MVCC for concurrency control. Data Concurrency and Consistency Oracle uses serializability for transaction isolation. It maintains data consistency by having a multi-version consistency model and the usage of locks and transactions. Oracle can prevent dirty reads, non-repeatable reads, phantom reads as it provides: 1. read committed: The default isolation level where query executed by transaction sees the data committed before the given query and thus avoids reading the commits happening while in the transaction. Even though it provides the read consistency(row, when reread, is the same as before if not committed by other transactions), there is a conflicting write issue when two concurrent transactions try to modify the same item. 2. serializable isolation: In this, the transaction views the changes committed before the transaction began and not just before the given query. So any commits to the item by another concurrent transaction say t2 is not reflected in the reread of the item by t1 implying read consistency. Oracle allows modifying a row if the changes made to it by another transaction are already committed before this transaction started. Otherwise, it generates an error read-only isolation level: Similar to serializable isolation but data is not permitted to be modified in the transaction. Read consistency is achieved by reconstructing the data from the undo statements. Locking Mechanism To prevent the incorrect updates of shared data between the concurrent transactions. Oracle has two types of locks (shared and exclusive) that provide data consistency, concurrency, and integrity. Oracle rules for taking locks : 1. When in the process of modifying a row, it is locked 2. When a row is being written by writer A, it blocks the concurrent writer B on the same row. 3. A reader can never block a writer. 4. A writer can never block a reader. Oracle automatically uses the lowest application-level lock to provide less restriction so more data is available to be accessed by others. Oracle also allows performs the lock conversions in order to maintain the consistencies. It never escalates locks, which is done to promote the lock level to a higher level when most of the locks are at a given granularity level to decrease the number of locks. Never escalation is done to avoid the probability of deadlocks. Also, locks are automatically released when transaction no longer needs them. In the case of deadlocks, oracle resolves them by rolling back the statements involved in the deadlock and releasing locks. Types of locks provided by oracle : 1. DML Locks: 1. Row Locks (TX) 2. Table Locks(TM) : 3. Row Share (RS) 4. Row Exclusive Table Lock (RX) 5. Share Table Lock (S) 6. Share Row Exclusive Table Lock (SRX) 7. Exclusive Table Lock (X) 2. DDL Locks 3. System Locks 1. Latches 2. Mutexes 3. Internal Locks

Joins

Nested Loop Join Hash Join Sort-Merge Join

Oracle provides 3 types of join methods: hash join, nested loop join, and sort-merge join. Based on the cost estimation, the optimizer picks the relevant plan for the query statement. Nested loop joins: They are useful when the DB joins the small subset of data or when it is efficient to access the inner table. It works well for the small table with the indexes on the join conditions. . They are equivalent to the for loops. Optimizer determines the driving row source and labels it as the outer loop and the other row source as the inner loop. Then for every request from the client, DBMS fetches a row from the outer row source, proves the inner row source to find the rows which matches the predicates and repeat this for every outer rows. Oracle DBMS 11g introduced it to reduce the I/O latency. DBMS optimizes and avoids the physical I/O if the table block or index needed for the join is in the memory (Buffer Pool). DBMS uses the batch I/O requests and sends an array of row ids instead of one. Hash Join: Used to join two larger data sets and when the join is equijoin. Oracle optimizer uses the smaller of the data sets to build the hash table on the join key in buffer pool(memory) PGA. It is most effective when the hash table completely fits in the memory. Deterministic hash function is used for specifying location in hash table for storing each row. DBMS generates the hash values between 1 and the size of the table. Input values to the hash table are the join keys and the outputs are the slots in the array. When a query comes, DBMS scans the larger data set and probs the hash table to get the rows matching the join predicate criteria. In case of match, DBMS reports the rows or pass them to the next step in the plan Hash tables are stored in PGAs so DBMS can access the rows without latching and thus reduces I/O latencies. If the hash table doesn’t fit in the memory, DBMS partitions the data and joins each partitions. The hash buckets are stored in both PGA and disk. This uses a lot of sort area memory and the I/O but still is cost effective as oracle uses parallel query servers. Sort Merge Joins: DBMS sorts the data sets which is the sort join operations. And in the merge join operations, DBMS probes the second data set for every row in the first data set to get the rows matching the predicate. It is preferred for the large data sets where the join is not equijoin(using inequality condition) and when the optimizer finds the join operation cheaper with the sorting. DBMS always sort the second data set. Sorting of first data set can be avoided in cases when it has index. Like hash joins, DBMS access rows in the PGA memory which reduces I/O and avoids latching. It is preferred over hash joins when the hash table does not fit in the memory. Oracle supports inner joins, outer joins, semi joins, anti joins, cartesian joins. It also supports the join optimizations like Bloom filters, partition-wise joins, and in-memory join groups.

Storage Model

N-ary Storage Model (Row/Record)

Database Storage Structures : Physical Storage Structures: Files storing data on the disk 1. Data files: Oracle RDBMS stores data for logical database structures on physical files 2. Control files: Contains metadata about the physical structure of the database like file locations 3. Online redo log files: consists of redo entries that record changes made to data Local Storage Structures : 1. Data blocks: it represents the number of bytes on disk 2. Extent: number of continuous logical data blocks in a single allocation 3. Segments: number of extents allocated for an object like table or index 4. Tablespaces: a logical container for segments Oracle RDBMS Processes: Client processes, background processes, server processes

Isolation Levels

Read Committed Serializable

Oracle supports isolation levels of read committed as well as serializable, defaulting with the latter. There is an additional mode available, "read only", which is not part of the SQL standard. read committed: The default isolation level where query executed by transaction sees the data committed before the given query and thus avoids reading the commits happening while in the transaction. Even though it provides the read consistency(row, when reread, is the same as before if not committed by other transactions), there is a conflicting write issue when two concurrent transactions try to modify the same item. serializable isolation: In this, the transaction views the changes committed before the transaction began and not just before the given query. So any commits to the item by another concurrent transaction say t2 is not reflected in the reread of the item by t1 implying read consistency. Oracle allows modifying a row if the changes made to it by another transaction are already committed before this transaction started. Otherwise, it generates an error read-only isolation level: Similar to serializable isolation but data is not permitted to be modified in the transaction. Read consistency is achieved by reconstructing the data from the undo statements.

Query Execution

Materialized Model

An execution plan is the sequence of operations run by DBMS to execute a query statement Execution plan consists of 1. Row source tree : 1. Ordering of the table referenced by the query 2. The access method for each table in the query statement 3. Join method where a join is present between the tables 4. Operations like sort, filter, aggregation 2. Optimization which includes cost or each operation 3. Partitioning 4. Parallel execution EXPLAIN PLAN is used to check the execution plan used for a query statement Execution plans are bound to change due to 1. Different Schemas 2. Different costs Oracle has an adaptive query plan optimizer which optimizes the query plan based on the statistics on run time

Indexes

B+Tree BitMap

The additional data structure which is associated with the table and table cluster to speed up data access(rows). Primary keys and unique keys (keys: expression or the set of columns on which index is built), already have indexes Oracle provides a composite index that is on multiple columns in a given table. Oracle also provides multiple indexes on the same table provided different index types or different partition schemes or different uniqueness properties. Oracle provides B+ Tree indexing and various others. RDBMS automatically reflects the indexes, the data changes made to the tables. Index Scan: DB retrieves a row by traversing through the index. Basic principles oracle uses is if an SQL query needs only the indexed columns, then DB reads the value from the index but if some access to the non-indexed columns is required then DB uses the row ids to get the rows. Oracle provides : •Full index scan: DB reads entire index in order •Fast full index scan: DB access the data in the index without accessing the table •Index range scan: ordered index scan where 1 or more columns are specified in conditions •Index Unique scan: similar to index range scan but have 0 or 1-row id associated with the index key •Index Skip Scan: uses logical subindex of a composite index Variations of B+ Trees index : •Reverse Key Indexes •Ascending and Descending Indexes Other indexes provided : •Bitmap index: DB stores a bitmap for each index key and each index stores pointers to multiple rows. •Function-Based Indexes: it can be either B+ trees or the bitmap index, where the index computes the function or expression that involves multiple columns to store in the index. •Application Domain Indexes: customized according to the application.

Foreign Keys

Supported

Foreign Key Constraints: two tables containing common column(s), Oracle can enforce a foreign key constraint between them

Data Model

Relational Key/Value Document / XML Graph

Oracle was originally designed as a relational DBMS. It now also supports a variety of data models for storage. Schema objects: Each user account on oracle RDBMS has a single schema which can contain many data structures called schema objects which are of the type : Table Index Partition Views Oracle RDBMS handles schema object dependency to keep DBMS up to date. Table Overview : Oracle RDBMS has relational tables and object tables. Relational tables are stored as : Heap organized table - rows not stored in any particular order Index organized table - rows ordered according to primary key values. External table - only metadata store in DB but data stored outside. The table can consist of different types of columns like - Virtual Columns - columns which don’t consume disk space Invisible Columns - values are viewable only when the column is specified by name Table Storage : Table data is held in the data segment in tablespace. The table is heap organized so unordered rows. So when a row is inserted in the DB, the first free available space in the data segment is used. Row Storage : Rows are stored in the data blocks as a one-row piece(if possible). For a heap organized table, a row has a unique row id that maps to the physical location of the row. In the case of the table clusters where rows from the different tables are stored in the same data block, have the same id. These ids are used for indexing like B-tree indexing to provide fast access to a row in a single I/O. Table compression Basic table compression: intended for bulk operation Advanced row compression: intended for an OLTP application The compressed rows are stored in a row-major format where all columns of a particular row are stored together. Also, the info needed to re-create the uncompressed data from the compressed one is stored in the data block itself. Hybrid Columnar Compression: stores the same column for a group of rows. Data is stored as a combination of row and columnar storage(stores column data together). Table Clusters: It is a group of tables which share columns and store the related data in same data blocks So, a single data block contains rows from multiple tables. This reduces the disk I/O for the joins. The cluster key is the common column between the tables. For, eg student and class tables share student_id column. In oracle RDBMS, data with the same cluster key values(eg, student_id = 20) are physically stored together. Indexed Clusters Oracle provides a clustered index of the B+-tree index on the cluster key before inserting any rows. This is done to use the index and locate the data. In other words, the RDBMS stores the rows in the heap and uses the index to locate them. Hash Cluster Oracle RDBMS also allows to create of a hash cluster and locates the rows using the key values(cluster key) stored in a separate index. It retrieves the data block(hash value). Oracle also provides the sorted hash cluster which stores rows efficiently to return them in sorted order. In case of collision, oracle links the filled block to a new overflow block, and now retrieving will take two I/Os. Attribute-Clustered Tables: Oracle allows a heap -organized table to store data in proximity based on the user-specified clustering directives (on single/multiple tables). Directives: clustering by linear order: divides rows into ranges based on user-specified attributes in a given order interleaved order: used for dimensional hierarchy This reduces table scans and thus I/0 and CPU cost. Zone Maps Oracle has zones which are a set of contiguous data blocks that store min and max values of the columns. It uses the predicate values in the SQL query to determine the zones to read and skip the unnecessary ones. Oracle uses zone maps which are an access structure to divide the data blocks into zones and automatically creates them when the clustering is specified on clustering columns.

System Architecture

Shared-Everything

Compression

Naïve (Page-Level) Naïve (Record-Level) Bit Packing / Mostly Encoding

Oracle supports compression at multiple levels within the data, including by row, block, and index. It also supports network compression designed to reduce bandwidth usage and increase network throughput. Table compression Basic table compression: intended for bulk operation Advanced row compression: intended for an OLTP application The compressed rows are stored in a row-major format where all columns of a particular row are stored together. Also, the info needed to re-create the uncompressed data from the compressed one is stored in the data block itself. Hybrid Columnar Compression: stores the same column for a group of rows. Data is stored as a combination of row and columnar storage(stores column data together).

Storage Organization

Heaps

Oracle RDBMS Logo
Website

http://www.oracle.com/us/products/database/

Tech Docs

https://docs.oracle.com/en/database/oracle/oracle-database/index.html

Developer

Oracle

Country of Origin

US

Start Year

1977

Project Type

Commercial

Written in

C, C++

Supported languages

C, C++, COBOL, Java, PHP, PL/SQL, Python, R, SQL, Visual Basic

Operating Systems

AIX, HP-UX, Linux, Solaris, Windows

Licenses

Proprietary

Wikipedia

https://en.wikipedia.org/wiki/Oracle_Database