Oracle database is a relational DBMS that has extended the relational model to an object-relational model, to store business models in an RDBMS.
Database Schema: Database schema is a collection of logical data structures
Table: It represents the real-world entity and can have integrity constraints on columns
Structured Query Language (SQL): Oracle RDBMS uses PL/SQL which is an extension which helps store application logic in the DB itself
Transaction Management: Oracle RDBMS supports multiuser concurrency.
Data Concurrency: Oracle RDBMS enforces a statement-level and transaction-level read consistency. This is done to avoid the dirty read problem and based on the level of consistency, DBMS guarantees data returning from the single or multiple queries is consistent and committed.
Oracle RDBMS is a set of physical structures like files and applications inside a single physical database can interact with multiple logical databases.
Oracle RDBMS has multitenant architecture
Sharding Architecture: Partition horizontally across multiple physical Oracle RDBMS. Useful for OLTP applications. After sharding, every database(shard) has a dedicated server and resources - CPU, flash, disk and memory and together they make up a single logical database.
This report has been created from the summarized of the Oracle Database documentation.
In 1977, Larry Ellison, Robert Miner, and Ed Oates founded Software Development Laboratories, which was hired by the United States Central Intelligence Agency (CIA) in order to write a new database system based upon SQL. This system came to be known as Oracle. The company changed its name to Relational Software, Inc. in 1979 and then to Oracle Systems Corporation in 1982.
Oracle follows the industry-accepted standards for SQL. Oracle SQL has many extensions to standard SQL language to provide additional statements. Oracle RDBMS processes SQL statements using a query optimizer which generates execution plans based on access paths and statistics.
The optimizer generates almost all the possible ways to process a query and assigns a cost to each step and finally takes the plan with the lowest cost. The main components are
•Query Transformer: changes the form of a query to generate the execution plan •Estimator: estimates the cost of a particular execution plan •Plan Generator: Generates different possible plans, sub plans for nested queries. It uses an adaptive query optimizations feature which changes the plans based on the statistics collected during the statement execution. This optimization uses a dynamic programming plan. Optimizer statistics describe the details of data storage and distribution. It includes table statistics, column statistics, index statistics, system statistics.
Oracle employs MVCC for concurrency control.
Data Concurrency and Consistency
Oracle uses serializability for transaction isolation. It maintains data consistency by having a multi-version consistency model and the usage of locks and transactions. Oracle can prevent dirty reads, non-repeatable reads, phantom reads as it provides:
read committed: The default isolation level where query executed by transaction sees the data committed before the given query statement and thus avoids reading the commits happening while in the transaction. Even though it provides the read consistency(row, when reread, is the same as before if not committed by other transactions), there is a conflicting write issue when two concurrent transactions try to modify the same item.
serializable isolation: In this, the transaction views the changes committed before the transaction began and not just before the given query. So any commits to the item by another concurrent transaction say t2 is not reflected in the reread of the item by t1 implying read consistency. Oracle allows modifying a row if the changes made to it by another transaction are already committed before this transaction started. Otherwise, it generates an error read-only isolation level: Similar to serializable isolation, but data is not permitted to be modified in the transaction. Read consistency is achieved by reconstructing the data from the undo statements.
Locking Mechanism To prevent the incorrect updates of shared data between the concurrent transactions. Oracle has two types of locks (shared and exclusive) that provide data consistency, concurrency, and integrity. Oracle rules for taking locks :
Types of locks provided by oracle :
Oracle provides 3 types of join methods: hash join, nested loop join, and sort-merge join. Based on the cost estimation, the optimizer picks the relevant plan for the query statement.
Nested loop joins:
They are useful when the DB joins the small subset of data or when it is efficient to access the inner table. It works well for the small table with the indexes on the join conditions. They are equivalent to the for loops. Out of the two row sources, the optimizer decides the driving row source and labels it as the outer loop and the other row source as the inner loop. Then for every request from the client, DBMS fetches a row from the outer row source, proves the inner row source to find the rows which match the predicates, and repeats this for every outer row. Oracle DBMS 11g introduced it to reduce the I/O latency. DBMS optimizes and avoids the physical I/O if the table block or index needed for the join is in the memory (Buffer Pool). DBMS uses the batch I/O requests and sends an array of row ids instead of one.
Used to join two larger data sets and when the join is equijoin. Oracle optimizer uses the smaller of the data sets to build the hash table on the join key in the buffer pool(memory) PGA. It is most effective when the hash table completely fits in the memory. A deterministic hash function is used for specifying the location in the hash table for storing each row. DBMS generates the hash values between 1 and the size of the table. Input values to the hash table are the join keys, and the outputs are the slots in the array. When a query comes, DBMS scans the larger data set and probes the hash table to get the rows matching the join predicate criteria. In case of a match, DBMS reports the rows or pass them to the next step in the plan
Hash tables are stored in PGAs, so DBMS can access the rows without latching and thus reduces I/O latencies. If the hash table doesn’t fit in the memory, DBMS partitions the data and joins each partition. The hash buckets are stored in both PGA and disk. This uses a lot of sort area memory and the I/O but still is cost-effective as oracle uses parallel query servers.
Sort Merge Joins:
DBMS sorts the data sets, which is the sort join operations. And in the merge join operations, DBMS probes the second data set for every row in the first data set to get the rows matching the predicate. It is preferred for the large data sets where the join is not equijoin(using inequality condition) and when the optimizer finds the join operation cheaper with the sorting. DBMS always sort the second data set. Sorting of the first data set can be avoided in cases when it has an index. Like hash joins, DBMS access rows in the PGA memory, which reduces I/O and avoids latching. It is preferred over hash joins when the hash table does not fit in the memory.
Oracle supports inner joins, outer joins, semi joins, anti joins, cartesian joins. It also supports the join optimizations like Bloom filters, partition-wise joins, and in-memory join groups.
Database Storage Structures :
Physical Storage Structures: Files storing data on the disk
1. Data files: Oracle RDBMS stores data for logical database structures on physical files 2. Control files: Contains metadata about the physical structure of the database like file locations 3. Online redo log files: consists of redo entries that record changes made to data
Local Storage Structures :
1. Data blocks: it represents the number of bytes on disk 2. Extent: number of continuous logical data blocks in a single allocation 3. Segments: number of extents allocated for an object like table or index 4. Tablespaces: a logical container for segments
Oracle RDBMS Processes: Client processes, background processes, server processes
Oracle has isolation levels of read committed and serializable. The default one is serializable. There is an additional mode available, "read-only", which is not part of the SQL standard.
The default isolation level where query executed by transaction sees the data committed before the given query and thus avoids reading the commits happening while in the transaction. Even though it provides the read consistency(row, when reread, is the same as before if not committed by other transactions), there is a conflicting write issue when two concurrent transactions try to modify the same item.
In this, the transaction views the changes committed before the transaction began and not just before the given query. So any commits to the item by another concurrent transaction say t2 is not reflected in the reread of the item by t1 implying read consistency. Oracle allows modifying a row if the changes made to it by another transaction are already committed before this transaction started. Otherwise, it generates an error.
Read-Only isolation level
Similar to serializable isolation, but data is not permitted to be modified in the transaction. Read consistency is achieved by reconstructing the data from the undo statements.
It is the sequence of operations run by DBMS to execute a query statement Execution plan consists of
Row source tree :
Optimization which includes cost or each operation
EXPLAIN PLAN is used to check the execution plan used for a query statement
Execution plans are bound to change due to
Oracle has an adaptive query plan optimizer which optimizes the query plan based on the statistics on run time
Stored procedures are supported by Oracle.
The additional data structure which is associated with the table and table cluster to speed up data access(rows). Primary keys and unique keys (keys: expression or the set of columns on which index is built), already have indexes.
Oracle provides a composite index that is on multiple columns in a given table. Oracle also offers various indexes on the same table provided different index types or various partition schemes or different uniqueness properties.
Oracle provides B+ Tree indexing and various others. RDBMS automatically reflects the indexes; the data changes made to the tables. Index Scan: DB retrieves a row by traversing through the index. Basic principles oracle uses if an SQL query needs only the indexed columns, then DB reads the value from the index, but if some access to the non-indexed columns is required, then DB uses the row ids to get the rows. Oracle provides :
•Full index scan: DB reads entire index in order •Fast full index scan: DB access the data in the index without accessing the table •Index range scan: ordered index scan where one or more columns are specified in conditions •Index Unique scan: similar to index range scan but have 0 or 1-row id associated with the index key •Index Skip Scan: uses logical subindex of a composite index
Variations of B+ Trees index :
•Reverse Key Indexes •Ascending and Descending Indexes
Other indexes provided :
•Bitmap index: DB stores a bitmap for each index key, and each index stores pointers to multiple rows. •Function-Based Indexes: it can be either B+ trees or the bitmap index, where the index computes the function or expression that involves multiple columns to store in the index. •Application Domain Indexes: customized according to the application.
Foreign Key Constraints
Two tables containing standard column(s), Oracle can enforce a foreign key constraint between them
Oracle was initially designed as a relational DBMS. It now also supports a variety of data models for storage. Schema objects: Each user account on oracle RDBMS has a single schema which can contain many data structures called schema objects which are of the type : Table Index Partition Views Oracle RDBMS handles schema object dependency to keep DBMS up to date.
Table Overview : Oracle RDBMS has relational tables and object tables. Relational tables are stored as : Heap organized table - rows not stored in any particular order Index organized table - rows ordered according to primary key values. External table - only metadata store in DB but data stored outside.
The table can consist of different types of columns - Virtual Columns - columns which don’t consume disk space Invisible Columns - values are viewable only when the column is specified by name
Table Storage : Table data is held in the data segment in tablespace. The table is heap organized such unordered rows. So when a row is inserted in the DB, the first free available space in the data segment is used.
Row Storage : In data blocks, rows are sorted as a one-row piece(if possible). For a heap organized table, a row has a unique row id that maps to the physical location of the row. In the case of the table clusters where rows from the different tables are stored in the same data block, have the same id. These ids are used for indexing like B-tree indexing to provide fast access to a row in a single I/O.
Table compression Basic table compression: intended for bulk operation Advanced row compression: designed for an OLTP application
The compressed rows are stored in a row-major format where all columns of a particular row are stored together. Also, the info needed to re-create the uncompressed data from the compressed one is stored in the data block itself.
Hybrid Columnar Compression: stores the same column for a group of rows. Data is stored as a combination of row and columnar storage(stores column data together).
Table Clusters: It is a group of tables which share columns and store the related data in the same data blocks So, a single data block contains rows from multiple tables. This reduces the disk I/O for the joins. The cluster key is the common column between the tables. For, e.g. student and class tables share student_id column. In oracle RDBMS, data with the same cluster key values(e.g., student_id = 20) are physically stored together.
Indexed Clusters Oracle provides a clustered index of the B+-tree index on the cluster key before inserting any rows. This is done to use the index and locate the data. In other words, the RDBMS stores the rows in a heap and uses the index to determine them.
Hash Cluster Oracle RDBMS also allows to create of a hash cluster and locates the rows using the key values(cluster key) stored in a separate index. It retrieves the data block(hash value). Oracle also provides the sorted hash cluster, which stores rows efficiently to return them in sorted order. In case of collision, oracle links the filled block to a new overflow block, and now retrieving will take two I/Os.
Attribute-Clustered Tables: Oracle allows a heap -organized table to store data in proximity based on the user-specified clustering directives (on single/multiple tables). Directives: clustering by linear order: divides rows into ranges based on user-specified attributes in a given order interleaved order: used for dimensional hierarchy This reduces table scans and thus I/0 and CPU cost.
Zone Maps Oracle has zones, which are a set of contiguous data blocks that store min and max values of the columns. It uses the predicate values in the SQL query to determine the zones to read and skip the unnecessary ones. Oracle uses zone maps, which are an access structure to divide the data blocks into zones and automatically creates them when the clustering is specified on clustering columns.
Oracle provides compression at multiple levels, e.g. at the row level, block-level, and index level. It also provides network compression, which reduces bandwidth usage and increases network throughput.
Basic table compression This type of compression mechanism is intended for bulk operation.
Advanced row compression
This compression mechanism is designed for an OLTP application. The rows, when compressed, are stored in a row-major format. All columns of a particular row are stored together. Also, the information needed to re-create the uncompressed data is stored in the data block.
Hybrid Columnar Compression This type of compression mechanism stores the same column for a group of rows. Data is stored as a combination of row and columnar storage i.e.; it stores column data together.