Revision as of 06:54, 16 March 2013 editBeland (talk | contribs)Autopatrolled, Administrators236,622 edits merge intro to Database← Previous edit |
Latest revision as of 15:20, 1 February 2023 edit undoQwerfjkl (talk | contribs)Extended confirmed users, Page movers, Rollbackers212,880 editsm Converting {{Wikidata redirect}} to {{R with Wikidata item}}. {{Wikidata redirect}} should only be used on soft redirects.Tag: PAWS [2.1] |
(13 intermediate revisions by 9 users not shown) |
Line 1: |
Line 1: |
|
|
#REDIRECT ] |
|
{{under construction|comment=Please see ] for coverage of this topic; merge in progress.}} |
|
|
|
|
|
|
|
{{Redirect category shell|1= |
|
==Functionality provided== |
|
|
|
{{R from merge}} |
|
|
|
|
|
{{R to section}} |
|
Features commonly offered by database management systems include: |
|
|
|
{{R with Wikidata item}} |
|
|
|
|
|
}} |
|
;Backup and replication : Copies of attributes need to be made regularly in case primary disks or other equipment fails. A periodic copy of attributes may also be created for a distant organization that cannot readily access the original. DBMS usually provide utilities to facilitate the process of extracting and disseminating attribute sets. When data is replicated between database servers, so that the information remains consistent throughout the database system and users cannot tell or even know which server in the DBMS they are using, the system is said to exhibit replication transparency. |
|
|
|
|
|
;Rule enforcement : Often one wants to apply rules to attributes so that the attributes are clean and reliable. For example, we may have a rule that says each car can have only one engine associated with it (identified by Engine Number). If somebody tries to associate a second engine with a given car, we want the DBMS to deny such a request and display an error message. However, with changes in the model specification such as, in this example, hybrid gas-electric cars, rules may need to change. Ideally such rules should be able to be added and removed as needed without significant data layout redesign. |
|
|
|
|
|
; Computation : Common computations requested on attributes are counting, summing, averaging, sorting, grouping, cross-referencing, and so on. Rather than have each computer application implement these from scratch, they can rely on the DBMS to supply such calculations. |
|
|
|
|
|
; Change and access logging : This describes who accessed which attributes, what was changed, and when it was changed. Logging services allow for a ] later by keeping a record of access occurrences and changes. Sometimes application-level code is used to record changes rather than leaving this to the database. |
|
|
|
|
|
; Development and monitoring support : A DBMS typically intends to provide convenient environment to develop and later maintain an application built around its respective database type. A DBMS either provides such tools, or allows integration with such external tools. Examples for tools relate to database design, application programming, application program maintenance, database performance analysis and monitoring, database configuration monitoring, DBMS hardware configuration (a DBMS and related database may span computers, networks, and storage units) and related database mapping (especially for a distributed DBMS), storage allocation and database layout monitoring, storage migration, etc. |
|
|
|
|
|
===Components=== |
|
|
|
|
|
DBMS ] specifies its components (including descriptions of their functions) and their interfaces. DBMS architecture is distinct from database architecture. The following are major DBMS components: |
|
|
|
|
|
:An external interface can be either a '']'' (e.g., typically for a database administrator), or an '']'' (API) used for communication between an application program and the DBMS. |
|
|
*'''Database language engines''' (or '''processors''') - Most operations upon databases are performed through expression in Database languages (see above). Languages exist for data definition, data manipulation and queries (e.g., SQL), as well as for specifying various aspects of security, and more. Language expressions are fed into a DBMS through proper interfaces. A language engine processes the language expressions (by a compiler or language interpreter) to extract the intended database operations from the expression in a way that they can be executed by the DBMS. |
|
|
|
|
|
*''']''' - Performs ] on every query to choose for it the most efficient '']'' (a partial order (tree) of operations) to be executed to compute the query result. |
|
|
*''']''' - Performs the received database operations on the database objects, typically at their higher-level representation. |
|
|
*'''Storage engine''' - translates the operations to low-level operations on the storage ]s. In some references the Storage engine is viewed as part of the database engine. |
|
|
*'''DBMS management and operation component''' - Comprises many components that deal with all the DBMS management and operational aspects like performance monitoring and tuning, backup and restore, recovery from failure, security management and monitoring, change management, database storage allocation and database storage layout monitoring, etc. |
|
|
* '''Application generation subsystem''' contains facilities to help users develop transaction-intensive applications. It usually requires that the user perform a detailed series of tasks to process a transaction. It facilitates easy-to-use data entry screens, programming languages, and interfaces. |
|
|
|
|
|
* ]s |
|
|
* Graphics component for producing graphs and charts. |
|
|
|
|
|
==Database storage== |
|
|
{{Main|Computer data storage}} |
|
|
|
|
|
Database storage is the container of the physical materialization of a database. It comprises the ''internal'' (physical) ''level'' in the database architecture. It also contains all the information needed (e.g., ], "data about the data", and internal ]s) to reconstruct the ''conceptual level'' and ''external level'' from the internal level when needed. It is generally the responsibility of the ]. Though typically accessed by a DBMS through the underlying ] (and often utilizing the operating systems' ]s as intermediates for storage layout), storage properties and configuration setting are extremely important for the efficient operation of the DBMS, and thus are closely maintained by database administrators. A DBMS, while in operation, always has its database residing in several types of storage (e.g., memory and external storage). The database data and the additional needed information, possibly in very large amounts, are coded into bits. Data typically reside in the storage in structures that look completely different from the way the data look in the conceptual and external levels, but in ways that attempt to optimize (the best possible) these levels' reconstruction when needed by users and programs, as well as for computing additional types of needed information from the data (e.g., when querying the database). |
|
|
|
|
|
In principle the database storage can be viewed as a ] ], where every bit of data has its unique address in this address space. In practice, only a very small percentage of addresses are kept as initial reference points (which also requires storage); most data is accessed by indirection using displacement calculations (distance in bits from the reference points) and data structures which define access paths (using pointers) to all needed data in an effective manner, optimized for the needed data access operations. |
|
|
|
|
|
Some DBMS support specifying which ] was used to store data, so multiple encodings can be used in the same database. |
|
|
|
|
|
===Database storage layout=== |
|
|
|
|
|
Database bits are laid-out in storage in data-structures and grouping that can take advantage of both known effective algorithms to retrieve and manipulate them and the storage own properties. Typically the storage itself is design to meet requirements of various areas that extensively utilize storage, including databases. A DBMS in operation always simultaneously utilizes several storage types (e.g., memory, and external storage), with respective layout methods. |
|
|
|
|
|
====Database storage hierarchy==== |
|
|
|
|
|
A database, while in operation, resides simultaneously in several types of storage. By the nature of contemporary computers most of the database part inside a computer that hosts the DBMS resides (partially replicated) in volatile storage. Data (pieces of the database) that are being processed/manipulated reside inside a processor, possibly in ]. These data are being read from/written to memory, typically through a computer ] (so far typically volatile storage components). Computer memory is communicating data (transferred to/from) external storage, typically through standard storage interfaces or networks (e.g., ], ]). A ], a common external storage unit, typically has storage hierarchy of it own, from a fast cache, typically consisting of (volatile and fast) ], which is connected (again via standard interfaces) to drives, possibly with different speeds, like ]s{{disambiguation needed|date=February 2012}} and magnetic ]s (non-volatile). The drives may be connected to ]s, on which typically the least active parts of a large database may reside, or database backup generations. |
|
|
|
|
|
Typically a correlation exists currently between storage speed and price, while the faster storage is typically volatile. |
|
|
|
|
|
====Data structures==== |
|
|
{{Main|Database storage structures}} |
|
|
{{Expand section|date=June 2011}} |
|
|
|
|
|
A data structure is an abstract construct that embeds data in a well defined manner. An efficient data structure allows to manipulate the data in efficient ways. The data manipulation may include data insertion, deletion, updating and retrieval in various modes. A certain data structure type may be very effective in certain operations, and very ineffective in others. A data structure type is selected upon DBMS development to best meet the operations needed for the types of data it contains. Type of data structure selected for a certain task typically also takes into consideration the type of storage it resides in (e.g., speed of access, minimal size of storage chunk accessed, etc.). In some DBMSs database administrators have the flexibility to select among options of data structures to contain user data for performance reasons. Sometimes the data structures have selectable parameters to tune the database performance. |
|
|
|
|
|
Databases may store data in many data structure types.<ref name="Physical Database Design">{{harvnb|Lightstone|Teorey|Nadeau|2007}}</ref> Common examples are the following: |
|
|
|
|
|
* ordered/unordered ] |
|
|
* ]s |
|
|
* ]s |
|
|
* ] |
|
|
* ] |
|
|
|
|
|
In contrast to conventional row-orientation, relational databases can also be ] or ] in the way they store data in these structures. |
|
|
|
|
|
====Database indexing==== |
|
|
{{Main|Index (database)}} |
|
|
|
|
|
] is a technique for improving database performance. The many types of indexes share the common property that they reduce the need to examine every entry when running a query. In large databases, this can reduce query time/cost by orders of magnitude. The simplest form of index is a sorted list of values that can be searched using a ] with an adjacent reference to the location of the entry, analogous to the index in the back of a book. The same data can have multiple indexes (an employee database could be indexed by last name and hire date.) |
|
|
|
|
|
Indexes affect performance, but not results. Database designers can add or remove indexes without changing application logic, reducing maintenance costs as the database grows and database usage evolves. |
|
|
|
|
|
Given a particular query, the DBMS' query optimizer is responsible for devising the most efficient strategy for finding matching data. |
|
|
|
|
|
Indexes can speed up data access, but they consume space in the database, and must be updated each time the data is altered. Indexes therefore can speed data access but slow data maintenance. These two properties determine whether a given index is worth the cost. |
|
|
|
|
|
====Database data clustering==== |
|
|
|
|
|
In many cases substantial performance improvement is gained if different types of database objects that are usually utilized together are laid in storage in proximity, being ''clustered''. This usually allows to retrieve needed related objects from storage in minimum number of input operations (each sometimes substantially time consuming). Even for in-memory databases clustering provides performance advantage due to common utilization of large caches for input-output operations in memory, with similar resulting behavior. |
|
|
|
|
|
For example it may be beneficial to cluster a record of an ''item'' in stock with all its respective ''order'' records. The decision of whether to cluster certain objects or not depends on the objects' utilization statistics, object sizes, caches sizes, storage types, etc. |
|
|
|
|
|
====Database materialized views==== |
|
|
{{Main|Materialized view}} |
|
|
|
|
|
Often storage redundancy is employed to increase performance. A common example is storing '']s'', which consist of frequently needed ''external views'' or query results. Storing such views saves the expensive computing of them each time they are needed. The downsides of materialized views are the overhead incurred when updating them to keep them synchronized with their original updated database data, and the cost of storage redundancy. |
|
|
|
|
|
====Database and database object replication==== |
|
|
{{Main|Database replication}} |
|
|
:See also '']'' below |
|
|
|
|
|
Occasionally a database employs storage redundancy by database objects replication (with one or more copies) to increase data availability (both to improve performance of simultaneous multiple end-user accesses to a same database object, and to provide resiliency in a case of partial failure of a distributed database). Updates of a replicated object need to be synchronized across the object copies. In many cases the entire database is replicated. |
|