Database management system: Difference between revisions

Browse history interactively ← Previous edit Next edit →Content deleted Content addedVisual WikitextInline

Revision as of 06:54, 16 March 2013 editBeland (talk \| contribs)Autopatrolled, Administrators236,622 edits merge intro to Database ← Previous edit		Revision as of 07:11, 16 March 2013 edit undoBeland (talk \| contribs)Autopatrolled, Administrators236,622 edits →Database storage: merge to Database engine Next edit →
Line 32:		Line 32:

	==Database storage==		==Database storage==
	{{Main\|Computer data storage}}		{{Main\|Computer data storage\|Database engine}}

	Database storage is the container of the physical materialization of a database. It comprises the ''internal'' (physical) ''level'' in the database architecture. It also contains all the information needed (e.g., ], "data about the data", and internal ]s) to reconstruct the ''conceptual level'' and ''external level'' from the internal level when needed. It is generally the responsibility of the ]. Though typically accessed by a DBMS through the underlying ] (and often utilizing the operating systems' ]s as intermediates for storage layout), storage properties and configuration setting are extremely important for the efficient operation of the DBMS, and thus are closely maintained by database administrators. A DBMS, while in operation, always has its database residing in several types of storage (e.g., memory and external storage). The database data and the additional needed information, possibly in very large amounts, are coded into bits. Data typically reside in the storage in structures that look completely different from the way the data look in the conceptual and external levels, but in ways that attempt to optimize (the best possible) these levels' reconstruction when needed by users and programs, as well as for computing additional types of needed information from the data (e.g., when querying the database).		Database storage is the container of the physical materialization of a database. It comprises the ''internal'' (physical) ''level'' in the database architecture. It also contains all the information needed (e.g., ], "data about the data", and internal ]s) to reconstruct the ''conceptual level'' and ''external level'' from the internal level when needed. Putting data into permanent storage is generally the responsibility of the ] a.k.a. "storage engine". Though typically accessed by a DBMS through the underlying ] (and often utilizing the operating systems' ]s as intermediates for storage layout), storage properties and configuration setting are extremely important for the efficient operation of the DBMS, and thus are closely maintained by database administrators. A DBMS, while in operation, always has its database residing in several types of storage (e.g., memory and external storage). The database data and the additional needed information, possibly in very large amounts, are coded into bits. Data typically reside in the storage in structures that look completely different from the way the data look in the conceptual and external levels, but in ways that attempt to optimize (the best possible) these levels' reconstruction when needed by users and programs, as well as for computing additional types of needed information from the data (e.g., when querying the database).

	In principle the database storage can be viewed as a ] ], where every bit of data has its unique address in this address space. In practice, only a very small percentage of addresses are kept as initial reference points (which also requires storage); most data is accessed by indirection using displacement calculations (distance in bits from the reference points) and data structures which define access paths (using pointers) to all needed data in an effective manner, optimized for the needed data access operations.

	Some DBMS support specifying which ] was used to store data, so multiple encodings can be used in the same database.		Some DBMS support specifying which ] was used to store data, so multiple encodings can be used in the same database.

			Various low-level ] are used by the storage engine to serialize the data model so it can be written to the medium of choice. Techniques such as ] may be used to improve performance. Conventional storage is row-oriented, but there are also ] and ]s.
	===Database storage layout===

	Database bits are laid-out in storage in data-structures and grouping that can take advantage of both known effective algorithms to retrieve and manipulate them and the storage own properties. Typically the storage itself is design to meet requirements of various areas that extensively utilize storage, including databases. A DBMS in operation always simultaneously utilizes several storage types (e.g., memory, and external storage), with respective layout methods.

	====Database storage hierarchy====

	A database, while in operation, resides simultaneously in several types of storage. By the nature of contemporary computers most of the database part inside a computer that hosts the DBMS resides (partially replicated) in volatile storage. Data (pieces of the database) that are being processed/manipulated reside inside a processor, possibly in ]. These data are being read from/written to memory, typically through a computer ] (so far typically volatile storage components). Computer memory is communicating data (transferred to/from) external storage, typically through standard storage interfaces or networks (e.g., ], ]). A ], a common external storage unit, typically has storage hierarchy of it own, from a fast cache, typically consisting of (volatile and fast) ], which is connected (again via standard interfaces) to drives, possibly with different speeds, like ]s{{disambiguation needed\|date=February 2012}} and magnetic ]s (non-volatile). The drives may be connected to ]s, on which typically the least active parts of a large database may reside, or database backup generations.

	Typically a correlation exists currently between storage speed and price, while the faster storage is typically volatile.

	====Data structures====
	{{Main\|Database storage structures}}
	{{Expand section\|date=June 2011}}

	A data structure is an abstract construct that embeds data in a well defined manner. An efficient data structure allows to manipulate the data in efficient ways. The data manipulation may include data insertion, deletion, updating and retrieval in various modes. A certain data structure type may be very effective in certain operations, and very ineffective in others. A data structure type is selected upon DBMS development to best meet the operations needed for the types of data it contains. Type of data structure selected for a certain task typically also takes into consideration the type of storage it resides in (e.g., speed of access, minimal size of storage chunk accessed, etc.). In some DBMSs database administrators have the flexibility to select among options of data structures to contain user data for performance reasons. Sometimes the data structures have selectable parameters to tune the database performance.

	Databases may store data in many data structure types.<ref name="Physical Database Design">{{harvnb\|Lightstone\|Teorey\|Nadeau\|2007}}</ref> Common examples are the following:

	* ordered/unordered ]
	* ]s
	* ]s
	* ]
	* ]

	In contrast to conventional row-orientation, relational databases can also be ] or ] in the way they store data in these structures.

	====Database indexing====
	{{Main\|Index (database)}}

	] is a technique for improving database performance. The many types of indexes share the common property that they reduce the need to examine every entry when running a query. In large databases, this can reduce query time/cost by orders of magnitude. The simplest form of index is a sorted list of values that can be searched using a ] with an adjacent reference to the location of the entry, analogous to the index in the back of a book. The same data can have multiple indexes (an employee database could be indexed by last name and hire date.)

	Indexes affect performance, but not results. Database designers can add or remove indexes without changing application logic, reducing maintenance costs as the database grows and database usage evolves.

	Given a particular query, the DBMS' query optimizer is responsible for devising the most efficient strategy for finding matching data.

	Indexes can speed up data access, but they consume space in the database, and must be updated each time the data is altered. Indexes therefore can speed data access but slow data maintenance. These two properties determine whether a given index is worth the cost.

	====Database data clustering====

	In many cases substantial performance improvement is gained if different types of database objects that are usually utilized together are laid in storage in proximity, being ''clustered''. This usually allows to retrieve needed related objects from storage in minimum number of input operations (each sometimes substantially time consuming). Even for in-memory databases clustering provides performance advantage due to common utilization of large caches for input-output operations in memory, with similar resulting behavior.

	For example it may be beneficial to cluster a record of an ''item'' in stock with all its respective ''order'' records. The decision of whether to cluster certain objects or not depends on the objects' utilization statistics, object sizes, caches sizes, storage types, etc.

	====Database materialized views====		====Database materialized views====
Line 90:		Line 47:
	====Database and database object replication====		====Database and database object replication====
	{{Main\|Database replication}}		{{Main\|Database replication}}
	:See also '']'' below

	Occasionally a database employs storage redundancy by database objects replication (with one or more copies) to increase data availability (both to improve performance of simultaneous multiple end-user accesses to a same database object, and to provide resiliency in a case of partial failure of a distributed database). Updates of a replicated object need to be synchronized across the object copies. In many cases the entire database is replicated.		Occasionally a database employs storage redundancy by database objects replication (with one or more copies) to increase data availability (both to improve performance of simultaneous multiple end-user accesses to a same database object, and to provide resiliency in a case of partial failure of a distributed database). Updates of a replicated object need to be synchronized across the object copies. In many cases the entire database is replicated.

Revision as of 07:11, 16 March 2013

This article or section is in a state of significant expansion or restructuring. You are welcome to assist in its construction by editing it as well. If this article or section has not been edited in several days, please remove this template.
If you are the editor who added this template and you are actively editing, please be sure to replace this template with {{in use}} during the active editing session. Click on the link for template parameters to use. This redirect was last edited by Beland (talk | contribs) 11 years ago. (Update timer)

Contributor note: Please see Database for coverage of this topic; merge in progress.

Functionality provided

Features commonly offered by database management systems include:

Backup and replication: Copies of attributes need to be made regularly in case primary disks or other equipment fails. A periodic copy of attributes may also be created for a distant organization that cannot readily access the original. DBMS usually provide utilities to facilitate the process of extracting and disseminating attribute sets. When data is replicated between database servers, so that the information remains consistent throughout the database system and users cannot tell or even know which server in the DBMS they are using, the system is said to exhibit replication transparency.

Rule enforcement: Often one wants to apply rules to attributes so that the attributes are clean and reliable. For example, we may have a rule that says each car can have only one engine associated with it (identified by Engine Number). If somebody tries to associate a second engine with a given car, we want the DBMS to deny such a request and display an error message. However, with changes in the model specification such as, in this example, hybrid gas-electric cars, rules may need to change. Ideally such rules should be able to be added and removed as needed without significant data layout redesign.

Computation: Common computations requested on attributes are counting, summing, averaging, sorting, grouping, cross-referencing, and so on. Rather than have each computer application implement these from scratch, they can rely on the DBMS to supply such calculations.

Change and access logging: This describes who accessed which attributes, what was changed, and when it was changed. Logging services allow for a database audit later by keeping a record of access occurrences and changes. Sometimes application-level code is used to record changes rather than leaving this to the database.

Development and monitoring support: A DBMS typically intends to provide convenient environment to develop and later maintain an application built around its respective database type. A DBMS either provides such tools, or allows integration with such external tools. Examples for tools relate to database design, application programming, application program maintenance, database performance analysis and monitoring, database configuration monitoring, DBMS hardware configuration (a DBMS and related database may span computers, networks, and storage units) and related database mapping (especially for a distributed DBMS), storage allocation and database layout monitoring, storage migration, etc.

Components

DBMS architecture specifies its components (including descriptions of their functions) and their interfaces. DBMS architecture is distinct from database architecture. The following are major DBMS components:

An external interface can be either a user interface (e.g., typically for a database administrator), or an application programming interface (API) used for communication between an application program and the DBMS.

Database language engines (or processors) - Most operations upon databases are performed through expression in Database languages (see above). Languages exist for data definition, data manipulation and queries (e.g., SQL), as well as for specifying various aspects of security, and more. Language expressions are fed into a DBMS through proper interfaces. A language engine processes the language expressions (by a compiler or language interpreter) to extract the intended database operations from the expression in a way that they can be executed by the DBMS.

Query optimizer - Performs query optimization on every query to choose for it the most efficient query plan (a partial order (tree) of operations) to be executed to compute the query result.
Database engine - Performs the received database operations on the database objects, typically at their higher-level representation.
Storage engine - translates the operations to low-level operations on the storage bits. In some references the Storage engine is viewed as part of the database engine.
DBMS management and operation component - Comprises many components that deal with all the DBMS management and operational aspects like performance monitoring and tuning, backup and restore, recovery from failure, security management and monitoring, change management, database storage allocation and database storage layout monitoring, etc.
Application generation subsystem contains facilities to help users develop transaction-intensive applications. It usually requires that the user perform a detailed series of tasks to process a transaction. It facilitates easy-to-use data entry screens, programming languages, and interfaces.

Database logs
Graphics component for producing graphs and charts.

Database storage

Main articles: Computer data storage and Database engine

Database storage is the container of the physical materialization of a database. It comprises the internal (physical) level in the database architecture. It also contains all the information needed (e.g., metadata, "data about the data", and internal data structures) to reconstruct the conceptual level and external level from the internal level when needed. Putting data into permanent storage is generally the responsibility of the database engine a.k.a. "storage engine". Though typically accessed by a DBMS through the underlying operating system (and often utilizing the operating systems' file systems as intermediates for storage layout), storage properties and configuration setting are extremely important for the efficient operation of the DBMS, and thus are closely maintained by database administrators. A DBMS, while in operation, always has its database residing in several types of storage (e.g., memory and external storage). The database data and the additional needed information, possibly in very large amounts, are coded into bits. Data typically reside in the storage in structures that look completely different from the way the data look in the conceptual and external levels, but in ways that attempt to optimize (the best possible) these levels' reconstruction when needed by users and programs, as well as for computing additional types of needed information from the data (e.g., when querying the database).

Some DBMS support specifying which character encoding was used to store data, so multiple encodings can be used in the same database.

Various low-level database storage structures are used by the storage engine to serialize the data model so it can be written to the medium of choice. Techniques such as indexing may be used to improve performance. Conventional storage is row-oriented, but there are also column-oriented and correlation databases.

Database materialized views

Main article: Materialized view

Often storage redundancy is employed to increase performance. A common example is storing materialized views, which consist of frequently needed external views or query results. Storing such views saves the expensive computing of them each time they are needed. The downsides of materialized views are the overhead incurred when updating them to keep them synchronized with their original updated database data, and the cost of storage redundancy.

Database and database object replication

Main article: Database replication

Occasionally a database employs storage redundancy by database objects replication (with one or more copies) to increase data availability (both to improve performance of simultaneous multiple end-user accesses to a same database object, and to provide resiliency in a case of partial failure of a distributed database). Updates of a replicated object need to be synchronized across the object copies. In many cases the entire database is replicated.