Revision as of 01:48, 14 March 2013 editBeland (talk | contribs)Autopatrolled, Administrators236,613 edits →History: merge section on XML databases into NoSQL section (mostly already there and in linked article)← Previous edit |
Latest revision as of 15:20, 1 February 2023 edit undoQwerfjkl (talk | contribs)Extended confirmed users, Page movers, Rollbackers212,880 editsm Converting {{Wikidata redirect}} to {{R with Wikidata item}}. {{Wikidata redirect}} should only be used on soft redirects.Tag: PAWS [2.1] |
(30 intermediate revisions by 10 users not shown) |
Line 1: |
Line 1: |
|
|
#REDIRECT ] |
|
{{refimprove|date=February 2013}} |
|
|
|
|
|
|
|
{{Redirect category shell|1= |
|
A '''database management system''' ('''DBMS''') is a set of programs that enables storing, modifying, and extracting information from a ]. It also provides users with tools to add, delete, access, modify, and analyze data stored in one location. A group can access the data by using query and reporting tools that are part of the DBMS or by using application programs specifically written to access the data. DBMS’s also provide the method for maintaining the integrity of stored data, running security and users access, and recovering information if the system fails. The information from a database can be presented in a variety of formats. Most DBMSs include a report writer program that enables you to output data in the form of a report. Many DBMSs also include a graphics component that enables you to output information in the form of graphs and charts. Database and database management system are essential to all areas of business, they must be carefully managed. |
|
|
|
{{R from merge}} |
|
There are many different types of DBMSs, ranging from small systems that run on personal computers to huge systems that run on mainframes. The following are examples of database applications: computerized library systems, flight reservation systems, and computerized parts inventory systems. |
|
|
|
{{R to section}} |
|
|
|
|
|
{{R with Wikidata item}} |
|
A DBMS typically supports ]s, which are in effect high-level programming languages, dedicated database languages that considerably simplify writing database application programs. Database languages also simplify the database organization as well as retrieving and presenting information from it. A DBMS provides facilities for controlling ], enforcing ], managing ], and ] the database after failures and restoring it from backup files, as well as maintaining database ]. |
|
|
|
}} |
|
|
|
|
DBMSs can be categorized according to the ](s) that they support, such as relational or XML, the type(s) of computer they support, such as a server cluster or a mobile phone, the ](s) that access the database, such as ] or ], performance trade-offs, such as maximum scale or maximum speed or others. Some DBMSs cover more than one entry in these categories, e.g., supporting multiple query languages. Database software typically support the ] (ODBC) standard which allows the database to integrate (to some extent) with other databases. |
|
|
|
|
|
==Overview and terminology== |
|
|
|
|
|
A ] is an organised pool of logically-related data. Data is stored within the data structures of the database. A DBMS is a suite of computer software providing the interface between users and a database or databases. A DBMS is a shell which surrounds a database or series of databases and through which all interactions take place with the database. The interactions catered for by most existing DBMS fall into four main groups: |
|
|
*Data definition. Defining new data structures for a database, removing data structures from the database, modifying the structure of existing data. |
|
|
*Data maintenance. Inserting new data into existing data structures, updating data in existing data structures, deleting data from existing data structures. |
|
|
*Data retrieval. Querying existing data by end-users and extracting data for use by application programs. |
|
|
*Data control. Creating and monitoring users of the database, restricting access to data in the database and monitoring the performance of databases. |
|
|
|
|
|
Both a database and its DBMS conform to the principles of a particular ].<ref>Tsitchizris, D. C. and F. H. Lochovsky (1982). ''Data Models.'' Englewood-Cliffs, Prentice-Hall.</ref> |
|
|
|
|
|
'''Database system''' refers collectively to the database model, database management system, and database.<ref>Beynon-Davies P. (2004). ''Database Systems'' 3rd Edition. Palgrave, Basingstoke, UK. ISBN 1-4039-1601-2</ref> |
|
|
|
|
|
Physically, database servers are dedicated computers that hold the actual databases and run only the DBMS and related software. Database servers are usually ] computers, with generous memory and ] disk arrays used for stable storage. Hardware database accelerators, connected to one or more servers via a high-speed channel, are also used in large volume transaction processing environments. DBMSs are found at the heart of most ]s. DBMSs may be built around a custom ] ] with built-in ] support, but modern DBMSs typically rely on a standard ] to provide these functions. {{Citation needed|date=April 2010}} |
|
|
|
|
|
Many databases have ] that accesses the database on behalf of end-users, without exposing the DBMS interface directly. Database designers and database administrators interact with the DBMS through dedicated interfaces to build and maintain the applications' databases, and thus need some more knowledge and understanding about how DBMSs operate and the DBMSs' external interfaces and tuning parameters. |
|
|
|
|
|
The development of a mature general-purpose DBMS typically takes several years and many man-years. Developers of DBMS typically update their product to follow and take advantage of progress in computer and storage technologies. Several DBMS products have been in on-going development since the 1970s-1980s. Since DBMSs comprise a significant ] ], computer and storage vendors often take into account DBMS requirements in their own development plans. |
|
|
|
|
|
==Functionality provided== |
|
|
|
|
|
Features commonly offered by database management systems include: |
|
|
|
|
|
;Query ability : Querying is the process of requesting attribute information from various perspectives and combination of factors. Example: "How many 2-door cars in Texas are green?" A database query language and report writer allow users to interactively interrogate the database, analyze its data and update it according to the users privileges on data. |
|
|
|
|
|
;Backup and replication : Copies of attributes need to be made regularly in case primary disks or other equipment fails. A periodic copy of attributes may also be created for a distant organization that cannot readily access the original. DBMS usually provide utilities to facilitate the process of extracting and disseminating attribute sets. When data is replicated between database servers, so that the information remains consistent throughout the database system and users cannot tell or even know which server in the DBMS they are using, the system is said to exhibit replication transparency. |
|
|
|
|
|
;Rule enforcement : Often one wants to apply rules to attributes so that the attributes are clean and reliable. For example, we may have a rule that says each car can have only one engine associated with it (identified by Engine Number). If somebody tries to associate a second engine with a given car, we want the DBMS to deny such a request and display an error message. However, with changes in the model specification such as, in this example, hybrid gas-electric cars, rules may need to change. Ideally such rules should be able to be added and removed as needed without significant data layout redesign. |
|
|
|
|
|
; Security : For security reasons, it is desirable to limit who can see or change specific attributes or groups of attributes. This may be managed directly on an individual basis, or by the assignment of individuals and privileges to groups, or (in the most elaborate models) through the assignment of individuals and groups to roles which are then granted entitlements. Data security prevents unauthorized users from viewing or updating the database. Using passwords, users are allowed access to the entire database or subsets of it called subschema. For example, an employee database can contain all the data about an individual employee, but one group of users may be authorized to view only payroll data, while others are allowed access to only work history and medical data. If the DBMS provides a way to interactively enter and update the database, as well as interrogate it, this capability allows for managing personal databases. Application-level code may be needed to record enough information to do a ] later. |
|
|
|
|
|
; Computation : Common computations requested on attributes are counting, summing, averaging, sorting, grouping, cross-referencing, and so on. Rather than have each computer application implement these from scratch, they can rely on the DBMS to supply such calculations. |
|
|
|
|
|
; Change and access logging : This describes who accessed which attributes, what was changed, and when it was changed. Logging services allow this by keeping a record of access occurrences and changes. |
|
|
|
|
|
; Automated optimization : For frequently occurring usage patterns or requests, some DBMS can adjust themselves to improve the speed of those interactions. In some cases the DBMS will merely provide tools to monitor performance, allowing a human expert to make the necessary adjustments after reviewing the statistics collected. |
|
|
|
|
|
==History== |
|
|
|
|
|
With the progress in technology in the areas of ], ], ] and ], the sizes, capabilities, and performance of databases and their respective DBMSs have grown in orders of magnitudes. |
|
|
|
|
|
The development of database technology can be divided into three eras based on ] or structure: ],<ref>{{citation | author = C. W. Bachmann | title = The Programmer as Navigator}}</ref> ]/], and post-relational. The two main early navigational data models were the ], epitomized by IBM's IMS system, and the ] model (]), implemented in a number of products such as ]. |
|
|
|
|
|
The ], first proposed in 1970 by ], departed from this tradition by insisting that applications should search for data by content, rather than by following links. The relational model is made up of ledger-style tables, each used for a different type of entity. It was not until the mid 1980s that computing hardware became powerful enough to allow relational systems (DBMSs plus applications) to be widely deployed. By the early 1990s, however, relational systems were dominant for all large-scale data processing applications, and they remain dominant today (2012) except in niche areas. The dominant database language is the standard SQL for the relational model, which has influenced database languages for other data models.{{facts|date=March 2013}} |
|
|
|
|
|
]s were invented in the 1980s to overcome the inconvenience of ], which led to the coining of the term "post-relational" but also development of hybrid ]s. |
|
|
|
|
|
The next generation of post-relational databases in the 2000s became known as ] databases, introducing fast ]s and ]s. A competing "next generation" known as ] databases attempted new implementations that retained the relational/SQL model while aiming to match the high performance of NoSQL compared to commercially available relational DBMSs. |
|
|
|
|
|
===1960s Navigational DBMS=== |
|
|
] database model.]] |
|
|
{{further|Navigational database}} |
|
|
|
|
|
The introduction of the term ''database'' coincided with the availability of direct-access storage (disks and drums) from the mid-1960s onwards. The term represented a contrast with the tape-based systems of the past, allowing shared interactive use rather than daily batch processing. The ] cites{{citation needed|date=November 2011}} a 1962 technical report as the first to use the term "data-base." |
|
|
|
|
|
As computers grew in speed and capability, a number of general-purpose database systems emerged; by the mid-1960s there were a number of such systems in commercial use. Interest in a standard began to grow, and ], author of one such product, the ] (IDS) , founded the "Database Task Group" within ], the group responsible for the creation and standardization of ]. In 1971 they delivered their standard, which generally became known as the "Codasyl approach", and soon a number of commercial products based on this approach were made available. |
|
|
|
|
|
The Codasyl approach was based on the "manual" navigation of a linked data set which was formed into a large network. When the database was first opened, the program was handed back a link to the first ] in the database, which also contained ]s to other pieces of data. To find any particular record the programmer had to step through these pointers one at a time until the required record was returned. Simple queries like "find all the people in India" required the program to walk the entire data set and collect the matching results one by one. There was, essentially, no concept of "find" or "search". This may sound like a serious limitation today, but in an era when most data was stored on ] such operations were too expensive to contemplate anyway. Solutions were found to many of these problems. Prime Computer created a CODASYL compliant DBMS based entirely on B-Trees that circumvented the record by record problem by providing alternate access paths. They also added a query language that was very straightforward. Further, there is no reason that relational normalization concepts cannot be applied to CODASYL databases however, in the final tally, CODASYL was very complex and required significant training and effort to produce useful applications. |
|
|
|
|
|
] also had their own DBMS system in 1968, known as ''IMS''. ] was a development of software written for the ] on the ]. IMS was generally similar in concept to Codasyl, but used a strict hierarchy for its model of data navigation instead of Codasyl's network model. Both concepts later became known as ]s due to the way data was accessed, and Bachman's 1973 ] presentation was ''The Programmer as Navigator''. IMS is classified as a ]. ] and CINCOM's TOTAL database are classified as ]. |
|
|
|
|
|
===1970s relational DBMS=== |
|
|
] worked at ] in ], in one of their offshoot offices that was primarily involved in the development of ] systems. He was unhappy with the navigational model of the Codasyl approach, notably the lack of a "search" facility. In 1970, he wrote a number of papers that outlined a new approach to database construction that eventually culminated in the groundbreaking ''A Relational Model ===Database machines and appliances=== |
|
|
{{Main|Database machine}} |
|
|
|
|
|
In the 1970s and 1980s attempts were made to build database systems with integrated hardware and software. The underlying philosophy was that such integration would provide higher performance at lower cost. Examples were IBM ], the early offering of ], and the ] database machine. |
|
|
|
|
|
Another approach to hardware support for database management was ]'s ] accelerator, a hardware disk controller with programmable search capabilities. In the long term, these efforts were generally unsuccessful because specialized database machines could not keep pace with the rapid development and progress of general-purpose computers. Thus most database systems nowadays are software systems running on general-purpose hardware, using general-purpose computer data storage. However this idea is still pursued for certain applications by some companies like ] and ] (]).of Data for Large Shared Data Banks''.<ref>Codd, E.F. (1970).. In: ''Communications of the ACM'' 13 (6): 377–387.</ref> |
|
|
|
|
|
In this paper, he described a new system for storing and working with large databases. Instead of records being stored in some sort of ] of free-form records as in Codasyl, Codd's idea was to use a "]" of fixed-length records, with each table used for a different type of entity. A linked-list system would be very inefficient when storing "sparse" databases where some of the data for any one record could be left empty. The relational model solved this by splitting the data into a series of normalized tables (or ''relations''), with optional elements being moved out of the main table to where they would take up room only if needed. Data may be freely inserted, deleted and edited in these tables, with the DBMS doing whatever maintenance needed to present a table view to the application/user. |
|
|
|
|
|
], related records are linked together with a "key"]] |
|
|
|
|
|
The relational model also allowed the content of the database to evolve without constant rewriting of links and pointers. The relational part comes from entities referencing other entities in what is known as one-to-many relationship, like a traditional hierarchical model, and many-to-many relationship, like a navigational (network) model. Thus, a relational model can express both hierarchical and navigational models, as well as its native tabular model, allowing for pure or combined modeling in terms of these three models, as the application requires. |
|
|
|
|
|
For instance, a common use of a database system is to track information about users, their name, login information, various addresses and phone numbers. In the navigational approach all of these data would be placed in a single record, and unused items would simply not be placed in the database. In the relational approach, the data would be ''normalized'' into a user table, an address table and a phone number table (for instance). Records would be created in these optional tables only if the address or phone numbers were actually provided. |
|
|
|
|
|
Linking the information back together is the key to this system. In the relational model, some bit of information was used as a "]", uniquely defining a particular record. When information was being collected about a user, information stored in the optional tables would be found by searching for this key. For instance, if the login name of a user is unique, addresses and phone numbers for that user would be recorded with the login name as its key. This simple "re-linking" of related data back into a single collection is something that traditional computer languages are not designed for. |
|
|
|
|
|
Just as the navigational approach would require programs to loop in order to collect records, the relational approach would require loops to collect information about any ''one'' record. Codd's solution to the necessary looping was a set-oriented language, a suggestion that would later spawn the ubiquitous ]. Using a branch of mathematics known as ], he demonstrated that such a system could support all the operations of normal databases (inserting, updating etc.) as well as providing a simple system for finding and returning ''sets'' of data in a single operation. |
|
|
|
|
|
Codd's paper was picked up by two people at Berkeley, ] and ]. They started a project known as ] using funding that had already been allocated for a geographical database project and student programmers to produce code. Beginning in 1973, INGRES delivered its first test products which were generally ready for widespread use in 1979. INGRES was similar to ] in a number of ways, including the use of a "language" for data access, known as ]. Over time, INGRES moved to the emerging SQL standard. |
|
|
|
|
|
IBM itself did one test implementation of the relational model, ], and a production one, ], both now discontinued. ] wrote ] for ], and now there are two new implementations: ] and ]. Most other DBMS implementations usually called ''relational'' are actually SQL DBMSs. |
|
|
|
|
|
In 1970, the University of Michigan began development of the ]<ref name=Hershey1972>William Hershey and Carol Easthope, , Spring Joint Computer Conference, May 1972 in ''ACM SIGIR Forum'', Volume 7, Issue 4 (December 1972), pp. 45-55, DOI=</ref> based on D.L. Childs' Set-Theoretic Data model.<ref name=North2010>Ken North, , ''Dr. Dobb's'', 10 March 2010</ref><ref>, D. L. Childs, 1968, Technical Report 3 of the CONCOMP (Research in Conversational Use of Computers) Project, University of Michigan, Ann Arbor, Michigan, USA</ref><ref>, D. L. Childs, 1968, Technical Report 6 of the CONCOMP (Research in Conversational Use of Computers) Project, University of Michigan, Ann Arbor, Michigan, USA</ref> Micro was used to manage very large data sets by the ], the ], and researchers from the ], the ], and ]. It ran on IBM mainframe computers using the ].<ref name=MICROManual1977>, M.A. Kahn, D.L. Rumelhart, and B.L. Bronson, October 1977, Institute of Labor and Industrial Relations (ILIR), University of Michigan and Wayne State University</ref> The system remained in production until 1998. |
|
|
|
|
|
===Late-1970s SQL DBMS=== |
|
|
IBM started working on a prototype system loosely based on Codd's concepts as '']'' in the early 1970s. The first version was ready in 1974/5, and work then started on multi-table systems in which the data could be split so that all of the data for a record (some of which is optional) did not have to be stored in a single large "chunk". Subsequent multi-user versions were tested by customers in 1978 and 1979, by which time a standardized ] – ]{{Citation needed|reason=First version of SQL standard was SQL-86 adopted in 1986|date=May 2012}} – had been added. Codd's ideas were establishing themselves as both workable and superior to Codasyl, pushing IBM to develop a true production version of System R, known as ''SQL/DS'', and, later, ''Database 2'' (]). |
|
|
|
|
|
Many of the people involved with INGRES became convinced of the future commercial success of such systems, and formed their own companies to commercialize the work but with an SQL interface. ], ], ] and eventually ] itself were all being sold as offshoots to the original INGRES product in the 1980s. Even ] is actually a re-built version of Sybase, and thus, INGRES. Only ]'s ] started from a different chain, based on IBM's papers on System R, and beat IBM to market when the first version was released in 1978. |
|
|
|
|
|
Stonebraker went on to apply the lessons from INGRES to develop a new database, Postgres, which is now known as ]. PostgreSQL is often used for global mission critical applications (the .org and .info domain name registries use it as their primary data store, as do many large companies and financial institutions). |
|
|
|
|
|
In Sweden, Codd's paper was also read and ] was developed from the mid-70s at ]. In 1984, this project was consolidated into an independent enterprise. In the early 1980s, Mimer introduced transaction handling for high robustness in applications, an idea that was subsequently implemented on most other DBMS. |
|
|
|
|
|
Another data model, the ], emerged in 1976 and gained popularity for ] as it emphasized a more familiar description than the earlier relational model. Later on, entity-relationship constructs were retrofitted as a data modeling construct for the relational model, and the difference between the two have become irrelevant.{{fact|date=March 2013}} |
|
|
|
|
|
===1980s object-oriented databases=== |
|
|
The 1980s, along with a rise in ], saw a growth in how data in various databases were handled. Programmers and designers began to treat the data in their databases as objects. That is to say that if a person's data were in a database, that person's attributes, such as their address, phone number, and age, were now considered to belong to that person instead of being extraneous data. This allows for relations between data to be relations to objects and their attributes and not to individual fields.<ref>Development of an object-oriented DBMS; Portland, Oregon, United States; Pages: 472 – 482; 1986; ISBN 0-89791-204-7</ref> The term "]" described the inconvenience of translating between programmed objects and database tables. ]s and ]s attempt to solve this problem by providing an object-oriented language (sometimes as extensions to SQL) that programmers can use as alternative to purely relational SQL. On the programming side, libraries known as ]s (ORMs) attempt to solve the same problem. |
|
|
|
|
|
Another big game changer for databases in the 1980s was the focus on increasing reliability and access speeds. In 1989, two professors from the University of Wisconsin at Madison published an article at an ] associated conference outlining their methods on increasing database performance. The idea was to replicate specific important, and often queried information, and store it in a smaller temporary database that linked these key features back to the main database. This meant that a query could search the smaller database more quickly, rather than search the entire dataset.<ref>Performance enhancement through replication in an object-oriented DBMS; Pages 325–336; ISBN 0-89791-317-5</ref> This eventually leads to the practice of ], which is used by almost every operating system from Windows to the system that operates Apple iPod devices. |
|
|
|
|
|
===Database machines and appliances=== |
|
|
{{Main|Database machine}} |
|
|
|
|
|
In the 1970s and 1980s attempts were made to build database systems with integrated hardware and software. The underlying philosophy was that such integration would provide higher performance at lower cost. Examples were IBM ], the early offering of ], and the ] database machine. |
|
|
|
|
|
Another approach to hardware support for database management was ]'s ] accelerator, a hardware disk controller with programmable search capabilities. In the long term, these efforts were generally unsuccessful because specialized database machines could not keep pace with the rapid development and progress of general-purpose computers. Thus most database systems nowadays are software systems running on general-purpose hardware, using general-purpose computer data storage. However this idea is still pursued for certain applications by some companies like ] and ] (]). |
|
|
|
|
|
===2000s NoSQL and NewSQL databases=== |
|
|
{{main|NoSQL}} |
|
|
|
|
|
The next generation of post-relational databases in the 2000s became known as ] databases, including fast ]s and ]s. ] are a type of structured document-oriented database that allows querying based on ] document attributes. |
|
|
|
|
|
NoSQL databases are often very fast, do not require fixed table schemas, avoid join operations by storing ] data, and are designed to ]. |
|
|
|
|
|
In recent years there was a high demand for massively distributed databases with high partition tolerance but according to the ] it is impossible for a ] to simultaneously provide ], ] and ] guarantees. A distributed system can satisfy any two of these guarantees at the same time, but not all three. For that reason many NoSQL databases are using what is called ] to provide both availability and partition tolerance guarantees with a maximum level of data consistency. |
|
|
|
|
|
The most popular NoSQL systems include: ], ], ], ], ], ] and ],<ref>{{cite web|url=http://db-engines.com/en/ranking |title=DB-Engines Ranking |date=January 2013 |accessdate=22 January 2013}}</ref> that all are ] products. |
|
|
|
|
|
A number of new relational databases continuing use of SQL but aiming for performance comparable to NoSQL are known as ]. |
|
|
|
|
|
==Components== |
|
|
|
|
|
{{unreferenced-section|date=March 2013}} |
|
|
|
|
|
DBMS ] specifies its components (including descriptions of their functions) and their interfaces. DBMS architecture is distinct from database architecture. The following are major DBMS components: |
|
|
|
|
|
*'''DBMS external ]s''' - They are the means to communicate with the DBMS (both ways, to and from the DBMS) to perform all the operations needed for the DBMS. These can be operations on a database, or operations to operate and manage the DBMS. For example: |
|
|
::- Direct database operations: defining data types, assigning security levels, updating data, querying the database, etc. |
|
|
::- Operations related to DBMS operation and management: backup and restore, database recovery, security monitoring, database storage allocation and database layout configuration monitoring, performance monitoring and tuning, etc. |
|
|
:An external interface can be either a '']'' (e.g., typically for a database administrator), or an '']'' (API) used for communication between an application program and the DBMS. |
|
|
*'''Database language engines''' (or '''processors''') - Most operations upon databases are performed through expression in Database languages (see above). Languages exist for data definition, data manipulation and queries (e.g., SQL), as well as for specifying various aspects of security, and more. Language expressions are fed into a DBMS through proper interfaces. A language engine processes the language expressions (by a compiler or language interpreter) to extract the intended database operations from the expression in a way that they can be executed by the DBMS. |
|
|
*''']''' - Performs ] on every query to choose for it the most efficient '']'' (a partial order (tree) of operations) to be executed to compute the query result. |
|
|
*''']''' - Performs the received database operations on the database objects, typically at their higher-level representation. |
|
|
*'''Storage engine''' - translates the operations to low-level operations on the storage ]s. In some references the Storage engine is viewed as part of the database engine. |
|
|
*'''] engine''' - for correctness and reliability purposes most DBMS internal operations are performed encapsulated in transactions (see below). Transactions can also be specified externally to the DBMS to encapsulate a group of operations. The transaction engine tracks all the transactions and manages their execution according to the transaction rules (e.g., proper concurrency control, and proper ''commit'' or ''abort'' for each). |
|
|
*'''DBMS management and operation component''' - Comprises many components that deal with all the DBMS management and operational aspects like performance monitoring and tuning, backup and restore, recovery from failure, security management and monitoring, database storage allocation and database storage layout monitoring, etc. |
|
|
|
|
|
{{cleanup-merge|date=March 2013}} |
|
|
* '''Data definition subsystem''' helps the user create and maintain the data dictionary and define the structure of the file in a database. |
|
|
* '''Data manipulation subsystem''' helps the user to add, change, and delete information in a database and query it for valuable information. Software tools within the data manipulation subsystem are most often the primary interface between user and the information contained in a database. It allows the user to specify its logical information requirements. |
|
|
* '''Application generation subsystem''' contains facilities to help users develop transaction-intensive applications. It usually requires that the user perform a detailed series of tasks to process a transaction. It facilitates easy-to-use data entry screens, programming languages, and interfaces. |
|
|
* '''Data administration subsystem''' helps users manage the overall database environment by providing facilities for backup and recovery, security management, query optimization, concurrency control, and change management. |
|
|
|
|
|
==Data structures (models)== |
|
|
] |
|
|
{{main|Database model}} |
|
|
A ] is a type of ] that determines the logical structure of a ] and fundamentally determines in which manner ] can be stored, organized, and manipulated. The most popular example of a database model is the ]. |
|
|
|
|
|
Common ]s for databases include: |
|
|
|
|
|
* ] |
|
|
* ] |
|
|
* ] |
|
|
* ] |
|
|
* ] |
|
|
* ] |
|
|
* ] |
|
|
* ] |
|
|
|
|
|
An ] combines the two related structures. Various ]s can implement any given logical model, and different models have different, application-specific performance characteristics. Most database management systems are built around one particular data model, although it is possible for products to offer support for more than one model. |
|
|
|
|
|
===In practice=== |
|
|
{{sync|Database model}} |
|
|
The dominant model in use today is the ad hoc one embedded in ], despite the objections of purists who believe this model is a corruption of the relational model since it violates several fundamental principles for the sake of practicality and performance. Many DBMSs also support the ] ] that supports a standard way for programmers to access the DBMS. |
|
|
|
|
|
Before the database management approach, organizations relied on file processing systems to organize, store, and process data files. End users criticized file processing because the data is stored in many different files and each organized in a different way. Each file was specialized to be used with a specific application. File processing was bulky, costly and inflexible when it came to supplying needed data accurately and promptly. Data redundancy is an issue with the file processing system because the independent data files produce duplicate data so when updates were needed each separate file would need to be updated. Another issue is the lack of data integration. The data is dependent on other data to organize and store it. Lastly, there was not any consistency or standardization of the data in a file processing system which makes maintenance difficult. For these reasons, the database management approach was produced. |
|
|
|
|
|
==Database query language== |
|
|
A ] and report object allows users to interactively interrogate the database, analyze its data and update it according to the ] on data. It also controls the security of the database. ] prevents unauthorized users from viewing or updating the database. Using passwords, users are allowed access to the entire database or subsets of it called ''subschemas''. For example, an employee database can contain all the data about an individual employee, but one group of users may be authorized to view only payroll data, while others are allowed access to only work history and medical data. |
|
|
|
|
|
If the DBMS provides a way to interactively enter and update the database, as well as interrogate it, this capability allows for managing personal databases. However, it may not leave an audit trail of actions or provide the kinds of controls necessary in a multi-user organization. These controls are only available when a set of application programs are customized for each data entry and updating function. |
|
|
Data structure is a logically representation of relationships between individual elements and data. |
|
|
|
|
|
==Database storage== |
|
|
{{Main|Computer data storage}} |
|
|
|
|
|
Database storage is the container of the physical materialization of a database. It comprises the ''internal'' (physical) ''level'' in the database architecture. It also contains all the information needed (e.g., ], "data about the data", and internal ]s) to reconstruct the ''conceptual level'' and ''external level'' from the internal level when needed. It is generally the responsibility of the ]. Though typically accessed by a DBMS through the underlying ] (and often utilizing the operating systems' ]s as intermediates for storage layout), storage properties and configuration setting are extremely important for the efficient operation of the DBMS, and thus are closely maintained by database administrators. A DBMS, while in operation, always has its database residing in several types of storage (e.g., memory and external storage). The database data and the additional needed information, possibly in very large amounts, are coded into bits. Data typically reside in the storage in structures that look completely different from the way the data look in the conceptual and external levels, but in ways that attempt to optimize (the best possible) these levels' reconstruction when needed by users and programs, as well as for computing additional types of needed information from the data (e.g., when querying the database). |
|
|
|
|
|
In principle the database storage can be viewed as a ] ], where every bit of data has its unique address in this address space. Practically only a very small percentage of addresses is kept as initial reference points (which also requires storage), and most of the database data is accessed by indirection using displacement calculations (distance in bits from the reference points) and data structures which define access paths (using pointers) to all needed data in an effective manner, optimized for the needed data access operations. |
|
|
|
|
|
===General considerations=== |
|
|
|
|
|
Data is ] by assigning a bit pattern to each ], ], or ] object. Many standards exist for encoding (e.g., ], ], ]). By adding bits to each encoded unit, the redundancy allows both to detect errors in coded data and to correct them based on mathematical algorithms. Errors occur regularly in low probabilities due to ] bit value flipping, or "physical bit fatigue," loss of the physical bit in storage its ability to maintain distinguishable value (0 or 1), or due to errors in inter or intra-computer communication. A random bit flip (e.g., due to random ]) is typically corrected upon detection. A bit, or a group of malfunctioning physical bits (not always the specific defective bit is known; group definition depends on specific storage device) is typically automatically fenced-out, taken out of use by the device, and replaced with another functioning equivalent group in the device, where the corrected bit values are restored (if possible). The ] (CRC) method is typically used in storage for ]. Some DBMS support specifying which ] was used to store data, so multiple encodings can be used in the same database. |
|
|
|
|
|
] methods allow in many cases to represent a string of bits by a shorter bit string ("compress") and reconstruct the original string ("decompress") when needed. This allows to utilize substantially less storage (tens of percents) for many types of data at the cost of more computation (compress and decompress when needed). Analysis of trade-off between storage cost saving and costs of related computations and possible delays in data availability is done before deciding whether to keep certain data in a database compressed or not. Data compression is typically controlled through the DBMS's data definition interface, but in some cases may be a ] and automatic. |
|
|
|
|
|
For security reasons certain types of data (e.g., credit-card information) may be kept ] in storage to prevent the possibility of unauthorized information reconstruction from chunks of storage snapshots (taken either via unforeseen vulnerabilities in a DBMS, or more likely, by bypassing it). Data encryption is typically controlled through the DBMS's data definition interface, but in some cases may be a default and automatic. |
|
|
|
|
|
===Data storage types=== |
|
|
|
|
|
This collection of bits describes both the contained database data and its related metadata (i.e., data that describes the contained data and allows computer programs to manipulate the database data correctly). The size of a database can be tens of ]s, where a ] is eight bits. The physical materialization of a bit can employ various existing technologies, while new and improved technologies are constantly under development. Common examples are: |
|
|
*''Magnetic medium'' (e.g., in ]) - orientation of ] in magnetic regions on a surface of material (two orientation directions, for 0 and 1). |
|
|
*'']'' (DRAM) - State of a miniature ] consisting of few ]s (among millions nowadays) in an ] (two states for 0 and 1). |
|
|
These two examples are respectively for two major storage types: |
|
|
*''Nonvolatile storage'' can maintain its bit states (0s and 1s) without electrical power supply, or when power supply is interrupted; |
|
|
*'']'' loses its bit values when power supply is interrupted (i.e., its content is erased). |
|
|
|
|
|
Sophisticated storage units, which can, in fact, be effective dedicated parallel computers that support a large amount of nonvolatile storage, typically must include also components with volatile storage. Some such units employ ] that can provide power for several hours in case of external power interruption (e.g., see the ]) and thus maintain the content of the volatile storage parts intact. Just before such a device's batteries lose their power the device typically automatically backs-up its volatile content portion (into nonvolatile) and shuts off to protect its data. |
|
|
|
|
|
Databases are usually too expensive (in terms of importance and needed investment in resources, e.g., time, money, to build them) to be lost by a power interruption. Thus at any point in time most of their content resides in nonvolatile storage. Even if for operational reason very large portions of them reside in volatile storage (e.g., tens of ]s in volatile memory, for in-memory databases), most of this is backed-up in nonvolatile storage. A relatively small portion of this, which temporarily may not have nonvolatile backup, can be reconstructed by proper automatic database recovery procedures after volatile storage content loss. |
|
|
|
|
|
More examples of storage types: |
|
|
*Volatile storage can be found in processors, computer memory (e.g., DRAM), etc. |
|
|
*Non-volatile storage types include ], ], ]s, ] and ]s, ]s, etc. |
|
|
|
|
|
====Storage metrics==== |
|
|
{{Expand section|date=July 2011}} |
|
|
Databases always use several types of storage when operational (and implied several when idle). Different types may significantly differ in their properties, and the optimal mix of storage types is determined by the types and quantities of operations that each storage type needs to perform, as well as considerations like physical space and energy consumption and dissipation (which may become critical for a large database). Storage types can be categorized by the following attributes: |
|
|
*Volatile/nonvolatile. |
|
|
*Cost of the medium (e.g., per megabyte), cost to operate (cost of energy consumed per unit time). |
|
|
*Access speed (e.g., bytes per second). |
|
|
*Granularity — from fine to coarse (e.g., size in bytes of access operation). |
|
|
*Reliability (the probability of spontaneous bit value change under various conditions). |
|
|
*Maximal possible number of writes (of any specific bit or specific group of bits; could be constrained by the technology used (e.g., "write once" or "write twice"), or due to "physical bit fatigue," loss of ability to distinguish between the 0, 1 states due to many state changes (e.g., in Flash memory)). |
|
|
*Power needed to operate (Energy per time; energy per byte accessed), Energy efficiency, Heat to dissipate. |
|
|
*Packaging density (e.g., realistic number of bytes per volume unit) |
|
|
|
|
|
====Protecting storage device content: Device mirroring (replication) and RAID==== |
|
|
{{Main|Disk mirroring|RAID}} |
|
|
:See also ] |
|
|
|
|
|
While a group of bits malfunction may be resolved by error detection and correction mechanisms (see above), storage device malfunction requires different solutions. The following solutions are commonly used and valid for most storage devices: |
|
|
* '''Device ] (replication)''' - A common solution to the problem is constantly maintaining an identical copy of device content on another device (typically of a same type). The downside is that this doubles the storage, and both devices (copies) need to be updated simultaneously with some overhead and possibly some delays. The upside is possible concurrent read of a same data group by two independent processes, which increases performance. When one of the replicated devices is detected to be defective, the other copy is still operational, and is being utilized to generate a new copy on another device (usually available operational in a pool of stand-by devices for this purpose). |
|
|
* '''Redundant array of independent disks''' (''']''') - This method generalizes the device mirroring above by allowing one device in a group of N devices to fail and be replaced with content restored (Device mirroring is RAID with N=2). RAID groups of N=5 or N=6 are common. N>2 saves storage, when comparing with N=2, at the cost of more processing during both regular operation (with often reduced performance) and defective device replacement. |
|
|
|
|
|
Device mirroring and typical RAID are designed to handle a single device failure in the RAID group of devices. However, if a second failure occurs before the RAID group is completely repaired from the first failure, then data can be lost. The probability of a single failure is typically small. Thus the probability of two failures in a same RAID group in time proximity is much smaller (approximately the probability squared, i.e., multiplied by itself). If a database cannot tolerate even such smaller probability of data loss, then the RAID group itself is replicated (mirrored). In many cases such mirroring is done geographically remotely, in a different storage array, to handle also recovery from disasters (see disaster recovery above). |
|
|
|
|
|
===Database storage layout=== |
|
|
|
|
|
Database bits are laid-out in storage in data-structures and grouping that can take advantage of both known effective algorithms to retrieve and manipulate them and the storage own properties. Typically the storage itself is design to meet requirements of various areas that extensively utilize storage, including databases. A DBMS in operation always simultaneously utilizes several storage types (e.g., memory, and external storage), with respective layout methods. |
|
|
|
|
|
====Database storage hierarchy==== |
|
|
|
|
|
A database, while in operation, resides simultaneously in several types of storage. By the nature of contemporary computers most of the database part inside a computer that hosts the DBMS resides (partially replicated) in volatile storage. Data (pieces of the database) that are being processed/manipulated reside inside a processor, possibly in ]. These data are being read from/written to memory, typically through a computer ] (so far typically volatile storage components). Computer memory is communicating data (transferred to/from) external storage, typically through standard storage interfaces or networks (e.g., ], ]). A ], a common external storage unit, typically has storage hierarchy of it own, from a fast cache, typically consisting of (volatile and fast) ], which is connected (again via standard interfaces) to drives, possibly with different speeds, like ]s{{disambiguation needed|date=February 2012}} and magnetic ]s (non-volatile). The drives may be connected to ]s, on which typically the least active parts of a large database may reside, or database backup generations. |
|
|
|
|
|
Typically a correlation exists currently between storage speed and price, while the faster storage is typically volatile. |
|
|
|
|
|
====Data structures==== |
|
|
{{Main|Database storage structures}} |
|
|
{{Expand section|date=June 2011}} |
|
|
|
|
|
A data structure is an abstract construct that embeds data in a well defined manner. An efficient data structure allows to manipulate the data in efficient ways. The data manipulation may include data insertion, deletion, updating and retrieval in various modes. A certain data structure type may be very effective in certain operations, and very ineffective in others. A data structure type is selected upon DBMS development to best meet the operations needed for the types of data it contains. Type of data structure selected for a certain task typically also takes into consideration the type of storage it resides in (e.g., speed of access, minimal size of storage chunk accessed, etc.). In some DBMSs database administrators have the flexibility to select among options of data structures to contain user data for performance reasons. Sometimes the data structures have selectable parameters to tune the database performance. |
|
|
|
|
|
Databases may store data in many data structure types.<ref name="Physical Database Design">{{harvnb|Lightstone|Teorey|Nadeau|2007}}</ref> Common examples are the following: |
|
|
|
|
|
* ordered/unordered ] |
|
|
* ]s |
|
|
* ]s |
|
|
* ] |
|
|
* ] |
|
|
|
|
|
In contrast to conventional row-orientation, relational databases can also be ] or ] in the way they store data in these structures. |
|
|
|
|
|
====Application data and DBMS data==== |
|
|
|
|
|
A typical DBMS cannot store the data of the application it serves alone. In order to handle the application data the DBMS need to store this data in data structures that comprise specific data by themselves. In addition the DBMS needs its own data structures and many types of bookkeeping data like indexes and ]s. The DBMS data is an integral part of the database and may comprise a substantial portion of it. |
|
|
|
|
|
====Database indexing==== |
|
|
{{Main|Index (database)}} |
|
|
|
|
|
] is a technique for improving database performance. The many types of indexes share the common property that they reduce the need to examine every entry when running a query. In large databases, this can reduce query time/cost by orders of magnitude. The simplest form of index is a sorted list of values that can be searched using a ] with an adjacent reference to the location of the entry, analogous to the index in the back of a book. The same data can have multiple indexes (an employee database could be indexed by last name and hire date.) |
|
|
|
|
|
Indexes affect performance, but not results. Database designers can add or remove indexes without changing application logic, reducing maintenance costs as the database grows and database usage evolves. |
|
|
|
|
|
Given a particular query, the DBMS' query optimizer is responsible for devising the most efficient strategy for finding matching data. |
|
|
|
|
|
Indexes can speed up data access, but they consume space in the database, and must be updated each time the data is altered. Indexes therefore can speed data access but slow data maintenance. These two properties determine whether a given index is worth the cost. |
|
|
|
|
|
====Database data clustering==== |
|
|
|
|
|
In many cases substantial performance improvement is gained if different types of database objects that are usually utilized together are laid in storage in proximity, being ''clustered''. This usually allows to retrieve needed related objects from storage in minimum number of input operations (each sometimes substantially time consuming). Even for in-memory databases clustering provides performance advantage due to common utilization of large caches for input-output operations in memory, with similar resulting behavior. |
|
|
|
|
|
For example it may be beneficial to cluster a record of an ''item'' in stock with all its respective ''order'' records. The decision of whether to cluster certain objects or not depends on the objects' utilization statistics, object sizes, caches sizes, storage types, etc. |
|
|
|
|
|
====Database materialized views==== |
|
|
{{Main|Materialized view}} |
|
|
|
|
|
Often storage redundancy is employed to increase performance. A common example is storing '']s'', which consist of frequently needed ''external views'' or query results. Storing such views saves the expensive computing of them each time they are needed. The downsides of materialized views are the overhead incurred when updating them to keep them synchronized with their original updated database data, and the cost of storage redundancy. |
|
|
|
|
|
====Database and database object replication==== |
|
|
{{Main|Database replication}} |
|
|
:See also '']'' below |
|
|
|
|
|
Occasionally a database employs storage redundancy by database objects replication (with one or more copies) to increase data availability (both to improve performance of simultaneous multiple end-user accesses to a same database object, and to provide resiliency in a case of partial failure of a distributed database). Updates of a replicated object need to be synchronized across the object copies. In many cases the entire database is replicated. |
|
|
|
|
|
==Database transactions== |
|
|
{{Main|Database transaction}} |
|
|
As with every software system, a DBMS that operates in a faulty computing environment is prone to failures of many kinds. A failure can corrupt the respective database unless special measures are taken to prevent this. A DBMS achieves certain levels of ] by encapsulating operations within transactions. The concept of a ] (or ''atomic transaction'') has evolved in order to ensure ]: the data should be in a coherent state after recovery from a crash. A database transaction is a unit of work, typically encapsulating a number of operations over a database (e.g., reading a database object, writing, acquiring ], etc.), an abstraction supported in database and also other systems. Each transaction has well defined boundaries in terms of which program/code executions are included in that transaction (determined by the transaction's programmer via special transaction commands). |
|
|
|
|
|
The acronym ] describes some ideal properties of a database transaction: ], ], ], and ]. |
|
|
|
|
|
{{further|Concurrency control}} |
|
|
|
|
|
==Query optimization== |
|
|
{{Main|Query optimization|Query optimizer}} |
|
|
|
|
|
A query is a request for information from a database. It can be as simple as "finding the address of a person with SS# 123-45-6789," or more complex like "finding the average salary of all the employed married men in California between the ages 30 to 39, that earn less than their wives." Queries results are generated by accessing relevant database data and manipulating it in a way that yields the requested information. Since database structures are complex, in most cases, and especially for not-very-simple queries, the needed data for a query can be collected from a database by accessing it in different ways, through different data-structures, and in different orders. Each different way typically requires different processing time. Processing times of a same query may have large variance, from a fraction of a second to hours, depending on the way selected. The purpose of query optimization, which is an automated process, is to find the way to process a given query in minimum time. The large possible variance in time justifies performing query optimization, though finding the exact optimal way to execute a query, among all possibilities, is typically very complex, time consuming by itself, may be too costly, and often practically impossible. Thus query optimization typically tries to approximate the optimum by comparing several common-sense alternatives to provide in a reasonable time a "good enough" plan which typically does not deviate much from the best possible result. |
|
|
|
|
|
==Development and monitoring support== |
|
|
{{Expand section|date=May 2011}} |
|
|
A DBMS typically intends to provide convenient environment to develop and later maintain an application built around its respective database type. A DBMS either provides such tools, or allows integration with such external tools. Examples for tools relate to database design, application programming, application program maintenance, database performance analysis and monitoring, database configuration monitoring, DBMS hardware configuration (a DBMS and related database may span computers, networks, and storage units) and related database mapping (especially for a distributed DBMS), storage allocation and database layout monitoring, storage migration, etc. |
|
|
|
|
|
==Distributed DBMS== |
|
|
A ] (DDBMS) is a collection of data which logically belong to the same system but are spread out over the sites of the computer network. The two aspects of a distributed database are distribution and logical correlation: |
|
|
*Distribution: The fact that the data are not resident at the same site, so that we can distinguish a distributed database from a single, centralized database. |
|
|
*Logical Correlation: The fact that the data have some properties which tie them together, so that we can distinguish a distributed database from a set of local databases or files which are resident at different sites of a computer network. |
|
|
|
|
|
==See also== |
|
|
{{multicol}} |
|
|
* ] |
|
|
* ] |
|
|
* ] |
|
|
* ] |
|
|
* ] |
|
|
* ] |
|
|
* ] |
|
|
* ] |
|
|
* ] |
|
|
|
|
|
==References== |
|
|
{{Reflist}} |
|
|
|
|
|
==Further reading== |
|
|
* ], Henry F. Korth, S. Sudarshan, '''' |
|
|
* ] and ], '''' |
|
|
|
|
|
{{Databases}} |
|
|
{{Database}} |
|
|
{{Database models}} |
|
|
|
|
|
] |
|
|
|
|
|
] |
|
|
] |
|
|
] |
|