History of Databases
Databases have evolved in four major phases, each of which has overlapped with at least one later phase (many Phase 1 databases are still in use):
Phase 1 – The first interactive databases, running on mainframes. Required computer code to be written to extract information. Tree-like in structure, they needed these trees to be traversed in order to get a desired piece of data, which could require intensive processing. Data structures were defined by computer engineering needs.
Phase 2 – Relational Databases have data split into tables with relations defined between them. They
use the standard SQL language to both read and write data. The paradigm initially supported both
transaction processing and information generation. Some deficiencies with the latter led to an
extension of the concept to better support information needs via technologies such as Data
Warehouses and OLAP (the latter itself sometimes being a multidimensional database). Data
structures are similar to actual business entities and transactions. This approach scales by using larger
computers, or by employing parallel processing (cf. Data Warehouse Appliances). Relational
Databases are typically used by a wide variety of business and technical staff.
Phase 3 – NoSQL technologies (such as Big Data) evolved from web-based businesses needing to store such
vast quantities of information (multiple petabytes where 1 Pb = 1015 bytes); so big that it had to be distributed
across many machines. These were developed to sift through large of data sets searching for patterns. They are
now often also applied to sensor-generated information (e.g. from jet engines). A large library of open source
statistical tools is available. Data is not structured when initially stored, a structure is applied when tools read the
database. Here scaling is by adding more (commodity) computers to the grid. Big Data is typically used by
specialist staff with a background in both technology and statistics; these are known as Data Scientists.
Phase 4 – Extension of the distributed NoSQL paradigm to SQL databases. A new class of technology, with SAP HANA as the most mature offering.
Some databases from both Phase 3 and Phase 4 are now held in memory (as opposed to on disk), this makes it lightning fast to access data.
Obviously, the data still needs to be stored on disk at some point; it needs to be loaded into memory from somewhere and changes need to be saved.