This is the first in a two-part series.
The further we get from the origin of a piece of technology, the easier it is for us to take for granted the incredible amount of work and huge leaps in innovation that were required to get us to where we are today. Databases are one such piece of technology. Many of us were born decades after the first database was implemented, and while we know the technology is old, we have no understanding of the path taken to get where we are today.
To understand how data storage and organization evolved, we first need to understand how computers were used at the dawn of computing. There wasn’t “data storage” as we know it today, just big boxes of punch cards. There was no real storage built into the machine itself, no operating systems, no nothing. This was mostly fine for the way people were doing things. Computers could only run one task at a time, so you’d show up for your appointment with the computer, run your punch cards, take your printed output and leave. You couldn’t interact with the computer otherwise. They were more like huge, advanced calculators.
In 1951, the Univac I computer was released, and with it the first magnetic tape storage drive. This allowed for much faster writes — hundreds of records per second — but tape storage is otherwise slow if you need to seek something since it can only be accessed sequentially.
That changed just a few short years later, in 1956, when IBM introduced disc storage with the 305 RAMAC. Unlike magnetic tape, data stored on discs could be accessed randomly, which sped up both reads and writes. We’d only been accessing data and executing programs sequentially until then, so conceptually, this was a pretty huge jump for people. Without a system to make organizing and accessing that data easier, it wasn’t actually a huge boon.
In 1961, when Charles Bachman wrote the first database management system for General Electric, it was called the Integrated Data Store, or IDS, and this opened the door to a lot of new technology. Architecturally it was a masterpiece, and there are IDS-type databases still in use today. For certain applications, its performance simply cannot be matched by a navigational database.
A few years later, with other general-purpose database systems entering the market but no real standards set for interacting with them, Bachman founded the Committee on Data Systems Languages (CODASYL) to begin work on a standard programming language. Thus, COBOL was born.
In 1966 another navigational database was released that would alter the course of history — IBM’s Information Management System, which was developed and released on the incredible IBM System/360 for the Apollo missions. Nothing we have today, from a computing perspective, would be possible without the System/360 and the things that were built for it. Countless innovations in computing, from virtualization to data storage, were pioneered at IBM for the System/360 mainframe. In this case, IMS sent us to the moon by handling the inventory for the bill of materials for the cool rocket ship, the Saturn V. IBM calls this a hierarchical database, but both IDS and IMS are examples of the earliest navigational databases.
In the 1970s, the collars got wider and the databases became relational. Edgar Codd was working on hard disks at IBM at the time, and he was pretty frustrated with the CODASYL approach since functionally, everything was a linked list, making a search feature impossible.
In his paper “A Relational Model of Data for Large Shared Data Banks,” he described an alternative model that would organize data into a group of tables, with each table containing a different category of data. Within the table, data would be organized into a fixed number of columns, with one column containing a unique identifier for that particular item and the remaining columns containing the item’s attributes. From this model, he described queries that joined tables based on the relationships between those unique keys to return your result. Sounds familiar, yes?
Originally, Codd described this model using mathematical terms. Instead of tables, rows and columns, he used relations, tuples and domains. The name of the model itself, “relational database,” comes from the mathematical system of relational calculus upon which the operations that allow for joins in this model are built. He was, allegedly, not the biggest fan of people standardizing on tables, rows and columns rather than the mathematical terms he described.
In 1974, IBM was also developing a prototype based on Codd’s paper. System R is the first implementation of the SQL we know and love today, the first proof that the relational model could provide solid performance and the algorithmic influence for many systems that came later. IBM fussed around with this until 1979, before realizing it needed a production version, which eventually became Db2. IBM’s papers about System R were the basis for Larry Ellison’s Oracle Database, which ultimately beat IBM to the market with the public release of Oracle v2 in 1979.
Also, in 1979 came the public release of INGRES, built by Eugene Wong and Michael Stonebraker and based on Codd’s ideas. It originally used a query language called QUEL, but it eventually moved to SQL once it became clear that was where standards were headed. While INGRES itself did not stick around long term, the lessons learned from it did: Nearly 20 years later, Michael Stonebraker released a database we all know and love: PostgreSQL.
Up until this point, the evolution of databases had largely been driven by the changing needs of enterprises and enterprise hardware. Computers didn’t yet exist in a form factor and price point that made them accessible to regular people, whether at home or at the office. We needed another leap, another shift in the way people fundamentally use and think about computers, to see databases evolve once again. To find out more, stay tuned for Part 2 in this series.