The 1995 SQL Reunion: People, Projects, and Politics

Prehistory

Mike Blasgen: So now we have a discussion about how it all began and how it proceeded. I have a timeline - some of you have seen it because I sent out one version of it - which acts to make me remember how to prompt people and also help me remember stuff that I remember myself. So I will do this. The earliest I remember is I was at [The University of California at] Berkeley and I remember a sign on the wall somewhere in the 2nd or 4th floor [of Cory Hall] saying that there were some interesting things going on in San Jose. I was still a student, so this would have been in 1968, roughly. So already San Jose was doing work in database. I don't think it was called that, then. It was called data management or file systems, or - I don't remember what it was called. But it had to do with work that Mike Senko was leading. And of course the research laboratory itself was always associated with data because the original development of the disk drive occurred there in the early fifties. So already by the late sixties there was a focus on software for the management of data. And I'm not familiar with that at all, nor was I involved in any of the work prior to the Phase Zero prototype of SEQUEL. But there was much work that went on in the company.

Irv, what led to Codd's paper[4], which was published in 1970?

Irv Traiger: I honestly don't know. There were two departments back then, the Systems Department under Jim Eaton and later Glenn Bacon, and another one - I think it was called Information Systems or something like that - under Senko, and they were very different worlds. People might play Ping-Pong together at lunch - there was a lot of Ping-Pong then - but essentially no technical interaction. You'd hear about things over there. In fact at one point there was a big project called DIAM[5]^{, 6} with a very complex structure, a complex query language. And we knew that this man was over there named Ted Codd and that there were some disagreements, but I really don't know what led to what. At one point, Ted Codd suddenly showed up in the Systems Department and after some delay he built up a small group of people - it was actually three people originally: Dines Bjørner, Ken Deckert, and me. We began to work on a project called GAMMA-0, and I brought the GAMMA-0 paper[7] with me.

Mike Blasgen: Oh, really? Is it on the artifact table?

Irv Traiger: Not yet; it will be there. GAMMA-0 was meant to be the lowest-level thing that anybody would get value from, and even then there was the notion of supporting multiple things on top, which would happen again in System R and in Eagle, the big project at Santa Teresa. Nevertheless, what kicked off this work was a key paper by Ted Codd - was it published in 1970 in CACM?

Mike Blasgen: Yes.

Irv Traiger: A couple of us from the Systems Department had tried to read it - couldn't make heads nor tails out of it. [laughter] At least back then, it seemed like a very badly written paper: some industrial motivation, and then right into the math. [laughter]

Bob Yost: I went over there with several other people - I was in the Advanced Systems Development Division - I remember going over there in about 1970 to see this because we were working with the IMS[8] guys at the time. We couldn't believe it; we thought it's going to take at least ten years before there's going to be anything. And it was ten years. [laughter]

Irv Traiger: So we had this 1970 paper; there were a couple of other papers that Ted had written after that; one on a language called DSL/Alpha[9], which was based on the predicate calculus. Glenn Bacon, who had the Systems Department, used to wonder how Ted could justify that everybody would be able to write this language that was based on mathematical predicate calculus, with universal quantifiers and existential quantifiers and variables and really, really hairy stuff.

Somehow, again, I don't know how, there grew up around IBM a bunch of pockets of activity. There was a project in the Peterlee Science Center in England of all places. Peterlee was a manufactured town. The English government was trying to seed industry and business in different parts of the UK and they invented Peterlee and IBM said, "Sure, we'll put a lab there." There was a person - was it Terry Borden? - Terry Rogers who was heading up this project based on the relational algebra - a very weird language that occasionally gets used nowadays as an intermediate layer in a system. There was a project in Hursley (kind of interesting how much activity in England) called the Hursley Prototype - was that Peter King?

Raymond Lorie: Peter Tilman.

Irv Traiger: OK, Tilman. There was a project at the Cambridge, Massachusetts, Scientific Center. Raymond Lorie, Andrew Symonds, and others, were doing that[10]. And there was a predecessor project[11] that had been done at MIT Lincoln Laboratory by Paul Rovner (who went to school with Mike and Jim Gray and Mario [Schkolnick] and me at Berkeley) and Jerry Feldman, who later became a Stanford professor and is now the head of ICSI[12] at Berkeley. So there were these pockets, and so Ted Codd wanted to establish his own pocket, and that turned into this GAMMA-0 project.

At one point Codd decided to set up a symposium at Yorktown - you know, the seat of power in the Research Division - and it was to basically have a scan of all the activity across IBM related to his relational ideas. We went through that, with the various labs being represented, and a bunch of others, and somehow or other a few months later this project happened. It was to be in San Jose; it was to have an infusion of people from Yorktown; and we didn't know what that would be like, but it wasn't a problem. People like Frank King and Don Chamberlin and Ray Boyce were certainly aware of the fact that they were the incoming horde, but they were very sensitive about it and they tried very, very hard to involve the San Jose people. Mike Senko and his department were merged into the Systems Department, which was renamed Computer Science, under Leonard Liu. Glenn Bacon went off to SSD, or what's now called SSD[13]. Mike Senko went back east, stayed in IBM, and died not too long after that, I think in Europe on a business trip. Frank King kept us kind of in task force mode for quite a few months, trying all kinds of crazy management schemes, like mentors, and inner circles, and teams. Out of that grew System R. That's kind of the long story. I don't want to steal the whole stage here. That's kind of the vague memory of how it all began.

Mike Blasgen: That's great. So actually you mentioned a lot of the points in my list here: I have Mike Senko, the Ted Codd paper, PRTV[14], Cambridge, ... So now, how did the Codd-Bachman thing come about? How did that fight come about? Is that related to DBTG?

Irv Traiger: Yeah, there was this standard going on. It was organized by the Database Task Group and it was called CODASYL[15]: Common Data something - Systems Language - how does that sound? It's kind of deja vu because you hear today about how important it is to follow standards, and if we had done it back then none of this stuff would have happened because DBTG was richer than IMS[16]; it was a network, which certainly includes a hierarchy; and for that matter, if you wanted flat files, you basically had that in DBTG. You could just omit the named relationships. What's the big deal, right? You want a good language, we'll give you a language. The technical community, which was kind of small then for database, had its own SIG and I don't remember what it was called. SIGMOD was new.

Raymond Lorie: SIGFIDET.

Irv Traiger: SIGFIDET. SIGMOD was the kind of grass roots, revolutionary, not taken seriously bunch and SIGFIDET and CODASYL just sort of ran the whole game, and Bachman was Mr. CODASYL[17]. On several occasions, and I don't remember them all, maybe one at an early SIGMOD conference, these people would go at each other, I mean just hurling thunderbolts, about better and worse, complicated and simple, and mathematical foundations, and who cares.

Mike Blasgen: One of those debates was published and widely circulated[18].

C. Mohan: NCC panel, I think. National Computer Conference.

Don Chamberlin: There was one at the SIGFIDET conference in Ann Arbor, Michigan in 1974.

Franco Putzolu: I think for a while people who eventually worked on System R worked on design techniques for DBTG databases. Also there was a project I remember in Yorktown in 1972-73 on how to design DBTG databases.

Don Chamberlin: I was working on that. I was recruited by Leonard Liu in Yorktown in 1971 to work on an operating system project called System A. Leonard Liu was a first-level manager in those days and I worked for Leonard for a year or so, until the System A project broke up in 1972. It seemed like every time there was an upheaval, Leonard got promoted and that was what happened in 1972. [laughter] Leonard got promoted to be a second-level manager and I started working for Frank King. We were in kind of a state of chaos in Yorktown in 1972 because our operating system project had broken up and we didn't have anything to do. Leonard was pretty astute politically and he thought that database was an important field to get into, so he kind of organized us into study group mode to try and figure out what needed to be done in databases. I got a particular job in this. I thought it was a plum of a job. My job was to study this CODASYL DBTG proposal and learn about it and give presentations on it and figure out what needed to be done to it and things like that. So I became an expert on DBTG and I just loved it and thought it was neat. It had all sorts of real complicated pointers and set-oriented selection rules and you could just study it all day. It was a real puzzle. I was kind of a programmer type; I really grooved on that and gave a lot of talks on it and things like that. I was the CODASYL expert in our group; other people studied other things: CICS[19] and IMS and different things like that.

We knew sort of peripherally that there was some work going on in the provinces, in San Jose. There was this guy Ted Codd who had some kind of strange mathematical notation, but nobody took it very seriously. Ray Boyce was hired at about this time, and we kind of got into this game called the Query Game where we were thinking of ways to express complicated queries. But actually before the Query Game started, I had a conversion experience, and I still remember this. Ted Codd came to visit Yorktown, I think it might have been at this symposium that Irv alluded to. He gave a seminar and a lot of us went to listen to him. This was as I say a revelation for me because Codd had a bunch of queries that were fairly complicated queries and since I'd been studying CODASYL, I could imagine how those queries would have been represented in CODASYL by programs that were five pages long that would navigate through this labyrinth of pointers and stuff. Codd would sort of write them down as one-liners. These would be queries like, "Find the employees who earn more than their managers." [laughter] He just whacked them out and you could sort of read them, and they weren't complicated at all, and I said, "Wow." This was kind of a conversion experience for me, that I understood what the relational thing was about after that.

Ray Boyce had just been hired at that time, and we organized between the two of us this game that we called the Query Game , where we'd think of different questions that needed to be expressed and we'd try to find out syntax to express them in. These are some original foils from back in those days that we put together to try and convince people of things. We called the notation SQUARE ; it stands for Specifying Queries as Relational Expressions. We had this idea, that Codd had developed two languages, called the relational algebra and the relational calculus. In the relational algebra, the basic objects were tables, and you combined these tables with operations like joins and projections and things like that. The relational calculus was a kind of a strange mathematical notation with a lot of quantifiers in it. We thought that what we needed was a language that was different from either one of those, in which the basic objects that you worked on were sets of values, and the things you did to those sets of values were you mapped one set of values into another using some kind of a table. So we had the usual database of sales and departments and items being located on different floors and we would take a value like two and map it through this notation into the departments that were on that floor, and then we'd map it again into the items that were sold by those departments. We would try to show that this mapping notation was simpler than some of the complex ways that you'd have to express this query in relational calculus, or of course far worse, using something like CODASYL.

So that was where this idea called SQUARE came from, and that was what Ray and I were working on when we transferred to San Jose in 1973, along with Leonard and Frank and Vera Watson and Robin Williams, who all came to San Jose at the same time. Jim Gray had come out the year earlier because he liked it on the west coast. Franco and Mike followed, I believe, in the following year, in 1974. So that was what was happening in Yorktown during the same period of time that Irv was working with Ted Codd at San Jose.

Mike Blasgen: That's great; I'm learning all kinds of things I didn't know.

Something that Irv mentioned was that there was a number of us who had an association with the University of California at Berkeley, and it is an amazingly large number. You wouldn't guess it - well, maybe it's because of geography. It's Irv, and Bruce [Lindsay], and Paul [McJones], and me, and Mario [Schkolnick], and Bob Selinger later, Bob Yost, and of course Jim Gray, who's actually a McKay fellow at the University of California at Berkeley right as we speak, is that right?

Jim Gray: As we speak, until midnight. [laughter]

Mike Blasgen: May 31 is his last day.

In case anyone is interested, here is the 1968 General Catalog for the University of California at Berkeley. That happened to be the year I taught at Berkeley. My name's not in here. Butler Lampson's name is in here, as teaching a course in operating systems.

Bruce Lindsay: I took that course.

Mario Schkolnick: I have heard rumors that you could flunk this course just by having grammatical typos in your reports. I was very sensitive to this, having just arrived from Chile to study at Berkeley.

Franco Putzolu: Do you know when INGRES started?

Mike Blasgen: I actually have that here, but I don't know the answer: about the same time. I went to Berkeley at the beginning of 1975. Gene Wong was my advisor when I was at Berkeley, Wong was one of the developers. Wong had a particular optimization procedure that he was advocating, and INGRES implemented it. Stonebraker had developed QUEL. So QUEL was mapped to this trick which I don't actually remember and which is not the fundamental contribution that INGRES made to the world.

Irv Traiger: It was to optimize based on how the query was doing dynamically, right?

Mike Blasgen: Well, it was a specific technique ...

Raymond Lorie: Single-variable query.

Mike Blasgen: That's right, it was a single-variable trick. I went to see that in 1975 and it was running. You could type QUEL into a UFI-like thing. They supported only query - there was no possibility of update. I guess you could have multiusers given that it was a timesharing system. It ran on a PDP-11/45.

Jim Gray: In about 1972 Stonebraker got a grant to do a geo-query database system. It was going to be used for studies of urban planning. The project did do some geographic database stuff, but fairly quickly it gravitated to building a relational database system. The result was the INGRES system[20]. INGRES started in about 1972 and a whole series of things spun off from that: Ingres[21], Britton-Lee, and Sybase.

Hostility developed between the San Jose IBM group and the Berkeley group because they were working on very, very similar things and had very, very similar ideas. Almost everybody was young and insecure (untenured), so there was a lot of concern about the priority of publishing. As a consequence we came to the conclusion that the best thing was not to talk to each other. Every time we talked, papers would appear that reflected the conversations without attribution. Occasionally people would go back and forth; Randy Katz was in both camps. We occasionally had summer students come to IBM and occasionally we would all give talks but always very carefully. In the chron file there are letters from Stonebraker saying, "Thanks for pointing out that in paragraph so-and-so of paper such-and-such we forget to cite ???". Of course this was not one-sided. The Berkeley folks thought the IBM guys were ripping off ideas from the INGRES project. We had a strained relationship[22].

Mike Blasgen: I actually personally have fairly fond memories of the relationship. But I know that lots of others like Frank and many others have bad feelings about it because apparently ideas were being taken from us and used by them without any credit.

Jim Gray: And conversely.

Franco Putzolu: Vice versa.

Mike Blasgen: OK, and vice versa. But I always heard the accusation the other way. [laughter]

But I personally had only good interactions with - well Gene Wong was my research advisor and was one of the key players in this thing. John Paul Jacob organized an event at the Catholic University in Rio in 1975 I would guess, the summer of 1975: it might have been the summer of 1976. Sharon and I went down to Rio, which was a really nice trip, we stopped in other places in South America. At that thing was Mike Stonebraker staying there for a month, Dennis Tsichritzis and his wife from the University of Toronto, Sharon and I, and others. I don't remember who else from IBM was there; was anybody in this room there? Jim wasn't there. I was in Rio for maybe two weeks: one week by myself giving lectures at this conference they had, and one week with Sharon just fooling around and giving more lectures. We were kind of stuck there, the five of us: Dennis and his wife, Sharon and me, and Mike Stonebraker (who was single). And so we palled around together. And so I got to be like a friend of Mike's because I was stuck in this place far away where you had nothing to do except go drink, which we did a lot of. So I got very close personally with Mike; Mike has always treated me, I always thought, very nicely. 'Course I don't know: maybe he talks behind my back.

Jim Gray: The good news was you worked on B-trees; they didn't do B-trees. [laughter] I worked on locks and they didn't do locks, so I was also OK.

[4] E.F. Codd. "A Relational Model of Data for Large Shared Data Banks" CACM 13, 6 (June 1970) pages 377-387.

[5] M.M. Astrahan, E.B. Altman, P.L. Fehder, and M.E. Senko. "Concepts of a Data Independent Access Model" 1972 ACM SIGFIDET Workshop Report, pages 349-362.

⁶ E.B. Altman, M.M. Astrahan, P.L. Fehder. and M.E. Senko. "Specifications in a Data Independent Access Model" 1972 ACM SIGFIDET Workshop Report, pages 363-376.

[7] D. Bjørner, E.F. Codd, K.L. Deckert, and I.L. Traiger. The GAMMA-0 n-ary Relational Data Base Interface: Specification of Objects and Operations. IBM Research Report RJ1200. San Jose, California (April 1973).

[8] IMS stands for Information Management System, IBM's first database management system.

[9] E.F. Codd. A database sublanguage founded on the relational calculus. Proc. ACM SIGFIDET Workshop on Data Description, Access, and Control, San Diego, California (November 1971) pages 35-68.

[10] The RM (Relational Memory) system supported binary relations; see:

A.J. Symonds and R.A. Lorie. "A schema for describing a relational data base" Proc. ACM SIGFIDET Workshop on Data Description, Access, and Control, (November 1970) pages 201-229.

R.A. Lorie and A.J. Symonds. "A Relational Access Method for Interactive Applications." Courant Computer Science Symposia, Vol. 6: Data Base Systems. Prentice-Hall, Englewood Cliffs, New Jersey (1971).

The successor XRM (Extended Relational Memory) system supported n-ary relations; see:

R.A. Lorie. XRM--An Extended (N-ary) Relational Memory. IBM Technical Report G320-2096. Cambridge Scientific Center, Cambridge, Mass. (January 1974).

[11] J.A. Feldman and P.D. Rovner. "An Algol-Based Associative Language" CACM 12, 8 (August 1969) pages 439-449.

[12] International Computer Science Institute.

[13] SSD stands for Storage Systems Division.

[14] PRTV stands for Peterlee Relational Test Vehicle. See:

Stephen Todd. "PRTV, an efficient implementation for large relational data bases" Proc. VLDB, Florence, Italy (1975), pages 554-556.

[15] Actually, CODASYL stands for Conference on Data Systems Languages, which was formed in 1959 to design the business data processing language COBOL. CODASYL's Data Base Task Group defined what has become known as the DBTG database model:

CODASYL Data Base Task Group. Report of the CODASYL Data Base Task Group. ACM (April 1971).

R.W. Taylor and R.L Frank. "CODASYL Data-Base Management Systems" ACM Computing Surveys 8, 1 (March 1976) pages 67-103.

[16] IMS is hierarchical.

[17] C. Bachman. "The programmer as navigator" (Turing Award lecture) CACM 16, 11 (November 1973) pages 653-658.

[18] "Data Models: Data Structure Set versus Relational" Supplement to Proc. ACM SIGMOD Workshop on Data Description, Access and Control, Ann Arbor, Michigan (May 1974).

[19] CICS stands for Customer Information Control System, IBM's TP monitor, or framework for writing online transaction-processing applications.

[20] M. Stonebraker, E. Wong, P. Kreps, and G. Held. "The Design and Implementation of INGRES" ACM TODS 1, 3 (September 1976) pages 189-222.

[21] The company was first called Relational Technology Inc., and was then renamed Ingres Corporation. ASK bought Ingres, and was itself bought by Computer Associates International, Inc.

[22] The 1988 ACM Software System Award was shared by System R (Donald Chamberlin, James Gray, Raymond Lorie, Gianfranco Putzolu, Patricia Selinger and Irving Traiger) and INGRES (Gerald Held, Michael Stonebraker and Eugene Wong).