CAL SNOBOL archive

CAL SNOBOL was created by Charles Simonyi and Paul McJones at the Computer Center of the University of California, Berkeley between 1968 and 1971. This archive was created by Paul McJones in July 2021 and last updated 18 February 2024.

Contents

Introduction

In early 1968 the U.C. Berkeley Computer Center operated a Control Data Corporation 6400 computer. Freshman Charles Simonyi approached Chief Programmer Gene Albright to ask about part-time work. Albright proposed that Simonyi add an "include" directive to the CDC ALGOL compiler. Taking up the task, Simonyi was surprised to find that the internals of the compiler were a direct copy of the GIER ALGOL compiler that he had studied at Regnecentralen in Denmark. With the successful completion of this project, Albright proposed that Simonyi implement the new language SNOBOL4 from Bell Telephone Laboratories. While an implementation of the Bell Labs portable SNOBOL4 was available for the 6400, it was not efficient enough in space or execution time to run the student jobs. Simonyi studied the available documentation [Griswold et al. 1967] and agreed to take on the job. Around this time, another freshman, Paul McJones, who had been working at the Computer Center since the previous fall, became interested in SNOBOL3. His supervisor, Dave Hussey, suggested he speak to Simonyi; Simonyi welcomed McJones into the project.

Original implementation, 1968 – 1969

Simonyi began by writing an ALGOL program to work out the details of the SNOBOL4 pattern matching algorithm. Then he began a design for the new system, which was to be written in the COMPASS assembly language of the 6400. The design consisted of a compiler that translated the SNOBOL source code into "micro instructions" for a stack-oriented virtual machine. The compiler had several passes, and was patterned after the GIER ALGOL compiler. The runtime was based on value descriptors each occupying one 60-bit machine word and containing a type code and one or more pointers or other data fields.

This was the original design document:

A prospectus written at that time gave an overview of the planned system:

Simonyi implemented the compiler syntactic and semantics phases, interpreter, heap storage, pattern matching, and many built-in functions. McJones implemented the compiler lexical analyzer, the operating system interface and I/O buffering, and a few of the built-in functions. Without classes to take that summer, they made good progress, running the first complete job before the fall quarter began:

As the time to release the system approached, a short document was written. It described CAL SNOBOL mostly in terms of differences from the Preliminary Report on the SNOBOL4 Programming Language from Bell Labs:

By around January 1969 the system was stable enough that classes began using it [citation?]. To announce availability of the system to other CDC sites, McJones wrote a letter that appeared on pages 8-10 of Issue 6 (June 1969) of the SNOBOL Bulletin [Bulletin 6].

A short document was written to accompany distributed copies of the system:

A longer document was prepared by Marianne Bentley, a Computer Center technical writer; this is an early snapshot:

To announce availability of a manual (presumably the one above), an update appeared on pages 40-41 of Issue 8 (April 1970) of the SNOBOL Bulletin [Bulletin 8]. William Waite, the editor, noted:

"We have been evaluating CAL SNOBOL4 here at the University of Colorado, and it appears to live up to its authors' claims of speed and compact size. (They say it is 5 times as fast as the IDA release and 1/7th as large.) Our major difficulty in this evaluation has been the fact that it would only run one of our existing programs! All of the others, written by diverse users, fell foul of the restriction on unevaluated expressions. This seems the most serious limitation of the system, and should be corrected if possible. We have found that unevaluated expressions can be avoided if the programmer conciously desires to do so, but this leads to awkward constructions in many cases."

Issue 9 (December 1970) of the SNOBOL Bulletin [Bulletin 9] mentioned this:

"CAL SNOBOL, an alternate version of SNOBOL4 for the CDC 6000 series, has been discussed in earlier versions of this Bulletin. Comparisons of CAL SNOBOL and SNOBOL4, version 3. 4 have been carried out at Colorado by Mr. Coleman. They indicate that since version 3.4 is just slightly faster (see below) and smaller than Version 2.0, it is still worthwhile to live with the restrictions imposed by CAL SNOBOL for production-type programs. This is especially true for I/O bound programs. While CAL SNOBOL I/O is a bit slower than assembly language programs, it is far faster than FORTRAN or FORTRAN-dependent Version 3.4 routines ."

Neither of the authors saved copies of the machine-readable source code or an assembly listing from this era, but a copy of the source code was discovered in 2007 in the SNOBOL archive created by Ralph Griswold at the University of Arizona:

In 2007, with the help of the ControlFreaks group, McJones was able to run a job under the Desktop CYBER emulator to assemble and run this version; here is the listing:

Several lists of bugs and fixed bugs were maintained, although they did not include dates so it's difficult to correlate them with the known development history:

The final mention of CAL SNOBOL in the SNOBOL Bulletin was in Issue 10 (June 1971) [Bulletin 10]. After a discussion about the bad performance of Fortran I/O used in standard SNOBOL (but not CAL SNOBOL), editor Waite continued:

"We should now attempt to spotlight other problem areas. The existence of high-speed versions like CAL SNOBOL might lead us to suspect that the distributed macros are too dependent upon the structure of System/360. Some attempt should be made to either substantiate or refute this suspicion by making comprehensive measurements on implementations for other machines. It does little good to theorize - hard facts are necessary."

Concurrent with CAL SNOBOL, the Computer Center was carrying out a much larger project: the development of the CAL Timesharing system for a second CDC 6400. CAL TSS could emulate CDC's SCOPE operating system, and timesharing users found it convenient to interactively edit their SNOBOL programs and then execute them. One or two additional standard procedures were added to CAL SNOBOL to make interactive input/output more convenient.

Revised version, 1971 – 1972

By 1971 there was pent-up demand for improvements to the system. Simonyi wrote a proposal including improved storage management, compiler listings, program tracing, "introspection" of run-time values, and also internal documentation of all the data structures:

Over the spring and summer of 1971 this work was carried out. This memo documents the changes:

There was an extended testing period, with the new "experimental" available under a different command name. In early January 1972 the new version was released:

Simonyi preserved a listing from this time. McJones didn't preserve a listing, but in 1980 was able to obtain a listing from a friend still at the Computer Center:

In 2012, McJones discovered a copy of the revised source code via the ControlFreaks organization. It seems to correspond fairly closely to the listing:

The internals documentation provided by Simonyi was typeset on a line printer by a Computer Center staff person. A June 1972 version was cited in [Griswold and Griswold, 1989]:

Although Simonyi and McJones never produced a stand-alone reference manual for their non-standard implementation of SNOBOL4, they were the recipients of good fortune in the form of a book about using SNOBOL4 to analyze texts in the humanities. The authors, assistant professor Laura Gould and graduate student Robert Gaskins, Jr., based their book on CAL SNOBOL, and took care to describe all the differences between it and the Bell Labs standard. They noted, "Mr. McJones has reviewed our work as it has progressed, and has made many helpful suggestions":

CAL SNOBOL escapes Berkeley

McJones graduated in December 1971, leaving the Computer Center for a project elsewhere on the Berkeley campus. Simonyi focussed on other employment (Berkeley Computer Corporation) and graduated a year later; neither worked on CAL SNOBOL again. But the system stayed in use at Berkeley until the CDC 6400 was retired in August 1982. And copies and derivatives of the system continued to run at other Control Data sites around the country. Willie Sue Haugeland (now Orr) maintained the system at Berkeley for several years and handled distribution to other sites.

Andrew Mickel at the University of Minnesota maintained a version that was ported to the CDC Kronos operating system (see Appendix 4).

Here are two documents from this era:

Appendix 1: Other SNOBOL implementations at Berkeley

SNOBOL3

SNOBOL3 for the IBM 7090 from Bell Labs was the original implementation. Berkeley had the correct hardware, but it's not known if SNOBOL3 was ever run there.

Around 1965 Butler Lampson implemented a version of SNOBOL3 for the Project Genie Timesharing System on the SDS 940.

SNOBOL4

The Bell Labs implementation of the new language SNOBOL4 used a set of macros designed for portability. Their original version ran on the IBM 360. It was quickly ported to the CDC 6000 series and became available at Berkeley in the spring of 1968.

As mentioned in the Introduction, the IDA implementation required too much memory, compile time, and execution time to be used for teaching students, thus leading to the CAL SNOBOL project.

Around the same time that Simonyi and McJones began work on CAL SNOBOL, Roger Sturgeon and Eric R. Anderson, two students under the guidance of Butler Lampson, began a SNOBOL4 implementation for the SDS 940.

Lampson notes:

"Snobol (1965-69): I designed two Snobol systems for the SDS 940, one for Snobol 3 (which I implemented), and the other for Snobol 4 without user data types (implemented by two students). Both were interactive and received considerable use at various 940 installations. Snobol 4 had a large (slow) workspace and did incremental compilation at the statement level."

SNOBOL4 for the SDS 940 included a set of functions for controlling a pseudo-teletype. This was based on the Conditional Command Processor Language designed by Charles Grant:

Around 1969, Professor Ward Douglas Maurer of the Electrical Engineering and Computer Science Department began working with graduate student Paul Santos on an approach for compiling SNOBOL4 to native assembly language, with an appropriate run-time library. This led to FASBOL, the subject of Santos's PhD thesis. The inital FASBOL ran on a Univac 1108. Santos designed a follow-on, FASBOL II, for the DEC PDP-10. Subsequently a masters student, Richard Strauss, designed a version of FASBOL for the CDC 6000 series.

Appendix 2: SNOBOL4 documents from Ralph Griswold

Appendix 3: SNOBOL Bulletin

TODO: Add remaining SNOBOL Bulletins?

Appendix 4: University of Minnesota publications

Copies of the following documents were a gift from Andrew Mickel.

Appendix 5: Other SNOBOL implementations and resources