The interview date is unknown
Aho: I have a question that I have - maybe you've already uncovered the answer to this - and that is... How did it get started? Do you have, at this point, a feeling of, "What was it that sparked the idea, in Ken's mind?"
MSM: I have a general idea what sparked Ken's mind. I want to know more about it. In order to know more about it, I have to know more about the circumstances surrounding Multics. My sense of it is that Ken liked working on CTSS, liked working on Multics. He liked the machine on which he could continue to do that. And the question was how does one go about getting hold of that machine and developing that system.
Aho: I had joined the Labs in, I guess, early 1967, and Thompson and Richie had their office right across the hall from mine. Yeah, there was another graduate student I had met at Princeton in the registration line, a person by the name of Jeff Alman, and we got to know one another quite well at Princeton. And then, upon graduation, he came to Bell Labs; I came to Bell Labs. We were working on various aspects of formal language theory, the automata theory - theoretical computer science at that time. I used to go up into the attic, into the sixth floor, where there was this machine that Thompson, and then shortly a few others, were working on, and I was interested in writing papers. So, very quickly I discovered that this was a much nicer way to write papers than the old traditional way that we used to have of giving handwritten notes to the secretary.
MSM: Let me back you up a bit. You got your Ph.D., you finished your dissertation in '67?
MSM: On indexed grammars?
MSM: And who's your dissertation advisor at Princeton?
Aho: That was interesting; I was an undergraduate at the University of Toronto and I had done my senior thesis in the area of minimization of Boolean functions. And I did a lot of reading of research papers, and I noticed there was a person by the name of McClusky, whose name was on a number of these papers. And then when it became time to think of what graduate school to go to... I didn't know where I really wanted to go, so I thought that MIT might be a safe place to apply. And then, I saw this person at Princeton. So I said, well, I might try applying to Princeton. No, I had sort of tentatively accepted MIT. Except McClusky kept sending me these very warm personal letters, saying, "Wouldn't you like to come to Princeton and work on a small campus. We can do great things together." And all I ever got from MIT were form letters. So after a while I said, "This is crazy! Why do I want to go to an impersonal school when I can get all this personal attention from this professor at Princeton?" Well, shortly after, I arrived at Princeton. Then McClusky said, "I am going off to Stanford." But it was a fair exchange - Stanford sent a young assistant professor, by the name of John Hopcroft, to Princeton, and he inherited McClusky's students. And there were two students that McClusky had from the year I had come. And, fortunately, Hopcroft only had one Ph.D. research problem, so he gave it to this other student. That problem is still open; it hasn't been solved. And he left me to my own devices. So, I spent a lot of time groping around for what would be an interesting thing to write a thesis on; but, actually, that groping around was, I think, a very valuable experience 'cause, once you learn how to find problems, it's a very valuable skill for later in life. After you graduate, nobody gives you problems.
MSM: Now when you went to Princeton's electrical engineering...computer science...was it called it then or was it still...
Aho: It was just called electrical engineering. But there was...there were several options that you had in the department - one of them was the computer science option. You could also...there was a solid-state electronics option, a communications theory option...that...those were the three main things that students took. You had to specialize in one of these areas. And then, at that time, I guess in the - I don't know if they still do it in the department - for the general examinations, you had to take one major area and two minor areas. An option for one of the minor areas was mathematics. So, a lot of students took computer science as a major, and communications theory and mathematics as a minor.
MSM: Is that what you did?
MSM: What did you go to get from the mathematicians?
Aho: My graduate education at Princeton consisted of a great number of forces from the math department. The electrical engineering department didn't have very many computer courses at that time. So...
MSM: I was going to say, "How did you know what computer science was? How was computer science then?"
Aho: There was hardly any. Probably the most influential thing that took place was, one summer Jeff Alman went and worked for Seymour Ginsberg out in California, and Ginsberg was very active in formal language theory at the time. So when Alman came back, he essentially taught Hopcroft, and me, formal language theory. And that's how I slowly started getting interested in this area of language theory. And at that time, the Chomsky hierarchy of grammars was very popular. People had the type zero to type three grammars. And, I had some interest in the compilers at that time, and I noticed that with the context-free grammars, the type two grammars, the Chomsky hierarchy, you couldn't express certain constructs of programming languages. So I said: What you really need to add to a grammar to get constructs like, "You have a string, and then you would have a repetition of that string, so you can, say, declare integer string, and then you can use string as an integer variable in some expression." That repetition construct is sort of one of the classical constructs that people used to say that languages of that form cannot be generated by a context-free grammar. So the idea of index grammars came out of, "What kind of mechanism would you like to have, or need, to be able to express constructs of this?" So, one way of viewing index grammars is that they are context- free grammars with certain symbol table information attached to the non-terminals of the grammar, and then it's very easy to generate constructs of the form "wnw" (?). What was kind of interesting was that, in this work, I discovered that...and I didn't discover it at the time I was working on my thesis but shortly thereafter...that indexed languages - those languages generated by index grammars - were the first in an infinite hierarchy of languages that exist between the context-free and the context-sensitive. You start off with the regular sets, you have the context-free languages, you have the index languages, and then, the other languages in this hierarchy are obtained by essentially putting a "duplicate" operator on memory that one can view, just as with context-free languages. There's an equivalent way of defining the same class of languages with push-down automata, so that any language that's recognizable by a push-down automaton can be described by a context-free grammar, and vice versa. There is this duality, likewise with index grammars, that any language that is generated by an index grammar, you can define an automaton, called a nested stack automaton, which is capable of recognizing that language, and vice versa. This is sort of one of the key results of the thesis. One way of viewing a nested stack automaton is that it is a push-down automaton with a duplicate operator that you can duplicate the stack, and then start working on the duplicated stack, and then you can duplicate the duplicated stack, and so on.
MSM: In the beginning it is almost a power set of push-down automata.
Aho: Well, you can get a stack of stacks, basically. And the nested stack automaton has sort of an efficient way of implementing the stack of stacks, and you can think of it as sort of almost like a cactus. That's why some people are calling it cactus automata, at the time. But then, going up to the next level of generalization had these nested stack languages. You can then duplicate a stack of stacks in one operation, so you can have stacks of stacks, and now you can see what the generalization is going up in this hierarchy. And the interesting part of it was that these...all the languages generated by these mechanisms had the same kind of decidability and closure properties. And, a little later, the Europeans... The work in Europe in particular, in language theory, went to biologically motivated examples of languages people were looking up - what phenomena could account for the striations on snail shells. And, they looked at grammatical mechanisms for being able to generate patterns of this form. These grammatical mechanisms were called Lindenmayers systems, and it turned out that there was a close relationship between certain kinds of Lindenmayers systems and indexed grammars. So, for a while, this was very popular as a certain sub-field of language theory. So, it was...very interesting...part of my life, a very interesting experience. And in fact there are still people who write papers once in a while on various aspects of index grammars and languages. And, this family of languages was shown to be an example of what was called the "Super Apple" (?). Some work that Ginsberg and Greibach had done, subsequently looking at automata theory for various classes of automata in a very general setting, a very algebraic setting. It's a very elegant, beautiful theory, and it was a nice way to get started on scientific career.
MSM: Okay! Well that leads...obviously to the next question - which is what was the appeal of Bell Labs to someone who was...a work of such a decidedly theoretical...
Aho: Well, I mean Bell Labs had a great reputation. And, also, McClusky had consulted for Bell Laboratories, and he had brought some of his graduate students on tour to Bell Labs. So we saw some of the work that was going on. Even when I was at Toronto, I took engineering physics as an undergraduate, and in our senior year we had a field trip, that we would go and visit various American research institutions. So, I went on this - in fact I organized - this American research visit in my senior year. We visited, amongst other places, Bell Laboratories. We also went to Brookhaven and GE, just to see what an industrial research lab was like. And, certainly, if one looked at the scientific literature, there were a number of papers--key papers--that had been co-authored by Bell Labs scientists. So it had a good reputation as a place to go and do research. The other attractive part of it was that Jeff Alman had just joined Bell Labs, too, and John Hopcroft had a close connection. He was working here for the summer, at that time, so that wasn't that much different from going from Princeton to Bell Labs in terms of the people that I knew. I had also...when I came interviewing...I was very impressed with Doug McIlroy, who was my boss for many years. And he was just a peach of a person to work for. He understood everything that I was working on, with just the most fragmentary descriptions. And...he had...he essentially taught me to write too, I think he's one of the finest technical writers that I know of. He has a flair for language, and a flair for economy of expression that is remarkable.
MSM: So you came in '67... At that time, if I understand correctly, Doug and Ken and Dennis and Joe Osanna were working on the Multics project.
MSM: And what was...what you people you...if I understand the way things work here, you were told to look around and find something to work on...
MSM: How'd you go about that process?
Aho: What Doug McIlroy did in terms of introducing me to the people in this department...he just said, "Oh, here is Jeff Alman sitting across the hall from you." And, Jeff and I worked very closely together for the first few years that I was at Bell Laboratories. We were writing four or five papers a year on various aspects of language theory and automata theory.
MSM: So essentially laying down the content of the books...
Aho: What we were doing was trying to develop a theory that would account for certain kinds of computational behavior. We were exploring what kind of power various kinds of automata-representative devices had. We were also very interested in algorithms, and studying algorithms from the point of view of, "How fast can you make them. What are upper and lower bounds on the speed with which certain problems could be solved using various models of computation." And, more interestingly, we were also trying to see if there were some general classes of techniques that could be used to solve problems. And, there were sort of two prongs to the work initially - there was the work in formal language theory - and we had a certain view that a number of these techniques could be applied to compiler design. So, in the early '70s, we wrote this two-volume sequence, Theory of Parsing, Translation and Compiling, that attempted to distill the essence of automaton language theory with applications to the process of compilation. And, I think this was one of the interesting demonstrations of putting a theoretical foundation onto an important practical area of computer science, that people in programming languages and compilers are, I think, an indigenous part of computer science. What language and automata theory did was they provided a theoretical foundation that could be used to design algorithms for the translation process. And, more importantly, people could then construct tools to help build components of compilers. A few years later, in the early '70s, I had a very interesting experience with another colleague, who was in the computing science research center at that time. This person had gotten his Ph.D. from Columbia under Eilenberg; he was a category theorist. His name was Steve Johnson. But then he had come to Bell Laboratories to work; he had developed a fascination with computers and computing. One time I remember he came into my office waving a book written by Elgot and Eilenberg, entitled Recursiveness, and what he said to me was, "I understand you know what the theorems in this book mean, but you don't understand the notation." He says, "I understand the notation but I don't know what the theorems mean - why don't we read this book together?" So we very quickly read the book, and discovered that it said well-known things in a well-known notation, namely, theorems of recursive function theory, couched in category theory. But, a more interesting part of that interaction was Steve asked me, "Well, what are you working on?" And I was saying, "Well, there are these things called grammars, and there are these things called automata, parsing methods for various classes of languages." And, he was interested in writing a C compiler at that time. And, he said, "Well, could you make a parser for C for me?" And I said..."using these techniques"...and I said, "Why sure!" And, I foolishly decided to construct the parser by hand, implementing one of these LR parser construction techniques. And I had a huge sheet of paper - big piece of cardboard - on which I carried out the sets of items construction - usually while I was watching television, 'cause it is such a trivial and mind-numbing task. And of course I didn't get it right. And I gave this piece of cardboard to Steve, and Steve would then encode it into his computer program. After a while he become so frustrated with me, that I couldn't get it a hundred percent right, he wrote a program that automated this parser construction technique. And that how the tool yacc was born. And yacc had an interesting experience in those early days that there were the people who understood how to write compilers - people like Dennis and Ken. And, I guess back in the '50s there was a feeling that the real programmer knows where every bit is, and there was some mis-chewing of higher level languages, because we couldn't get the same kind of efficiency. On the other hand, people who were newer to the game, and also people who weren't so concerned with extracting every iota of efficiency out of the machine, gravitated to higher level languages - FORTRAN, in particular. In the late '50s and early '60s, the same kind of experience took place with yacc that people quickly discovered that you could very easily construct a parser for a language by writing down the grammar, and then yacc would translate the grammar into a parser. A few years ago, writing a parser for a language was considered worthy of a Ph.D. thesis. So here came along this device that sort of automatically did what you used to be able to get a Ph.D. thesis for. And there were some people who recognized the potential of constructing special purpose languages for various application areas. And, one of them in particular was Brian Kurnighan. Brian's office was right next to mine. And, he had been interested in computer typesetting. This is a field that Joe Osanna had pioneered. And Joe had written this text formatter which got to be known as troff. There was an ASCII version of it in roff, prior to that. Brian wanted to extend the capabilities of troff to handle other specialized constructs - equations - and, later, various kinds of graphical constructs. But, I kept saying to him, "Well, it is very easy to write a grammar that describes the structure, the syntactic structure of mathematics that appears in documents. So, why not write a yacc generated parser for this?" So then he and Lorinda Cherry developed this troff preprocessor that got to be known as eqn. And this became again one of the standard tools in our typesetting arsenal, when a lot of people are writing mathematical oriented papers around here. And eqn made the typesetting of mathematics, or the printing of mathematics, just as easy as speaking mathematics. Kernighan had this philosophy that language should be like what a professional uses to communicate, the description of those objects over a telephone. So you could write "'a' sub 'i'" - that's a way of describing the mathematical expression "a" with the subscript "i." And also, looking at some of these application areas... In mathematics if you write "'a' sub 'i' squared" or "'a' sub 'i' suit 2," this can be written in three different forms: You could have "a" with the subscript "i" squared. You could have "a" sub "i" raised to the power of "2." And many people who do professional typesetting often like to put the superscript "2" above the subscript "i." And...it was...at that time...Johnson, Alman, and I were looking at techniques for being able to parse ambiguous grammars. And, it turned out that having the ability to use ambiguous grammars allowed one to put in syntactic constructs for special cases. These syntactic constructs for these special cases would make the grammar ambiguous, 'cause there was a general mechanism in the grammar for already specifying it, so you added another production which specified the special case construct. And then we added a parsing action conflict rule to yacc saying that, if you could take the choice of reducing by several productions, then choose the one that you have designated as the special case construct. So you could automatically optimize certain kinds of language processing by first very quickly writing a grammar for the general case, and then as you saw the special cases that needed to be done, you could insert productions into the specification. The semantic rules for processing, and then as a consequence, the human interface improved to a number of the special purpose languages, because there were constructs for dealing with some of these special cases. And you got the kind of output that you wanted. And this was a development that took place in the early '70s. So many people became quite devoted to the various compiler construction tools. There is another colleague of mine, at that time, by the name of Mike Lesk, who had a great deal of interest in special purpose languages, as well as various kinds of information processing and information retrieval mechanisms. In a compiler, one of the first things - tasks that you perform - is you take the sequence of input characters, and you chop them up into words, the lexical units that make sense. So if you write an identifier like "floors," those letters are to be grouped together and treated as a lexical unit. The task, or the part of the compiler that combines sequences of characters into lexical units, is often called a lexical analyzer or scanner. And, since we had developed a parser generator, I had been interested in pattern matching techniques of various kinds, and Mike had come over to me and asked, "Could I use one of your finite state machine approaches to doing lexical analysis?" And I was amazed at how quickly Mike could program any application. So, within a period of a few weeks, he had created this first version of lex, which had taken one of these pattern matching algorithms that I had used in some other UNIX programs, namely egrep, and encapsulated them into a lex program which could be used in a very convenient way with yacc, so that the automation of the front end of the process, the front end of the compiler, the analysis part, was now virtually complete...that you could automatically...you could specify the lexical constructs of a language using the notation of regular expressions - another formalism from formal language theory. You could specify the syntactic structure using a context-free grammar. By the combination of these, then you could design a front end that would map a stream of symbols into the intermediate representation, from which then the code generation work could take place. The work on pattern matching...
MSM: Can I interrupt for a second?
MSM: This all fits together so well, and I recognize what you're doing, 'cause I have used lex and yacc in code generation. I sat in on Ravi Sethi's compiler course, at Princeton, in '83, and have done that little PASCAL exercise at the end of the Dragon Book. So, it is interesting to hear you talking about these tools. And, it all fits together so nicely that it seems almost the obvious way to go. But it wasn't at the time, when you came. It wasn't the way people went about things. And what I hear as you describe it is...you said...you used a phrase earlier, "provide a theoretical foundation for compiler construction." And there are two ways in which one provides theoretical foundations for things. One is to come in and watch the way in which people are doing things, and then say, "Okay let's...let's formalize what you are doing." And another is to come in and watch the way people are doing things, and say, "No, there is another way to do this that will put you on firmer theoretical foundations, and in the end make your job easier." How clear was it that would make the job... There is an attitude of almost meta-programming that seems to lie...be part of UNIX. And, I am beginning to wonder whether that grew up here first or whether this is something that emerged here at Bell Labs, whether it is something that people brought with them and consciously tried to institute, or implement. Am I making...
Aho: Yup. All work occurs in a vacuum. And, as I had mentioned, when I first came, I was quite theoretically oriented, but the work was motivated by various kind of language processing. Chomsky, what he defined, is families of grammars. His initial motivation was to find some kind of syntactic representation for natural languages. Although there is some debate at this point of, "Did he really want to do that?" But...and when I was at Princeton, I had taken a course from the...I believe it was the physiology department...it is tough to know where some of these professors are actually housed...
MSM: And that's the nice thing about Princeton...
Aho: But, we talked about Chomsky's work - transformational grammars. The first two languages - programming languages that I learned - were Snowball 4 and IPL 5. And IPL 5 was taught by this person from...I believe it was the psychology department...
MSM: Who's in it?
Aho: I don't remember the name at the moment. It goes so far back. But, when I came to Bell Labs, the structure of languages became much clearer to me, and one of the things that I had worked on were various formalisms for translation - syntactic directed translation schemes of various kinds. And this was again work that I had done jointly with Jeff Alman. And, by studying some of these devices, you could see the power of certain formalisms for being able to specify a translation. Now if you look at a compiler, one way of abstracting it is that it's just an input-output device - you put in a source program at one end, and what comes out at the other end is a target language program. Even if you open the lid of a compiler and look in at, you see that the compiler consists of a composition of phases. And each phase in itself is a translation of a certain kind. There is a translation that takes place in the lexical analyzer going from the stream of tokens, as they are inputted - as a sequence of characters into a stream of tokens - then the parser takes that stream of tokens, and checks to see that it has a certain syntactic structure, and then produces some kind of intermediate representation as output. And then the subsequent phases take that intermediate representation and map it into object code. So, one of the things that we noticed was that a translation mechanism based on regular expressions or finite state machines could be used for the lexical analysis - modeling what a lexical analyzer did. What a context...what the syntax analyzer did could be modeled by a syntax- directed translation, where you write a grammar, and associate with each production a semantic rule. Well, that was a notion syntactic directed translations were known. But the key insights that we obtained were how to make this process go efficiently. In the mid '50s, Knuth had published a paper that essentially defined a large class of grammars called the LR(k) grammars. What people had done prior to that were defining various kinds of parsing strategies, and people had defined various precedents schemes - operator precedents, wheat precedents, mixed strategy precedents, and so on. And then there were certain kinds of bounded context - kinds of parsing strategies - but Knuth, I think, in a stroke of genius, said, "This is the natural class of grammars that you've all been groping for... And here is its definition. Not only that, but if you restrict yourself to this class of grammars, then here is the procedure for being able to construct a parser from that grammar." Now the problem with Knuth's paper was that the parsers that you got out of it were enormous. And then, work took place in the late '60s and early '70s of how to make that process more efficient. One of my colleagues at Princeton had done work on a certain restricted class of LR(k) grammars. In fact these were the simple LL grammars. This is work that he had done jointly with Hopcroft.
MSM: Who is this?
Aho: Al Kornjack.
Aho: So, the talk of parsing methods and searching for a class of grammars that was big enough to describe the syntactic constructs that you were interested in, but yet restricted enough so that you could construct efficient parsers from it, was the name of the day. And, the methods that are used in yacc were a refinement of the methods that had come out from Knuth's paper on. There was a student at MIT, by the name of Drimmer, who defined the simple LR grammars. And one of the things that what we discovered when we wrote the theory of parsing book...what we discovered was that the literature - the scientific literature, if you could call it that - was very inaccurate. These objects were so new, and the techniques for dealing with them were so incomplete, that many of the claims that were made and published about various parsing methods were just wrong. The proofs did not hold. So one of the things that Jeff and I did was to attempt to understand what kind of techniques could be used to analyze these formalisms, and also to put a clear picture into the scientific literature - what was true and what wasn't. So that was a very interesting project, writing that book. And I discovered subsequently that that book came to be one of the hundred most referenced books in computer science and mathematics--although we certainly did not write it from that point of view, at that time. But it did give, at least me, a good foundation for talking about algorithms - reasonable algorithms that could be used in devices like lex and yacc. There was another event that took place in the early '70s... I had mentioned I was working on algorithms. Hopcroft and Alman and I had entered into a second...into a writing project. Hopcroft and Alman had written a language theory book, earlier. And I had written this book with Alman on the theory of parsing, so we decided that we'd enlarge the class of co-authors, and write a book on design and analysis of computer algorithms. I was very interested in algorithm design techniques at the moment, at that particular time. And, one time I happened to be giving a talk on techniques that could be used for constructing algorithms. Before, people tended to look at problems in isolation. But, the same technique could be used for many different problems. And, it became convenient to say, "Well, there's a general algorithmic design technique - divide and conquer. Take a big problem, break it up into chunks, try to find solutions for the chunks, and then combine these solutions into a solution to the problem as a whole." It's a technique that has been used since antiquity. But, just calling it "divide and conquer" is good enough because I can say to you, "Have you tried 'divide and conquer' on your problem?" It saved an enormous amount of conversation. Likewise, in pattern matching, algorithms in the area of looking for patterns in text, you could construct very efficient algorithms by using certain kinds of automata. When I had joined the Labs, I had mentioned that Doug McIlroy was my first boss. He had a passion for words, and we'd often talk about very unusual words, like, "What's the longest word in English...that has no repeated letter?" Or, "What's the longest word in which the letters goes up in alphabetically increasing order?" With certain formal methods, it was a bit trivial to specify these kinds of patterns. And, I guess I was also interested in crossword puzzles at that time, so I decided to implement one technique for constructing a regular expression pattern matcher that would very quickly look for regular expression patterns. The reason this is particularly effective for crossword puzzles is that, if you know that you have the third, the fifth, and the seventh letters of the word are such and so, then with regular expressions you could very quickly search the dictionary, and find all words that have that combinations of patterns. Regular expressions also allow you a much richer class of patterns than those. But that seemed to be one good motivation for it. And my idea was to use a certain technique for constructing deterministic finite automata from the regular expression, and then use that deterministic automaton to do the pattern matching. It was very similar in style to what yacc had done, that yacc takes a description, in this case a context-free grammar, and transforms its into an automaton that does the processing. So, this two-step style of pattern matching - you take the pattern, then you construct from the pattern, a recognizer, then use the recognizer to do the analysis - is a very potent one.
MSM: And this work was done after yacc?
Aho: It was contemporaneous because this is basically the idea in lex. And, I am very poor at remembering the exact dates at which things took place, and I really didn't keep a log in those days of when I did these things. So, I don't know exactly when these ideas occurred. But when I did the first version of this regular expression matching...pattern matching program...Doug McIlroy in the previous year had had a summer student who had taken a classical algorithm for constructing a deterministic automaton from a regular expression, and look for these patterns. And it was written in machine language. I was astonished to discover that the program that I had written ran several times faster that what this machine language program did. And, in fact, this sort of became an avocation of mine for a number of years subsequently, and the improvement in performance of that program has gone up by almost two orders of magnitude. Most recently what Andrew Hume had done...he put a technique called the Boyer-Moore Algorithm into the grep programs. And now we can search an 800-page book, looking for any key word or phrase, in a fraction of a second, with these fast patterns matching techniques. The early versions of these programs would have taken almost a hundred times faster to look for these patterns. And, also now that we have the new machines, the combination of better algorithms and faster machines has been just dramatic. But I was going to mention a story that... When I was working on the algorithms book, I was giving a talk on algorithm design techniques. There was a woman in the audience who worked for the technical information libraries, and she had written a bibliographic search program. They used to get a tape from the government that had various citations on it, and bibliographers could then specify some kind of search prescription consisting of Boolean functions of key words and phrases. And then the program would look for all citations that satisfied that search. Sort of the night before, some gun-ho bibliographer had a search prescription that had had over a hundred key words and phrases, and there was a six hundred dollar limit on the program of how much time it could consume. So, after using up that six hundred dollars, it had not yet finished printing out all the titles that satisfied the search. So she mentioned this story to me, and I said, "Oh, why don't you, with your set of key words, construct a pattern matching machine, and then the pattern matching machine will look for all of these key words in parallel. And by the way, here is an efficient technique for constructing this pattern matching machine." So, a month or two later, she came back to me and said, "Remember that program I had written? Now, it costs twenty-five dollars to do the search. In fact, every search costs twenty five dollars - that's the cost of reading the tape." So what had been a I/O bound...er, process bound problem, had become I/O bound, the way it should be. And also, at that time, I took that algorithm and implemented the UNIX program fgrep with it. And, there was a lot of interest in words. Lorinda Cherry was interested in writing, and I think this interest came from Doug McIlroy, who treasured good writing, good diction. So, she wrote a program called diction which would look for instances of questionable words and phrases in one's writing. If you used constructs like "irregardless" or "emotional feeling," it would flag every sentence that contained an instance of one of these questionable dictions. The fgrep algorithm was ideally suited to that kind of application, as well, where you could take a set of key words and look for instances of these key words. So, this kind of technology that was based on language and automata theory had a variety of applications. And, these applications are still unfolding--the most recent application of these string pattern matching ideas took place in a program that was written by a summer student who came here from Stanford. As I had mentioned, back in, I think it was '84, I had spent a sabbatical at Stanford, and had taught the course on advanced compilers, and there was this bright graduate student in the audience, by the name of Steve Chang, who subsequently came to Bell Labs for a summer. And, during the summer period, he developed a program called twig, which did for the back end of the compiler what lex and yacc did for the front end. That with twig one could take essentially a tree rewriting scheme...no, generalize the notion of the syntactic units to be trees, and one could view the process of code generation as taking an intermediate representation in the form of a tree, and then wallpapering it with little templates that correspond to machine instructions, or sequences of machine instructions. And...what twig did was...it would take a intermediate representation, and see if it could decompose this machine...this intermediate representation...to a collection of subtrees. And it used a dynamic programming code generation algorithm that Steve Johnson and I had developed in the mid-'70s. This is an algorithm that was the foundation of one of the PCC compilers. And, in fact, if I may take a digression on that... The portability of UNIX stems from several factors: one is that it was soon written in a high level language, C, which Dennis Richie had devised. I had mentioned Steve Johnson had this interest in C compilers, and then tools for automating construction of compilers. One of the things that Steve and I did, after we had done yacc, was to look at algorithms for code generation. And we had developed a dynamic programming technique which would allow one to take a tree representation and map it into an optimal sequence of output code for a machine. And we developed the theory of, under what conditions would the output code be optimal. And we tried to extend the class of machines for which this kind of code generation technique could be used. And it turned out that it fit a large class of machine architectures.
MSM: Is this the reason for Steve's interest in a C machine? Joe Condon had been telling me that Steve had been used to architectures for a machine optimized for C.
Aho: Well, it's again...what interesting technology...what you could do once you had these technological tools available. We had published this paper on code generation - optimal code generation - I guess at one of the theoretical conferences, and then it was published in the JACM. But most importantly, the PCC 2 compiler had used this, say, code generation mechanism, and it became possible to re-target this compiler to other machines fairly easily - all you had to do was specify new code generation tables that corresponded to the new architecture, and then you would have a compiler that would translate C into that new architecture. Since the UNIX operating system was written in C, it could then be moved very quickly to this. And both Dennis and Richie and Steve Johnson were very interested in portability, at that time. Well, now that we had this tool of being able to re-target a compiler to a new architecture, people in the mid '70s started looking at the question of, "Well, what would a good machine be for executing C programs?" I think this was a question that Sandy Fraser had also been interested in initially. Steve Johnson looked at it in depth. And, there was a young researcher here at the time, by the name of Dave Ditsel, who was consumed with this question of what would a good architecture for running C programs be like. So what people did was they would search a space of architectures for one that would be effective in running C programs. And they were able to do this in a much more scientific way by taking a C compiler, re- targeting it to this hypothetical architecture, and since we had a body of real C programs we could look at...these were the ones in usr bin. We could take those programs to see how well they would run in that new architecture. So they repeated this process of looking at an architecture, how good it was, then iterating. And the reduced instruction set computer that we called CRISP was born in this particular fashion. So the time CRISP came out in the early '80s, it was well-honed for running C programs. At least in the AT&T environment, we had an enormous investment in C programs, at that time. So, it was a particularly effective engine for running C machines. And, now, as you are well aware, the RISC machines have produced a dramatic increase in price-performance, in terms of what one can buy. And, this work arose out of compiler technology, to some...to a great degree. When I was at Stanford, there was a professor there, by the name of John Henessey, who was very interested in compilation techniques...to be able to take advantage of RISC architectures, and he had been involved with the MIPS company that makes one of the very efficient RISC chips at this time. So again, all of that work I view as stemming from the availability of a certain kind of technology. And this technology is in some sense rooted in a formalism that gives it a degree of efficiency that you might not have with more ad hoc methods. Well, with ad hoc methods you can do anything, but...at least my point of view is that, if you have a scientific base on which you can measure performance, and you can iterate and improve your algorithms with this scientific understanding, and then build an engineering design theory on that, you are going to be unassailable in the work and some of the compiler construction tools. Pattern matching algorithms seems to give some evidence to this. The other aspect of this is that it improves software quality in a significant way, and productivity in a significant way, 'cause you can write a compiler much more quickly using these tools, than if you had to do it from scratch: none of these stories that the first FORTRAN compiler took dozens of staff years to produce. Whereas now you could construct a compiler - a significant compiler - as part of a classroom project in an undergraduate computer science course.
MSM: Of course you are writing it on a machine that has, to all intents and purposes, unlimited memory. The first FORTRAN compiler had to go into a fairly small...
Aho: That's true, and it is still true that constructing compilers - good compilers - for languages like ADA are substantial efforts. But on the other hand, coming up with a preprocessor, or, a little language, is an activity that in many cases can be done over the course of a weekend.
MSM: 'Cause one of the things that struck me right from the beginning was when I first learned something about UNIX and its relationship to C as a function of...that you can stay out of assembler...as independent of the machine as you possibly can. Learning that UNIX was running on VAXes, or two-thirds of the VAXes out there were running UNIX. And then taking the VAX architecture handbook. Because even though we weren't supposed to write in assembler, it was a good idea to at least know what assembler was about. And see this instruction set on a VAX-11. It was an assembler's dream. Evaluate a polynomial: that's an assembly language instruction. And I thought, "Wait a minute... It doesn't fit! And then there's a discussion about the push toward, and the way in which RISC architectures had turned out to be the ones that work best with C. It seemed to make sense. Just let the compiler do the work."
Aho: Well, when I was doing this work with Steve Johnson, Johnson used to go around and saying he should get the language designer, compiler writer, and the machine designer into the room at the same time, because the system will be so much better. The tradition, up to that point, had always been: some engineer went off and designed the machine, and then some committee went off and designed a programming language, and then there was this poor compiler writer that had to spend an enormously frustrating experience of mapping that programming language onto that machine architecture. Certainly if one looks at the popularity, or lack thereof, of ADA, there are constructs of ADA which are very difficult to compile. If one had done a prototype implementation at the time the language was being defined, then it would perhaps have been a...different in certain aspects. FORTRAN...the lexical structure of FORTRAN is very difficult to analyze using lex, or any of these automated tools. The newer languages are much easier to lexically analyze and to syntactically analyze. So, from one point of view, the technology has shaped the structure of today's languages to some degree. PASCAL was defined so that it could be easily compiled. In fact, one of...I think one of the reasons for the enormous popularity of PASCAL was its availability. It was easy to construct compilers and to move compilers to a lot of machines. So the ubiquity of the language stems from the fact that you were able to obtain compilers for it.
MSM: So that language design ought indeed to keep implementation in line is what you want to say. 'Cause there was a school of thought in the '60s that, especially among European designers, that you design the language and not worry about the implementation.
Aho: That's very true. And some of that philosophy still persists in the specification of protocols these days. That protocols are being designed by committees without keeping an eye out for performance or implementation considerations. And, that may be too much of a purest attitude to get the kind of efficiency or perhaps interoperability that one would like to have. One of the great things about... Well, you talked about UNIX being a spirit. The one way that I view it is that there's a great deal of Darwinism in UNIX. If one looked at how certain commands and languages came into being, it was because someone had an idea. Say like Kernighan and Cherry for this language for typesetting equations. They got a rudimentary form of the language processor to be up and running, and then they let their friends use it. Then the language evolved on the basis of usage patterns. That, as users gained more experience with the language, they would be able to say, "I'd like to have these additional features." Or, "These are some awkwardnesses in the language." So there was a Darwinistic evolution of the language, and, in fact, of the UNIX system itself. That it is satisfied a certain user's needs, and there was enough time to refine the system so that it satisfied those needs - I thought, quite efficiently and quite elegantly. There is this "European approach," if you want to use that term, or this more dogmatic approach to language design, where you have some august committee that meets for a period of several years to come up with a language specification. They...write a document. Compiler writers work off that document for several years to produce a compiler, only to discover that there may be some infelicities in the language design. And, the process is much more cumbersome. Natural languages evolve, and I think, "Why shouldn't programming languages?"
MSM: But two things...there are two things that I want to pursue further. Back up a little bit, and then I want to get back to this business of languages... I want to confirm whether I heard correctly. Essentially, when you arrived during those early years starting in '67, and you established your presence here... And you and Jeff Alman established your presence. How long was Jeff here?
Aho: He was here for three years.
MSM: So you were both around when UNIX got started. At least for that first year you were both around. What I hear is that...you have people like Ken and Dennis who are interested in this new operating system. Dennis is interested in languages, interested in getting this into a high level language. You are a resource for a body of theory, of which people are becoming ever more awares. And, is that how the connection is made, that is...? Dennis and Ken haven't brought anything like this theoretical background to their work, out of there own background.
Aho: No...well, Dennis had worked with...his doctoral dissertation with Albert Meyer, who was a distinguished theorist at MIT. So he certainly had an appreciation for, and understood theory. My initial interactions with UNIX, and were those as of a user, and I really didn't interact with Ken and Dennis on a technical front. Except I would always be in the UNIX room using the system, and I would watch their process of development. There was a great spirit in that UNIX room, in the early days, of a lot of camaraderie and discussion. The real catalyst, I think, for getting, injecting these ideas in...these theoretical ideas into UNIX, I think came out of my interactions with Steve Johnson, and also the... I suspect the insights that I gained from Doug McIlroy, who used some of the early versions of the programs that I had...I was interested in experimenting with some of these pattern matching ideas. So, he had been very much of an early user of these programs as I was developing them. If I had appreciated the significance of what yacc would have done, I would have written the program back in the late '60s, rather than waiting for Steve to do it in the '70s. Because it made an enormous difference.
MSM: But you can...that was a difference after the fact.
Aho: Yes. And, I really didn't appreciate the significance of what you could do with it, at the time, in the '60s. And I don't think anyone did, because the...what made a lot of this philosophy...a lot of these tools go was the framework that UNIX provided. That you could have pipes on which you could take the output of one program, and transmit it as input to another program. So that the notion of filters was something that evolved during the '70s. And these...I said I was very interested in formalisms for translation. What you could then do was you could compose these translations along.
MSM: The pipes were really an implementation of that notion of translations...
Aho: Yes. Doug McIlroy, though, I think is probably the author of translation...of pipes. That he had written, I think, this unpublished paper when he at Oxford back in the '60s. I don't know whether you have seen it or not.
MSM: No I haven't. Because I have talked to him about pipes, but he didn't mention a paper. I'll have to get it from him.
Aho: You should read this paper because it's UNIX pipes. One of the interesting things about Doug is that he has had these great, seminal ideas which not everyone knows about. And whether his standards are so high that he doesn't publish them...or what? But it's remarkable, the...
MSM: He's emerging as the gray eminence.
Aho: Yup. No, I...UNIX would not be the way it is today without Doug. And, also, I don't think the work that I would have done would have had the encouragement, had it not been for someone like Doug. For example, this algorithm for bibliographic search... If I remember, on some Friday afternoon, I went into Doug's office, as I said, "Here's something that looks kind of cute." And I only had about two minutes in which to describe the essence of how you construct this pattern matching automaton from the sets of key words. There is a slight technical detail that you have to appreciate on this, the construction of this failure function - which has some, actually, deep mathematical implications on properties of patterns and strings. But, I just mentioned it in the most cursory fashion imaginable. I discovered a week or so later that Doug had given a talk on the basis of this two- minute discussion. And he had the most wonderful set of viewgraphs constructed for this talk - which I subsequently used. So, Doug's abilities to comprehend work on a broad front, and also to appreciate the economy of expression that theoretical models gave to the work, I think was very supportive.
MSM: I think that one of the continuing themes of this manifest...in the range of tools that are in UNIX (?)...the order in which the tools were developed. But it is also manifested in the way people think. Is this notion a language? Sure, when Brian was describing...talking about eqn, he said that mathematical typesetting had really been an exercise in finding the right language. And I had always thought of it as a problem with graphics on the page. And Lorinda said, "No, no. The graphics is easy. It was a just a question of language." You said, earlier on, I talked about Brian wanting to design it the way people talk. And, you talked about the continuing interest in the printed word, or, the lexical word - orthography in language as written down. It's a view of the world. It's a language synergy of the world, which is quite different from the view of the world that we talk about mathematicians not talking to one another, but pointing to things, or moving things, or, visual images. Has UNIX - I don't want to say ignored, but... Has UNIX tended to emphasize that linguistic turn and leave aside questions of more visual patterns of thinking? Has this been conscious among people, or...?
Aho: Well, I think you have to bear in mind, when UNIX was born, the input device was a teletype machine. That...and what people were searching for were dictions by which one could use that kind of human interface to the machine. So, taking abstractions that represent various aspects of human endeavor, and translating those into ASCII strings was the question of the day. When one entered a document, one had to take two- dimensional limitations like that of mathematics and find a linearization for it. The common denominator, in UNIX, for information is the file - a sequence of bytes. And, the byte stream has, as Thompson puts it, the nice property that you can compose functions on a byte stream. Take a byte stream as input, and produce a byte steam as output. It's very easy to construct a filter. That is just a sequence of byte stream to byte stream mappings. Um, well, people have searched for what is the two-dimensional analog of the byte stream. And, I think a Turing Award could be won for getting the right abstraction - something that is as ubiquitous as the byte stream for one-dimensional input - what should that two-dimensional representation be? And, what seems to be happening is that there are many two-dimensional representations if you go off into the area of CAD, the design of tools for VLSI design. The circuit can be looked at at many different levels of abstraction. You might be interested in the logical properties of a circuit, or the electrical properties of a circuit, or the layout properties of a circuit. In each case, you want to have different dictions for talking about these different levels of abstraction. It is the common denominator that links them. Files on UNIX are universal, but they might not be the most efficient representation. So what we are seeing is, I think, a mention of abstractions for dealing with different application areas. But the abstractions that are used to talk about electrical circuits are quite different than the abstractions that are used to talk about chemical structure diagrams, which are quite different from the abstractions that people use to talk about music. And, it may be the case, and, in fact, I feel it's the case, that what we are seeing with computers is just a mirror of human activity, and that information comes in many different sizes and shapes. These sizes and shapes represent the different aspects of human endeavor.
MSM: So UNIX began as...with the teletype as the major interface, with the file as a stream of bytes. You used a Darwinian image before. Is UNIX wedded...so wedded to these forms of data that it won't...it can't adapt to, let us say, a more visual way?
Aho: No, in fact, far from it. I think UNIX has an enormous amount of flexibility. I don't know whether you've talked to Dennis about the evolution of the UNIX system, per se, but version 10 UNIX that we run today is significantly different from the early versions of UNIX. What became very clear, very early in the game, is that computing is a worldwide fellowship. Computing is not done in isolation. We like to have access to other people's ideas, programs, information. We like to have it conveniently accessible from whatever appears on your desktop, or whatever you have on your phone. And, you'd like...at least my view is that you'd like to have what appears on your desktop to be very much like the telephone, in that you have access to the same services no matter what telephone you use, and, that it has a human interface that's easy to understand, and gives you the connectivity that you'd like, to both other people and to information and computing resources, and whatever other kinds of information resources that you want. There is a lively debate going on of what kind of graphical user interface do people want. And, it may be the case that there is no single graphical user interface that's going to be good for everyone. On the other hand, I suspect there are going to be common elements that everybody should understand so that, or, at least the basic functions people can use this tool productively in their lives. Text is going to be around for the foreseeable future. So no matter what kind of graphical user interface that you're dealing with, you are going to have to be able to deal with the word, as it appears - certainly as we know it today, as it appears in books and papers. And UNIX is very well suited for that. So you are going to need that kind of mechanism, which is, I think, well represented in UNIX. But, I suspect we are going to have our different user interfaces that one can overlay on the basic computational model that UNIX presents, depending on what kind of application are you are interested in. Certainly, people are visually oriented, and I think that being able to manipulate computation and resources in visual forms is going to be important. But, from another perspective, these are just...just manifestations of language - more complex languages than we've had in the past. But, we are dealing with different kinds of syntax; but there is a syntax there, and there is a semantics there. The basic principles that we have understood in terms of syntax and semantics still apply, except that we have to use more powerful mechanisms to describe them. And, the semantics are more complicated, especially when you are now dealing with issues such as concurrency - objects that can appear at different times or simultaneous times. Arriving at the terminal, you have to have some mechanisms for being able to cope with that. From one perspective, that's just a mirror of society, that society has set up certain protocols for guiding interactions; as we are having this conversation, there's a certain implicit protocol that we follow. If there were five people in this room, we...we'd exercise a certain protocol.
MSM: Yeah, there is...there are certainly protocols that we follow. And there are exchanges on the one hand, that the overt manifest exchanges. The way the exchange of words... We also, of course, give a lot of signals off to one another. We have both been maintaining fairly intense eyeball to eyeball communication here. Watching one another's gestures, and that's been part of the conversation. How we steer the conversation, of course, is not being caught on that tape at all, and therefore will never be caught by the transcript, and, yet, we'll be [MISSING SEGMENT] a full understanding of it. I am not suggesting that...you know...obviously...we have the faintest idea of how one captures that...
MSM: ...record of language, record of that language, or whether it's...one wants to try to get computers to do that.
Aho: Well, I think the great challenge today is to improve the ease of use, and the effectiveness of use of the machine. And, if one can find successful ways of harnessing two-dimensional inputs and, in fact, other forms of input - why can't we combine voice?
MSM: That's been a longstanding dream of the Labs, hasn't it? You know we can talk to these things, and get them to talk back.
Aho: Yup. And, in fact, if you look at some of the work in Jim Flannigan's lab, it's remarkable what progress has been made in terms of speech synthesis and recognition. The big trick, and there is also another development that is taking place, and this is optical character recognition. We can put devices...paper into scanners, and have that automatically be translated into forms that can be manipulated by the machine. We can deduce the lexical structures of what's on a sheet of paper. But the thing that's really required to make both speech understanding and OCR fly is to harness semantics, and that's going to be a long time in coming. That if you want to resolve the ambiguities that are present in our language or interactions, which humans do naturally as a matter of course, you have to understand what's being said. And I suspect...you know it's rather remarkable that all I have to do is say a few sounds, and you immediately recognize my voice. Why is that?
MSM: I thought you were about to say that all it takes is a few words and we communicate. But the thing is that the two of us walk into this room, with the world, each with our own experience of the world. And it's an experience that at various points is intersected. Not that we've intersected one another, but, rather, that the world we've experienced has been in some cases the same world - reading the same books, meeting the same people, being in the same places. That has also aided communication. Both aids it and also can block it - lack of communication because of the false cognates.
Aho: I think of it as a symbol table. We have a large shared symbol table.
Aho: Did...it might be interesting, in fact, this is a view that I have, and it's clearly shaped by what my background was. What is the view that you've gotten from the other people who were involved in UNIX in the early days?
MSM: Well, what I'm getting is the shared view - that people do tend to think in terms of languages. They do tend to think in terms of tools. There is this property of UNIX that struck me from the first time I encountered it. As someone interested in the origins of the so-called software crisis, I knew the problems of programming, and someone who had been pulled back to the subject by Joe Weizenbaum's book, and the marvelous view of a hacker, and this whole notion of people's sitting and hammering away at this machine, and writing code that works almost as a black box, and needs a black box for the person that wrote it. Now you encounter something like UNIX, which on the one hand represents really extremely sophisticated and clever programming. And then, on the other hand, if not theory driven, at least rooted in theory, and sensitive to theory. Programmers' development tool, yeah, but, also, it seemed to me a reflection of explorations in applied computer science. What I've been trying to get at in these interviews, among the many things that I've been trying to get at, is how that...do you...seem to be. The problems with programming elsewhere make it clear that it's not natural. This is the sort of thing that may just happen, it may be managed, it could be a question of place and time, and so on. I'm getting glimpses at it, but the phenomenon seems to be clearly there.
Aho: I'll certainly be interested in your insights as you digest more and more of this material and accumulate more and more of this material. Could UNIX have happened at some other place...? Or did it take a place like Bell Labs, which had a tradition of hiring people who have both breadth and depth, and a mathematical proclivity?
MSM: That's a good question, and it's one of the ones that has to give answers. There is an obvious...as I have been thinking about this...in what form will all of this take? There's some obvious contrasts. One of them is prompted by the recent book, Fumbling the Future, which is about the failure of Xerox to go with Alto. If one wanted to think about comparisons of environments - that research environments were supposed to be unhurried. "Let's get a lot of bright people together and give them time to figure out what they'd like to do, and see what comes out of it." That, you know, Xerox PARC is the obvious place or comparison that I'm about to make. And an interesting conversation with the person that did the first paintbrush program. He was demonstrating up at SIGGRAPH over a reception at the computer museum.
[BREAK IN THE TAPE]
Aho: The math center has long had a mission of doing work in fundamental areas. Starting off with the work of Shannon, and there's also a certain consultive aspect to the work that the math center used to do that we would be treated as consultants by the rest of the company, who could be called upon to help out, to help understand phenomena that was important to the telephone company. But, they were given a charter in which there were no holds barred in how they solved those particular problems, and some very innovative solutions came out of that. I think that kind of tradition was inherited by the computing science research center. And I think it's a good tradition to have. That...there were also very high standards for the work and the people, that, no matter who came in, you were expected to be the forefront of your field, and be able to interact with the forefront of your field. There was probably also an implication that the real contribution was not just writing the paper, or, in fact, in many cases papers were never written. But, the real contributions were the ideas, and the refinement of the ideas, and showing people how to use these ideas, to solve problems of interest. I think that attitude still persists: people are interested in solving problems, but solving them in ways that no one ever thought of before.
MSM: And in sharing the solutions with each other. A fairly small group that has a remarkable longevity.
Aho: Yes, um...
MSM: One doesn't see that kind of stability outside of an academic department. Think of the people who were involved. Here I am, twenty years after the event, coming in and working my way down a hallway, and getting most of the actors who were involved in it - and have been involved with it over the intervening twenty years. I can't think very easily of another situation in which that would be the case.
Aho: I think its freedom, the focus, and the stability, and also, I guess, the funding that supports this kind of activity, that there's a catalyst that takes these ingredients together and, once in a while, produces remarkable innovation. Datakit, I think, has also been, if one were to use the embodiment of Datakit, the fundamental idea is virtual circuit packet switching. This is what Sandy Fraser has his fundamental patent on. It was an idea, and then there was some chance to show what you could do with that idea. In the Karmarkar's algorithm, I guess, is another example of, "Here's an idea, and then there's an opportunity to show what you can do with that idea." First and foremost you need the idea.
MSM: But there's also...it seems to me there are certain ideas that emerge in this environment, and those that are not likely to do so - because of an internalization of the institution's mission and goals.
Aho: That's undoubtedly true. And, we're certainly not going to produce innovations in Chaucerian poetry.
MSM: Also...is there...I think Lorinda noted...not particularly enthusiastic about AI either.
Aho: I guess I have two views of that. A lot of the work that we do, and a lot of the work that Lorinda has done, many people, and many people in the AI community, would also call AI. But, there's more of a bent that we are interested in solving problems than, perhaps, putting the fancy buzzwords on what we're really doing. And I guess, also, people here tend to be somewhat jaundiced in terms of making claims before the experiment that demonstrates the results has been done. I think there's a great tragedy in a lot of work in certain areas where people make claims that can't be justified by scientific experiment. I give AI people a great deal of credit for attacking problems that most scientists just wouldn't go near. But, I also feel that the field may have done itself a disservice by making unsubstantiated claims, which one wouldn't do in a more scientific tradition. One of the publications, I think it was, perhaps, the American Math Monthly, talked about, "Was it necessary for AI to be able to make these outrageous claims for the survival of the field?"
MSM: I want to see that...
Aho: Umm...and, there you go... Sure, find it in the library. I found it a rather interesting perspective.
MSM: Well, there's...there's an article of an issue of Daedalus, which is the journal of the American Association of the Advancement of Sciences...no, no...I'm...I am sorry, that's wrong. What do they call themselves, that outfit in Boston? Anyway, it's about the artificial intelligence...well...the state of the field, the debate going on. An article by Pappert, about, in essence...