The Moby Address(c) Scheme and its application to AI databases. Submitted to AAAI-86, Science Track, AI Architectures and Languages Abstract: A method is presented by which the virtual address space of a computer system, previously considered one of the most important system hardware architectural parameters, can instead be made a software parameter, thus allowing the implementation of arbitrarily large virtual address spaces without hardware changes. Beyond simply allowing the execution of very large programs, this scheme is expected to have signification implications for the implementation and maintaince of large non-volatile pointer-oriented data bases, which frequently arise in AI applications. The scheme is fundamentally a run-time mapping operation on pointers. Additionally, certain non-uniform extensions to mapping process are discussed which bear on the problem of applying a number of computers effectively to a single problem involving symbolic computation. Keywords: Introduction: One of the most important characteristics of a computer system is its "address space". The address space might be thought of as the limit in size to which a program can dynamically grow on a particular computer architecture without discontinuous changes to its logical structure. Until now address space size has been thought of as a hardware determined parameter. For example, the address space of the LMI LAMBDA processor has been said to be 2**25. of 32. bit words, and this number is determined by the width of certain hardware data paths, the size of certain mapping memories, etc. Although there have been various attempts to run larger programs than the basic address space of the computer involved, such efforts have usually proved to be seriously flawwed. "Overlaying", for example, is one such well known technique, and it has often proved much less than desirable, particularily for heavily pointer oriented applications such as LISP. The MOBY ADDRESS(c) technique is a fundamental advance which allows the logical address space (now identified as the MOBY space) to be decoupled from the underlying hardware address space (now identified as the LOCAL space.) Once decoupled from hardware, only software considerations limit the size of the MOBY space. The baseline scheme we consider here, for example, provides 50 bits of address. The feasibility of such large address spaces makes fundamental differences possible in the way addresses are treated. For example, with such a large address space it becomes possible to make every address which ever existed or will exist running LISP machine environment unique. This in turn, allows machines to exchange symbolic information without converting pointers to "flat" form via operations such as PRINT and INTERN. Although some background details of the Moby Address technique itself are presented in the present paper, an effort is made to focus on the application of this technique to a problem which is somewhat AI specific, namely the maintainence of very large, non-volatile, pointer oriented symbolic data bases. As AI applications begin to effectively conceptualize and store their experiences over significant lengths of time, the size of the data bases they will accumulate will become extraordinarily large. In this environment, present techniques for encoding pointer oriented data bases as a sequence of integers stored in conventional "flat object" file systems [FASL reference], will become grossly inadequate, and, we believe, techniques similar to those presented here will become necessary. The idea of introducing a mapping between the actual computation and the very large address space is very attractive because it allows the actual computation to take place on pointers of modest width, which saves hardware and increases efficiency. Moreover, a large and mature base of hardware and software becomes immediately available experimentation in a broad range of additional areas, including effectively employing multiple processors on tasks in symbolic computation. The concept of "address space" is one of the very most fundamental concepts, underlying nearly all modern computers [Vn]. Historically, this parameter has been chosen very early in computer system design, and its unfortuately low value has probably been responsible for the obsolesence of more computer designs than any other cause. In early computers, the choice of address space size was probably guided by an estimation of the maximum amount of physical memory which was likely to be attached to the computer in question. Later, with the introduction of virtual memory systems, the address space size may have been chosen in comparison with the disk storage availability. However, the desirability of implementing an address space comparable in size to fundamental physical limiting values (for example, the number of microseconds in a thousand years, approximately 2**55) has been reconized previously. Multics [ML] and the Domain Network of Apollo [DN] represent two of the best known such "wide pointer" systems. Typically, such wide pointer systems require a data path two to four times the width usual in non wide-pointer systems. This represents a significant, but not completely unacceptable, cost in hardware, which the designers of the aforementioned systems chose to pay by designing "outboard" extensions to their central processing units, which were previously existing designs. The address spaces in most modern computers are volatile in the sense that they are frequently discarded and reconstructed (i.e., whenever the computer is "booted"). However, not coincidentally, the wide pointer systems mentioned above are non-volatile in that the address space itself persists across system reloads. Even in these systems, attempts to implement data bases of the type considered here have not been reported to date. The Apollo system (at least in its usual implementation) is unsuitable in that the minimum "object" size is 32K bytes, which allows for only 512 objects simultaneously in the active address space of the commerically available CPU which is used. The LISP systems usually employed on the Apollo do not support the wide pointers. Of the existing systems surveyed here, Multics would seem to be the most suitable vehicle for the implementation of a non-volatile data base of the type we are discussing. A LISP implementation for MULTICS supporting wide pointers does exist [GB]. However, possibly due to certain historical factors and limited availability, extending the (fairly elaborate) non-symbolic oriented pointer system provided by the Multics into a truly symbolic oriented configuration does not seem to have been attempted. Below, a case will be made that, lacking the ability to insert functionality at the layer represented by reconcilation (or mapping) layer, the attempt would have been difficult. One previously proposed design, the Ooze system of Xerox Parc [ref?], is similar to the present design in that it also is a software oriented mapping operation, although it was proposed to operate on object numbers, not general purpose addresses. Apparently, the design was not completed and its goals were considerably more restricted than those of the present design. Comparison to previously work on very large non-volatile AI databases: As discussed above, large non-volatile symbolic data bases of the kind we are considering are rarely, if ever, employed today. Instead, the data is resident in a volatile address space during active computation. Periodically, writeouts are made to conventional, non-volatile, file systems which are capable of storing only "flat" object such as integers and incapable of storing pointers. However, widely used techniques exist which convert a pointer data base to a sequence of integers [FASL]. While actually resident in the conventional file system, the data base is inert and can not be referenced or updated directly. However, the file can be loaded (as a whole) back into an active address space for further use. Since the non-volatile form of the data does not contain pointers (or particularily pointers across file boundaries) many of the problems considered by this paper do not arise. However, this state of bliss comes at a high, and in the long run simply unacceptable, price. During the loading process Few existing papers Garbage collection in large address spaces. Bishop. Hewitt and Liberman.. [Li] Lieberman, H and Hewitt, C, A real time garbage collector based on the lifetimes of Objects [Bs] Bishop, P. Computer systems with a very large address space and garbage collection. Tech. Rept. TR-178, MIT Lab for Computer Science, Cambridge, Mass, May 1977. [Wh] Memory management in a gigantic LISP environment, or GC considered harmful. In Proc. 1980 Lisp Conf., Stanford, Calif. [Vn]