The Moby Address(c) Scheme and its application to AI databases.

Submitted to AAAI-86, Science Track, AI Architectures and Languages


Abstract: 
    A method is presented by which the virtual address space
of a computer system, previously considered one of the most important
system hardware architectural parameters, can instead be made a software
parameter, thus allowing the implementation of arbitrarily large virtual
address spaces without hardware changes.  Beyond simply allowing the
execution of very large programs, this scheme is expected to have
signification implications for the implementation and maintaince of
large non-volatile pointer-oriented data bases, which frequently arise
in AI applications.  The scheme is fundamentally a run-time mapping
operation on pointers.  Additionally, certain non-uniform extensions to
mapping process are discussed which bear on the problem of applying a
number of computers effectively to a single problem involving symbolic
computation.


Keywords:


Introduction:

  One of the most important characteristics of a computer system is its
"address space".  The address space might be thought of as the limit in
size to which a program can dynamically grow on a particular computer
architecture without discontinuous changes to its logical structure.
Until now address space size has been thought of as a hardware
determined parameter.  For example, the address space of the LMI LAMBDA
processor has been said to be 2**25.  of 32. bit words, and this number
is determined by the width of certain hardware data paths, the size of
certain mapping memories, etc.  Although there have been various
attempts to run larger programs than the basic address space of the
computer involved, such efforts have usually proved to be seriously
flawwed.  "Overlaying", for example, is one such well known technique,
and it has often proved much less than desirable, particularily for
heavily pointer oriented applications such as LISP.  The MOBY ADDRESS(c)
technique is a fundamental advance which allows the logical address
space (now identified as the MOBY space) to be decoupled from the
underlying hardware address space (now identified as the LOCAL space.)
Once decoupled from hardware, only software considerations limit the
size of the MOBY space.  The baseline scheme we consider here, for
example, provides 50 bits of address.  The feasibility of such large
address spaces makes fundamental differences possible in the way
addresses are treated.  For example, with such a large address space it
becomes possible to make every address which ever existed or will exist
running LISP machine environment unique.  This in turn, allows machines
to exchange symbolic information without converting pointers to "flat"
form via operations such as PRINT and INTERN.

  Although some background details of the Moby Address technique itself
are presented in the present paper, an effort is made to focus on the
application of this technique to a problem which is somewhat AI
specific, namely the maintainence of very large, non-volatile, pointer
oriented symbolic data bases.  As AI applications begin to effectively
conceptualize and store their experiences over significant lengths of
time, the size of the data bases they will accumulate will become
extraordinarily large.  In this environment, present techniques for
encoding pointer oriented data bases as a sequence of integers stored in
conventional "flat object" file systems [FASL reference], will become
grossly inadequate, and, we believe, techniques similar to those
presented here will become necessary.

  The idea of introducing a mapping between the actual computation and
the very large address space is very attractive because it allows the
actual computation to take place on pointers of modest width, which
saves hardware and increases efficiency.  Moreover, a large and mature
base of hardware and software becomes immediately available
experimentation in a broad range of additional areas, including
effectively employing multiple processors on tasks in symbolic
computation.

  The concept of "address space" is one of the very most fundamental
concepts, underlying nearly all modern computers [Vn].  Historically,
this parameter has been chosen very early in computer system design, and
its unfortuately low value has probably been responsible for the
obsolesence of more computer designs than any other cause.  In early
computers, the choice of address space size was probably guided by an
estimation of the maximum amount of physical memory which was likely to
be attached to the computer in question.  Later, with the introduction
of virtual memory systems, the address space size may have been chosen
in comparison with the disk storage availability.  

  However, the desirability of implementing an address space comparable
in size to fundamental physical limiting values (for example, the number
of microseconds in a thousand years, approximately 2**55) has been
reconized previously.  Multics [ML] and the Domain Network of Apollo
[DN] represent two of the best known such "wide pointer" systems.
Typically, such wide pointer systems require a data path two to four
times the width usual in non wide-pointer systems.  This represents a
significant, but not completely unacceptable, cost in hardware, which
the designers of the aforementioned systems chose to pay by designing
"outboard" extensions to their central processing units, which were
previously existing designs.

  The address spaces in most modern computers are volatile in the sense
that they are frequently discarded and reconstructed (i.e., whenever the
computer is "booted").  However, not coincidentally, the wide pointer
systems mentioned above are non-volatile in that the address space itself
persists across system reloads.  Even in these systems, attempts to
implement data bases of the type considered here have not been reported
to date.  The Apollo system (at least in its usual implementation) is
unsuitable in that the minimum "object" size is 32K bytes, which allows
for only 512 objects simultaneously in the active address space of the
commerically available CPU which is used.  The LISP systems usually
employed on the Apollo do not support the wide pointers.  Of the
existing systems surveyed here, Multics would seem to be the most
suitable vehicle for the implementation of a non-volatile data base of
the type we are discussing.  A LISP implementation for MULTICS
supporting wide pointers does exist [GB].  However, possibly due to
certain historical factors and limited availability, extending the
(fairly elaborate) non-symbolic oriented pointer system provided by the
Multics into a truly symbolic oriented configuration does not seem to
have been attempted.  Below, a case will be made that, lacking
the ability to insert functionality at the layer represented by
reconcilation (or mapping) layer, the attempt would have been difficult.

  One previously proposed design, the Ooze system of Xerox Parc [ref?],
is similar to the present design in that it also is a software oriented
mapping operation, although it was proposed to operate on object
numbers, not general purpose addresses.  Apparently, the design was not
completed and its goals were considerably more restricted than
those of the present design.


Comparison to previously work on very large non-volatile AI databases:

  As discussed above, large non-volatile symbolic data bases of the kind
we are considering are rarely, if ever, employed today.  Instead, the data
is resident in a volatile address space during active computation.
Periodically, writeouts are made to conventional, non-volatile, file systems
which are capable of storing only "flat" object such as integers
and incapable of storing pointers.  However, widely used techniques exist
which convert a pointer data base to a sequence of integers [FASL].
While actually resident in the conventional file system, the data base
is inert and can not be referenced or updated directly.  However,
the file can be loaded (as a whole) back into an active address space
for further use.  Since the non-volatile form of the data does not contain
pointers (or particularily pointers across file boundaries) many of the
problems considered by this paper do not arise.  However, this state of
bliss comes at a high, and in the long run simply unacceptable, price.

During the loading process

  Few existing papers
Garbage collection in large address spaces.
   Bishop.   Hewitt and Liberman..


[Li]   Lieberman, H and Hewitt, C, A real time garbage collector based on the
	 lifetimes of Objects
[Bs]   Bishop, P. Computer systems with a very large address space and 
	 garbage collection.  Tech. Rept. TR-178, MIT Lab for Computer Science, 
	 Cambridge, Mass, May 1977.
[Wh]   Memory management in a gigantic LISP environment, or GC considered
	 harmful.  In Proc. 1980 Lisp Conf., Stanford, Calif.
[Vn]