Project Description
Motivation
An Example
ASAN Computing Model
ASAN Components
Two major trends provide the context and motivation for the proposed research.
The first is that modern network-based applications must use systems whose
architectures are based on a CPU-centric model using node architectures
optimized for interaction with the memory hierarchies in uniprocessor or
small-scale multiprocessor applications. Consequently, network- based applications
that produce, transport, and process large data sets suffer substantial
losses in performance when these data sets must be moved through the memory
and I/O hierarchies of multiple nodes.
The second major trend is the evolution of computational environments
driven by collaborative large scale scientific and engineering computations.
The hardware base for such applications is heterogeneous and at any given
time an application may utilize geographically distributed clusters of
workstations/PCs, high end graphics engines, specialized multiprocessors,
terabit storage facilities, and various distinct end user platforms. Both
trends point to several unique features about data intensive network applications
that challenge system architects.
The generation, processing, and display of the data may take place at distinct
points in a WAN or SAN and thus data streams must be transmitted to and
processed at multiple hosts while sustaining a minimum frame rate.
Processing and display deadlines cannot be met without network service
guarantees.
Applications and the requirements they are asked to meet are dynamic and
therefore the resources used by the applications must be coordinated and
managed at run-time.
Heterogeneous systems will use multiple communication substrates with different
properties and service guarantees.

Scientific applications such as the interactive, parallel atmospheric
modeling application depicted above form a significant class of motivating
applications. For example, one transformation necessary for visualization
translates data from the spectral to the grid domain, since the model itself
performs its computations in one domain, whereas end users wish to view
data in the other. Other transformations filter the data stream prior to
transmission while display calculations such as dithering must process
graphical data prior to display. Such applications make effective use of
computational clusters of workstations accessed remotely by end users.
Pipelined solutions to such stream oriented computations are well known.
The key idea here is that such computations can be performed within the
network interfaces as the data is being transmitted thereby avoiding costly
traversals of the cache and I/O hierarchy.
The implications of processing data as they pass through the network
interfaces are three fold
-
Speed: The NIs must be capable of processing the data fast enough
-
Control: The NIs must be capable of operating autonomously and concurrently
with the host CPUs
-
Configurable: The NIs must re-programmable so as to enable computations
to be dynamically placed within the NIs.
We expect to address each of these capabilities as follows.
-
Speed: The use of Field Programmable Gate Arrays (FPGAs) in the NIs.
-
Control: The use of embedded microprocessors in the NI
-
Configurable: We are developing a run-time reconfiguration infrastructure
and associated API to support the dynamic reconfiguration (hardware and
software) of the NIs.
The ASAN computing model is based on the following view operational view.

It is straightforward to envision support for data stream computations
in these active NIs. For example, consider an image stream that is captured
at one host in a network and processed at multiple locations. In general,
the image processing operations may be performed on the source CPU, source
NI, destination NI, or destination CPU as identified by points 1-4 the
figure. Inter-processor communication takes place by first establishing
a virtual connection between source and destination. The connection abstraction
provides a convenient means to specify the computation requirements of
the stream and permit the placement of these computations at some point
along the path. This problem is similar to the mapping actions performed
on parallel programs at the time they are initialized. Thus, the establishment
of a connection concerns not just the allocation of communication resources,
but (1) the identification of the computations performed on stream data,
(2) the allocation of appropriate computational resources, in this case
configurable hardware, so that the data may be operated on in real-time,
and (3) the identification of QoS parameters that will be used to guide
the scheduling algorithm in the NIs. When the computations are allocated
to the NIs at connection establishment, the NI configurable hardware must
be programmed to implement the computations as the data is streamed through
the interface.
The key idea is to operate on data as it "passes through" the interface.
The principal benefits are the ability to perform selected computations
in the NIs "close" to and synchronously with the data streams they affect,
thereby (1) avoiding unnecessary intermediate and host resident storage
of data items, (2) decreasing loads on hosts (CPU, memory, I/O infrastructure)
and/or reducing costly host CPU interruptions, and (3) avoiding disturbances
of host caches. Our current design calls for the use of FPGA devices for
the implementation of the configurable hardware component.
The ASAN Project is exploring several different aspects of SAN functionality.
We are currently developing the following major components.
QUIC
This component is the Quality of Service (QoS) infrastructure that provides
communication scheduling services. The QUIC layer moves scheduling services
into the network interfaces, is dynamically extensible and can provide
several classes of delivery guarantees. QUIC also supports selected application-specific
computations into the NIs.
GRIM
Generally Reliable Inter-processor Messages (GRIM) is a lightweight, low
level reliable message layer that encompasses optimistic flow control,
reliable multicast, and dynamic reconfiguration in the presence of faults.
RAPID
Reconfigurable Architecture for Processor Interconnect Devices (RAPID)
is a network interface for SANs that combines reconfigurable logic in
the form of field programmable gate arrays (FPGAs) and embedded
microprocessors. The first version of the NI will be comprised of
Compaq PCI Pamette cards with Myrinet Mezzanine cards. The initial
testbed will use 8 such interfaces in a 16 node quad Pentium Pro
cluster. Experiences with this testbed will be used for a second
generation custom RAPID card optimizedfor the support of streaming
computations.
VCM
The Virtual Communication Machine (VCM) software architecture aims to provide
adequate support for dynamically placing and executing application-specific
code called "extension modules" in the NI. Extension modules can be used
for interfacing to, dynamically programming, and controlling the FPGA resources
from within the RAPID NI rather than from the host CPU. Overall, the VCM
provides an extensible environment with a common set of abstractions for
controlling both FPGA and processor and memory resources within an NI.
|