The Synthesis of Distributed Real-Time Systems
from Processing Graphs

Steve Goddard
1997

Computer Science & Engineering
University of Nebraska-Lincoln
Lincoln, Nebraska 68588-0115 USA

Much of my research efforts have been driven by open problems that I have encountered in my consulting practice. In fact this research originated with a Bell Labs research and development project involving signal processing applications developed with the U.S. Navy's Processing Graph Method (PGM) [18]. I was asked to create an on-line algorithm that would allocate these real-time applications to processors in a distributed network. Although the problem is NP-hard, I had hoped some simple heuristic techniques would provide a reasonable (albeit not optimal) solution. Unfortunately, I found that there did not even exist a framework from which I could analytically evaluate possible allocations. The state-of-the art was to simulate execution scenarios -- an extremely time consuming and untenable option. Moreover, I found that there was no way to bound latency or memory requirements in the applications. Before I could evaluate heuristic allocation algorithms, a framework for evaluating processor demand, latency, and memory requirements of the applications was needed. This line of research is briefly described below.

The Synthesis of Real-Time Systems from Processing Graphs. Processing graphs are a standard design aid in the development of complex digital signal processing systems. These directed graphs consist of nodes and graph edges depicting the flow of data from one node to the next. When sufficient data arrives, a node executes its function from start to finish in isolation (i.e., without synchronization).

Processing graphs provide a natural description of signal processing applications with each node representing a mathematical function to be performed on a stream of data that flows on the arcs of the graph. The streams of input data are typically generated by sensors sampling the environment at periodic rates and sending the samples to the signal processor via an external channel. The processing graph methodology allows one to easily understand the signal processing performed by graphically depicting the structure of the algorithm. An important advantage of the graphical representation is that portions of the signal processing application (i.e., subgraphs) can be understood in the absence of the rest of the algorithm. Unfortunately, it is difficult to analyze the performance and processing requirements of applications developed with processing graph methodologies.

In my experience with clients, the primary problem in developing embedded signal processing applications with a processing graph methodology is building a predictable real-time system in which one can determine if the graph ``fits'' on a processor. A graph ``fits'' on a processor if enough processing capacity is available to guarantee a temporally correct execution. Typically a processing graph is mapped to a set of tasks accourding to a model of real-time execution (e.g.,[4, 10, 15, 16, 19, 24, 22, 23]). The schedulability conditions for the model are used to see if the graph ``fits'' on the processor by verifying processor capacity. Unfortunately for general processing graph methodologies, such as PGM, there is no obvious task model that efficiently describes the execution characteristics of nodes in the processing graph. Without a real-time task model and its schedulability conditions, the latency and memory requirements of an application cannot be evaluated.

The overall goal of my work is to create a synthesis method for building distributed real-time applications from processing graphs. As discussed below, I have already made significant progress towards this goal.

Mapping Nodes to Real-Time Tasks. The synthesis method begins with the creation of a PGM graph. Once a processing graph has been created, the nodes must be mapped to real-time tasks to provide temporal correctness. Therefore, the first problem to solve is identifying when nodes will execute so that the correct task model can be used. The most natural representation of node execution is achieved using the concept of node execution rates [7]. In [7, 8, 9], new theorems are presented that characterize the node execution rate of every node in a PGM graph. The synthesis of real-time systems from cyclic processing graphs is achieved by mapping the graph to real-time tasks according to the Rate Based Execution (RBE) model developed by Jeffay [11], and the RBE schedulability condition is used to verify processor capacity.

Once the processing graph has been mapped to a set of real-time tasks and a scheduling algorithm selected, issues such as latency and memory requirements can be addressed. Latency is the interval between the time when a signal arrives from the external sensor and the time when the processed signal is available to the output device. The memory requirements of a graph are dependent on the amount of buffering required by the graph edges during graph execution. Although system engineers have been unable to analyze the properties of latency and memory requirements in PGM graphs, we have shown that this is not an intrinsic property of the methodology. Moreover, by applying existing scheduling theory to PGM graphs, we have developed a framework for quantifying and managing both the latency and memory requirements of processing graphs.

Latency Analysis. There has always been a trade-off between latency and throughput in embedded signal processing applications. Prior to [5, 9], however, there has not been an analytical method for quantifying this tradeoff. We have identified two sources of latency in real-time systems created from processing graphs. The first, called inherent latency, is latency defined by the processing graph parameters. The second source of latency, called imposed latency, comes from the scheduling and execution of nodes in the graph. In [9], a framework is developed for evaluating latency using equations that provide upper and lower bounds for both types of latency. The trade-off between latency and throughput can now be quantified by evaluating latency and processor utilization using our equations.

Memory Requirements. Sizing the system to fit memory requirements is another major concern in embedded signal processing applications. An important aspect in controlling memory usage is the node execution schedule since this impacts the buffer requirements of the edges in the graph. The state of the art in minimizing the memory requirements of a signal processing graph is to create a node execution schedule off-line [14, 20, 25, 21, 3]. In contrast, our approach is to use a dynamic, on-line scheduler for graph execution. It is shown in [8] that preemptive earliest deadline first (EDF) scheduling can achieve near optimal memory usage and often requires less memory than static schedules created by off-line schedulers designed to minimize memory usage. Moreover, dynamic, rate-based scheduling has less imposed latency and can provide more control over deterministic response times of selected processing paths than static scheduling.

Applying the Theory. I am interested primarily in the practical application of research. As such, I have been continually applying and evaluating these research results in my consulting work. Moreover, to illustrate results from our research, we have used portions of actual applications to quantify the memory savings achievable and the latency incurred using our synthesis method.

For example, in [7] a Synthetic Aperture Radar (SAR) Application was used as the motivating problem. SAR data can be used to identify man-made objects on the ground or in the air by producing high-resolution, all-weather images in real-time, but it requires a tremendous amount of processing capacity [26]. Using the SAR graph, we have shown that on-line dynamic scheduling can be achieved with minimal imposed latency.

The memory requirements of an International Maritime Satellite (INMARSAT) mobile satellite receiver application was evaluated in [8]. INMARSAT is a global maritime communication and navigational system. For many acoustic signal processing applications, such as the INMARSAT mobile satellite receiver, memory requirements are more of a concern than processor capacity. We have shown that for these applications, dynamic, on-line scheduling can be achieved with near minimal buffer requirements using a fixed buffer for each queue and a simple EDF scheduler. When a shared buffer implementation is used, the buffer requirements of the graph are reduced below those of a comparable implementation scheduled off-line.

I have also evaluated the latency requirements of the distributed Directed Low Frequency Analysis and Recording acoustic signal processing application from the Airborne Low Frequency Sonar subsystem of the Block II upgrade to LAMPS MK III helicopters. This is one of several signal processing applications that must execute on a new distributed signal processing system being developed by General Dynamics. The latency bounds developed in [9] are being used to determine the number of processors and the amount of memory required in the new system for the application to meet its mission requirements.

Continuing Research. The current synthesis method uses uses a work-conserving scheduler -- a scheduler that never idles the processor when nodes are eligible for execution. I am investigating the use of a non-work-conserving scheduler that deliberately idles the processor at times when nodes are eligible for execution. Such a scheduler increases imposed latency, but may provide tighter bounds on the memory requirements of processing graphs. This line of research may also be of use when the operating system's scheduling algorithm cannot be modified since my non-work-conserving scheduling model can be used without changing the existing scheduling algorithm.

There are numerous practical problems still open, such as scheduling the rate based network traffic in a distributed system and the problem of allocating graphs to processors. A related research area that has gained considerable attention in recent years is the analysis and implementation of multimedia systems. Multimedia systems are often described using simple processing graphs [1, 12], and the RBE task model was created to support multimedia computing [11]. Thus, my research also provides a framework for analyzing latency in distributed multimedia systems.

REFERENCES

  1. Anderson, D.P., Tzou, S.Y., Wahbe, R., Govindan, R., Andrews, M., ``Support for Live Digital Audio and Video,'' Proc. of the Tenth International Conference on Distributed Computing Systems, Paris, France, May 1990, 54-61.
  2. Baruah, S., Goddard, S., Jeffay, K., ``Feasibility Concerns in PGM Graphs with Bounded Buffers,'' Proc. of the Third International Conference on Engineering of Complex Computer Systems, Como, Italy, September, 1997, 130-139.
  3. Bhattacharyya, S.S., Murthy, P.K., Lee, E.A., Software Synthesis from Dataflow Graphs, Kluwer Academic Publishers, 1996.
  4. Gerber, R., Seongsoo, H., Saksena, M., ``Guaranteeing Real-Time Requirements with Resource-Based Calibration of Periodic Processes,'' IEEE Transactions on Software Engineering, 21(7), July 1995.
  5. Goddard, S., Jeffay, K. ``Analyzing the Real-Time Properties of a Dataflow Execution Paradigm using a Synthetic Aperture Radar Application,'' Technical Report TR97-007, Department of Computer Science, University of North Carolina at Chapel Hill, April 1997.
  6. Goddard, S., Kumar, S. and Prins, J.F., ``Connected Components Algorithms For Mesh Connected Parallel Computers,'' Parallel Algorithms, Sandeep Bhatt ed., American Mathematical Society, Providence, RI, 1997, 43-58.
  7. Goddard, S., Jeffay, K. ``Analyzing the Real-Time Properties of a Dataflow Execution Paradigm using a Synthetic Aperture Radar Application,'' Proc. of IEEE Real-Time Technology and Applications Symposium, June 1997, 60-71.
  8. Goddard, S., Jeffay, K. ``Managing Memory Requirements in the Synthesis of Real-Time Systems from Processing Graphs,'' to appear in Proc. of IEEE Real-Time Technology and Applications Symposium, Denver, Colorado, June 1998.
  9. Goddard, S., Jeffay, K. ``Analyzing the Real-Time Properties of Processing Graphs Implemented with the Rate Based Execution Model,'' Technical Report TR98-001, Department of Computer Science, University of North Carolina at Chapel Hill, In Submission.
  10. Jeffay, K., ``The Real-Time Producer/Consumer Paradigm: A paradigm for the construction of efficient, predictable real-time systems,'' Proc. of the ACM/SIGAPP Symposium on Applied Computing, Indianapolis, IN, February 1993, 796-804.
  11. Jeffay, K., Bennett, D. ``A Rate-Based Execution Abstraction For Multimedia Computing,'' Proc. of the Fifth Intl. Workshop on Network and Operating System Support for Digital Audio and Video, Durham, NH, April 1995, published in Lecture Notes in Computer Science, T.D.C. Little and R. Gusella editors, Vol. 1018, pp. 65-75, Springer-Verlag, Heidelberg, Germany, 1995.
  12. Jeffay, K., Stone, D.L., Smith, F.D., ``Kernel Support for Live Digital Audio and Video,'' Computer Communications, Vol. 15, No. 6, July/August 1992, 388-395.
  13. Kumar, S., Goddard, S. and Prins, J.F., ``Connected Components Algorithms For Mesh Connected Parallel Computers,'' Proc. of the Third DIMACS Implementation Challenge, October 1994, 37-51.
  14. Lee, E.A., Messerschmitt, D.G., ``Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing,'' IEEE Transactions on Computers, Vol. C-36, No. 1, January 1987, 24-35.
  15. Mok, A.K., Sutanthavibul, S., ``Modeling and Scheduling of Dataflow Real-Time Systems,'' Proc. of the IEEE Real-Time Systems Symposium, San Diego, CA, Dec. 1985, 178-187.
  16. Mok, A. K., et al., ``Synthesis of a Real-Time System with Data-driven Timing Constraints,'' Proc. of the IEEE Real-Time Systems Symposium, San Jose, CA, Dec. 1987, 133-143.
  17. Novak, L.M., Owirka, G.J., Netishen, C.M., ``Performance of a high-resolution polarimetric SAR automatic target recognition system,'' M.I.T. Lincoln Laboratory Journal, Vo. 6, No. 1, 1991, 11-25.
  18. Processing Graph Method Specification, prepared by the Naval Research Laboratory for use by the Navy Standard Signal Processing Program Office (PMS-412), Version 1.0, Dec. 1987.
  19. Ramamritham, K., ``Allocation and Scheduling of Precedence-Related Periodic Tasks,'' IEEE Transactions on Parallel and Distributed Systems, Vol. 6, No. 4, April 1995, 412-420.
  20. Ritz, S., Meyer, H., ``Exploring the design space of a DSP-based mobile satellite receiver,'' Proc. of ICSPAT 94, Dallas, TX, October 1994.
  21. Ritz, R., Willems, M., Meyer, H., ``Scheduling for Optimum Data Memory Compaction in Block Diagram Oriented Software Synthesis,'' Proc. of ICASSP 95, Detroit, MI, May 1995, 133-143.
  22. Sun, J., Liu, J., ``Synchronization Protocols in Distributed Real-Time Systems,'' Proc. of 16th International Conference on Distributed Computing Systems, May, 1996.
  23. Sun, J., Liu, J., ``Bounding Completion Times of Jobs with Arbitrary Release Times and Variable Execution Times,'' Proc. of the IEEE Real-Time Systems Symposium, Washington, DC, Dec. 1996, 2-12. December, 1996.
  24. Spuri, M., Stankovic, J.A., ``How to Integrate Precedence Constraints and Shared Resources in Real-Time Scheduling,'' IEEE Transactions on Computers, Vol. 43, No. 12, December 1994, 1407-1412.
  25. Zivojnovic', V., Ritz, S., Meyer, H., ``High Performance DSP Software Using Data-Flow Graph Transformations,'' Proc. of ASILOMAR 94, Pacific Grove, November 1994.
  26. Zuerndorfer, B., Shaw, G.A., ``SAR Processing for RASSP Application,'' Proc. of 1st Annual RASSP Conference, Arlington, VA, August 15-18, 1994.