This book provides design methods for Digital Signal Processors and Application Specific Instruction set Processors, based on the author's extensive, industrial design experience. Top-down and bottom-up design methodologies are presented, providing valuable guidance for both students and practicing design engineers.
Coverage includes design of internal-external data types, application specific instruction sets, micro architectures, including designs for datapath and control path, as well as memory sub systems. Integration and verification of a DSP-ASIP processor are discussed and reinforced with extensive examples.
- Instruction set design for application specific processors based on fast application profiling
- Micro architecture design methodology
- Micro architecture design details based on real examples
- Extendable architecture design protocols
- Design for efficient memory sub systems (minimizing on chip memory and cost)
- Real example designs based on extensive, industrial experiences
Inhaltsverzeichnis
1;Front Cover;1 2;Embedded DSP Processor Design;4 3;Copyright Page;5 4;Table of Contents;8 5;Preface;20 6;List of Trademarks and Product Names;26 7;Chapter 1. Introduction;28 7.1;1.1 How to Read the Book;28 7.2;1.2 DSP Theory for Hardware Designers;32 7.2.1;1.2.1 Review of DSP Theory and Fundamentals;32 7.2.2;1.2.2 ADC and Finite-length Modeling;33 7.2.3;1.2.3 Digital Filters;35 7.2.4;1.2.4 Transform;37 7.2.5;1.2.5 Adaptive Filter and Signal Enhancement;39 7.2.6;1.2.6 Random Process and Autocorrelation;41 7.3;1.3 Theory, Applications, and Implementations;42 7.4;1.4 DSP Applications;44 7.4.1;1.4.1 Real-Time Concept;44 7.4.2;1.4.2 Communication Systems;44 7.4.3;1.4.3 Multimedia Signal Processing Systems;46 7.4.4;1.4.4 Review on Applications;50 7.5;1.5 DSP Implementations;51 7.5.1;1.5.1 DSP Implementation on GPP;52 7.5.2;1.5.2 DSP Implementation on GP DSP Processors;52 7.5.3;1.5.3 DSP Implementation on ASIP;53 7.5.4;1.5.4 DSP Implementation on ASIC;53 7.5.5;1.5.5 Trade-off and Decision of Implementations;55 7.6;1.6 Review of Processors and Systems;56 7.6.1;1.6.1 DSP Processor Architecture;56 7.6.2;1.6.2 DSP Firmware;57 7.6.3;1.6.3 Embedded System Overview;59 7.6.4;1.6.4 DSP in an Embedded System;61 7.6.5;1.6.5 Fundamentals of Embedded Computing;62 7.7;1.7 Design Flow;63 7.7.1;1.7.1 Hardware Design Flow in General;63 7.7.2;1.7.2 ASIP Hardware Design Flow;65 7.7.3;1.7.3 ASIP Design Automation;67 7.8;1.8 Conclusions;70 7.9;Exercises;71 7.10;References;72 8;Chapter 2. Numerical Representation and Finite-Length DSP;74 8.1;2.1 Fixed-Point Numerical Representation;74 8.1.1;2.1.1 An Intuitive Example;75 8.1.2;2.1.2 Fixed-Point Numerical Representation;77 8.1.3;2.1.3 Fixed-Point Binary Representation;78 8.1.4;2.1.4 Integer Binary Representation;79 8.1.5;2.1.5 Fractional Binary Representation;80 8.1.6;2.1.6 Fixed-Point Operands;81 8.1.7;2.1.7 Integer or Fractional;82 8.1.8;2.1.8 Other Binary Data Formats;90 8.2;2.2 Data Quality Measure;92 8.2.1;2.2.1 Noise, Distortion, Dynamic R
ange, and Precision;92 8.2.2;2.2.2 Quantitative Concept of Dynamic Range and Precision;95 8.3;2.3 Floating-Point Numerical Representation;96 8.4;2.4 Block Floating-Point;100 8.5;2.5 DSP Based on Finite Precision;103 8.5.1;2.5.1 The Way of QuantizationRounding and Truncation;103 8.5.2;2.5.2 Overflow Saturation and Guards;105 8.5.3;2.5.3 Requirements on Guards;108 8.5.4;2.5.4 Execution Order;109 8.6;2.6 Examples of Corner Cases;109 8.7;2.7 Conclusions;110 8.8;Exercises;111 8.9;References;112 9;Chapter 3. DSP Architectures;114 9.1;3.1 DSP Subsystem Architecture;114 9.2;3.2 Processor Architecture;115 9.2.1;3.2.1 Inside a DSP Subsystem;116 9.2.2;3.2.2 DSP (Memory Bus) Architecture;118 9.2.3;3.2.3 Functional Description at Top Architecture Level;122 9.2.4;3.2.4 DSP Architecture Design;124 9.3;3.3 Inside a DSP Core;128 9.3.1;3.3.1 The Datapath and Register Bus;128 9.3.2;3.3.2 MAC;128 9.3.3;3.3.3 ALU;130 9.3.4;3.3.4 Register File;131 9.3.5;3.3.5 Control Path;132 9.3.6;3.3.6 Address Generator (AGU);135 9.4;3.4 The Difference between GPP and ASIP DSP;136 9.4.1;3.4.1 The Difference between Designing a GPP and ASIP DSP;136 9.4.2;3.4.2 Comparing DSP Processors to Other Processors;137 9.4.3;3.4.3 CISC or RISC;140 9.5;3.5 Advanced DSP Architecture;143 9.5.1;3.5.1 DSP with Extreme Specification;143 9.5.2;3.5.2 ILP DSP Processors;147 9.5.3;3.5.3 Dual MAC and SIMD;149 9.5.4;3.5.4 VLIW and Superscalar;155 9.5.5;3.5.5 On-Chip Multicore DSP;172 9.6;3.6 Conclusions;180 9.7;Exercises;181 9.8;References;184 10;Chapter 4. DSP ASIP Design Flow;186 10.1;4.1 Design and Use of ASIP;186 10.1.1;4.1.1 What Is ASIP?;186 10.1.2;4.1.2 DSP ASIP Design Flow;187 10.2;4.2 Understanding Applications through Profiling;189 10.3;4.3 Architecture Selection;190 10.3.1;4.3.1 General Methodology;190 10.3.2;4.3.2 Architectures;195 10.3.3;4.3.3 Quantitative Approach;199 10.4;4.4 Designing Instruction Sets;200 10.5;4.5 Designing the Toolchain;201 10.6;4.6 Microarchitecture Design;205 10.7;4.7 Firmware Design;206 10
.7.1;4.7.1 Real-time Firmware;207 10.7.2;4.7.2 Firmware with Finite Precision;208 10.7.3;4.7.3 Firmware Design Flow for One Application;208 10.7.4;4.7.4 Firmware Design Flow for Multiapplications;210 10.8;4.8 Conclusions;211 10.9;Exercises;211 10.10;References;212 11;Chapter 5. A Simple DSP CoreThe Junior Processor;214 11.1;5.1 JuniorA Simple DSP Processor;214 11.2;5.2 Instruction Set and Operations;215 11.2.1;5.2.1 Load / Store Instructions;215 11.2.2;5.2.2 Addressing for Data Memory Access;217 11.2.3;5.2.3 Instructions for Basic Arithmetic Operations;217 11.2.4;5.2.4 Logic and Shift Operations;218 11.2.5;5.2.5 Program Flow Control Instructions;219 11.3;5.3 Assembly Coding;221 11.4;5.4 Assembly Benchmarking;224 11.4.1;5.4.1 Benchmarking of Block Transfer;226 11.4.2;5.4.2 Benchmarking of Single-Sample FIR;226 11.4.3;5.4.3 Benchmarking of Frame FIR;228 11.4.4;5.4.4 Benchmarking of Single-Sample Biquad IIR;231 11.4.5;5.4.5 Benchmarking of 16-bit Division;232 11.4.6;5.4.6 Benchmarking of Vector Maximum Tracking;233 11.4.7;5.4.7 Benchmarking of 8 8 DCT;234 11.4.8;5.4.8 Benchmarking of 256-point FFT;237 11.4.9;5.4.9 Benchmarking of Windowing;238 11.5;5.5 Discussion of Junior DSP;239 11.6;5.6 Conclusions;241 11.7;Exercises;242 11.8;References;242 12;Chapter 6. Code Profiling for ASIP Design;244 12.1;6.1 Source Code Profiling;244 12.1.1;6.1.1 What Is Source Code Profiling?;245 12.1.2;6.1.2 Why Profiling?;247 12.1.3;6.1.3 What to Profile;248 12.1.4;6.1.4 How to Profile;251 12.1.5;6.1.5 The Language to Profile;252 12.2;6.2 Static Profiling;253 12.2.1;6.2.1 Dynamic and Static Profiling;253 12.2.2;6.2.2 Static Profiling;253 12.2.3;6.2.3 Fine-grained Static Profiling;254 12.2.4;6.2.4 Coarse-grained Static Profiling;256 12.3;6.3 Dynamic Profiling;258 12.3.1;6.3.1 Instrumentation for Coarse-grained Profiling;258 12.3.2;6.3.2 Instrumentation for Fine-grained Profiling;258 12.3.3;6.3.3 Implement Instrumentation;259 12.4;6.4 Use of Reference Assembly Codes;261 12.4.1;6.4.1 Expose Hi
dden Costs;261 12.4.2;6.4.2 Understanding Assembly Codes;262 12.5;6.5 Quality Evaluation of Results;263 12.5.1;6.5.1 Evaluating Results of Source Code Profiling;263 12.5.2;6.5.2 Using Profiling Results;263 12.6;6.6 Conclusions;264 12.7;Exercises;264 12.8;References;264 13;Chapter 7. Assembly Instruction Set Design;266 13.1;7.1 Methodology;266 13.1.1;7.1.1 Opportunities and Constraints;266 13.1.2;7.1.2 Classification of General Instructions;271 13.1.3;7.1.3 Design of General RISC Subset Instructions;272 13.1.4;7.1.4 Specify CISC Instructions;275 13.1.5;7.1.5 For Undergraduates: From Junior to Senior;276 13.2;7.2 Designing RISC Subset Instructions;277 13.2.1;7.2.1 Data Access Instructions;277 13.2.2;7.2.2 Basic Arithmetic Instructions;283 13.2.3;7.2.3 Unsigned ALU Instructions;291 13.2.4;7.2.4 Program Flow Control Instructions;292 13.3;7.3 CISC Subset Instructions;298 13.3.1;7.3.1 MAC and Multiplication Instructions;298 13.3.2;7.3.2 Double-Precision Arithmetic Instructions;301 13.3.3;7.3.3 Other CISC Instructions;304 13.4;7.4 Accelerated Extensions;304 13.4.1;7.4.1 Challenges;304 13.4.2;7.4.2 Methodology;305 13.5;7.5 Instructions for Instruction Level Parallel (ILP) Architecture;307 13.5.1;7.5.1 Superscalar;307 13.5.2;7.5.2 VLIW Instructions;307 13.5.3;7.5.3 SIMD Instructions;309 13.6;7.6 Memory and Register Addressing;313 13.6.1;7.6.1 Register Addressing;314 13.6.2;7.6.2 Data Memory Addressing;317 13.6.3;7.6.3 Hardware Accelerated Memory Addressing;322 13.7;7.7 Coding;328 13.7.1;7.7.1 Assembly Encoding;328 13.7.2;7.7.2 Machine Code Coding;331 13.7.3;7.7.3 Examples;333 13.8;7.8 Conclusions;336 13.9;Exercises;337 13.10;References;339 14;Chapter 8. Software Development Toolchain;342 14.1;8.1 What Is Toolchain and IDE?;342 14.1.1;8.1.1 ASIP Users View on IDE;343 14.1.2;8.1.2 ASIP Designers View on IDE;344 14.2;8.2 Code Analysis;345 14.2.1;8.2.1 Lexical Analysis;346 14.2.2;8.2.2 Syntax Analysis;346 14.2.3;8.2.3 Semantic Analysis;350 14.3;8.3 Profiler and WCET Analyzer;351
14.4;8.4 Compiler Overview;353 14.4.1;8.4.1 Intermediate Code Generation;353 14.4.2;8.4.2 Code Optimization;355 14.4.3;8.4.3 Code Generation;359 14.4.4;8.4.4 Error Handler;361 14.4.5;8.4.5 Compiler Generator and Verification of a Generated Compiler;362 14.5;8.5 Assembler;362 14.6;8.6 Linker;364 14.7;8.7 Simulator and Debugger Basics;366 14.7.1;8.7.1 Instruction Set Simulator (ISS);368 14.7.2;8.7.2 Processor Simulator;376 14.7.3;8.7.3 Architecture Simulator;377 14.8;8.8 Debugger and GUI;377 14.8.1;8.8.1 Debugger;377 14.8.2;8.8.2 SW Debugging;378 14.8.3;8.8.3 GUI;379 14.9;8.9 Evaluation of Programming Tools;380 14.10;8.10 Conclusions;381 14.11;Exercises;381 14.12;References;382 15;Chapter 9. Evaluation of an Instruction Set;384 15.1;9.1 Benchmarking;384 15.1.1;9.1.1 Benchmarking DSP Kernel Algorithms;387 15.1.2;9.1.2 Some Benchmarking Examples;392 15.2;9.2 Instruction Use Profiling;392 15.3;9.3 Coverage Analysis;393 15.4;9.4 Conclusions;393 15.5;References;394 16;Chapter 10. Design of DSP Microarchitecture;396 16.1;10.1 Introduction to Microarchitecture;396 16.1.1;10.1.1 Microarchitecture versus Architecture;396 16.1.2;10.1.2 Microarchitecture Design;397 16.2;10.2 Microarchitecture-level Components;397 16.2.1;10.2.1 Basic Logic Components;398 16.2.2;10.2.2 Arithmetic Components;400 16.3;10.3 Hardware Design Fundamentals;401 16.3.1;10.3.1 Function Partitioning;401 16.3.2;10.3.2 Function Allocation;402 16.3.3;10.3.3 HW Multiplexing;403 16.3.4;10.3.4 Scheduling of Hardware Execution;406 16.3.5;10.3.5 Modeling and Simulation;408 16.4;10.4 Functional Specification at Microarchitecture Level;408 16.4.1;10.4.1 Intermodule Block Diagram;408 16.4.2;10.4.2 Microarchitecture Schematic;409 16.4.3;10.4.3 Module Functional Flowchart;409 16.4.4;10.4.4 Finite State Machine;414 16.4.5;10.4.5 Truth Table for Coding and Decoding;416 16.5;10.5 ASIP Microarchitecture Design Flow;417 16.5.1;10.5.1 Exposing Microoperations;418 16.5.2;10.5.2 Allocation and Partitioning of Microoperations;41
8 16.5.3;10.5.3 Pipeline Scheduling Microoperations;420 16.5.4;10.5.4 HW Multiplexing of Microoperations;420 16.5.5;10.5.5 Microoperations Integration;421 16.6;10.6 Conclusions;423 16.7;Exercises;423 16.8;References;424 17;Chapter 11. Design of Register File and Register Bus;426 17.1;11.1 Datapath;426 17.2;11.2 Design of Register Files;427 17.2.1;11.2.1 General Register File;427 17.2.2;11.2.2 Design of a Simple Register File;428 17.2.3;11.2.3 Pipeline around Register File;430 17.2.4;11.2.4 Special Registers in a General Register File;431 17.3;11.3 Design of Advanced Register Files;433 17.3.1;11.3.1 Register File for Cluster Datapath;433 17.3.2;11.3.2 Ultra Large Register File;435 17.4;11.4 Conclusions;437 17.5;Exercises;437 17.6;References;438 18;Chapter 12. ALU HW Implementation;440 18.1;12.1 Arithmetic and Logic Unit (ALU);440 18.2;12.2 Design of Arithmetic Unit (AU);442 18.2.1;12.2.1 Implementation Methodology;442 18.2.2;12.2.2 Select Kernel Components;443 18.2.3;12.2.3 Implementing Simple AU Instructions;445 18.2.4;12.2.4 Implementing Special AU Instructions;450 18.3;12.3 Shift and Rotation;453 18.3.1;12.3.1 Design a Shifter Using a Shifter Primitive;454 18.3.2;12.3.2 Design a Shifter Using Truth Tables;457 18.3.3;12.3.3 Logic Operation and Data Manipulation;457 18.4;12.4 ALU Integration;460 18.4.1;12.4.1 Preprocessing and Postprocessing;460 18.4.2;12.4.2 ALU Integration;460 18.5;12.5 Conclusions;461 18.6;Exercises;462 18.7;References;465 19;Chapter 13. MAC Hardware Implementation;466 19.1;13.1 Introduction;466 19.1.1;13.1.1 Review of Convolution;466 19.1.2;13.1.2 MAC Fundamentals;467 19.2;13.2 MAC Implementation;469 19.2.1;13.2.1 MAC Instructions;469 19.2.2;13.2.2 Implementing Multiplications;469 19.2.3;13.2.4 Implementing Double-Precision Instructions;476 19.2.4;13.2.3 Implementing MAC Instructions;473 19.2.5;13.2.5 Accessing ACR Context;478 19.2.6;13.2.6 Flag Operations and Other Postoperations;482 19.3;13.3 A MAC Design Case;483 19.4;13.4 MAC Integrations;49
2 19.4.1;13.4.1 Physical Critical-Path;492 19.4.2;13.4.2 Pipeline in a MAC;493 19.5;13.5 Dual MAC, Multiple MAC, and VLIW;495 19.6;13.6 Conclusions;497 19.7;Exercises;498 19.8;References;501 20;Chapter 14. Control Path Design;502 20.1;14.1 Control Paths;502 20.2;14.2 Control Path Organization;503 20.2.1;14.2.1 Pipeline Consideration;505 20.2.2;14.2.2 Interrupt Management;510 20.3;14.3 Control Path Hardware Design;513 20.3.1;14.3.1 Top-level Structure;513 20.3.2;14.3.2 Design of Program Memory and Peripherals;515 20.3.3;14.3.3 Loading Code;516 20.3.4;14.3.4 Instruction Flow Controller;518 20.3.5;14.3.5 Loop Controller;521 20.3.6;14.3.6 PC Stack;523 20.3.7;14.3.7 Senior PC FSM Example;526 20.4;14.4 Instruction Decoder;529 20.4.1;14.4.1 Control Signal Decoding;530 20.4.2;14.4.2 Decoding Order;532 20.4.3;14.4.3 Decoding for Exception, Interrupt, Jump, and Conditional Execution;532 20.4.4;14.4.4 Issues of Multicycle Execution;533 20.4.5;14.4.6 Decoding for Superscalar;536 20.4.6;14.4.5 VLIW Machine Decoding;535 20.5;14.5 Conclusions;537 20.6;Exercises;537 20.7;References;539 21;Chapter 15. Design of Memory Subsystems;540 21.1;15.1 Memory and Peripherals;540 21.1.1;15.1.1 Memory Modules;540 21.1.2;15.1.2 Memory Peripheral Circuits;544 21.2;15.2 Design of Memory Addressing Circuitry;551 21.2.1;15.2.1 General Addressing Circuit;551 21.2.2;15.2.2 Modulo Addressing Circuit;554 21.3;15.3 Buses;558 21.4;15.4 Memory Hierarchy;559 21.4.1;15.4.1 Problems;559 21.4.2;15.4.2 Memory Hierarchy of DSP Processors;560 21.5;15.5 DMA;562 21.5.1;15.5.1 DMA Concepts;562 21.5.2;15.5.2 Configuring a Program for a DMA Task;566 21.5.3;15.5.3 A SoC View;570 21.6;15.6 Conclusions;570 21.7;Exercises;570 21.8;References;572 22;Chapter 16. DSP Core Peripherals;574 22.1;16.1 Peripherals;574 22.2;16.2 Design a Peripheral Module;576 22.2.1;16.2.1 Design of a Common Interface in Peripheral Modules;577 22.2.2;16.2.2 Protocol Design of Peripheral Modules;581 22.3;16.3 Interrupt Handler;582 22.3.1;16.3.1 Int
errupt Basics;582 22.3.2;16.3.2 Interrupt Sources;582 22.3.3;16.3.3 Interrupt Requests;584 22.3.4;16.3.4 Interrupt Handling Process;585 22.3.5;16.3.5 A Case Study;588 22.4;16.4 Timers;594 22.5;16.5 Direct Memory Access (DMA);597 22.5.1;16.5.1 DMA Basics;597 22.5.2;16.5.2 Design a Simple DMA;600 22.5.3;16.5.3 Advanced DMA Controller;608 22.5.4;16.5.4 DMA Benchmarking;616 22.6;16.6 Serial Ports;616 22.6.1;16.6.1 Bit Synchronization;616 22.6.2;16.6.2 Packet Synchronization;619 22.6.3;16.6.3 Arbitration;620 22.6.4;16.6.4 Control of a Serial Port;621 22.7;16.7 Parallel Ports;621 22.8;16.8 Conclusions;621 22.9;Exercises;622 22.10;References;623 23;Chapter 17. Design for DSP Functional Acceleration;624 23.1;17.1 Functional Acceleration;624 23.1.1;17.1.1 Loosely Connected Accelerator;625 23.1.2;17.1.2 Tightly Connected Accelerator;626 23.2;17.2 Accelerator Specification;628 23.2.1;17.2.1 Principle;628 23.2.2;17.2.2 An Accelerator with One Single Instruction;628 23.2.3;17.2.3 An Accelerator with Multiple Instructions;629 23.2.4;17.2.4 An Accelerator as a Slave Processor;630 23.3;17.3 Scalable Processor and Accelerator Interface;631 23.3.1;17.3.1 Configurability and Extendibility;631 23.3.2;17.3.2 Extendible Hardware Interface;635 23.3.3;17.3.3 Extendible Programmer Tools;638 23.4;17.4 Accelerator Design Flow;643 23.5;17.5 Conclusions;643 23.6;Exercises;644 23.7;References;645 24;Chapter 18. Real-time Fixed-point DSP Firmware;646 24.1;18.1 Firmware (FW);646 24.2;18.2 Application Modeling under HW Constraints;647 24.2.1;18.2.1 Understanding Applications;647 24.2.2;18.2.2 Understanding Hardware;651 24.2.3;18.2.3 Algorithm Selection;653 24.2.4;18.2.4 Language Selection;660 24.2.5;18.2.5 Real-time Firmware Implementation;662 24.2.6;18.2.6 Firmware for Fixed-point Data;665 24.3;18.3 Assembly Implementation;673 24.3.1;18.3.1 General Flow and C-Compiling;673 24.3.2;18.3.2 Plan and Specify for Assembly Coding;674 24.3.3;18.3.3 Fixed-point Assembly Kernels;675 24.3.4;18.3.4 Low Cycle
Cost Assembly Coding;676 24.3.5;18.3.5 Storage Efficient Assembly Kernels;679 24.3.6;18.3.6 Function Libraries;683 24.3.7;18.3.7 Optimize Control Codes;685 24.4;18.4 Assembly-level Integration and Release;686 24.5;18.5 Conclusions;688 24.6;References;688 25;Chapter 19. ASIP Integration and Verification;690 25.1;19.1 Integration;690 25.1.1;19.1.1 HW Integration of an ASIP Core;692 25.1.2;19.1.2 Integration of a DSP Subsystem and a DSP Processor;695 25.1.3;19.1.3 HW Integration of a SoC;702 25.1.4;19.1.4 Integration of SoC Simulator;712 25.2;19.2 Functional Verification;713 25.2.1;19.2.1 The Basics;713 25.2.2;19.2.2 Verification Process;716 25.2.3;19.2.3 Verification Techniques;718 25.2.4;19.2.4 Speed-up Verification;724 25.2.5;19.2.5 Simulation or Emulation;726 25.2.6;19.2.6 Verification of an ASIP;727 25.2.7;19.2.7 Writing Testbench;727 25.3;19.3 Conclusions;728 25.4;Exercises;730 25.5;References;730 26;Chapter 20. Parallel Streaming Signal Processing;732 26.1;20.1 Streaming DSP;732 26.1.1;20.1.1 Streaming Signals;732 26.1.2;20.1.2 Parallel Streaming DSP Processors;732 26.2;20.2 Parallel Architecture, Divide and Conquer;734 26.2.1;20.2.1 Review of Parallel Architectures;734 26.2.2;20.2.2 Divide and Conquer;737 26.3;20.3 Expose Control Complexities;739 26.3.1;20.3.1 General Control Handling;739 26.3.2;20.3.2 Exposing Challenges;740 26.3.3;20.3.3 SIMT Architecture for Low-level Parallel Applications;743 26.3.4;20.3.4 Design of Multicore DSP Subsystems;748 26.4;20.4 Streaming Data Manipulations;753 26.4.1;20.4.1 Data Complexity of Streaming DSP;753 26.4.2;20.4.2 Data Complexity: Case 1Video;753 26.4.3;20.4.3 Data Complexity: Case 2Radio Baseband;759 26.5;20.5 NoC for Parallel Memory Access;762 26.5.1;20.5.1 Design Methods;762 26.5.2;20.5.2 Analyses of Parallel Memory Access for NoC Design;763 26.6;20.6 Parallel Memory Architecture;766 26.6.1;20.6.1 Requirements for Parallel Algorithms;766 26.6.2;20.6.2 Cache;767 26.6.3;20.6.3 Ultra-large Register File;770 26.7;20.7 P3R
MA for Streaming DSP Processors;771 26.7.1;20.7.1 Parallel Vector (Scratchpad) Memories;772 26.7.2;20.7.2 The Memory Subsystem Hardware;774 26.7.3;20.7.3 Parallel Programming by Hand;775 26.7.4;20.7.4 Programming Toolchain for P3RMA;781 26.8;20.8 Conclusions;784 26.9;References;785 27;Glossary;788 28;Appendix A. Senior Assembly InstructionSet Manual;796 29;Index;798