New Page 1

Efficient PIM Architectures for Data-Intensive Applications

Overview

Data-Intensive Applications have special characteristics that General-Purpose Processors can not efficiently support. We have found out that PIM architectures can provide the tremendous amount of memory bandwidth that is required by these applications. The PIM architectures can also provide the parallel computational environment that these applications can exploit greatly.

However, in order to efficiently utilize the effectiveness of PIM modules, a new computational paradigm is necessary. We are investigating architectural techniques to efficiently exploit the huge memory bandwidth and parallel computations inside of PIM modules.

Problems

Data-Intensive applications such as media applications and database applications are ever increasingly demanding applications in the computing society. Supporting these applications is one of the top design considerations for the future microprocessors.

However, these Data-Intensive applications require a huge amount of data transfers and have some unique operation characteristics that are quite different from the usual characteristics that are expected from a General-Purpose Processor.

Problem 1: Memory-Wall Problem

Data-Intensive Applications require a tremendous amount of data accesses and because of the memory-wall problem and the aggregated penalty due to the huge amount of accesses, the performance is severely degraded.

Problem 2: Special Characteristics

Data-Intensive Applications special characteristics that are not quite suitable for a General-Purpose Processor to adapt. Some of the controversial characteristics are shown in the below table.

　 Characteristics

Data-Intensive Applications General-Purpose Processing

Operations

Add, Sub, Abs, And, …

Add, Sub, Mult, Div, …

Operand Size on 8 bits data 32 or 64 bits data

Parallelism Massively data parallel limited parallelism supported

Data Amount Massive amount of data Very small cache

Execution Style Stream-based (data-flow) Control-flow

Our Approach

Inside of these Data-Intensive applications, there are core operations that consumes the majority of the overall execution time. These core operations take up to 90% of the total execution times of the applications.

Our approach is a Hardware/Software Co-Design approach using PIM (Processor-In Memory) Module to execute the memory access-intensive and computation-intensive core operations. Our computing paradigm is different from the conventional Hardware/Software Co-Design paradigm since the operations are happening where the data are located instead of supplying data to a Hardware module. And by efficiently organizing the PIM structure, we can truly benefit from the huge memory bandwidth improvement (and also drastically reduced data transfers between the processor and memory.) and parallel executions.

Our expected performance improvement can be described as in the below diagram.

Publications

An Efficient PIM (Processor-In-Memory) Architecture for Motion Estimation
Jung-Yup Kang, Sandeep Gupta, Saurabh Shah, and Jean-Luc Gaudiot
Proceedings of IEEE 14th International Conference on Application-Specific Systems, Architectures and Processors (ASAP2003), pp. 273-283, The Hague, The Netherlands, June 24-26, 2003

An Efficient PIM (Processor-In-Memory) Archtiecture for Motion Estimation
Jung-Yup Kang, Saurabh Shah, Sandeep Gupta and Jean-Luc Gaudiot
Technical Report PASCAL-2003-01, University of California at Irvine, February 2003

NSF Interim Report
March 2003.

Disclaimer: This material is based in part upon work supported by the National Science Foundation under award number CCR-0220106. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

	Characteristics
	Data-Intensive Applications	General-Purpose Processing
Operations	Add, Sub, Abs, And, …	Add, Sub, Mult, Div, …
Operand Size	on 8 bits data	32 or 64 bits data
Parallelism	Massively data parallel	limited parallelism supported
Data Amount	Massive amount of data	Very small cache
Execution Style	Stream-based (data-flow)	Control-flow