Efficient PIM Architectures for Data-Intensive Applications


Data-Intensive Applications have special characteristics that General-Purpose Processors can not efficiently support. We have found out that PIM architectures can provide the tremendous amount of memory bandwidth that is required by these applications. The PIM architectures can also provide the parallel computational environment that these applications can exploit greatly.  

However, in order to efficiently utilize the effectiveness of PIM modules, a new computational paradigm is necessary. We are investigating architectural techniques to efficiently exploit the huge memory bandwidth and parallel computations inside of PIM modules.


Data-Intensive applications such as media applications and database applications are ever increasingly demanding applications in the computing society. Supporting these applications is one of the top design considerations for the future microprocessors.

However, these Data-Intensive applications require a huge amount of data transfers and have some unique operation characteristics that are quite different from the usual characteristics that are expected from a General-Purpose Processor.

Data-Intensive Applications require a tremendous amount of data accesses and because of the memory-wall problem and the aggregated penalty due to the huge amount of accesses, the performance is severely degraded. 


Data-Intensive Applications General-Purpose Processing
Add, Sub, Abs, And,
Add, Sub, Mult, Div,
Operand Size on 8 bits data 32 or 64 bits data
Parallelism Massively data parallel limited parallelism supported
Data Amount Massive amount of data Very small cache
Execution Style Stream-based (data-flow) Control-flow

Our Approach

Inside of these Data-Intensive applications, there are core operations that consumes the majority of the overall execution time. These core operations take up to 90% of the total execution times of the applications. 

Our approach is a Hardware/Software Co-Design approach using PIM (Processor-In Memory) Module to execute the memory access-intensive and computation-intensive core operations. Our computing paradigm is different from the conventional Hardware/Software Co-Design paradigm since the operations are happening where the data are located instead of supplying data to a Hardware module. And by efficiently organizing the PIM structure, we can truly benefit from the huge memory bandwidth improvement (and also drastically reduced data transfers between the processor and memory.) and parallel executions.

Our expected performance improvement can be described as in the below diagram.


An Efficient PIM (Processor-In-Memory) Architecture for Motion Estimation
Jung-Yup Kang, Sandeep Gupta, Saurabh Shah, and Jean-Luc Gaudiot
Proceedings of IEEE 14th International Conference on Application-Specific Systems, Architectures and Processors (ASAP2003), pp. 273-283, The Hague, The Netherlands, June 24-26, 2003

An Efficient PIM (Processor-In-Memory) Archtiecture for Motion Estimation
Jung-Yup Kang, Saurabh Shah, Sandeep Gupta and Jean-Luc Gaudiot  
Technical Report PASCAL-2003-01,  University of California at Irvine, February 2003

NSF Interim Report
March 2003.

Disclaimer: This material is based in part upon work supported by the National Science Foundation under award number CCR-0220106. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.