My research topic is developing a parallel FFT library. The various
FFT applications in many important fields make it desirable to have
a highly optimized implementation. Taking the advantage of the hierarchical
structure of the modern computer architecture and pipeling instruction
operations it is possible to develop a high performance adaptive FFT
implementation. Although some inherent characteristics of FFT algorithms
are not amiable to the modern computer architecture, such as imbalance
operations of addition and multiplication, high demand of load and store
operations, we can overcome these difficulties by using a composable
blocks of codelets, each computing a part of the transform and selecting
the optimal combination of these codelets at runtime.