WorldCat Identities

Yi, Q.

Overview
Works: 6 works in 6 publications in 1 language and 32 library holdings
Publication Timeline
.
Most widely held works by Q Yi
Parameterizing loop fusion for automated empirical tuning( )

1 edition published in 2005 in English and held by 6 WorldCat member libraries worldwide

Traditional compilers are limited in their ability to optimize applications for different architectures because statically modeling the effect of specific optimizations on different hardware implementations is difficult. Recent research has been addressing this issue through the use of empirical tuning, which uses trial executions to determine the optimization parameters that are most effective on a particular hardware platform. In this paper, we investigate empirical tuning of loop fusion, an important transformation for optimizing a significant class of real-world applications. In spite of its usefulness, fusion has attracted little attention from previous empirical tuning research, partially because it is much harder to configure than transformations like loop blocking and unrolling. This paper presents novel compiler techniques that extend conventional fusion algorithms to parameterize their output when optimizing a computation, thus allowing the compiler to formulate the entire configuration space for loop fusion using a sequence of integer parameters. The compiler can then employ an external empirical search engine to find the optimal operating point within the space of legal fusion configurations and generate the final optimized code using a simple code transformation system. We have implemented our approach within our compiler infrastructure and conducted preliminary experiments using a simple empirical search strategy. Our results convey new insights on the interaction of loop fusion with limited hardware resources, such as available registers, while confirming conventional wisdom about the effectiveness of loop fusion in improving application performance
Annotating user-defined abstractions for optimization( )

1 edition published in 2005 in English and held by 6 WorldCat member libraries worldwide

This paper discusses the features of an annotation language that we believe to be essential for optimizing user-defined abstractions. These features should capture semantics of function, data, and object-oriented abstractions, express abstraction equivalence (e.g., a class represents an array abstraction), and permit extension of traditional compiler optimizations to user-defined abstractions. Our future work will include developing a comprehensive annotation language for describing the semantics of general object-oriented abstractions, as well as automatically verifying and inferring the annotated semantics
Automatic Blocking Of QR and LU Factorizations for Locality( )

1 edition published in 2004 in English and held by 5 WorldCat member libraries worldwide

QR and LU factorizations for dense matrices are important linear algebra computations that are widely used in scientific applications. To efficiently perform these computations on modern computers, the factorization algorithms need to be blocked when operating on large matrices to effectively exploit the deep cache hierarchy prevalent in today's computer memory systems. Because both QR (based on Householder transformations) and LU factorization algorithms contain complex loop structures, few compilers can fully automate the blocking of these algorithms. Though linear algebra libraries such as LAPACK provides manually blocked implementations of these algorithms, by automatically generating blocked versions of the computations, more benefit can be gained such as automatic adaptation of different blocking strategies. This paper demonstrates how to apply an aggressive loop transformation technique, dependence hoisting, to produce efficient blockings for both QR and LU with partial pivoting. We present different blocking strategies that can be generated by our optimizer and compare the performance of auto-blocked versions with manually tuned versions in LAPACK, both using reference BLAS, ATLAS BLAS and native BLAS specially tuned for the underlying machine architectures
Applying Loop Optimizations to Object-oriented Abstractions Through General Classification of Array Semantics( )

1 edition published in 2004 in English and held by 5 WorldCat member libraries worldwide

Optimizing compilers have a long history of applying loop transformations to C and Fortran scientific applications. However, such optimizations are rare in compilers for object-oriented languages such as C++ or Java, where loops operating on user-defined types are left unoptimized due to their unknown semantics. Our goal is to reduce the performance penalty of using high-level object-oriented abstractions. We propose an approach that allows the explicit communication between programmers and compilers. We have extended the traditional Fortran loop optimizations with an open interface. Through this interface, we have developed techniques to automatically recognize and optimize user-defined array abstractions. In addition, we have developed an adapted constant-propagation algorithm to automatically propagate properties of abstractions. We have implemented these techniques in a C++ source-to-source translator and have applied them to optimize several kernels written using an array-class library. Our experimental results show that using our approach, applications using high-level abstractions can achieve comparable, and in cases superior, performance to that achieved by efficient low-level hand-written codes
Toward the Automated Generation of Components from Existing Source Code( )

1 edition published in 2004 in English and held by 5 WorldCat member libraries worldwide

A major challenge to achieving widespread use of software component technology in scientific computing is an effective migration strategy for existing, or legacy, source code. This paper describes initial work and challenges in automating the identification and generation of components using the ROSE compiler infrastructure and the Babel language interoperability tool. Babel enables calling interfaces expressed in the Scientific Interface Definition Language (SIDL) to be implemented in, and called from, an arbitrary combination of supported languages. ROSE is used to build specialized source-to-source translators that (1) extract a SIDL interface specification from information implicit in existing C++ source code and (2) transform Babel's output to include dispatches to the legacy code
Semantic-driven Parallelization of Loops Operating on User-defined Containers( )

1 edition published in 2003 in English and held by 5 WorldCat member libraries worldwide

The authors describe ROSE, a C++ infrastructure for source-to-source translation, that provides an interface for programmers to easily write their own translators for optimizing user-defined high-level abstractions. Utilizing the semantics of these high-level abstractions, they demonstrate the automatic parallelization of loops that iterate over user-defined containers that have interfaces similar to the lists, vectors and sets in the Standard Template Library (STL). The parallelization is realized in two phases. First, they insert OpenMP directives into a serial program, driven by the recognition of the high-level abstractions, containers, that are thread-safe. Then, they translate the OpenMP directives into library routines that explicitly create and manage parallelism. By providing an interface for the programmer to classify the semantics of their abstractions, they are able to automatically parallelize operations on containers, such as linked-lists, without resorting to complex loop dependence analysis techniques. The approach is consistent with general goals within telescoping languages
 
Audience Level
0
Audience Level
1
  Kids General Special  
Audience level: 0.80 (from 0.79 for Parameteri ... to 0.80 for Applying L ...)

Languages