Ken Kennedy, Bradley Broom, Arun Chauhan, Rob Fowler, John Garvin, Charles Koelbel, Cheryl McCosh, and John Mellor-Crummey (2005)
Telescoping Languages: A System for Automatic Generation of Domain Languages
Proceedings of the IEEE, 93(3):387–408.
The software gap - the discrepancy between the need for new software and the aggregate capacity of the workforce to produce it - is a serious problem for scientific software. Although users appreciate the convenience (and, thus, improved productivity) of using relatively high-level scripting languages, the slow execution speeds of these languages remain a problem. Lower level languages, such as C and Fortran, provide better performance for production applications, but at the cost of tedious programming and optimization by experts. If applications written in scripting languages could be routinely compiled into highly optimized machine code, a huge productivity advantage would be possible. It is not enough, however, to simply develop excellent compiler technologies for scripting languages (as a number of projects have succeeded in doing for MATLAB). In practice, scientists typically extend these languages with their own domain-centric components, such as the MATLAB signal processing toolbox. Doing so effectively defines a new domain-specific language. If we are to address efficiency problems for such extended languages, we must develop a framework for automatically generating optimizing compilers for them. To accomplish this goal, we have been pursuing an innovative strategy that we call telescoping languages. Our approach calls for using a library-preprocessing phase to extensively analyze and optimize collections of libraries that define an extended language. Results of this analysis are collected into annotated libraries and used to generate a library-aware optimizer. The generated library-aware optimizer uses the knowledge gathered during preprocessing to carry out fast and effective optimization of high-level scripts. This enables script optimization to benefit from the intense analysis performed during preprocessing without repaying its price. Since library preprocessing is performed only at infrequent "language-generation" times, its cost is amortized over many compilations of individual scripts that use the library. We call this strategy "telescoping languages" because it merges knowledge of a hierarchy of extended languages into a single library-aware optimizer. We present our vision and plans for compiler frameworks based on telescoping languages and - report on the preliminary research that has established the effectiveness of this approach.