TI C6678 DSP evaluation module
In the News
 FLAME Project receives three year NSF Software Infrastructure for Sustained Innovation (SI2) grant
Nov. 14, 2011: Texas Instruments and UT Austin collaborate to deliver linear algebra library on TI's high performance multicore DSPs
 libflame attains full functionality on TI C66x Family of DSPs
"Look Mom, No Fortran!"
libflame can now be compiled without requiring a Fortran compiler.
(Fortran LAPACK interface still supported) 24 Oct. 2011. With the new EVD and SVD solvers, libflame achieves most functionality of LAPACK
Contents
Objective
The objective of the FLAME project is to transform the development of dense linear algebra libraries from an art reserved for experts to a science that can be understood by novice and expert alike. Rather than being only a library, the project encompasses a new notation for expressing algorithms, a methodology for systematic derivation of algorithms, Application Program Interfaces (APIs) for representing the algorithms in code, and tools for mechanical derivation, implementation and analysis of algorithms and implementations.
The Library: libflame
Many visitors of this wiki will be primarily interested in the highperformance dense linear algebra library that has resulted from the FLAME project. Information on how to download and use this library can be found HERE.
libflame contains implementations of many operations that are provided by the BLAS and LAPACK libraries. However, not all FLAME implementions support every datatype. Also, in many cases, we use a different naming convention for our routine names. The following table summarizes which routines are supported within libflame and also provides their corresponding netlib name for reference.
operation name 
netlib routine name 
libflame routine name 
FLAME/C 
FLASH 
GPU support 
type support 
lapack2flame support 
Elemental 
libflame routine prefix 


FLA_ 
FLASH_* 
FLASH_# 



Level3 BLAS 

general matrixmatrix multiply 
?gemm 
Gemm 
y 
y 
y 
sdcz 
N/A 
y 
hermitian matrixmatrix multiply 
?hemm 
Hemm 
y 
y 
y 
sdcz 
N/A 
y 
hermitian rankk update 
?herk 
Herk 
y 
y 
y 
sdcz 
N/A 
y 
hermitian rank2k update 
?her2k 
Her2k 
y 
y 
y 
sdcz 
N/A 
y 
symmetric matrixmatrix multiply 
?symm 
Symm 
y 
y 
y 
sdcz 
N/A 
y 
symmetric rankk update 
?syrk 
Syrk 
y 
y 
y 
sdcz 
N/A 
y 
symmetric rank2k update 
?syr2k 
Syr2k 
y 
y 
y 
sdcz 
N/A 
y 
triangular matrix multiply 
?trmm 
Trmm 
y 
y 
y 
sdcz 
N/A 
y 
triangular solve with multiple righthand sides 
?trsm 
Trsm 
y 
y 
y 
sdcz 
N/A 
y 
LAPACKlevel 

Cholesky factorization 
?potrf 
Chol 
y 
y 
y 
sdcz 
sdcz 
y 
LU factorization with no pivoting 
~ 
LU_nopiv 
y 
y 
y 


y 
LU factorization with partial pivoting 
?getrf 
LU_piv 
y 
y 
y 
sdcz 
sdcz 
y 
LU factorization with incremental pivoting 
~ 
LU_incpiv 

y 

sdcz 

N/A 
QR factorization (via UT Householder transforms) 
?geqrf 
QR_UT 
y 
y 
y 
sdcz 
sdcz 
y 
QR factorization (via incremental UT Householder transforms) 
~ 
QR_UT_inc 

y 

sdcz 

N/A 
LQ factorization (via UT Householder transforms) 
?gelqf 
LQ_UT 
y 
y 
y 
sdcz 
sdcz 
y 
UpandDowndate Cholesky/QR factor 
~ 
UDdate_UT 
y 


sdcz 


UpandDowndate Cholesky/QR factor (via incremental UT Householderlike transforms) 
~ 
UDdate_UT_inc 

y 

sdcz 

N/A 
Triangular matrix inversion 
?trtri 
Trinv 
y 
y 
y 
sdcz 
sdcz 
y 
Triangulartranspose matrix multiply 
?lauum 
Ttmm 
y 
y 
y 
sdcz 
sdcz 
y 
SPD/HPD inversion 
?potri+ 
SPDinv 
y 
y 
y 
sdcz 
sdcz 
y 
Triangular Sylvester equation solve 
?trsyl^ 
Sylv 
y 
y 
y 
sdcz 
sdcz 

Triangular Lyapunov equation solve 
~ 
Lyap 
y 
y 
y 



Reduction of Hermitianpositive definite eigenproblem to standard form 
[sd]sygst, [cz]hegst 
Eig_gest 
y 
y 
y 
sdcz 
sdcz 
y 
Reduction to upper Hessenberg form 
?gehrd 
Hess_UT 
y 


sdcz 
sdcz 

Reduction to tridiagonal form 
[sd]sytrd, [cz]hetrd 
Tridiag_UT 
y 


sdcz 
sdcz 
y 
Reduction to bidiagonal form 
?gebrd 
Bidiag_UT 
y 


sdcz 
sdcz 
y 
Symmetric/Hermitian Eigenvalue Decomposition 
[sd]syev, [cz]heev 
Hevd 
y 


dz 

y 
Generalized Symmetric/Hermitian Eigenvalue Decomposition 
[sd]sygvx, [cz]hegvx 
Soon! 





y 
Skew Symmetric/Hermitian Eigenvalue Decomposition 
~ 
Soon! 





y 
Singular Value Decomposition 
?gesvd 
Svd 
y 


dz 

y 
Notes:
 y These routines are provided by libflame.
 ? Expands to one of {sdcz}.
 ~ These routines are not provided by LAPACK.
 + The LAPACK routine ?potri() differs from FLA_SPDinv() and FLASH_SPDinv() in that ?potri() require the user to invoke the Cholesky factorization manually and then pass in the result as input, whereas the FLAME implementations perform the Cholesky factorization internally and automatically.
 ^ LAPACK provides only an unblocked implementation of the triangular Sylvester equation solver. The lapack2flame compatibility interface maps invocations of ?trsyl() to the blocked implementation in libflame.
* Invocations of routines with the FLASH_ prefix call SuperMatrix by default. If SuperMatrix was not enabled at configuretime, or it was disabled at runtime with FLASH_Queue_disable(), then FLASH_ routines execute sequentially, though they will still use hierarchical storage.
 # GPU support must be enabled at configuretime and then invoked with FLASH_Queue_enable_gpu().
Related publications
Field G. Van Zee. libflame: The Complete Reference. www.lulu.com, 2010.
Field G. Van Zee, Ernie Chan, Robert van de Geijn, Enrique S. QuintanaOrti, and Gregorio QuintanaOrti. "Introducing: The libflame Library for Dense Matrix Computations." IEEE Computing in Science & Engineering. 11(6):5662, 2009.
Notation
The key insight that enables the FLAME methodology is a new, more stylized notation for expressing loopbased linear algebra algorithms. This notation closely resembles how algorithms are naturally illustrated with pictures.
Related publications
 Paolo Bientinesi, Enrique S. QuintanaOrtí, and Robert van de Geijn. "Representing Linear Algebra Algorithms in Code: The FLAME APIs," TOMS, 31(1):2759, March 2005.
 Robert A. van de Geijn and Enrique S. QuintanaOrtí. The Science of Programming Matrix Computations. www.lulu.com. 2008.
Derivation
The FLAME project promotes the systematic derivation of loopbased algorithms handinhand with the proof of their correctness. Key is the ability to identify the loopinvariant: the state to be maintained before and after each loop iteration, which then prescribes the loopguard, the initialization before the loop, how to progress through the operand(s), and the updates. To derive the shown algorithm for LU factorization one fills in the below "worksheet". In the greyshaded areas predicates appear that ensure the correctness of the algorithm.
Related publications
 John A. Gunnels, Fred G. Gustavson, Greg M. Henry, and Robert A. van de Geijn, "FLAME: Formal Linear Algebra Methods Environment," TOMS, 27(4):422455, December 2001.
 Paolo Bientinesi, John A. Gunnels, Margaret E. Myers, Enrique S. QuintanaOrtí, and Robert van de Geijn, "The Science of Deriving Dense Linear Algebra Algorithms," TOMS, 31(1):126, March 2005.
 Robert A. van de Geijn and Enrique S. QuintanaOrtí. The Science of Programming Matrix Computations. www.lulu.com. 2008.
function [ A_out ] = LU_blk_var1( A, nb_alg ) [ ATL, ATR, ... ABL, ABR ] = FLA_Part_2x2( A, ... 0, 0, 'FLA_TL'); while ( size( ATL, 1 ) < size( A, 1 ) ) b = min( size( ABR, 1 ), nb_alg ); [ A00, A01, A02, ... A10, A11, A12, ... A20, A21, A22 ] = ... FLA_Repart_2x2_to_3x3( ATL, ATR, ... ABL, ABR, b, b, 'FLA_BR'); %% A01 = trilu( A00 ) \ A01; A10 = A10 / triu( A00 ); A11 = A11  A10 * A01; A11 = LU_unb_var1( A11 ); %% [ ATL, ATR, ... ABL, ABR ] = ... FLA_Cont_with_3x3_to_2x2( A00, A01, A02, ... A10, A11, A12, ... A20, A21, A22, ... 'FLA_TL'); end A_out = [ ATL, ATR ABL, ABR ]; return 
APIs
A number of APIs have been defined fo representing the algorithms in different languages. These include
FLAME LaTeX commads for typesetting algorithms and worksheets. 

FLAME Mscript (Matlab/Octave) API. 

FLAME/C 
FLAME API for the C programming language. 
FLASH 
Extension that allows matrices to be viewed (hierarchically) as submatrices 
FLARE 
The Formal Linear Algebra Recovery Environment (FLARE) adds algorithmic faulttolerance to some FLAME implementations. 
Runtime system for scheduling tasks with submatrices to threads 

C++ library for distributed memory architectures by Jack Poulson 
Related publications
 Paolo Bientinesi, Enrique S. QuintanaOrtí, and Robert van de Geijn. "Representing Linear Algebra Algorithms in Code: The FLAME APIs," TOMS, 31(1):2759, March 2005.
 Tze Meng Low, Kent Milfeld, Robert van de Geijn, and Field Van Zee. "Parallelizing FLAME Code with OpenMP Task Queues," FLAME Working Note #15. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR200450.
 John A. Gunnels, Daniel S. Katz, Enrique S. QuintanaOrtí, and Robert van de Geijn. "FaultTolerant HighPerformance MatrixMatrix Multiplication: Theory and Practice," The International Conference for Dependable Systems and Networks (DSN2001), pp. 4756, July 24, 2001.
 Robert A. van de Geijn and Enrique S. QuintanaOrtí. The Science of Programming Matrix Computations. www.lulu.com. 2008.
Downloads
Libraries and materials can be downloaded for those interested in trying the approach and resulting libraries.
Publications and Reference Materials
We maintain a number of documents and web pages.
Robert A. van de Geijn and Enrique S. QuintanaOrti. The Science of Programming Matrix Computations. www.lulu.com, 2008.
Field G. Van Zee. libflame: The Complete Reference. www.lulu.com, 2010.
 FLAME/C online reference materials
A listing of all routines that are part of the libflame library has been automatically compiled with Doxygen.
Information for Developers
libflame and related projects are maintained in a repository managed with subversion.
Contact us
Please email us at flame@cs.utexas.edu , field@cs.utexas.edu , or rvdg@cs.utexas.edu