Welcome to rocSOLVERâ€™s documentation!Â¶
Legal Disclaimer
The information contained herein is for informational purposes only, and is subject to change without notice. In addition, any stated support is planned and is also subject to change. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMDâ€™s products are as set forth in a signed agreement between the parties or in AMDâ€™s Standard Terms and Conditions of Sale.
Contents
rocSOLVERâ€™s documentation consists of 3 main Chapters. The User Guide is the starting point for new users of the library, and a basic reference for current users and/or users of LAPACK. Advanced users and developers who want to further understand or extend the rocSOLVER library may wish to refer to the Library Design Guide. For a list of currently implemented routines, and a description of eachâ€™s functionality and input and output parameters, see the rocSOLVER API.
rocSOLVER User GuideÂ¶
IntroductionÂ¶
Table of contents
Library overviewÂ¶
rocSOLVER is an implementation of LAPACK routines on top of the AMDâ€™s open source ROCm platform. rocSOLVER is implemented in the HIP programming language and optimized for AMDâ€™s latest discrete GPUs.
Currently implemented functionalityÂ¶
The rocSOLVER library is in the early stages of active development. New features are being continuously added, with new functionality documented at each release of the ROCm platform.
The following tables summarize the LAPACK functionality implemented for the different supported precisions in rocSOLVERâ€™s latest release. All LAPACK and LAPACKlike main functions include _batched and _strided_batched versions. For a complete description of the listed routines, please see the rocSOLVER API document.
LAPACK auxiliary functionsÂ¶
Function 
single 
double 
single complex 
double complex 

x 
x 
x 
x 

x 
x 
x 
x 
Function 
single 
double 
single complex 
double complex 

x 
x 
x 
x 

x 
x 
x 
x 

x 
x 
x 
x 

x 
x 
x 
x 
Function 
single 
double 
single complex 
double complex 

x 
x 
x 
x 

x 
x 
x 
x 
Function 
single 
double 
single complex 
double complex 

x 
x 

x 
x 
x 
x 

x 
x 
x 
x 

x 
x 
x 
x 
Function 
single 
double 
single complex 
double complex 

x 
x 
x 
x 
Function 
single 
double 
single complex 
double complex 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 
Function 
single 
double 
single complex 
double complex 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 
LAPACK main functionsÂ¶
Function 
single 
double 
single complex 
double complex 

x 
x 
x 
x 

x 
x 
x 
x 

x 
x 
x 
x 

x 
x 
x 
x 

x 
x 
x 
x 

x 
x 
x 
x 
Function 
single 
double 
single complex 
double complex 

x 
x 
x 
x 

x 
x 
x 
x 

x 
x 
x 
x 

x 
x 
x 
x 

x 
x 
x 
x 

x 
x 
x 
x 

x 
x 
x 
x 

x 
x 
x 
x 
Function 
single 
double 
single complex 
double complex 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 
x 
x 

x 
x 
x 
x 
Function 
single 
double 
single complex 
double complex 

x 
x 
x 
x 

x 
x 
x 
x 

x 
x 
x 
x 

x 
x 
x 
x 

x 
x 
x 
x 

x 
x 
x 
x 

x 
x 
x 
x 
Function 
single 
double 
single complex 
double complex 

x 
x 
x 
x 
Function 
single 
double 
single complex 
double complex 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 

x 
x 
Function 
single 
double 
single complex 
double complex 

x 
x 
x 
x 
LAPACKlike functionsÂ¶
Function 
single 
double 
single complex 
double complex 

x 
x 
x 
x 

x 
x 
x 
x 
Function 
single 
double 
single complex 
double complex 

x 
x 
x 
x 

x 
x 
x 
x 

x 
x 
x 
x 
Building and InstallationÂ¶
Table of contents
PrerequisitesÂ¶
rocSOLVER requires a ROCmenabled platform. For more information, see the ROCm install guide.
rocSOLVER also requires a compatible version of rocBLAS installed on the system. For more information, see the rocBLAS install guide.
rocBLAS and rocSOLVER are both still under active development, and it is hard to define minimal compatibility versions. For now, a good rule of thumb is to always use rocSOLVER together with the matching rocBLAS version. For example, if you want to install rocSOLVER from the ROCm 3.3 release, then be sure that the ROCm 3.3 version of rocBLAS is also installed; if you are building the rocSOLVER branch tip, then you will need to build and install the rocBLAS branch tip as well.
Installing from prebuilt packagesÂ¶
If you have added the ROCm repositories to your Linux distribution, the latest release version of rocSOLVER can be installed using a package manager. On Ubuntu, for example, use the commands:
sudo aptget update
sudo aptget install rocsolver
Building & installing from sourceÂ¶
The rocSOLVER source code is hosted on GitHub. Download the code and checkout the desired branch using:
git clone b <desired_branch_name> https://github.com/ROCmSoftwarePlatform/rocSOLVER.git
cd rocSOLVER
To build from source, some external dependencies such as CMake and Python are required. Additionally, if the library clients are to be built (by default they are not), then LAPACK and GoogleTest will be also required. (The library clients, rocsolvertest and rocsolverbench, provide the infrastructure for testing and benchmarking rocSOLVER. For more details see the clients section of this userâ€™s guide).
Using the install.sh scriptÂ¶
It is recommended that the provided install.sh script be used to build and install rocSOLVER. The command
./install.sh help
gives detailed information on how to use this installation script.
Next, some common use cases are listed:
./install.sh
This command builds rocSOLVER and puts the generated library files, such as headers and
librocsolver.so
, in the output directory: rocSOLVER/build/release/rocsolverinstall
.
Other output files from the configuration and building process can also be found in the
rocSOLVER/build
and rocSOLVER/build/release
directories. It is assumed that all
external library dependencies have been installed. It also assumes that the rocBLAS library
is located at /opt/rocm/rocblas
.
./install.sh g
Use the g
flag to build in debug mode. In this case the generated library files will be located at
rocSOLVER/build/debug/rocsolverinstall
.
Other output files from the configuration
and building process can also be found
in the rocSOLVER/build
and rocSOLVER/build/debug
directories.
./install.sh lib_dir /home/user/rocsolverlib build_dir buildoutput
Use lib_dir
and build_dir
to
change output directories.
In this case, for example, the installer
will put the headers and library files in
/home/user/rocsolverlib
, while the outputs
of the configuration and building processes will
be in rocSOLVER/buildoutput
and rocSOLVER/buildoutput/release
.
The selected output directories must be
local, otherwise the user may require sudo
privileges.
To install rocSOLVER systemwide, we
recommend the use of the i
flag as shown
below.
./install.sh rocblas_dir /alternative/rocblas/location
Use rocblas_dir
to change where the
build system will search for the rocBLAS library.
In this case, for example, the installer
will look for the rocBLAS library at
/alternative/rocblas/location
.
./install.sh s
With the s
flag, the installer will
generate a static library
(librocsolver.a
) instead.
./install.sh d
With the d
flag, the installer will first
install all the external dependencies
required by the rocSOLVER library in
/usr/local
.
This flag only needs to be used once. For
subsequent invocations of install.sh it is
not necessary to rebuild the dependencies.
./install.sh c
With the c
flag, the installer will
additionally build the library clients
rocsolverbench
and
rocsolvertest
.
The binaries will be located at
rocSOLVER/build/release/clients/staging
.
It is assumed that all external dependencies
for the client have been installed.
./install.sh dc
By combining the c
and d
flags, the installer
will also install all the external
dependencies required by rocSOLVER clients.
Again, the d
flag only needs to be used once.
./install.sh i
With the i
flag, the installer will
additionally
generate a prebuilt rocSOLVER package and
install it, using a suitable package
manager, at the standard location:
/opt/rocm/rocsolver
.
This is the preferred approach to install
rocSOLVER on a system, as it will allow
the library to be safely removed using the
package manager.
./install.sh p
With the p
flag, the installer will also
generate the rocSOLVER package, but it will
not be installed.
./install.sh i install_dir /package/install/path
When generating a package, use install_dir
to change the directory where
it will be installed.
In this case, for example, the rocSOLVER
package will be installed at
/package/install/path
.
Manual building and installationÂ¶
Manual installation of all the external dependencies is not an easy task. Get more information on how to install each dependency at the corresponding documentation sites:
Once all dependencies are installed (including ROCm and rocBLAS), rocSOLVER can be manually built using a combination of CMake and Make commands. Using CMake options can provide more flexibility in tailoring the building and installation process. Here we provide a list of examples of common use cases (see the CMake documentation for more information on CMake options).
mkdir p build/release && cd build/release
CXX=/opt/rocm/bin/hipcc cmake DCMAKE_INSTALL_PREFIX=rocsolverinstall ../..
make install
This is equivalent to ./install.sh
.
mkdir p buildoutput/release && cd buildoutput/release
CXX=/opt/rocm/bin/hipcc cmake DCMAKE_INSTALL_PREFIX=/home/user/rocsolverlib ../..
make install
This is equivalent to ./install.sh lib_dir /home/user/rocsolverlib build_dir buildoutput
.
mkdir p build/release && cd build/release
CXX=/opt/rocm/bin/hipcc cmake DCMAKE_INSTALL_PREFIX=rocsolverinstall Drocblas_DIR=/alternative/rocblas/location ../..
make install
This is equivalent to ./install.sh rocblas_dir /alternative/rocblas/location
.
mkdir p build/debug && cd build/debug
CXX=/opt/rocm/bin/hipcc cmake DCMAKE_INSTALL_PREFIX=rocsolverinstall DCMAKE_BUILD_TYPE=Debug ../..
make install
This is equivalent to ./install.sh g
.
mkdir p build/release && cd build/release
CXX=/opt/rocm/bin/hipcc cmake DCMAKE_INSTALL_PREFIX=rocsolverinstall DBUILD_SHARED_LIBS=OFF ../..
make install
This is equivalent to ./install.sh s
.
mkdir p build/release && cd build/release
CXX=/opt/rocm/bin/hipcc cmake DCMAKE_INSTALL_PREFIX=rocsolverinstall DBUILD_CLIENTS_TESTS=ON DBUILD_CLIENTS_BENCHMARKS=ON ../..
make install
This is equivalent to ./install.sh c
.
mkdir p build/release && cd build/release
CXX=/opt/rocm/bin/hipcc cmake DCMAKE_INSTALL_PREFIX=rocsolverinstall DCPACK_SET_DESTDIR=OFF DCPACK_PACKAGING_INSTALL_PREFIX=/opt/rocm ../..
make install
make package
This is equivalent to ./install.sh p
.
mkdir p build/release && cd build/release
CXX=/opt/rocm/bin/hipcc cmake DCMAKE_INSTALL_PREFIX=rocsolverinstall DCPACK_SET_DESTDIR=OFF DCPACK_PACKAGING_INSTALL_PREFIX=/package/install/path ../..
make install
make package
sudo dpkg i rocsolver[\_]*.deb
On an Ubuntu system, for example, this would be equivalent to ./install.sh i install_dir /package/install/path
.
Using rocSOLVERÂ¶
Once installed, rocSOLVER can be used just like any other library with a C API. The header file will need to be included in the user code, and both the rocBLAS and rocSOLVER shared libraries will become linktime and runtime dependencies for the user application.
Next, some examples are used to illustrate the basic use of rocSOLVER API and rocSOLVER batched API.
Table of contents
QR factorization of a single matrixÂ¶
The following code snippet uses rocSOLVER to compute the QR factorization of a general mbyn real matrix in double precision. For a full description of the used rocSOLVER routine, see the API documentation here: rocsolver_dgeqrf().
#include <hip/hip_runtime_api.h> // for hip functions
#include <rocsolver.h> // for all the rocsolver C interfaces and type declarations
#include <stdio.h> // for printf
#include <stdlib.h> // for malloc
// Example: Compute the QR Factorization of a matrix on the GPU
double *create_example_matrix(rocblas_int *M_out,
rocblas_int *N_out,
rocblas_int *lda_out) {
// a *very* small example input; not a very efficient use of the API
const double A[3][3] = { { 12, 51, 4},
{ 6, 167, 68},
{ 4, 24, 41} };
const rocblas_int M = 3;
const rocblas_int N = 3;
const rocblas_int lda = 3;
*M_out = M;
*N_out = N;
*lda_out = lda;
// note: rocsolver matrices must be stored in column major format,
// i.e. entry (i,j) should be accessed by hA[i + j*lda]
double *hA = (double*)malloc(sizeof(double)*lda*N);
for (size_t i = 0; i < M; ++i) {
for (size_t j = 0; j < N; ++j) {
// copy A (2D array) into hA (1D array, columnmajor)
hA[i + j*lda] = A[i][j];
}
}
return hA;
}
// We use rocsolver_dgeqrf to factor a real MbyN matrix, A.
// See https://rocsolver.readthedocs.io/en/latest/api_lapackfunc.html#c.rocsolver_dgeqrf
// and https://www.netlib.org/lapack/explorehtml/df/dc5/group__variants_g_ecomputational_ga3766ea903391b5cf9008132f7440ec7b.html
int main() {
rocblas_int M; // rows
rocblas_int N; // cols
rocblas_int lda; // leading dimension
double *hA = create_example_matrix(&M, &N, &lda); // input matrix on CPU
// let's print the input matrix, just to see it
printf("A = [\n");
for (size_t i = 0; i < M; ++i) {
printf(" ");
for (size_t j = 0; j < N; ++j) {
printf("% .3f ", hA[i + j*lda]);
}
printf(";\n");
}
printf("]\n");
// initialization
rocblas_handle handle;
rocblas_create_handle(&handle);
// Some rocsolver functions may trigger rocblas to load its GEMM kernels.
// You can preload the kernels by explicitly invoking rocblas_initialize
// (e.g., to exclude onetime initialization overhead from benchmarking).
// preload rocBLAS GEMM kernels (optional)
// rocblas_initialize();
// calculate the sizes of our arrays
size_t size_A = lda * (size_t)N; // count of elements in matrix A
size_t size_piv = (M < N) ? M : N; // count of Householder scalars
// allocate memory on GPU
double *dA, *dIpiv;
hipMalloc((void**)&dA, sizeof(double)*size_A);
hipMalloc((void**)&dIpiv, sizeof(double)*size_piv);
// copy data to GPU
hipMemcpy(dA, hA, sizeof(double)*size_A, hipMemcpyHostToDevice);
// compute the QR factorization on the GPU
rocsolver_dgeqrf(handle, M, N, dA, lda, dIpiv);
// copy the results back to CPU
double *hIpiv = (double*)malloc(sizeof(double)*size_piv); // householder scalars on CPU
hipMemcpy(hA, dA, sizeof(double)*size_A, hipMemcpyDeviceToHost);
hipMemcpy(hIpiv, dIpiv, sizeof(double)*size_piv, hipMemcpyDeviceToHost);
// the results are now in hA and hIpiv
// we can print some of the results if we want to see them
printf("R = [\n");
for (size_t i = 0; i < M; ++i) {
printf(" ");
for (size_t j = 0; j < N; ++j) {
printf("% .3f ", (i <= j) ? hA[i + j*lda] : 0);
}
printf(";\n");
}
printf("]\n");
// clean up
free(hIpiv);
hipFree(dA);
hipFree(dIpiv);
free(hA);
rocblas_destroy_handle(handle);
}
The exact command used to compile the example above may vary depending on the system environment, but here is a typical example:
/opt/rocm/bin/hipcc I/opt/rocm/include c example.c
/opt/rocm/bin/hipcc o example L/opt/rocm/lib lrocsolver lrocblas example.o
QR factorization of a batch of matricesÂ¶
One of the advantages of using GPUs is the ability to execute in parallel many operations of the same type but on different data sets. Based on this idea, rocSOLVER and rocBLAS provide a batch version of most of their routines. These batch versions allow the user to execute the same operation on a set of different matrices and/or vectors with a single library call. For more details on the approach to batch functionality followed in rocSOLVER, see Batched rocSOLVER.
Strided_batched versionÂ¶
The following code snippet uses rocSOLVER to compute the QR factorization of a series of general mbyn real matrices in double precision. The matrices must be stored in contiguous memory locations on the GPU, and are accessed by a pointer to the first matrix and a stride value that gives the separation between one matrix and the next. For a full description of the used rocSOLVER routine, see the API documentation here: rocsolver_dgeqrf_strided_batched().
#include <hip/hip_runtime_api.h> // for hip functions
#include <rocsolver.h> // for all the rocsolver C interfaces and type declarations
#include <stdio.h> // for printf
#include <stdlib.h> // for malloc
// Example: Compute the QR Factorizations of an array of matrices on the GPU
double *create_example_matrices(rocblas_int *M_out,
rocblas_int *N_out,
rocblas_int *lda_out,
rocblas_stride *strideA_out,
rocblas_int *batch_count_out) {
const double A[2][3][3] = {
// First input matrix
{ { 12, 51, 4},
{ 6, 167, 68},
{ 4, 24, 41} },
// Second input matrix
{ { 3, 12, 11},
{ 4, 46, 2},
{ 0, 5, 15} } };
const rocblas_int M = 3;
const rocblas_int N = 3;
const rocblas_int lda = 3;
const rocblas_stride strideA = lda * N;
const rocblas_int batch_count = 2;
*M_out = M;
*N_out = N;
*lda_out = lda;
*strideA_out = strideA;
*batch_count_out = batch_count;
// allocate space for input matrix data on CPU
double *hA = (double*)malloc(sizeof(double)*strideA*batch_count);
// copy A (3D array) into hA (1D array, columnmajor)
for (size_t b = 0; b < batch_count; ++b)
for (size_t i = 0; i < M; ++i)
for (size_t j = 0; j < N; ++j)
hA[i + j*lda + b*strideA] = A[b][i][j];
return hA;
}
// Use rocsolver_dgeqrf_strided_batched to factor an array of real MbyN matrices.
int main() {
rocblas_int M; // rows
rocblas_int N; // cols
rocblas_int lda; // leading dimension
rocblas_stride strideA; // stride from start of one matrix to the next
rocblas_int batch_count; // number of matricies
double *hA = create_example_matrices(&M, &N, &lda, &strideA, &batch_count);
// print the input matrices
for (size_t b = 0; b < batch_count; ++b) {
printf("A[%zu] = [\n", b);
for (size_t i = 0; i < M; ++i) {
printf(" ");
for (size_t j = 0; j < N; ++j) {
printf("% 4.f ", hA[i + j*lda + strideA*b]);
}
printf(";\n");
}
printf("]\n");
}
// initialization
rocblas_handle handle;
rocblas_create_handle(&handle);
// preload rocBLAS GEMM kernels (optional)
// rocblas_initialize();
// calculate the sizes of our arrays
size_t size_A = strideA * (size_t)batch_count; // elements in array for matrices
rocblas_stride strideP = (M < N) ? M : N; // stride of Householder scalar sets
size_t size_piv = strideP * (size_t)batch_count; // elements in array for Householder scalars
// allocate memory on GPU
double *dA, *dIpiv;
hipMalloc((void**)&dA, sizeof(double)*size_A);
hipMalloc((void**)&dIpiv, sizeof(double)*size_piv);
// copy data to GPU
hipMemcpy(dA, hA, sizeof(double)*size_A, hipMemcpyHostToDevice);
// compute the QR factorizations on the GPU
rocsolver_dgeqrf_strided_batched(handle, M, N, dA, lda, strideA, dIpiv, strideP, batch_count);
// copy the results back to CPU
double *hIpiv = (double*)malloc(sizeof(double)*size_piv); // householder scalars on CPU
hipMemcpy(hA, dA, sizeof(double)*size_A, hipMemcpyDeviceToHost);
hipMemcpy(hIpiv, dIpiv, sizeof(double)*size_piv, hipMemcpyDeviceToHost);
// the results are now in hA and hIpiv
// print some of the results
for (size_t b = 0; b < batch_count; ++b) {
printf("R[%zu] = [\n", b);
for (size_t i = 0; i < M; ++i) {
printf(" ");
for (size_t j = 0; j < N; ++j) {
printf("% 4.f ", (i <= j) ? hA[i + j*lda + strideA*b] : 0);
}
printf(";\n");
}
printf("]\n");
}
// clean up
free(hIpiv);
hipFree(dA);
hipFree(dIpiv);
free(hA);
rocblas_destroy_handle(handle);
}
Batched versionÂ¶
The following code snippet uses rocSOLVER to compute the QR factorization of a series of general mbyn real matrices in double precision. The matrices do not need to be in contiguous memory locations on the GPU, and will be accessed by an array of pointers. For a full description of the used rocSOLVER routine, see the API documentation here: rocsolver_dgeqrf_batched.
#include <hip/hip_runtime_api.h> // for hip functions
#include <rocsolver.h> // for all the rocsolver C interfaces and type declarations
#include <stdio.h> // for printf
#include <stdlib.h> // for malloc
// Example: Compute the QR Factorizations of a batch of matrices on the GPU
double **create_example_matrices(rocblas_int *M_out,
rocblas_int *N_out,
rocblas_int *lda_out,
rocblas_int *batch_count_out) {
// a small example input
const double A[2][3][3] = {
// First input matrix
{ { 12, 51, 4},
{ 6, 167, 68},
{ 4, 24, 41} },
// Second input matrix
{ { 3, 12, 11},
{ 4, 46, 2},
{ 0, 5, 15} } };
const rocblas_int M = 3;
const rocblas_int N = 3;
const rocblas_int lda = 3;
const rocblas_int batch_count = 2;
*M_out = M;
*N_out = N;
*lda_out = lda;
*batch_count_out = batch_count;
// allocate space for input matrix data on CPU
double **hA = (double**)malloc(sizeof(double*)*batch_count);
hA[0] = (double*)malloc(sizeof(double)*lda*N);
hA[1] = (double*)malloc(sizeof(double)*lda*N);
for (size_t b = 0; b < batch_count; ++b)
for (size_t i = 0; i < M; ++i)
for (size_t j = 0; j < N; ++j)
hA[b][i + j*lda] = A[b][i][j];
return hA;
}
// Use rocsolver_dgeqrf_batched to factor a batch of real MbyN matrices.
int main() {
rocblas_int M; // rows
rocblas_int N; // cols
rocblas_int lda; // leading dimension
rocblas_int batch_count; // number of matricies
double **hA = create_example_matrices(&M, &N, &lda, &batch_count);
// print the input matrices
for (size_t b = 0; b < batch_count; ++b) {
printf("A[%zu] = [\n", b);
for (size_t i = 0; i < M; ++i) {
printf(" ");
for (size_t j = 0; j < N; ++j) {
printf("% 4.f ", hA[b][i + j*lda]);
}
printf(";\n");
}
printf("]\n");
}
// initialization
rocblas_handle handle;
rocblas_create_handle(&handle);
// preload rocBLAS GEMM kernels (optional)
// rocblas_initialize();
// calculate the sizes of the arrays
size_t size_A = lda * (size_t)N; // count of elements in each matrix A
rocblas_stride strideP = (M < N) ? M : N; // stride of Householder scalar sets
size_t size_piv = strideP * (size_t)batch_count; // elements in array for Householder scalars
// allocate memory on the CPU for an array of pointers,
// then allocate memory for each matrix on the GPU.
double **A = (double**)malloc(sizeof(double*)*batch_count);
for (rocblas_int b = 0; b < batch_count; ++b)
hipMalloc((void**)&A[b], sizeof(double)*size_A);
// allocate memory on GPU for the array of pointers and Householder scalars
double **dA, *dIpiv;
hipMalloc((void**)&dA, sizeof(double*)*batch_count);
hipMalloc((void**)&dIpiv, sizeof(double)*size_piv);
// copy each matrix to the GPU
for (rocblas_int b = 0; b < batch_count; ++b)
hipMemcpy(A[b], hA[b], sizeof(double)*size_A, hipMemcpyHostToDevice);
// copy the array of pointers to the GPU
hipMemcpy(dA, A, sizeof(double*)*batch_count, hipMemcpyHostToDevice);
// compute the QR factorizations on the GPU
rocsolver_dgeqrf_batched(handle, M, N, dA, lda, dIpiv, strideP, batch_count);
// copy the results back to CPU
double *hIpiv = (double*)malloc(sizeof(double)*size_piv); // householder scalars on CPU
hipMemcpy(hIpiv, dIpiv, sizeof(double)*size_piv, hipMemcpyDeviceToHost);
for (rocblas_int b = 0; b < batch_count; ++b)
hipMemcpy(hA[b], A[b], sizeof(double)*size_A, hipMemcpyDeviceToHost);
// the results are now in hA and hIpiv
// print some of the results
for (size_t b = 0; b < batch_count; ++b) {
printf("R[%zu] = [\n", b);
for (size_t i = 0; i < M; ++i) {
printf(" ");
for (size_t j = 0; j < N; ++j) {
printf("% 4.f ", (i <= j) ? hA[b][i + j*lda] : 0);
}
printf(";\n");
}
printf("]\n");
}
// clean up
free(hIpiv);
for (rocblas_int b = 0; b < batch_count; ++b)
free(hA[b]);
free(hA);
for (rocblas_int b = 0; b < batch_count; ++b)
hipFree(A[b]);
free(A);
hipFree(dA);
hipFree(dIpiv);
rocblas_destroy_handle(handle);
}
Memory ModelÂ¶
Almost all LAPACK and rocSOLVER routines require workspace memory in order to compute their results. In contrast to LAPACK, however, pointers to the workspace are not explicitly passed to rocSOLVER functions as arguments; instead, they are managed behindthescenes using a configurable device memory model.
rocSOLVER makes use of and is integrated with rocBLASâ€™s memory model.
Workspace memory, and the scheme used to manage it, is tracked on a perrocblas_handle
basis, and
the same functionality that is used to manipulate rocBLASâ€™s workspace memory can and will also affect
rocSOLVERâ€™s workspace memory.
There are 4 schemes for device memory management:
Automatic (managed by rocSOLVER/rocBLAS): The default scheme. Device memory persists between function calls and will be automatically reallocated if more memory is required by a function.
Usermanaged (preallocated): The desired workspace size is specified by the user as an environment variable before handle creation, and cannot be altered after the handle is created.
Usermanaged (manual): The desired workspace size can be manipulated using rocBLAS helper functions.
Userowned: The user manually allocates device memory and calls a rocBLAS helper function to use it as the workspace.
Table of contents
Automatic workspaceÂ¶
By default, rocSOLVER will automatically allocate device memory to be used as internal workspace
using the rocBLAS memory model, and will increase the amount of allocated memory as needed by rocSOLVER
functions. If this scheme is in use, the function rocblas_is_managing_device_memory
will return
true
. In order to reenable this scheme if it is not in use, a nullptr
or zero size can be
passed to the helper functions rocblas_set_device_memory_size
or rocblas_set_workspace
.
For more details on these rocBLAS APIs, see the
rocBLAS documentation.
This scheme has the disadvantage that automatic reallocation is synchronizing, and the user cannot control when this synchronization happens.
Usermanaged workspaceÂ¶
Alternatively, the user can manually specify an amount of memory to be allocated by rocSOLVER/rocBLAS. This allows the user to control when and if memory is reallocated and synchronization occurs. However, function calls will fail if there is not enough allocated memory.
Minimum required sizeÂ¶
In order to choose an appropriate amount of memory to allocate, rocSOLVER can be queried to determine
the minimum amount of memory required for functions to complete. The query can be started by calling
rocblas_start_device_memory_size_query
, followed by calls to the desired functions with appropriate
problem sizes (a null pointer can be passed to the device pointer arguments). A final call to
rocblas_stop_device_memory_size_query
will return the minimum required size.
For example, the following code snippet will return the memory size required to solve a 1024*1024 linear
system with 1 righthand side (involving calls to getrf
and getrs
):
size_t memory_size;
rocblas_start_device_memory_size_query(handle);
rocsolver_dgetrf(handle, 1024, 1024, nullptr, lda, nullptr, nullptr);
rocsolver_dgetrs(handle, rocblas_operation_none, 1024, 1, nullptr, lda, nullptr, nullptr, ldb);
rocblas_stop_device_memory_size_query(handle, &memory_size);
For more details on the rocBLAS APIs, see the rocBLAS documentation.
Using an environment variableÂ¶
The desired workspace size can be provided before creation of the rocblas_handle
by setting the
value of environment variable ROCBLAS_DEVICE_MEMORY_SIZE
. If this variable is unset or the value
is == 0, then it will be ignored. Note that a workspace size set in this way cannot be changed once
the handle has been created.
Using helper functionsÂ¶
Another way to set the desired workspace size is by using the helper function rocblas_set_device_memory_size
.
This function is called after handle creation and can be called multiple times; however, it is
recommended to first synchronize the handle stream if a rocSOLVER or rocBLAS routine has already been
called. For example:
hipStream_t stream;
rocblas_get_stream(handle, &stream);
hipStreamSynchronize(stream);
rocblas_set_device_memory_size(handle, memory_size);
For more details on the rocBLAS APIs, see the rocBLAS documentation.
Userowned workspaceÂ¶
Finally, the user may opt to manage the workspace memory manually using HIP. By calling the function
rocblas_set_workspace
, the user may pass a pointer to device memory to rocBLAS that will be used
as the workspace for rocSOLVER. For example:
void* device_memory;
hipMalloc(&device_memory, memory_size);
rocblas_set_workspace(handle, device_memory, memory_size);
// perform computations here
rocblas_set_workspace(handle, nullptr, 0);
hipFree(device_memory);
For more details on the rocBLAS APIs, see the rocBLAS documentation.
Multilevel LoggingÂ¶
Similar to rocBLAS logging, rocSOLVER provides logging facilities that can be used to output information on rocSOLVER function calls. Three modes of logging are supported: trace logging, bench logging, and profile logging.
Note that performance will degrade when logging is enabled.
Table of contents
Logging modesÂ¶
Trace loggingÂ¶
Trace logging outputs a line each time an internal rocSOLVER or rocBLAS routine is called, outputting the function name and the values of its arguments (excluding stride arguments). The maximum depth of nested function calls that can appear in the log is specified by the user.
Bench loggingÂ¶
Bench logging outputs a line each time a public rocSOLVER routine is called (excluding
auxiliary library functions), outputting a line that can be used with the executable
rocsolverbench
to call the function with the same size arguments.
Profile loggingÂ¶
Profile logging, upon calling rocsolver_log_write_profile
or rocsolver_log_flush_profile
,
or terminating the logging session using rocsolver_log_end
, will output statistics on each
called internal rocSOLVER and rocBLAS routine. These include the number of times each function
was called, the total program runtime occupied by the function, and the total program runtime
occupied by its nested function calls. As with trace logging, the maximum depth of nested output
is specified by the user. Note that, when profile logging is enabled, the stream will be synchronized
after every internal function call.
Initialization and setupÂ¶
In order to use rocSOLVERâ€™s logging facilities, the user must first call rocsolver_log_begin
in order to allocate the internal data structures used for logging and begin the logging session.
The user may then specify a layer mode and max level depth, either programmatically using
rocsolver_log_set_layer_mode
, rocsolver_log_set_max_levels
, or by setting the corresponding
environment variables.
The layer mode specifies which logging type(s) are activated, and can be rocblas_layer_mode_none
,
rocblas_layer_mode_log_trace
, rocblas_layer_mode_log_bench
, rocblas_layer_mode_log_profile
,
or a bitwise combination of these. The max level depth specifies the default maximum depth of nested
function calls that may appear in the trace and profile logging.
Both the default layer mode and max level depth can be specified using environment variables.
ROCSOLVER_LAYER
ROCSOLVER_LEVELS
If these variables are not set, the layer mode will default to rocblas_layer_mode_none
and the
max level depth will default to 1. These defaults can be restored by calling the function
rocsolver_log_restore_defaults
.
ROCSOLVER_LAYER
is a bitwise OR of zero or more bit masks as follows:
If
ROCSOLVER_LAYER
is not set, then there is no loggingIf
(ROCSOLVER_LAYER & 1) != 0
, then there is trace loggingIf
(ROCSOLVER_LAYER & 2) != 0
, then there is bench loggingIf
(ROCSOLVER_LAYER & 4) != 0
, then there is profile logging
Three environment variables can set the full path name for a log file:
ROCSOLVER_LOG_TRACE_PATH
sets the full path name for trace loggingROCSOLVER_LOG_BENCH_PATH
sets the full path name for bench loggingROCSOLVER_LOG_PROFILE_PATH
sets the full path name for profile logging
If one of these environment variables is not set, then ROCSOLVER_LOG_PATH
sets the full path
for the corresponding logging, if it is set. If neither the above nor ROCSOLVER_LOG_PATH
are
set, then the corresponding logging output is streamed to standard error.
The results of profile logging, if enabled, can be printed using rocsolver_log_write_profile
or rocsolver_log_flush_profile
. Once logging facilities are no longer required (e.g. at
program termination), the user must call rocsolver_log_end
to free the data structures used
for logging. If the profile log has not been flushed beforehand, then rocsolver_log_end
will also output the results of profile logging.
For more details on the mentioned logging functions, see the Logging functions section on the rocSOLVER API document.
Example codeÂ¶
Code examples that illustrate the use of rocSOLVERâ€™s multilevel logging facilities can be found
in this section or in the example_logging.cpp
file in the clients/samples
directory.
The following example shows some basic use: enabling trace and profile logging, and setting the max depth for their output.
// initialization
rocblas_handle handle;
rocblas_create_handle(&handle);
rocsolver_log_begin();
// begin trace logging and profile logging (max depth = 5)
rocsolver_log_set_layer_mode(rocblas_layer_mode_log_trace  rocblas_layer_mode_log_profile);
rocsolver_log_set_max_levels(5);
// call rocSOLVER functions...
// terminate logging and print profile results
rocsolver_log_flush_profile();
rocsolver_log_end();
rocblas_destroy_handle(handle);
Alternatively, users may control which logging modes are enabled by using environment variables.
The benefit of this approach is that the program does not need to be recompiled if a different
logging environment is desired. This requires that rocsolver_log_set_layer_mode
and
rocsolver_log_set_max_levels
are not called in the code, e.g.
// initialization
rocblas_handle handle;
rocblas_create_handle(&handle);
rocsolver_log_begin();
// call rocSOLVER functions...
// termination
rocsolver_log_end();
rocblas_destroy_handle(handle);
The user may then set the desired logging modes and max depth on the command line as follows:
export ROCSOLVER_LAYER=5
export ROCSOLVER_LEVELS=5
Kernel loggingÂ¶
Kernel launches from within rocSOLVER can be added to the trace and profile logs using an
additional layer mode flag. The flag rocblas_layer_mode_ex_log_kernel
can be combined with
rocblas_layer_mode
flags and passed to rocsolver_log_set_layer_mode
in order to enable
kernel logging. Alternatively, the environment variable ROCSOLVER_LAYER
can be set such that
(ROCSOLVER_LAYER & 16) != 0
:
If
(ROCSOLVER_LAYER & 17) != 0
, then kernel calls will be added to the trace logIf
(ROCSOLVER_LAYER & 20) != 0
, then kernel calls will be added to the profile log
Multiple host threadsÂ¶
The logging facilities for rocSOLVER assume that each rocblas_handle
is associated with at
most one host thread. When using rocSOLVERâ€™s multilevel logging setup, it is recommended to
create a separate rocblas_handle
for each host thread.
The rocsolver_log_* functions are not threadsafe. Calling a log function while any rocSOLVER routine is executing on another host thread will result in undefined behaviour. Once enabled, logging data collection is threadsafe. However, note that trace logging will likely result in garbled trace trees if rocSOLVER routines are called from multiple host threads.
ClientsÂ¶
rocSOLVER has an infrastructure for testing and benchmarking similar to that of rocBLAS, as well as sample code illustrating basic use of the library.
Client binaries are not built by default; they require specific flags to be passed to the install script
or CMake system. If the c
flag is passed to install.sh
, the client binaries will be located in the
directory <rocsolverDIR>/build/release/clients/staging
. If both the c
and g
flags are passed to
install.sh
, the client binaries will be located in <rocsolverDIR>/build/debug/clients/staging
.
If the DBUILD_CLIENTS_TESTS=ON
flag, the DBUILD_CLIENTS_BENCHMARKS=ON
flag, and/or the
DBUILD_CLIENTS_SAMPLES=ON
flag are passed to the CMake system, the relevant client binaries will normally
be located in the directory <rocsolverDIR>/build/clients/staging
. See the Building and installation
section of the User Guide for more information on building the library and its clients.
Table of contents
Testing rocSOLVERÂ¶
The rocsolvertest
client executes a suite of Google tests (gtest) that
verifies the correct functioning of the library. The results computed by rocSOLVER, given random input data,
are normally compared with the results computed by NETLib LAPACK on the CPU, or tested implicitly
in the context of the solved problem. It will be built if the c
flag is passed to install.sh
or if the DBUILD_CLIENTS_TESTS=ON
flag is
passed to the CMake system.
Calling the rocSOLVER gtest client with the help
flag
./rocsolvertest help
returns information on different flags that control the behavior of the gtests.
One of the most useful flags is the gtest_filter
flag, which allows the user to choose which tests to run
from the suite. For example, the following command will run the tests for only geqrf:
./rocsolvertest gtest_filter=*GEQRF*
Note that rocSOLVERâ€™s tests are divided into two separate groupings: checkin_lapack
and daily_lapack
.
Tests in the checkin_lapack
group are small and quick to execute, and verify basic correctness and error
handling. Tests in the daily_lapack
group are large and slower to execute, and verify correctness of
large problem sizes. Users may run one test group or the other using gtest_filter
, e.g.
./rocsolvertest gtest_filter=*checkin_lapack*
./rocsolvertest gtest_filter=*daily_lapack*
Benchmarking rocSOLVERÂ¶
The rocsolverbench
client runs any rocSOLVER function with random data of the specified dimensions. It compares basic
performance information (i.e. execution times) between NETLib LAPACK on the
CPU and rocSOLVER on the GPU. It will be built if the c
flag is passed to install.sh
or if the
DBUILD_CLIENTS_BENCHMARKS=ON
flag is passed to the CMake system.
Calling the rocSOLVER bench client with the help
flag
./rocsolverbench help
returns information on the different parameters and flags that control the behavior of the benchmark client.
Two of the most important flags for rocsolverbench
are the f
and r
flags. The f
(or
function
) flag allows the user to select which function to benchmark. The r
(or precision
)
flag allows the user to select the data precision for the function, and can be one of s (single precision),
d (double precision), c (single precision complex), or z (double precision complex).
The nonpointer arguments for a function can be passed to rocsolverbench
by using the argument name as
a flag (see the rocSOLVER API document for information on the function arguments and
their names). For example, the function rocsolver_dgeqrf_strided_batched
has the following method signature:
rocblas_status
rocsolver_dgeqrf_strided_batched(rocblas_handle handle,
const rocblas_int m,
const rocblas_int n,
double* A,
const rocblas_int lda,
const rocblas_stride strideA,
double* ipiv,
const rocblas_stride strideP,
const rocblas_int batch_count);
A call to rocsolverbench
that runs this function on a batch of one hundred 30x30 matrices could look like this:
./rocsolverbench f geqrf_strided_batched r d m 30 n 30 lda 30 strideA 900 strideP 30 batch_count 100
Generally, rocsolverbench
will attempt to provide or calculate a suitable default value for these arguments,
though at least one size argument must always be specified by the user. Functions that take m and n as arguments
typically require m to be provided, and a square matrix will be assumed. For example, the previous command is
equivalent to:
./rocsolverbench f geqrf_strided_batched r d m 30 batch_count 100
Other useful benchmarking options include the perf
flag, which will disable the LAPACK computation and only time and print the rocSOLVER performance result; the i
(or iters
) flag, which indicates the number of times to run the
GPU timing loop (the performance result would be the average of all the runs); and the profile
flag, which enables profile logging indicating the maximum depth of the nested output.
./rocsolverbench f geqrf_strided_batched r d m 30 batch_count 100 perf 1
./rocsolverbench f geqrf_strided_batched r d m 30 batch_count 100 iters 20
./rocsolverbench f geqrf_strided_batched r d m 30 batch_count 100 profile 5
In addition to the benchmarking functionality, the rocSOLVER bench client can also provide the norm of the error in the
computations when the v
(or verify
) flag is used; and return the amount of device memory required as workspace for the given function, if the
mem_query
flag is passed.
./rocsolverbench f geqrf_strided_batched r d m 30 batch_count 100 verify 1
./rocsolverbench f geqrf_strided_batched r d m 30 batch_count 100 mem_query 1
rocSOLVER sample codeÂ¶
rocSOLVERâ€™s sample programs provide illustrative examples of how to work with the rocSOLVER library. They will be
built if the c
flag is passed to install.sh
or if the DBUILD_CLIENTS_SAMPLES=ON
flag is passed to the
CMake system.
Currently, sample code exists to demonstrate the following:
Basic use of rocSOLVER in C, C++, and Fortran, using the example of rocsolver_geqrf;
Use of batched and strided_batched functions, using rocsolver_geqrf_batched and rocsolver_geqrf_strided_batched as examples;
Use of rocSOLVER with the Heterogeneous Memory Management (HMM) model; and
Use of rocSOLVERâ€™s multilevel logging functionality.
rocSOLVER Library Design GuideÂ¶
Tuning rocSOLVER PerformanceÂ¶
Some compiletime parameters in rocSOLVER can be modified to tune the performance of the library functions in a given context (e.g., for a particular matrix size or shape). A description of these tunable constants is presented in this section.
To facilitate the description, the constants are grouped by the family of functions they affect. Some aspects of the involved algorithms are also depicted here for the sake of clarity; however, this section is not intended to be a review of the wellknown methods for different matrix computations. These constants are specific to the rocSOLVER implementation and are only described within that context.
All described constants can be found in library/src/include/ideal_sizes.hpp
.
These are not runtime arguments for the associated API functions. The library must be
rebuilt from source for any change to take effect.
Warning
The effect of changing a tunable constant on the performance of the library is difficult to predict, and such analysis is beyond the scope of this document. Advanced users and developers tuning these values should proceed with caution. New values may (or may not) improve or worsen the performance of the associated functions.
Table of contents
geqr2/geqrf and geql2/geqlf functionsÂ¶
The orthogonal factorizations from the left (QR or QL factorizations) are separated into two versions: blocked and unblocked. The unblocked routines GEQR2 and GEQL2 are based on BLAS Level 2 operations and work by applying Householder reflectors one column at a time. The blocked routines GEQRF and GEQLF factorize a block of columns at each step using the unblocked functions (provided the matrix is large enough) and apply the resulting block reflectors to update the rest of the matrix. The application of the block reflectors is based on matrixmatrix operations (BLAS Level 3), which, in general, can give better performance on the GPU.
GEQxF_BLOCKSIZEÂ¶

GEQxF_BLOCKSIZE
Â¶ Determines the size of the block column factorized at each step in the blocked QR or QL algorithm (GEQRF or GEQLF). It also applies to the corresponding batched and stridedbatched routines.
GEQxF_GEQx2_SWITCHSIZEÂ¶

GEQxF_GEQx2_SWITCHSIZE
Â¶ Determines the size at which rocSOLVER switches from the unblocked to the blocked algorithm when executing GEQRF or GEQLF. It also applies to the corresponding batched and stridedbatched routines.
GEQRF or GEQLF will factorize blocks of GEQxF_BLOCKSIZE columns at a time until the rest of the matrix has no more than GEQxF_GEQx2_SWITCHSIZE rows or columns; at this point the last block, if any, will be factorized with the unblocked algorithm (GEQR2 or GEQL2).
(As of the current rocSOLVER release, these constants have not been tuned for any specific cases.)
gerq2/gerqf and gelq2/gelqf functionsÂ¶
The orthogonal factorizations from the right (RQ or LQ factorizations) are separated into two versions: blocked and unblocked. The unblocked routines GERQ2 and GELQ2 are based on BLAS Level 2 operations and work by applying Householder reflectors one row at a time. The blocked routines GERQF and GELQF factorize a block of rows at each step using the unblocked functions (provided the matrix is large enough) and apply the resulting block reflectors to update the rest of the matrix. The application of the block reflectors is based on matrixmatrix operations (BLAS Level 3), which, in general, can give better performance on the GPU.
GExQF_BLOCKSIZEÂ¶

GExQF_BLOCKSIZE
Â¶ Determines the size of the block row factorized at each step in the blocked RQ or LQ algorithm (GERQF or GELQF). It also applies to the corresponding batched and stridedbatched routines.
GExQF_GExQ2_SWITCHSIZEÂ¶

GExQF_GExQ2_SWITCHSIZE
Â¶ Determines the size at which rocSOLVER switches from the unblocked to the blocked algorithm when executing GERQF or GELQF. It also applies to the corresponding batched and stridedbatched routines.
GERQF or GELQF will factorize blocks of GExQF_BLOCKSIZE rows at a time until the rest of the matrix has no more than GExQF_GExQ2_SWITCHSIZE rows or columns; at this point the last block, if any, will be factorized with the unblocked algorithm (GERQ2 or GELQ2).
(As of the current rocSOLVER release, these constants have not been tuned for any specific cases.)
org2r/orgqr, org2l/orgql, ung2r/ungqr and ung2l/ungql functionsÂ¶
The generators of a matrix Q with orthonormal columns (as products of Householder reflectors derived from the QR or QL factorizations) are also separated into blocked and unblocked versions. The unblocked routines ORG2R/UNG2R and ORG2L/UNG2L, based on BLAS Level 2 operations, work by accumulating one Householder reflector at a time. The blocked routines ORGQR/UNGQR and ORGQL/UNGQL multiply a set of reflectors at each step using the unblocked functions (provided there are enough reflectors to accumulate) and apply the resulting block reflector to update Q. The application of the block reflectors is based on matrixmatrix operations (BLAS Level 3), which, in general, can give better performance on the GPU.
xxGQx_BLOCKSIZEÂ¶

xxGQx_BLOCKSIZE
Â¶ Determines the size of the block reflector that is applied at each step when generating a matrix Q with orthonormal columns with the blocked algorithm (ORGQR/UNGQR or ORGQL/UNGQL).
xxGQx_xxGQx2_SWITCHSIZEÂ¶

xxGQx_xxGQx2_SWITCHSIZE
Â¶ Determines the size at which rocSOLVER switches from the unblocked to the blocked algorithm when executing ORGQR/UNGQR or ORGQL/UNGQL.
ORGQR/UNGQR or ORGQL/UNGQL will accumulate xxGQx_BLOCKSIZE reflectors at a time until there are no more than xxGQx_xxGQx2_SWITCHSIZE reflectors left; the remaining reflectors, if any, are applied one by one using the unblocked algorithm (ORG2R/UNG2R or ORG2L/UNG2L).
(As of the current rocSOLVER release, these constants have not been tuned for any specific cases.)
orgr2/orgrq, orgl2/orglq, ungr2/ungrq and ungl2/unglq functionsÂ¶
The generators of a matrix Q with orthonormal rows (as products of Householder reflectors derived from the RQ or LQ factorizations) are also separated into blocked and unblocked versions. The unblocked routines ORGR2/UNGR2 and ORGL2/UNGL2, based on BLAS Level 2 operations, work by accumulating one Householder reflector at a time. The blocked routines ORGRQ/UNGRQ and ORGLQ/UNGLQ multiply a set of reflectors at each step using the unblocked functions (provided there are enough reflectors to accumulate) and apply the resulting block reflector to update Q. The application of the block reflectors is based on matrixmatrix operations (BLAS Level 3), which, in general, can give better performance on the GPU.
xxGxQ_BLOCKSIZEÂ¶

xxGxQ_BLOCKSIZE
Â¶ Determines the size of the block reflector that is applied at each step when generating a matrix Q with orthonormal rows with the blocked algorithm (ORGRQ/UNGRQ or ORGLQ/UNGLQ).
xxGxQ_xxGxQ2_SWITCHSIZEÂ¶

xxGxQ_xxGxQ2_SWITCHSIZE
Â¶ Determines the size at which rocSOLVER switches from the unblocked to the blocked algorithm when executing ORGRQ/UNGRQ or ORGLQ/UNGLQ.
ORGRQ/UNGRQ or ORGLQ/UNGLQ will accumulate xxGxQ_BLOCKSIZE reflectors at a time until there are no more than xxGxQ_xxGxQ2_SWITCHSIZE reflectors left; the remaining reflectors, if any, are applied one by one using the unblocked algorithm (ORGR2/UNGR2 or ORGL2/UNGL2).
(As of the current rocSOLVER release, these constants have not been tuned for any specific cases.)
orm2r/ormqr, orm2l/ormql, unm2r/unmqr and unm2l/unmql functionsÂ¶
As with the generators of orthonormal/unitary matrices, the routines to multiply a general matrix C by a matrix Q with orthonormal columns are separated into blocked and unblocked versions. The unblocked routines ORM2R/UNM2R and ORM2L/UNM2L, based on BLAS Level 2 operations, work by multiplying one Householder reflector at a time, while the blocked routines ORMQR/UNMQR and ORMQL/UNMQL apply a set of reflectors at each step (provided there are enough reflectors to start with). The application of the block reflectors is based on matrixmatrix operations (BLAS Level 3), which, in general, can give better performance on the GPU.
xxMQx_BLOCKSIZEÂ¶

xxMQx_BLOCKSIZE
Â¶ Determines the size of the block reflector that multiplies the matrix C at each step with the blocked algorithm (ORMQR/UNMQR or ORMQL/UNMQL).
xxMQx_BLOCKSIZE also acts as a switch size; if the total number of reflectors is not greater than xxMQx_BLOCKSIZE (k <= xxMQx_BLOCKSIZE), ORMQR/UNMQR or ORMQL/UNMQL will directly call the unblocked routines (ORM2R/UNM2R or ORM2L/UNM2L). However, when k is not a multiple of xxMQx_BLOCKSIZE, the last block that updates C in the blocked process is allowed to be smaller than xxMQx_BLOCKSIZE.
(As of the current rocSOLVER release, this constant has not been tuned for any specific cases.)
ormr2/ormrq, orml2/ormlq, unmr2/unmrq and unml2/unmlq functionsÂ¶
As with the generators of orthonormal/unitary matrices, the routines to multiply a general matrix C by a matrix Q with orthonormal rows are separated into blocked and unblocked versions. The unblocked routines ORMR2/UNMR2 and ORML2/UNML2, based on BLAS Level 2 operations, work by multiplying one Householder reflector at a time, while the blocked routines ORMRQ/UNMRQ and ORMLQ/UNMLQ apply a set of reflectors at each step (provided there are enough reflectors to start with). The application of the block reflectors is based on matrixmatrix operations (BLAS Level 3), which, in general, can give better performance on the GPU.
xxMxQ_BLOCKSIZEÂ¶

xxMxQ_BLOCKSIZE
Â¶ Determines the size of the block reflector that multiplies the matrix C at each step with the blocked algorithm (ORMRQ/UNMRQ or ORMLQ/UNMLQ).
xxMxQ_BLOCKSIZE also acts as a switch size; if the total number of reflectors is not greater than xxMxQ_BLOCKSIZE (k <= xxMxQ_BLOCKSIZE), ORMRQ/UNMRQ or ORMLQ/UNMLQ will directly call the unblocked routines (ORMR2/UNMR2 or ORML2/UNML2). However, when k is not a multiple of xxMxQ_BLOCKSIZE, the last block that updates C in the blocked process is allowed to be smaller than xxMxQ_BLOCKSIZE.
(As of the current rocSOLVER release, this constant has not been tuned for any specific cases.)
gebd2/gebrd and labrd functionsÂ¶
The computation of the bidiagonal form of a matrix is separated into blocked and unblocked versions. The unblocked routine GEBD2 (and the auxiliary LABRD), based on BLAS Level 2 operations, apply Householder reflections to one column and row at a time. The blocked routine GEBRD reduces a leading block of rows and columns at each step using the unblocked function LABRD (provided the matrix is large enough), and applies the resulting block reflectors to update the trailing submatrix. The application of the block reflectors is based on matrixmatrix operations (BLAS Level 3), which, in general, can give better performance on the GPU.
GEBRD_BLOCKSIZEÂ¶

GEBRD_BLOCKSIZE
Â¶ Determines the size of the leading block that is reduced to bidiagonal form at each step when using the blocked algorithm (GEBRD). It also applies to the corresponding batched and stridedbatched routines.
GEBRD_GEBD2_SWITCHSIZEÂ¶

GEBRD_GEBD2_SWITCHSIZE
Â¶ Determines the size at which rocSOLVER switches from the unblocked to the blocked algorithm when executing GEBRD. It also applies to the corresponding batched and stridedbatched routines.
GEBRD will use LABRD to reduce blocks of GEBRD_BLOCKSIZE rows and columns at a time until the trailing submatrix has no more than GEBRD_GEBD2_SWITCHSIZE rows or columns; at this point the last block, if any, will be reduced with the unblocked algorithm (GEBD2).
(As of the current rocSOLVER release, these constants have not been tuned for any specific cases.)
gesvd functionÂ¶
The Singular Value Decomposition of a matrix A could be sped up for matrices with sufficiently many more rows than columns (or columns than rows) by starting with a QR factorization (or LQ factorization) of A and working with the triangular factor afterwards.
THIN_SVD_SWITCHÂ¶

THIN_SVD_SWITCH
Â¶ Determines the factor by which one dimension of a matrix should exceed the other dimension for the thin SVD to be computed when executing GESVD. It also applies to the corresponding batched and stridedbatched routines.
When a mbyn matrix A is passed to GESVD, if m >= THIN_SVD_SWITCH*n or n >= THIN_SVD_SWITCH*m, then the thin SVD is computed.
(As of the current rocSOLVER release, this constant has not been tuned for any specific cases.)
sytd2/sytrd, hetd2/hetrd and latrd functionsÂ¶
The computation of the tridiagonal form of a symmetric/Hermitian matrix is separated into blocked and unblocked versions. The unblocked routines SYTD2/HETD2 (and the auxiliary LATRD), based on BLAS Level 2 operations, apply Householder reflections to one column/row at a time. The blocked routine SYTRD reduces a block of rows and columns at each step using the unblocked function LATRD (provided the matrix is large enough) and applies the resulting block reflector to update the rest of the matrix. The application of the block reflectors is based on matrixmatrix operations (BLAS Level 3), which, in general, can give better performance on the GPU.
xxTRD_BLOCKSIZEÂ¶

xxTRD_BLOCKSIZE
Â¶ Determines the size of the leading block that is reduced to tridiagonal form at each step when using the blocked algorithm (SYTRD/HETRD). It also applies to the corresponding batched and stridedbatched routines.
xxTRD_xxTD2_SWITCHSIZEÂ¶

xxTRD_xxTD2_SWITCHSIZE
Â¶ Determines the size at which rocSOLVER switches from the unblocked to the blocked algorithm when executing SYTRD/HETRD. It also applies to the corresponding batched and stridedbatched routines.
SYTRD/HETRD will use LATRD to reduce blocks of xxTRD_BLOCKSIZE rows and columns at a time until the rest of the matrix has no more than xxTRD_xxTD2_SWITCHSIZE rows or columns; at this point the last block, if any, will be reduced with the unblocked algorithm (SYTD2/HETD2).
(As of the current rocSOLVER release, these constants have not been tuned for any specific cases.)
sygs2/sygst and hegs2/hegst functionsÂ¶
The reduction of a symmetric/Hermitiandefinite generalized eigenproblem to standard form is separated into blocked and unblocked versions. The unblocked routines SYGS2/HEGS2 reduce the matrix A one column/row at a time with vector operations and rank2 updates (BLAS Level 2). The blocked routines SYGST/HEGST reduce a leading block of A at each step using the unblocked methods (provided A is large enough) and update the trailing matrix with BLAS Level 3 operations (matrix products and rank2k updates), which, in general, can give better performance on the GPU.
xxGST_BLOCKSIZEÂ¶

xxGST_BLOCKSIZE
Â¶ Determines the size of the leading block that is reduced to standard form at each step when using the blocked algorithm (SYGST/HEGST). It also applies to the corresponding batched and stridedbatched routines.
xxGST_BLOCKSIZE also acts as a switch size; if the original size of the problem is not larger than xxGST_BLOCKSIZE (n <= xxGST_BLOCKSIZE), SYGST/HEGST will directly call the unblocked routines (SYGS2/HEGS2). However, when n is not a multiple of xxGST_BLOCKSIZE, the last block reduced in the blocked process is allowed to be smaller than xxGST_BLOCKSIZE.
(As of the current rocSOLVER release, this constant has not been tuned for any specific cases.)
syevd, heevd and stedc functionsÂ¶
When running SYEVD/HEEVD (or the corresponding batched and stridedbatched routines), the computation of the eigenvectors of the associated tridiagonal matrix can be sped up using a divideandconquer approach (implemented in STEDC), provided the size of the independent block is large enough.
STEDC_MIN_DC_SIZEÂ¶

STEDC_MIN_DC_SIZE
Â¶ Determines the minimum size required for the eigenvectors of an independent block of a tridiagonal matrix to be computed using the divideandconquer algorithm (STEDC).
If the size of the block is not greater than STEDC_MIN_DC_SIZE (bs <= STEDC_MIN_DC_SIZE), the eigenvectors are computed with the normal QR algorithm.
(As of the current rocSOLVER release, this constant has not been tuned for any specific cases.)
potf2/potrf functionsÂ¶
The Cholesky factorization is separated into blocked (rightlooking) and unblocked versions. The unblocked routine POTF2, based on BLAS Level 2 operations, computes one diagonal element at a time and scales the corresponding row/column. The blocked routine POTRF factorizes a leading block of rows/columns at each step using the unblocked algorithm (provided the matrix is large enough) and updates the trailing matrix with BLAS Level 3 operations (symmetric rankk updates), which, in general, can give better performance on the GPU.
POTRF_BLOCKSIZEÂ¶

POTRF_BLOCKSIZE
Â¶ Determines the size of the leading block that is factorized at each step when using the blocked algorithm (POTRF). It also applies to the corresponding batched and stridedbatched routines.
POTRF_POTF2_SWITCHSIZEÂ¶

POTRF_POTF2_SWITCHSIZE
Â¶ Determines the size at which rocSOLVER switches from the unblocked to the blocked algorithm when executing POTRF. It also applies to the corresponding batched and stridedbatched routines.
POTRF will factorize blocks of POTRF_BLOCKSIZE columns at a time until the rest of the matrix has no more than POTRF_POTF2_SWITCHSIZE columns; at this point the last block, if any, will be factorized with the unblocked algorithm (POTF2).
(As of the current rocSOLVER release, these constants have not been tuned for any specific cases.)
sytf2/sytrf and lasyf functionsÂ¶
The BunchKaufman factorization is separated into blocked and unblocked versions. The unblocked routine SYTF2 generates one 1by1 or 2by2 diagonal block at a time and applies a rank1 update. The blocked routine SYTRF executes a partial factorization of a given maximum number of diagonal elements (LASYF) at each step (provided the matrix is large enough), and updates the rest of the matrix with matrixmatrix operations (BLAS Level 3), which, in general, can give better performance on the GPU.
SYTRF_BLOCKSIZEÂ¶

SYTRF_BLOCKSIZE
Â¶ Determines the maximum size of the partial factorization executed at each step when using the blocked algorithm (SYTRF). It also applies to the corresponding batched and stridedbatched routines.
SYTRF_SYTF2_SWITCHSIZEÂ¶

SYTRF_SYTF2_SWITCHSIZE
Â¶ Determines the size at which rocSOLVER switches from the unblocked to the blocked algorithm when executing SYTRF. It also applies to the corresponding batched and stridedbatched routines.
SYTRF will use LASYF to factorize a submatrix of at most SYTRF_BLOCKSIZE columns at a time until the rest of the matrix has no more than SYTRF_SYTF2_SWITCHSIZE columns; at this point the last block, if any, will be factorized with the unblocked algorithm (SYTF2).
(As of the current rocSOLVER release, these constants have not been tuned for any specific cases.)
rocSOLVER APIÂ¶
TypesÂ¶
rocSOLVER uses types and enumerations defined by the rocBLAS API. For more information, see the rocBLAS types documentation. Next we present additional types, only used in rocSOLVER, that extend the rocBLAS API.
Additional typesÂ¶
List of additional types
rocblas_svectÂ¶

enum
rocblas_svect
Â¶ Used to specify how the singular vectors are to be computed and stored.
Values:

enumerator
rocblas_svect_all
Â¶ The entire associated orthogonal/unitary matrix is computed.

enumerator
rocblas_svect_singular
Â¶ Only the singular vectors are computed and stored in output array.

enumerator
rocblas_svect_overwrite
Â¶ Only the singular vectors are computed and overwrite the input matrix.

enumerator
rocblas_svect_none
Â¶ No singular vectors are computed.

enumerator
rocblas_evectÂ¶

enum
rocblas_evect
Â¶ Used to specify how the eigenvectors are to be computed.
Values:

enumerator
rocblas_evect_original
Â¶ Compute eigenvectors for the original symmetric/Hermitian matrix.

enumerator
rocblas_evect_tridiagonal
Â¶ Compute eigenvectors for the symmetric tridiagonal matrix.

enumerator
rocblas_evect_none
Â¶ No eigenvectors are computed.

enumerator
rocblas_workmodeÂ¶

enum
rocblas_workmode
Â¶ Used to enable the use of fast algorithms (with outofplace computations) in some of the routines.
Values:

enumerator
rocblas_outofplace
Â¶ Outofplace computations are allowed; this requires extra device memory for workspace.

enumerator
rocblas_inplace
Â¶ If not enough memory is available, this forces inplace computations.

enumerator
LAPACK Auxiliary FunctionsÂ¶
These are functions that support more advanced LAPACK routines. The auxiliary functions are divided into the following categories:
Vector and Matrix manipulations. Some basic operations with vectors and matrices that are not part of the BLAS standard.
Householder reflections. Generation and application of Householder matrices.
Bidiagonal forms. Computations specialized in bidiagonal matrices.
Tridiagonal forms. Computations specialized in tridiagonal matrices.
Symmetric matrices. Computations specialized in symmetric matrices.
Orthonormal matrices. Generation and application of orthonormal matrices.
Unitary matrices. Generation and application of unitary matrices.
Note
Throughout the APIsâ€™ descriptions, we use the following notations:
x[i] stands for the ith element of vector x, while A[i,j] represents the element in the ith row and jth column of matrix A. Indices are 1based, i.e. x[1] is the first element of x.
If X is a real vector or matrix, \(X^T\) indicates its transpose; if X is complex, then \(X^H\) represents its conjugate transpose. When X could be real or complex, we use Xâ€™ to indicate X transposed or X conjugate transposed, accordingly.
x_i \(=x_i\); we sometimes use both notations, \(x_i\) when displaying mathematical equations, and x_i in the text describing the function parameters.
Vector and Matrix manipulationsÂ¶
List of vector and matrix manipulations
rocsolver_<type>lacgv()Â¶

rocblas_status
rocsolver_zlacgv
(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *x, const rocblas_int incx)Â¶

rocblas_status
rocsolver_clacgv
(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *x, const rocblas_int incx)Â¶ LACGV conjugates the complex vector x.
It conjugates the n entries of a complex vector x with increment incx.
 Parameters
[in] handle
: rocblas_handle.[in] n
: rocblas_int. n >= 0.The dimension of vector x.
[inout] x
: pointer to type. Array on the GPU of size at least n (size depends on the value of incx).On entry, the vector x. On exit, each entry is overwritten with its conjugate value.
[in] incx
: rocblas_int. incx != 0.The distance between two consecutive elements of x. If incx is negative, the elements of x are indexed in reverse order.
rocsolver_<type>laswp()Â¶

rocblas_status
rocsolver_zlaswp
(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_int k1, const rocblas_int k2, const rocblas_int *ipiv, const rocblas_int incx)Â¶

rocblas_status
rocsolver_claswp
(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_int k1, const rocblas_int k2, const rocblas_int *ipiv, const rocblas_int incx)Â¶

rocblas_status
rocsolver_dlaswp
(rocblas_handle handle, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_int k1, const rocblas_int k2, const rocblas_int *ipiv, const rocblas_int incx)Â¶

rocblas_status
rocsolver_slaswp
(rocblas_handle handle, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_int k1, const rocblas_int k2, const rocblas_int *ipiv, const rocblas_int incx)Â¶ LASWP performs a series of row interchanges on the matrix A.
Row interchanges are done one by one. If \(\text{ipiv}[k_1 + (j  k_1) \cdot \text{abs}(\text{incx})] = r\), then the jth row of A will be interchanged with the rth row of A, for \(j = k_1,k_1+1,\dots,k_2\). Indices \(k_1\) and \(k_2\) are 1based indices.
 Parameters
[in] handle
: rocblas_handle.[in] n
: rocblas_int. n >= 0.The number of columns of the matrix A.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the matrix to which the row interchanges will be applied. On exit, the resulting permuted matrix.
[in] lda
: rocblas_int. lda > 0.The leading dimension of the array A.
[in] k1
: rocblas_int. k1 > 0.The k_1 index. It is the first element of ipiv for which a row interchange will be done. This is a 1based index.
[in] k2
: rocblas_int. k2 > k1 > 0.The k_2 index. k_2  k_1 + 1 is the number of elements of ipiv for which a row interchange will be done. This is a 1based index.
[in] ipiv
: pointer to rocblas_int. Array on the GPU of dimension at least k_1 + (k_2  k_1)*abs(incx).The vector of pivot indices. Only the elements in positions k_1 through k_1 + (k_2  k_1)*abs(incx) of this vector are accessed. Elements of ipiv are considered 1based.
[in] incx
: rocblas_int. incx != 0.The distance between successive values of ipiv. If incx is negative, the pivots are applied in reverse order.
Householder reflectionsÂ¶
List of Householder functions
rocsolver_<type>larfg()Â¶

rocblas_status
rocsolver_zlarfg
(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *alpha, rocblas_double_complex *x, const rocblas_int incx, rocblas_double_complex *tau)Â¶

rocblas_status
rocsolver_clarfg
(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *alpha, rocblas_float_complex *x, const rocblas_int incx, rocblas_float_complex *tau)Â¶

rocblas_status
rocsolver_dlarfg
(rocblas_handle handle, const rocblas_int n, double *alpha, double *x, const rocblas_int incx, double *tau)Â¶

rocblas_status
rocsolver_slarfg
(rocblas_handle handle, const rocblas_int n, float *alpha, float *x, const rocblas_int incx, float *tau)Â¶ LARFG generates a Householder reflector H of order n.
The reflector H is such that
\[\begin{split} H'\left[\begin{array}{c} \text{alpha}\\ x \end{array}\right]=\left[\begin{array}{c} \text{beta}\\ 0 \end{array}\right] \end{split}\]where x is an n1 vector, and alpha and beta are scalars. Matrix H can be generated as
\[\begin{split} H = I  \text{tau}\left[\begin{array}{c} 1\\ v \end{array}\right]\left[\begin{array}{cc} 1 & v' \end{array}\right] \end{split}\]where v is an n1 vector, and tau is a scalar known as the Householder scalar. The vector
\[\begin{split} \bar{v}=\left[\begin{array}{c} 1\\ v \end{array}\right] \end{split}\]is the Householder vector associated with the reflection.
 Note
The matrix H is orthogonal/unitary (i.e. \(H'H=HH'=I\)). It is symmetric when real (i.e. \(H^T=H\)), but not Hermitian when complex (i.e. \(H^H\neq H\) in general).
 Parameters
[in] handle
: rocblas_handle.[in] n
: rocblas_int. n >= 0.The order (size) of reflector H.
[inout] alpha
: pointer to type. A scalar on the GPU.On entry, the scalar alpha. On exit, it is overwritten with beta.
[inout] x
: pointer to type. Array on the GPU of size at least n1 (size depends on the value of incx).On entry, the vector x, On exit, it is overwritten with vector v.
[in] incx
: rocblas_int. incx > 0.The distance between two consecutive elements of x.
[out] tau
: pointer to type. A scalar on the GPU.The Householder scalar tau.
rocsolver_<type>larft()Â¶

rocblas_status
rocsolver_zlarft
(rocblas_handle handle, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int n, const rocblas_int k, rocblas_double_complex *V, const rocblas_int ldv, rocblas_double_complex *tau, rocblas_double_complex *T, const rocblas_int ldt)Â¶

rocblas_status
rocsolver_clarft
(rocblas_handle handle, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int n, const rocblas_int k, rocblas_float_complex *V, const rocblas_int ldv, rocblas_float_complex *tau, rocblas_float_complex *T, const rocblas_int ldt)Â¶

rocblas_status
rocsolver_dlarft
(rocblas_handle handle, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int n, const rocblas_int k, double *V, const rocblas_int ldv, double *tau, double *T, const rocblas_int ldt)Â¶

rocblas_status
rocsolver_slarft
(rocblas_handle handle, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int n, const rocblas_int k, float *V, const rocblas_int ldv, float *tau, float *T, const rocblas_int ldt)Â¶ LARFT generates the triangular factor T of a block reflector H of order n.
The block reflector H is defined as the product of k Householder matrices
\[\begin{split} \begin{array}{cl} H = H_1H_2\cdots H_k & \: \text{if direct indicates forward direction, or} \\ H = H_k\cdots H_2H_1 & \: \text{if direct indicates backward direction} \end{array} \end{split}\]The triangular factor T is upper triangular in the forward direction and lower triangular in the backward direction. If storev is columnwise, then
\[ H = I  VTV' \]where the ith column of matrix V contains the Householder vector associated with \(H_i\). If storev is rowwise, then
\[ H = I  V'TV \]where the ith row of matrix V contains the Householder vector associated with \(H_i\).
 Parameters
[in] handle
: rocblas_handle.[in] direct
: rocblas_direct.Specifies the direction in which the Householder matrices are applied.
[in] storev
: rocblas_storev.Specifies how the Householder vectors are stored in matrix V.
[in] n
: rocblas_int. n >= 0.The order (size) of the block reflector.
[in] k
: rocblas_int. k >= 1.The number of Householder matrices forming H.
[in] V
: pointer to type. Array on the GPU of size ldv*k if columnwise, or ldv*n if rowwise.The matrix of Householder vectors.
[in] ldv
: rocblas_int. ldv >= n if columnwise, or ldv >= k if rowwise.Leading dimension of V.
[in] tau
: pointer to type. Array of k scalars on the GPU.The vector of all the Householder scalars.
[out] T
: pointer to type. Array on the GPU of dimension ldt*k.The triangular factor. T is upper triangular if direct indicates forward direction, otherwise it is lower triangular. The rest of the array is not used.
[in] ldt
: rocblas_int. ldt >= k.The leading dimension of T.
rocsolver_<type>larf()Â¶

rocblas_status
rocsolver_zlarf
(rocblas_handle handle, const rocblas_side side, const rocblas_int m, const rocblas_int n, rocblas_double_complex *x, const rocblas_int incx, const rocblas_double_complex *alpha, rocblas_double_complex *A, const rocblas_int lda)Â¶

rocblas_status
rocsolver_clarf
(rocblas_handle handle, const rocblas_side side, const rocblas_int m, const rocblas_int n, rocblas_float_complex *x, const rocblas_int incx, const rocblas_float_complex *alpha, rocblas_float_complex *A, const rocblas_int lda)Â¶

rocblas_status
rocsolver_dlarf
(rocblas_handle handle, const rocblas_side side, const rocblas_int m, const rocblas_int n, double *x, const rocblas_int incx, const double *alpha, double *A, const rocblas_int lda)Â¶

rocblas_status
rocsolver_slarf
(rocblas_handle handle, const rocblas_side side, const rocblas_int m, const rocblas_int n, float *x, const rocblas_int incx, const float *alpha, float *A, const rocblas_int lda)Â¶ LARF applies a Householder reflector H to a general matrix A.
The Householder reflector H, of order m or n, is to be applied to an mbyn matrix A from the left or the right, depending on the value of side. H is given by
\[ H = I  \text{alpha}\cdot xx' \]where alpha is the Householder scalar and x is a Householder vector. H is never actually computed.
 Parameters
[in] handle
: rocblas_handle.[in] side
: rocblas_side.Determines whether H is applied from the left or the right.
[in] m
: rocblas_int. m >= 0.Number of rows of A.
[in] n
: rocblas_int. n >= 0.Number of columns of A.
[in] x
: pointer to type. Array on the GPU of size at least 1 + (m1)*abs(incx) if left side, or at least 1 + (n1)*abs(incx) if right side.The Householder vector x.
[in] incx
: rocblas_int. incx != 0.Distance between two consecutive elements of x. If incx < 0, the elements of x are indexed in reverse order.
[in] alpha
: pointer to type. A scalar on the GPU.The Householder scalar. If alpha = 0, then H = I (A will remain the same; x is never used)
[inout] A
: pointer to type. Array on the GPU of size lda*n.On entry, the matrix A. On exit, it is overwritten with H*A (or A*H).
[in] lda
: rocblas_int. lda >= m.Leading dimension of A.
rocsolver_<type>larfb()Â¶

rocblas_status
rocsolver_zlarfb
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *V, const rocblas_int ldv, rocblas_double_complex *T, const rocblas_int ldt, rocblas_double_complex *A, const rocblas_int lda)Â¶

rocblas_status
rocsolver_clarfb
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *V, const rocblas_int ldv, rocblas_float_complex *T, const rocblas_int ldt, rocblas_float_complex *A, const rocblas_int lda)Â¶

rocblas_status
rocsolver_dlarfb
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *V, const rocblas_int ldv, double *T, const rocblas_int ldt, double *A, const rocblas_int lda)Â¶

rocblas_status
rocsolver_slarfb
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *V, const rocblas_int ldv, float *T, const rocblas_int ldt, float *A, const rocblas_int lda)Â¶ LARFB applies a block reflector H to a general mbyn matrix A.
The block reflector H is applied in one of the following forms, depending on the values of side and trans:
\[\begin{split} \begin{array}{cl} HA & \: \text{(No transpose from the left),}\\ H'A & \: \text{(Transpose or conjugate transpose from the left),}\\ AH & \: \text{(No transpose from the right), or}\\ AH' & \: \text{(Transpose or conjugate transpose from the right).} \end{array} \end{split}\]The block reflector H is defined as the product of k Householder matrices as
\[\begin{split} \begin{array}{cl} H = H_1H_2\cdots H_k & \: \text{if direct indicates forward direction, or} \\ H = H_k\cdots H_2H_1 & \: \text{if direct indicates backward direction} \end{array} \end{split}\]H is never stored. It is calculated as
\[ H = I  VTV' \]where the ith column of matrix V contains the Householder vector associated with \(H_i\), if storev is columnwise; or
\[ H = I  V'TV \]where the ith row of matrix V contains the Householder vector associated with \(H_i\), if storev is rowwise. T is the associated triangular factor as computed by LARFT.
 Parameters
[in] handle
: rocblas_handle.[in] side
: rocblas_side.Specifies from which side to apply H.
[in] trans
: rocblas_operation.Specifies whether the block reflector or its transpose/conjugate transpose is to be applied.
[in] direct
: rocblas_direct.Specifies the direction in which the Householder matrices are to be applied to generate H.
[in] storev
: rocblas_storev.Specifies how the Householder vectors are stored in matrix V.
[in] m
: rocblas_int. m >= 0.Number of rows of matrix A.
[in] n
: rocblas_int. n >= 0.Number of columns of matrix A.
[in] k
: rocblas_int. k >= 1.The number of Householder matrices.
[in] V
: pointer to type. Array on the GPU of size ldv*k if columnwise, ldv*n if rowwise and applying from the right, or ldv*m if rowwise and applying from the left.The matrix of Householder vectors.
[in] ldv
: rocblas_int. ldv >= k if rowwise, ldv >= m if columnwise and applying from the left, or ldv >= n if columnwise and applying from the right.Leading dimension of V.
[in] T
: pointer to type. Array on the GPU of dimension ldt*k.The triangular factor of the block reflector.
[in] ldt
: rocblas_int. ldt >= k.The leading dimension of T.
[inout] A
: pointer to type. Array on the GPU of size lda*n.On entry, the matrix A. On exit, it is overwritten with H*A, A*H, Hâ€™*A, or A*Hâ€™.
[in] lda
: rocblas_int. lda >= m.Leading dimension of A.
Bidiagonal formsÂ¶
List of functions for bidiagonal forms
rocsolver_<type>labrd()Â¶

rocblas_status
rocsolver_zlabrd
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tauq, rocblas_double_complex *taup, rocblas_double_complex *X, const rocblas_int ldx, rocblas_double_complex *Y, const rocblas_int ldy)Â¶

rocblas_status
rocsolver_clabrd
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tauq, rocblas_float_complex *taup, rocblas_float_complex *X, const rocblas_int ldx, rocblas_float_complex *Y, const rocblas_int ldy)Â¶

rocblas_status
rocsolver_dlabrd
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *D, double *E, double *tauq, double *taup, double *X, const rocblas_int ldx, double *Y, const rocblas_int ldy)Â¶

rocblas_status
rocsolver_slabrd
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *D, float *E, float *tauq, float *taup, float *X, const rocblas_int ldx, float *Y, const rocblas_int ldy)Â¶ LABRD computes the bidiagonal form of the first k rows and columns of a general mbyn matrix A, as well as the matrices X and Y needed to reduce the remaining part of A.
The reduced form is given by:
\[ B = Q'AP \]where the leading kbyk block of B is upper bidiagonal if m >= n, or lower bidiagonal if m < n. Q and P are orthogonal/unitary matrices represented as the product of Householder matrices
\[\begin{split} \begin{array}{cl} Q = H_1H_2\cdots H_k, & \text{and} \\ P = G_1G_2\cdots G_k. \end{array} \end{split}\]Each Householder matrix \(H_i\) and \(G_i\) is given by
\[\begin{split} \begin{array}{cl} H_i = I  \text{tauq}[i]\cdot v_iv_i', & \text{and} \\ G_i = I  \text{taup}[i]\cdot u_iu_i'. \end{array} \end{split}\]If m >= n, the first i1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i]=1\); while the first i elements of the Householder vector \(u_i\) are zero, and \(u_i[i+1]=1\). If m < n, the first i elements of the Householder vector \(v_i\) are zero, and \(v_i[i+1]=1\); while the first i1 elements of the Householder vector \(u_i\) are zero, and \(u_i[i]=1\).
The unreduced part of the matrix A can be updated using the block update
\[ A = A  VY'  XU' \]where V and U are the mbyk and nbyk matrices formed with the vectors \(v_i\) and \(u_i\), respectively.
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of the matrix A.
[in] n
: rocblas_int. n >= 0.The number of columns of the matrix A.
[in] k
: rocblas_int. min(m,n) >= k >= 0.The number of leading rows and columns of matrix A that will be reduced.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the mbyn matrix to be reduced. On exit, the first k elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n), contain the bidiagonal form B. If m >= n, the elements below the diagonal of the first k columns are the possibly nonzero elements of the Householder vectors associated with Q, while the elements above the superdiagonal of the first k rows are the n  i  1 possibly nonzero elements of the Householder vectors related to P. If m < n, the elements below the subdiagonal of the first k columns are the m  i  1 possibly nonzero elements of the Householder vectors related to Q, while the elements above the diagonal of the first k rows are the n  i possibly nonzero elements of the vectors associated with P.
[in] lda
: rocblas_int. lda >= m.specifies the leading dimension of A.
[out] D
: pointer to real type. Array on the GPU of dimension k.The diagonal elements of B.
[out] E
: pointer to real type. Array on the GPU of dimension k.The offdiagonal elements of B.
[out] tauq
: pointer to type. Array on the GPU of dimension k.The Householder scalars associated with matrix Q.
[out] taup
: pointer to type. Array on the GPU of dimension k.The Householder scalars associated with matrix P.
[out] X
: pointer to type. Array on the GPU of dimension ldx*k.The mbyk matrix needed to update the unreduced part of A.
[in] ldx
: rocblas_int. ldx >= m.The leading dimension of X.
[out] Y
: pointer to type. Array on the GPU of dimension ldy*k.The nbyk matrix needed to update the unreduced part of A.
[in] ldy
: rocblas_int. ldy >= n.The leading dimension of Y.
rocsolver_<type>bdsqr()Â¶

rocblas_status
rocsolver_zbdsqr
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nv, const rocblas_int nu, const rocblas_int nc, double *D, double *E, rocblas_double_complex *V, const rocblas_int ldv, rocblas_double_complex *U, const rocblas_int ldu, rocblas_double_complex *C, const rocblas_int ldc, rocblas_int *info)Â¶

rocblas_status
rocsolver_cbdsqr
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nv, const rocblas_int nu, const rocblas_int nc, float *D, float *E, rocblas_float_complex *V, const rocblas_int ldv, rocblas_float_complex *U, const rocblas_int ldu, rocblas_float_complex *C, const rocblas_int ldc, rocblas_int *info)Â¶

rocblas_status
rocsolver_dbdsqr
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nv, const rocblas_int nu, const rocblas_int nc, double *D, double *E, double *V, const rocblas_int ldv, double *U, const rocblas_int ldu, double *C, const rocblas_int ldc, rocblas_int *info)Â¶

rocblas_status
rocsolver_sbdsqr
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nv, const rocblas_int nu, const rocblas_int nc, float *D, float *E, float *V, const rocblas_int ldv, float *U, const rocblas_int ldu, float *C, const rocblas_int ldc, rocblas_int *info)Â¶ BDSQR computes the singular value decomposition (SVD) of an nbyn bidiagonal matrix B, using the implicit QR algorithm.
The SVD of B has the form:
\[ B = QSP' \]where S is the nbyn diagonal matrix of singular values of B, the columns of Q are the left singular vectors of B, and the columns of P are its right singular vectors.
The computation of the singular vectors is optional; this function accepts input matrices U (of size nubyn) and V (of size nbynv) that are overwritten with \(UQ\) and \(P'V\). If nu = 0 no left vectors are computed; if nv = 0 no right vectors are computed.
Optionally, this function can also compute \(Q'C\) for a given nbync input matrix C.
 Parameters
[in] handle
: rocblas_handle.[in] uplo
: rocblas_fill.Specifies whether B is upper or lower bidiagonal.
[in] n
: rocblas_int. n >= 0.The number of rows and columns of matrix B.
[in] nv
: rocblas_int. nv >= 0.The number of columns of matrix V.
[in] nu
: rocblas_int. nu >= 0.The number of rows of matrix U.
[in] nc
: rocblas_int. nu >= 0.The number of columns of matrix C.
[inout] D
: pointer to real type. Array on the GPU of dimension n.On entry, the diagonal elements of B. On exit, if info = 0, the singular values of B in decreasing order; if info > 0, the diagonal elements of a bidiagonal matrix orthogonally equivalent to B.
[inout] E
: pointer to real type. Array on the GPU of dimension n1.On entry, the offdiagonal elements of B. On exit, if info > 0, the offdiagonal elements of a bidiagonal matrix orthogonally equivalent to B (if info = 0 this matrix converges to zero).
[inout] V
: pointer to type. Array on the GPU of dimension ldv*nv.On entry, the matrix V. On exit, it is overwritten with Pâ€™*V. (Not referenced if nv = 0).
[in] ldv
: rocblas_int. ldv >= n if nv > 0, or ldv >=1 if nv = 0.The leading dimension of V.
[inout] U
: pointer to type. Array on the GPU of dimension ldu*n.On entry, the matrix U. On exit, it is overwritten with U*Q. (Not referenced if nu = 0).
[in] ldu
: rocblas_int. ldu >= nu.The leading dimension of U.
[inout] C
: pointer to type. Array on the GPU of dimension ldc*nc.On entry, the matrix C. On exit, it is overwritten with Qâ€™*C. (Not referenced if nc = 0).
[in] ldc
: rocblas_int. ldc >= n if nc > 0, or ldc >=1 if nc = 0.The leading dimension of C.
[out] info
: pointer to a rocblas_int on the GPU.If info = 0, successful exit. If info = i > 0, i elements of E have not converged to zero.
Tridiagonal formsÂ¶
List of functions for tridiagonal forms
rocsolver_<type>latrd()Â¶

rocblas_status
rocsolver_zlatrd
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, double *E, rocblas_double_complex *tau, rocblas_double_complex *W, const rocblas_int ldw)Â¶

rocblas_status
rocsolver_clatrd
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, float *E, rocblas_float_complex *tau, rocblas_float_complex *W, const rocblas_int ldw)Â¶

rocblas_status
rocsolver_dlatrd
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *E, double *tau, double *W, const rocblas_int ldw)Â¶

rocblas_status
rocsolver_slatrd
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *E, float *tau, float *W, const rocblas_int ldw)Â¶ LATRD computes the tridiagonal form of k rows and columns of a symmetric/hermitian matrix A, as well as the matrix W needed to update the remaining part of A.
The reduced form is given by:
\[ T = Q'AQ \]If uplo is lower, the first k rows and columns of T form the tridiagonal block. If uplo is upper, then the last k rows and columns of T form the tridiagonal block. Q is an orthogonal/unitary matrix represented as the product of Householder matrices
\[\begin{split} \begin{array}{cl} Q = H_1H_2\cdots H_k & \text{if uplo indicates lower, or}\\ Q = H_nH_{n1}\cdots H_{nk+1} & \text{if uplo is upper}. \end{array} \end{split}\]Each Householder matrix \(H_i\) is given by
\[ H_i = I  \text{tau}[i]\cdot v_iv_i' \]where tau[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_i\) are zero, and \(v_i[i+1] = 1\). If uplo is upper, the last ni elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).
The unreduced part of the matrix A can be updated using a rank update of the form:
\[ A = A  VW'  WV' \]where V is the nbyk matrix formed by the vectors \(v_i\).
 Parameters
[in] handle
: rocblas_handle.[in] uplo
: rocblas_fill.Specifies whether the upper or lower part of the matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.
[in] n
: rocblas_int. n >= 0.The number of rows and columns of the matrix A.
[in] k
: rocblas_int. 0 <= k <= n.The number of rows and columns of the matrix A to be reduced.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the nbyn matrix to be reduced. On exit, if uplo is lower, the first k columns have been reduced to tridiagonal form (given in the diagonal elements of A and the array E), the elements below the diagonal contain the possibly nonzero entries of the Householder vectors associated with Q, stored as columns. If uplo is upper, the last k columns have been reduced to tridiagonal form (given in the diagonal elements of A and the array E), the elements above the diagonal contain the possibly nonzero entries of the Householder vectors associated with Q, stored as columns.
[in] lda
: rocblas_int. lda >= n.The leading dimension of A.
[out] E
: pointer to real type. Array on the GPU of dimension n1.If upper (lower), the last (first) k elements of E are the offdiagonal elements of the computed tridiagonal block.
[out] tau
: pointer to type. Array on the GPU of dimension n1.If upper (lower), the last (first) k elements of tau are the Householder scalars related to Q.
[out] W
: pointer to type. Array on the GPU of dimension ldw*k.The nbyk matrix needed to update the unreduced part of A.
[in] ldw
: rocblas_int. ldw >= n.The leading dimension of W.
rocsolver_<type>sterf()Â¶

rocblas_status
rocsolver_dsterf
(rocblas_handle handle, const rocblas_int n, double *D, double *E, rocblas_int *info)Â¶

rocblas_status
rocsolver_ssterf
(rocblas_handle handle, const rocblas_int n, float *D, float *E, rocblas_int *info)Â¶ STERF computes the eigenvalues of a symmetric tridiagonal matrix.
The eigenvalues of the symmetric tridiagonal matrix are computed by the PalWalkerKahan variant of the QL/QR algorithm, and returned in increasing order.
The matrix is not represented explicitly, but rather as the array of diagonal elements D and the array of symmetric offdiagonal elements E.
 Parameters
[in] handle
: rocblas_handle.[in] n
: rocblas_int. n >= 0.The number of rows and columns of the tridiagonal matrix.
[inout] D
: pointer to real type. Array on the GPU of dimension n.On entry, the diagonal elements of the tridiagonal matrix. On exit, if info = 0, the eigenvalues in increasing order. If info > 0, the diagonal elements of a tridiagonal matrix that is similar to the original matrix (i.e. has the same eigenvalues).
[inout] E
: pointer to real type. Array on the GPU of dimension n1.On entry, the offdiagonal elements of the tridiagonal matrix. On exit, if info = 0, this array converges to zero. If info > 0, the offdiagonal elements of a tridiagonal matrix that is similar to the original matrix (i.e. has the same eigenvalues).
[out] info
: pointer to a rocblas_int on the GPU.If info = 0, successful exit. If info = i > 0, STERF did not converge. i elements of E did not converge to zero.
rocsolver_<type>steqr()Â¶

rocblas_status
rocsolver_zsteqr
(rocblas_handle handle, const rocblas_evect evect, const rocblas_int n, double *D, double *E, rocblas_double_complex *C, const rocblas_int ldc, rocblas_int *info)Â¶

rocblas_status
rocsolver_csteqr
(rocblas_handle handle, const rocblas_evect evect, const rocblas_int n, float *D, float *E, rocblas_float_complex *C, const rocblas_int ldc, rocblas_int *info)Â¶

rocblas_status
rocsolver_dsteqr
(rocblas_handle handle, const rocblas_evect evect, const rocblas_int n, double *D, double *E, double *C, const rocblas_int ldc, rocblas_int *info)Â¶

rocblas_status
rocsolver_ssteqr
(rocblas_handle handle, const rocblas_evect evect, const rocblas_int n, float *D, float *E, float *C, const rocblas_int ldc, rocblas_int *info)Â¶ STEQR computes the eigenvalues and (optionally) eigenvectors of a symmetric tridiagonal matrix.
The eigenvalues of the symmetric tridiagonal matrix are computed by the implicit QL/QR algorithm, and returned in increasing order.
The matrix is not represented explicitly, but rather as the array of diagonal elements D and the array of symmetric offdiagonal elements E. When D and E correspond to the tridiagonal form of a full symmetric/Hermitian matrix, as returned by, e.g., SYTRD or HETRD, the eigenvectors of the original matrix can also be computed, depending on the value of evect.
 Parameters
[in] handle
: rocblas_handle.[in] evect
: rocblas_evect.Specifies how the eigenvectors are computed.
[in] n
: rocblas_int. n >= 0.The number of rows and columns of the tridiagonal matrix.
[inout] D
: pointer to real type. Array on the GPU of dimension n.On entry, the diagonal elements of the tridiagonal matrix. On exit, if info = 0, the eigenvalues in increasing order. If info > 0, the diagonal elements of a tridiagonal matrix that is similar to the original matrix (i.e. has the same eigenvalues).
[inout] E
: pointer to real type. Array on the GPU of dimension n1.On entry, the offdiagonal elements of the tridiagonal matrix. On exit, if info = 0, this array converges to zero. If info > 0, the offdiagonal elements of a tridiagonal matrix that is similar to the original matrix (i.e. has the same eigenvalues).
[inout] C
: pointer to type. Array on the GPU of dimension ldc*n.On entry, if evect is original, the orthogonal/unitary matrix used for the reduction to tridiagonal form as returned by, e.g.,
ORGTR or UNGTR. On exit, it is overwritten with the eigenvectors of the original symmetric/Hermitian matrix (if evect is original), or the eigenvectors of the tridiagonal matrix (if evect is tridiagonal). (Not referenced if evect is none).[in] ldc
: rocblas_int. ldc >= n if evect is original or tridiagonal.Specifies the leading dimension of C. (Not referenced if evect is none).
[out] info
: pointer to a rocblas_int on the GPU.If info = 0, successful exit. If info = i > 0, STEQR did not converge. i elements of E did not converge to zero.
rocsolver_<type>stedc()Â¶

rocblas_status
rocsolver_zstedc
(rocblas_handle handle, const rocblas_evect evect, const rocblas_int n, double *D, double *E, rocblas_double_complex *C, const rocblas_int ldc, rocblas_int *info)Â¶

rocblas_status
rocsolver_cstedc
(rocblas_handle handle, const rocblas_evect evect, const rocblas_int n, float *D, float *E, rocblas_float_complex *C, const rocblas_int ldc, rocblas_int *info)Â¶

rocblas_status
rocsolver_dstedc
(rocblas_handle handle, const rocblas_evect evect, const rocblas_int n, double *D, double *E, double *C, const rocblas_int ldc, rocblas_int *info)Â¶

rocblas_status
rocsolver_sstedc
(rocblas_handle handle, const rocblas_evect evect, const rocblas_int n, float *D, float *E, float *C, const rocblas_int ldc, rocblas_int *info)Â¶ STEDC computes the eigenvalues and (optionally) eigenvectors of a symmetric tridiagonal matrix.
This function uses the divide and conquer method to compute the eigenvectors. The eigenvalues are returned in increasing order.
The matrix is not represented explicitly, but rather as the array of diagonal elements D and the array of symmetric offdiagonal elements E. When D and E correspond to the tridiagonal form of a full symmetric/Hermitian matrix, as returned by, e.g., SYTRD or HETRD, the eigenvectors of the original matrix can also be computed, depending on the value of evect.
 Parameters
[in] handle
: rocblas_handle.[in] evect
: rocblas_evect.Specifies how the eigenvectors are computed.
[in] n
: rocblas_int. n >= 0.The number of rows and columns of the tridiagonal matrix.
[inout] D
: pointer to real type. Array on the GPU of dimension n.On entry, the diagonal elements of the tridiagonal matrix. On exit, if info = 0, the eigenvalues in increasing order.
[inout] E
: pointer to real type. Array on the GPU of dimension n1.On entry, the offdiagonal elements of the tridiagonal matrix. On exit, if info = 0, the values of this array are destroyed.
[inout] C
: pointer to type. Array on the GPU of dimension ldc*n.On entry, if evect is original, the orthogonal/unitary matrix used for the reduction to tridiagonal form as returned by, e.g.,
ORGTR or UNGTR. On exit, if info = 0, it is overwritten with the eigenvectors of the original symmetric/Hermitian matrix (if evect is original), or the eigenvectors of the tridiagonal matrix (if evect is tridiagonal). (Not referenced if evect is none).[in] ldc
: rocblas_int. ldc >= n if evect is original or tridiagonal.Specifies the leading dimension of C. (Not referenced if evect is none).
[out] info
: pointer to a rocblas_int on the GPU.If info = 0, successful exit. If info = i > 0, STEDC failed to compute an eigenvalue on the submatrix formed by the rows and columns info/(n+1) through mod(info,n+1).
Symmetric matricesÂ¶
List of functions for symmetric matrices
rocsolver_<type>lasyf()Â¶

rocblas_status
rocsolver_zlasyf
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nb, rocblas_int *kb, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶

rocblas_status
rocsolver_clasyf
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nb, rocblas_int *kb, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶

rocblas_status
rocsolver_dlasyf
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nb, rocblas_int *kb, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶

rocblas_status
rocsolver_slasyf
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nb, rocblas_int *kb, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶ LASYF computes a partial factorization of a symmetric matrix \(A\) using BunchKaufman diagonal pivoting.
The partial factorization has the form
\[\begin{split} A = \left[ \begin{array}{cc} I & U_{12} \\ 0 & U_{22} \end{array} \right] \left[ \begin{array}{cc} A_{11} & 0 \\ 0 & D \end{array} \right] \left[ \begin{array}{cc} I & 0 \\ U_{12}^T & U_{22}^T \end{array} \right] \end{split}\]or
\[\begin{split} A = \left[ \begin{array}{cc} L_{11} & 0 \\ L_{21} & I \end{array} \right] \left[ \begin{array}{cc} D & 0 \\ 0 & A_{22} \end{array} \right] \left[ \begin{array}{cc} L_{11}^T & L_{21}^T \\ 0 & I \end{array} \right] \end{split}\]depending on the value of uplo. The order of the block diagonal matrix \(D\) is either \(nb\) or \(nb1\), and is returned in the argument \(kb\).
 Parameters
[in] handle
: rocblas_handle.[in] uplo
: rocblas_fill.Specifies whether the upper or lower part of the matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.
[in] n
: rocblas_int. n >= 0.The number of rows and columns of the matrix A.
[in] nb
: rocblas_int. 2 <= nb <= n.The number of columns of A to be factored.
[out] kb
: pointer to a rocblas_int on the GPU.The number of columns of A that were actually factored (either nb or nb1).
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the symmetric matrix A to be factored. On exit, the partially factored matrix.
[in] lda
: rocblas_int. lda >= n.Specifies the leading dimension of A.
[out] ipiv
: pointer to rocblas_int. Array on the GPU of dimension n.The vector of pivot indices. Elements of ipiv are 1based indices. If uplo is upper, then only the last kb elements of ipiv will be set. For n  kb < k <= n, if ipiv[k] > 0 then rows and columns k and ipiv[k] were interchanged and D[k,k] is a 1by1 diagonal block. If, instead, ipiv[k] = ipiv[k1] < 0, then rows and columns k1 and ipiv[k] were interchanged and D[k1,k1] to D[k,k] is a 2by2 diagonal block. If uplo is lower, then only the first kb elements of ipiv will be set. For 1 <= k <= kb, if ipiv[k] > 0 then rows and columns k and ipiv[k] were interchanged and D[k,k] is a 1by1 diagonal block. If, instead, ipiv[k] = ipiv[k+1] < 0, then rows and columns k+1 and ipiv[k] were interchanged and D[k,k] to D[k+1,k+1] is a 2by2 diagonal block.
[out] info
: pointer to a rocblas_int on the GPU.If info = 0, successful exit. If info[i] = j > 0, D is singular. D[j,j] is the first diagonal zero.
Orthonormal matricesÂ¶
List of functions for orthonormal matrices
rocsolver_<type>org2r()Â¶

rocblas_status
rocsolver_dorg2r
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)Â¶

rocblas_status
rocsolver_sorg2r
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)Â¶ ORG2R generates an mbyn Matrix Q with orthonormal columns.
(This is the unblocked version of the algorithm).
The matrix Q is defined as the first n columns of the product of k Householder reflectors of order m
\[ Q = H_1H_2\cdots H_k. \]The Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors \(v_i\) and scalars \(\text{ipiv}[i]\), as returned by GEQRF.
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of the matrix Q.
[in] n
: rocblas_int. 0 <= n <= m.The number of columns of the matrix Q.
[in] k
: rocblas_int. 0 <= k <= n.The number of Householder reflectors.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the matrix A as returned by
GEQRF, with the Householder vectors in the first k columns. On exit, the computed matrix Q.[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GEQRF.
rocsolver_<type>orgqr()Â¶

rocblas_status
rocsolver_dorgqr
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)Â¶

rocblas_status
rocsolver_sorgqr
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)Â¶ ORGQR generates an mbyn Matrix Q with orthonormal columns.
(This is the blocked version of the algorithm).
The matrix Q is defined as the first n columns of the product of k Householder reflectors of order m
\[ Q = H_1H_2\cdots H_k \]The Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors \(v_i\) and scalars \(\text{ipiv}[i]\), as returned by GEQRF.
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of the matrix Q.
[in] n
: rocblas_int. 0 <= n <= m.The number of columns of the matrix Q.
[in] k
: rocblas_int. 0 <= k <= n.The number of Householder reflectors.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the matrix A as returned by
GEQRF, with the Householder vectors in the first k columns. On exit, the computed matrix Q.[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GEQRF.
rocsolver_<type>orgl2()Â¶

rocblas_status
rocsolver_dorgl2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)Â¶

rocblas_status
rocsolver_sorgl2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)Â¶ ORGL2 generates an mbyn Matrix Q with orthonormal rows.
(This is the unblocked version of the algorithm).
The matrix Q is defined as the first m rows of the product of k Householder reflectors of order n
\[ Q = H_kH_{k1}\cdots H_1 \]The Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors \(v_i\) and scalars \(\text{ipiv}[i]\), as returned by GELQF.
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. 0 <= m <= n.The number of rows of the matrix Q.
[in] n
: rocblas_int. n >= 0.The number of columns of the matrix Q.
[in] k
: rocblas_int. 0 <= k <= m.The number of Householder reflectors.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the matrix A as returned by
GELQF, with the Householder vectors in the first k rows. On exit, the computed matrix Q.[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GELQF.
rocsolver_<type>orglq()Â¶

rocblas_status
rocsolver_dorglq
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)Â¶

rocblas_status
rocsolver_sorglq
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)Â¶ ORGLQ generates an mbyn Matrix Q with orthonormal rows.
(This is the blocked version of the algorithm).
The matrix Q is defined as the first m rows of the product of k Householder reflectors of order n
\[ Q = H_kH_{k1}\cdots H_1 \]The Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors \(v_i\) and scalars \(\text{ipiv}[i]\), as returned by GELQF.
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. 0 <= m <= n.The number of rows of the matrix Q.
[in] n
: rocblas_int. n >= 0.The number of columns of the matrix Q.
[in] k
: rocblas_int. 0 <= k <= m.The number of Householder reflectors.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the matrix A as returned by
GELQF, with the Householder vectors in the first k rows. On exit, the computed matrix Q.[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GELQF.
rocsolver_<type>org2l()Â¶

rocblas_status
rocsolver_dorg2l
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)Â¶

rocblas_status
rocsolver_sorg2l
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)Â¶ ORG2L generates an mbyn Matrix Q with orthonormal columns.
(This is the unblocked version of the algorithm).
The matrix Q is defined as the last n columns of the product of k Householder reflectors of order m
\[ Q = H_kH_{k1}\cdots H_1 \]The Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors \(v_i\) and scalars \(\text{ipiv}[i]\), as returned by GEQLF.
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of the matrix Q.
[in] n
: rocblas_int. 0 <= n <= m.The number of columns of the matrix Q.
[in] k
: rocblas_int. 0 <= k <= n.The number of Householder reflectors.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the matrix A as returned by
GEQLF, with the Householder vectors in the last k columns. On exit, the computed matrix Q.[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GEQLF.
rocsolver_<type>orgql()Â¶

rocblas_status
rocsolver_dorgql
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)Â¶

rocblas_status
rocsolver_sorgql
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)Â¶ ORGQL generates an mbyn Matrix Q with orthonormal columns.
(This is the blocked version of the algorithm).
The matrix Q is defined as the last n column of the product of k Householder reflectors of order m
\[ Q = H_kH_{k1}\cdots H_1 \]The Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors \(v_i\) and scalars \(\text{ipiv}[i]\), as returned by GEQLF.
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of the matrix Q.
[in] n
: rocblas_int. 0 <= n <= m.The number of columns of the matrix Q.
[in] k
: rocblas_int. 0 <= k <= n.The number of Householder reflectors.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the matrix A as returned by
GEQLF, with the Householder vectors in the last k columns. On exit, the computed matrix Q.[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GEQLF.
rocsolver_<type>orgbr()Â¶

rocblas_status
rocsolver_dorgbr
(rocblas_handle handle, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)Â¶

rocblas_status
rocsolver_sorgbr
(rocblas_handle handle, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)Â¶ ORGBR generates an mbyn Matrix Q with orthonormal rows or columns.
If storev is columnwise, then the matrix Q has orthonormal columns. If m >= k, Q is defined as the first n columns of the product of k Householder reflectors of order m
\[ Q = H_1H_2\cdots H_k \]If m < k, Q is defined as the product of Householder reflectors of order m
\[ Q = H_1H_2\cdots H_{m1} \]On the other hand, if storev is rowwise, then the matrix Q has orthonormal rows. If n > k, Q is defined as the first m rows of the product of k Householder reflectors of order n
\[ Q = H_kH_{k1}\cdots H_1 \]If n <= k, Q is defined as the product of Householder reflectors of order n
\[ Q = H_{n1}H_{n2}\cdots H_1 \]The Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors \(v_i\) and scalars \(\text{ipiv}[i]\), as returned by GEBRD in its arguments A and tauq or taup.
 Parameters
[in] handle
: rocblas_handle.[in] storev
: rocblas_storev.Specifies whether to work columnwise or rowwise.
[in] m
: rocblas_int. m >= 0.The number of rows of the matrix Q. If rowwise, then min(n,k) <= m <= n.
[in] n
: rocblas_int. n >= 0.The number of columns of the matrix Q. If columnwise, then min(m,k) <= n <= m.
[in] k
: rocblas_int. k >= 0.The number of columns (if storev is columnwise) or rows (if rowwise) of the original matrix reduced by
GEBRD.[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the Householder vectors as returned by
GEBRD. On exit, the computed matrix Q.[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension min(m,k) if columnwise, or min(n,k) if rowwise.The Householder scalars as returned by
GEBRD.
rocsolver_<type>orgtr()Â¶

rocblas_status
rocsolver_dorgtr
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)Â¶

rocblas_status
rocsolver_sorgtr
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)Â¶ ORGTR generates an nbyn orthogonal Matrix Q.
Q is defined as the product of n1 Householder reflectors of order n. If uplo indicates upper, then Q has the form
\[ Q = H_{n1}H_{n2}\cdots H_1 \]On the other hand, if uplo indicates lower, then Q has the form
\[ Q = H_1H_2\cdots H_{n1} \]The Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors \(v_i\) and scalars \(\text{ipiv}[i]\), as returned by SYTRD in its arguments A and tau.
 Parameters
[in] handle
: rocblas_handle.[in] uplo
: rocblas_fill.Specifies whether the
SYTRD factorization was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.[in] n
: rocblas_int. n >= 0.The number of rows and columns of the matrix Q.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the Householder vectors as returned by
SYTRD. On exit, the computed matrix Q.[in] lda
: rocblas_int. lda >= n.Specifies the leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension n1.The Householder scalars as returned by
SYTRD.
rocsolver_<type>orm2r()Â¶

rocblas_status
rocsolver_dorm2r
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)Â¶

rocblas_status
rocsolver_sorm2r
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)Â¶ ORM2R multiplies a matrix Q with orthonormal columns by a general mbyn matrix C.
(This is the unblocked version of the algorithm).
The matrix Q is applied in one of the following forms, depending on the values of side and trans:
\[\begin{split} \begin{array}{cl} QC & \: \text{No transpose from the left,}\\ Q^TC & \: \text{Transpose from the left,}\\ CQ & \: \text{No transpose from the right, and}\\ CQ^T & \: \text{Transpose from the right.} \end{array} \end{split}\]Q is defined as the product of k Householder reflectors
\[ Q = H_1H_2 \cdots H_k \]of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QR factorization GEQRF.
 Parameters
[in] handle
: rocblas_handle.[in] side
: rocblas_side.Specifies from which side to apply Q.
[in] trans
: rocblas_operation.Specifies whether the matrix Q or its transpose is to be applied.
[in] m
: rocblas_int. m >= 0.Number of rows of matrix C.
[in] n
: rocblas_int. n >= 0.Number of columns of matrix C.
[in] k
: rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.The number of Householder reflectors that form Q.
[in] A
: pointer to type. Array on the GPU of size lda*k.The Householder vectors as returned by
GEQRF in the first k columns of its argument A.[in] lda
: rocblas_int. lda >= m if side is left, or lda >= n if side is right.Leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GEQRF.[inout] C
: pointer to type. Array on the GPU of size ldc*n.On entry, the matrix C. On exit, it is overwritten with Q*C, C*Q, Qâ€™*C, or C*Qâ€™.
[in] ldc
: rocblas_int. ldc >= m.Leading dimension of C.
rocsolver_<type>ormqr()Â¶

rocblas_status
rocsolver_dormqr
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)Â¶

rocblas_status
rocsolver_sormqr
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)Â¶ ORMQR multiplies a matrix Q with orthonormal columns by a general mbyn matrix C.
(This is the blocked version of the algorithm).
The matrix Q is applied in one of the following forms, depending on the values of side and trans:
\[\begin{split} \begin{array}{cl} QC & \: \text{No transpose from the left,}\\ Q^TC & \: \text{Transpose from the left,}\\ CQ & \: \text{No transpose from the right, and}\\ CQ^T & \: \text{Transpose from the right.} \end{array} \end{split}\]Q is defined as the product of k Householder reflectors
\[ Q = H_1H_2\cdots H_k \]of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QR factorization GEQRF.
 Parameters
[in] handle
: rocblas_handle.[in] side
: rocblas_side.Specifies from which side to apply Q.
[in] trans
: rocblas_operation.Specifies whether the matrix Q or its transpose is to be applied.
[in] m
: rocblas_int. m >= 0.Number of rows of matrix C.
[in] n
: rocblas_int. n >= 0.Number of columns of matrix C.
[in] k
: rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.The number of Householder reflectors that form Q.
[in] A
: pointer to type. Array on the GPU of size lda*k.The Householder vectors as returned by
GEQRF in the first k columns of its argument A.[in] lda
: rocblas_int. lda >= m if side is left, or lda >= n if side is right.Leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GEQRF.[inout] C
: pointer to type. Array on the GPU of size ldc*n.On entry, the matrix C. On exit, it is overwritten with Q*C, C*Q, Qâ€™*C, or C*Qâ€™.
[in] ldc
: rocblas_int. ldc >= m.Leading dimension of C.
rocsolver_<type>orml2()Â¶

rocblas_status
rocsolver_dorml2
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)Â¶

rocblas_status
rocsolver_sorml2
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)Â¶ ORML2 multiplies a matrix Q with orthonormal rows by a general mbyn matrix C.
(This is the unblocked version of the algorithm).
The matrix Q is applied in one of the following forms, depending on the values of side and trans:
\[\begin{split} \begin{array}{cl} QC & \: \text{No transpose from the left,}\\ Q^TC & \: \text{Transpose from the left,}\\ CQ & \: \text{No transpose from the right, and}\\ CQ^T & \: \text{Transpose from the right.} \end{array} \end{split}\]Q is defined as the product of k Householder reflectors
\[ Q = H_kH_{k1}\cdots H_1 \]of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the LQ factorization GELQF.
 Parameters
[in] handle
: rocblas_handle.[in] side
: rocblas_side.Specifies from which side to apply Q.
[in] trans
: rocblas_operation.Specifies whether the matrix Q or its transpose is to be applied.
[in] m
: rocblas_int. m >= 0.Number of rows of matrix C.
[in] n
: rocblas_int. n >= 0.Number of columns of matrix C.
[in] k
: rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.The number of Householder reflectors that form Q.
[in] A
: pointer to type. Array on the GPU of size lda*m if side is left, or lda*n if side is right.The Householder vectors as returned by
GELQF in the first k rows of its argument A.[in] lda
: rocblas_int. lda >= k.Leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GELQF.[inout] C
: pointer to type. Array on the GPU of size ldc*n.On entry, the matrix C. On exit, it is overwritten with Q*C, C*Q, Qâ€™*C, or C*Qâ€™.
[in] ldc
: rocblas_int. ldc >= m.Leading dimension of C.
rocsolver_<type>ormlq()Â¶

rocblas_status
rocsolver_dormlq
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)Â¶

rocblas_status
rocsolver_sormlq
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)Â¶ ORMLQ multiplies a matrix Q with orthonormal rows by a general mbyn matrix C.
(This is the blocked version of the algorithm).
The matrix Q is applied in one of the following forms, depending on the values of side and trans:
\[\begin{split} \begin{array}{cl} QC & \: \text{No transpose from the left,}\\ Q^TC & \: \text{Transpose from the left,}\\ CQ & \: \text{No transpose from the right, and}\\ CQ^T & \: \text{Transpose from the right.} \end{array} \end{split}\]Q is defined as the product of k Householder reflectors
\[ Q = H_kH_{k1}\cdots H_1 \]of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the LQ factorization GELQF.
 Parameters
[in] handle
: rocblas_handle.[in] side
: rocblas_side.Specifies from which side to apply Q.
[in] trans
: rocblas_operation.Specifies whether the matrix Q or its transpose is to be applied.
[in] m
: rocblas_int. m >= 0.Number of rows of matrix C.
[in] n
: rocblas_int. n >= 0.Number of columns of matrix C.
[in] k
: rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.The number of Householder reflectors that form Q.
[in] A
: pointer to type. Array on the GPU of size lda*m if side is left, or lda*n if side is right.The Householder vectors as returned by
GELQF in the first k rows of its argument A.[in] lda
: rocblas_int. lda >= k.Leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GELQF.[inout] C
: pointer to type. Array on the GPU of size ldc*n.On entry, the matrix C. On exit, it is overwritten with Q*C, C*Q, Qâ€™*C, or C*Qâ€™.
[in] ldc
: rocblas_int. ldc >= m.Leading dimension of C.
rocsolver_<type>orm2l()Â¶

rocblas_status
rocsolver_dorm2l
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)Â¶

rocblas_status
rocsolver_sorm2l
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)Â¶ ORM2L multiplies a matrix Q with orthonormal columns by a general mbyn matrix C.
(This is the unblocked version of the algorithm).
The matrix Q is applied in one of the following forms, depending on the values of side and trans:
\[\begin{split} \begin{array}{cl} QC & \: \text{No transpose from the left,}\\ Q^TC & \: \text{Transpose from the left,}\\ CQ & \: \text{No transpose from the right, and}\\ CQ^T & \: \text{Transpose from the right.} \end{array} \end{split}\]Q is defined as the product of k Householder reflectors
\[ Q = H_kH_{k1}\cdots H_1 \]of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QL factorization GEQLF.
 Parameters
[in] handle
: rocblas_handle.[in] side
: rocblas_side.Specifies from which side to apply Q.
[in] trans
: rocblas_operation.Specifies whether the matrix Q or its transpose is to be applied.
[in] m
: rocblas_int. m >= 0.Number of rows of matrix C.
[in] n
: rocblas_int. n >= 0.Number of columns of matrix C.
[in] k
: rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.The number of Householder reflectors that form Q.
[in] A
: pointer to type. Array on the GPU of size lda*k.The Householder vectors as returned by
GEQLF in the last k columns of its argument A.[in] lda
: rocblas_int. lda >= m if side is left, lda >= n if side is right.Leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GEQLF.[inout] C
: pointer to type. Array on the GPU of size ldc*n.On entry, the matrix C. On exit, it is overwritten with Q*C, C*Q, Qâ€™*C, or C*Qâ€™.
[in] ldc
: rocblas_int. ldc >= m.Leading dimension of C.
rocsolver_<type>ormql()Â¶

rocblas_status
rocsolver_dormql
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)Â¶

rocblas_status
rocsolver_sormql
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)Â¶ ORMQL multiplies a matrix Q with orthonormal columns by a general mbyn matrix C.
(This is the blocked version of the algorithm).
The matrix Q is applied in one of the following forms, depending on the values of side and trans:
\[\begin{split} \begin{array}{cl} QC & \: \text{No transpose from the left,}\\ Q^TC & \: \text{Transpose from the left,}\\ CQ & \: \text{No transpose from the right, and}\\ CQ^T & \: \text{Transpose from the right.} \end{array} \end{split}\]Q is defined as the product of k Householder reflectors
\[ Q = H_kH_{k1}\cdots H_1 \]of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QL factorization GEQLF.
 Parameters
[in] handle
: rocblas_handle.[in] side
: rocblas_side.Specifies from which side to apply Q.
[in] trans
: rocblas_operation.Specifies whether the matrix Q or its transpose is to be applied.
[in] m
: rocblas_int. m >= 0.Number of rows of matrix C.
[in] n
: rocblas_int. n >= 0.Number of columns of matrix C.
[in] k
: rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.The number of Householder reflectors that form Q.
[in] A
: pointer to type. Array on the GPU of size lda*k.The Householder vectors as returned by
GEQLF in the last k columns of its argument A.[in] lda
: rocblas_int. lda >= m if side is left, lda >= n if side is right.Leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GEQLF.[inout] C
: pointer to type. Array on the GPU of size ldc*n.On entry, the matrix C. On exit, it is overwritten with Q*C, C*Q, Qâ€™*C, or C*Qâ€™.
[in] ldc
: rocblas_int. ldc >= m.Leading dimension of C.
rocsolver_<type>ormbr()Â¶

rocblas_status
rocsolver_dormbr
(rocblas_handle handle, const rocblas_storev storev, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)Â¶

rocblas_status
rocsolver_sormbr
(rocblas_handle handle, const rocblas_storev storev, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)Â¶ ORMBR multiplies a matrix Q with orthonormal rows or columns by a general mbyn matrix C.
If storev is columnwise, then the matrix Q has orthonormal columns. If storev is rowwise, then the matrix Q has orthonormal rows. The matrix Q is applied in one of the following forms, depending on the values of side and trans:
\[\begin{split} \begin{array}{cl} QC & \: \text{No transpose from the left,}\\ Q^TC & \: \text{Transpose from the left,}\\ CQ & \: \text{No transpose from the right, and}\\ CQ^T & \: \text{Transpose from the right.} \end{array} \end{split}\]The order q of the orthogonal matrix Q is q = m if applying from the left, or q = n if applying from the right.
When storev is columnwise, if q >= k, then Q is defined as the product of k Householder reflectors
\[ Q = H_1H_2\cdots H_k, \]and if q < k, then Q is defined as the product
\[ Q = H_1H_2\cdots H_{q1}. \]When storev is rowwise, if q > k, then Q is defined as the product of k Householder reflectors
\[ Q = H_1H_2\cdots H_k, \]and if q <= k, Q is defined as the product
\[ Q = H_1H_2\cdots H_{q1}. \]The Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors and scalars as returned by GEBRD in its arguments A and tauq or taup.
 Parameters
[in] handle
: rocblas_handle.[in] storev
: rocblas_storev.Specifies whether to work columnwise or rowwise.
[in] side
: rocblas_side.Specifies from which side to apply Q.
[in] trans
: rocblas_operation.Specifies whether the matrix Q or its transpose is to be applied.
[in] m
: rocblas_int. m >= 0.Number of rows of matrix C.
[in] n
: rocblas_int. n >= 0.Number of columns of matrix C.
[in] k
: rocblas_int. k >= 0.The number of columns (if storev is columnwise) or rows (if rowwise) of the original matrix reduced by
GEBRD.[in] A
: pointer to type. Array on the GPU of size lda*min(q,k) if columnwise, or lda*q if rowwise.The Householder vectors as returned by
GEBRD.[in] lda
: rocblas_int. lda >= q if columnwise, or lda >= min(q,k) if rowwise.Leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least min(q,k).The Householder scalars as returned by
GEBRD.[inout] C
: pointer to type. Array on the GPU of size ldc*n.On entry, the matrix C. On exit, it is overwritten with Q*C, C*Q, Qâ€™*C, or C*Qâ€™.
[in] ldc
: rocblas_int. ldc >= m.Leading dimension of C.
rocsolver_<type>ormtr()Â¶

rocblas_status
rocsolver_dormtr
(rocblas_handle handle, const rocblas_side side, const rocblas_fill uplo, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)Â¶

rocblas_status
rocsolver_sormtr
(rocblas_handle handle, const rocblas_side side, const rocblas_fill uplo, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)Â¶ ORMTR multiplies an orthogonal matrix Q by a general mbyn matrix C.
The matrix Q is applied in one of the following forms, depending on the values of side and trans:
\[\begin{split} \begin{array}{cl} QC & \: \text{No transpose from the left,}\\ Q^TC & \: \text{Transpose from the left,}\\ CQ & \: \text{No transpose from the right, and}\\ CQ^T & \: \text{Transpose from the right.} \end{array} \end{split}\]The order q of the orthogonal matrix Q is q = m if applying from the left, or q = n if applying from the right.
Q is defined as a product of q1 Householder reflectors. If uplo indicates upper, then Q has the form
\[ Q = H_{q1}H_{q2}\cdots H_1. \]On the other hand, if uplo indicates lower, then Q has the form
\[ Q = H_1H_2\cdots H_{q1} \]The Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors and scalars as returned by SYTRD in its arguments A and tau.
 Parameters
[in] handle
: rocblas_handle.[in] side
: rocblas_side.Specifies from which side to apply Q.
[in] uplo
: rocblas_fill.Specifies whether the
SYTRD factorization was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.[in] trans
: rocblas_operation.Specifies whether the matrix Q or its transpose is to be applied.
[in] m
: rocblas_int. m >= 0.Number of rows of matrix C.
[in] n
: rocblas_int. n >= 0.Number of columns of matrix C.
[in] A
: pointer to type. Array on the GPU of size lda*q.On entry, the Householder vectors as returned by
SYTRD.[in] lda
: rocblas_int. lda >= q.Leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least q1.The Householder scalars as returned by
SYTRD.[inout] C
: pointer to type. Array on the GPU of size ldc*n.On entry, the matrix C. On exit, it is overwritten with Q*C, C*Q, Qâ€™*C, or C*Qâ€™.
[in] ldc
: rocblas_int. ldc >= m.Leading dimension of C.
Unitary matricesÂ¶
List of functions for unitary matrices
rocsolver_<type>ung2r()Â¶

rocblas_status
rocsolver_zung2r
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)Â¶

rocblas_status
rocsolver_cung2r
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)Â¶ UNG2R generates an mbyn complex Matrix Q with orthonormal columns.
(This is the unblocked version of the algorithm).
The matrix Q is defined as the first n columns of the product of k Householder reflectors of order m
\[ Q = H_1H_2\cdots H_k \]The Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors \(v_i\) and scalars \(\text{ipiv}[i]\), as returned by GEQRF.
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of the matrix Q.
[in] n
: rocblas_int. 0 <= n <= m.The number of columns of the matrix Q.
[in] k
: rocblas_int. 0 <= k <= n.The number of Householder reflectors.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the matrix A as returned by
GEQRF, with the Householder vectors in the first k columns. On exit, the computed matrix Q.[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GEQRF.
rocsolver_<type>ungqr()Â¶

rocblas_status
rocsolver_zungqr
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)Â¶

rocblas_status
rocsolver_cungqr
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)Â¶ UNGQR generates an mbyn complex Matrix Q with orthonormal columns.
(This is the blocked version of the algorithm).
The matrix Q is defined as the first n columns of the product of k Householder reflectors of order m
\[ Q = H_1H_2\cdots H_k \]Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors \(v_i\) and scalars \(\text{ipiv}[i]\), as returned by GEQRF.
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of the matrix Q.
[in] n
: rocblas_int. 0 <= n <= m.The number of columns of the matrix Q.
[in] k
: rocblas_int. 0 <= k <= n.The number of Householder reflectors.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the matrix A as returned by
GEQRF, with the Householder vectors in the first k columns. On exit, the computed matrix Q.[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GEQRF.
rocsolver_<type>ungl2()Â¶

rocblas_status
rocsolver_zungl2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)Â¶

rocblas_status
rocsolver_cungl2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)Â¶ UNGL2 generates an mbyn complex Matrix Q with orthonormal rows.
(This is the unblocked version of the algorithm).
The matrix Q is defined as the first m rows of the product of k Householder reflectors of order n
\[ Q = H_k^HH_{k1}^H\cdots H_1^H \]The Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors \(v_i\) and scalars \(\text{ipiv}[i]\), as returned by GELQF.
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. 0 <= m <= n.The number of rows of the matrix Q.
[in] n
: rocblas_int. n >= 0.The number of columns of the matrix Q.
[in] k
: rocblas_int. 0 <= k <= m.The number of Householder reflectors.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the matrix A as returned by
GELQF, with the Householder vectors in the first k rows. On exit, the computed matrix Q.[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GELQF.
rocsolver_<type>unglq()Â¶

rocblas_status
rocsolver_zunglq
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)Â¶

rocblas_status
rocsolver_cunglq
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)Â¶ UNGLQ generates an mbyn complex Matrix Q with orthonormal rows.
(This is the blocked version of the algorithm).
The matrix Q is defined as the first m rows of the product of k Householder reflectors of order n
\[ Q = H_k^HH_{k1}^H\cdots H_1^H \]The Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors \(v_i\) and scalars \(\text{ipiv}[i]\), as returned by GELQF.
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. 0 <= m <= n.The number of rows of the matrix Q.
[in] n
: rocblas_int. n >= 0.The number of columns of the matrix Q.
[in] k
: rocblas_int. 0 <= k <= m.The number of Householder reflectors.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the matrix A as returned by
GELQF, with the Householder vectors in the first k rows. On exit, the computed matrix Q.[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GELQF.
rocsolver_<type>ung2l()Â¶

rocblas_status
rocsolver_zung2l
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)Â¶

rocblas_status
rocsolver_cung2l
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)Â¶ UNG2L generates an mbyn complex Matrix Q with orthonormal columns.
(This is the unblocked version of the algorithm).
The matrix Q is defined as the last n columns of the product of k Householder reflectors of order m
\[ Q = H_kH_{k1}\cdots H_1 \]The Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors \(v_i\) and scalars \(\text{ipiv}[i]\), as returned by GEQLF.
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of the matrix Q.
[in] n
: rocblas_int. 0 <= n <= m.The number of columns of the matrix Q.
[in] k
: rocblas_int. 0 <= k <= n.The number of Householder reflectors.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the matrix A as returned by
GEQLF, with the Householder vectors in the last k columns. On exit, the computed matrix Q.[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GEQLF.
rocsolver_<type>ungql()Â¶

rocblas_status
rocsolver_zungql
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)Â¶

rocblas_status
rocsolver_cungql
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)Â¶ UNGQL generates an mbyn complex Matrix Q with orthonormal columns.
(This is the blocked version of the algorithm).
The matrix Q is defined as the last n columns of the product of k Householder reflectors of order m
\[ Q = H_kH_{k1}\cdots H_1 \]The Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors \(v_i\) and scalars \(\text{ipiv}[i]\), as returned by GEQLF.
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of the matrix Q.
[in] n
: rocblas_int. 0 <= n <= m.The number of columns of the matrix Q.
[in] k
: rocblas_int. 0 <= k <= n.The number of Householder reflectors.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the matrix A as returned by
GEQLF, with the Householder vectors in the last k columns. On exit, the computed matrix Q.[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GEQLF.
rocsolver_<type>ungbr()Â¶

rocblas_status
rocsolver_zungbr
(rocblas_handle handle, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)Â¶

rocblas_status
rocsolver_cungbr
(rocblas_handle handle, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)Â¶ UNGBR generates an mbyn complex Matrix Q with orthonormal rows or columns.
If storev is columnwise, then the matrix Q has orthonormal columns. If m >= k, Q is defined as the first n columns of the product of k Householder reflectors of order m
\[ Q = H_1H_2\cdots H_k \]If m < k, Q is defined as the product of Householder reflectors of order m
\[ Q = H_1H_2\cdots H_{m1} \]On the other hand, if storev is rowwise, then the matrix Q has orthonormal rows. If n > k, Q is defined as the first m rows of the product of k Householder reflectors of order n
\[ Q = H_kH_{k1}\cdots H_1 \]If n <= k, Q is defined as the product of Householder reflectors of order n
\[ Q = H_{n1}H_{n2}\cdots H_1 \]The Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors \(v_i\) and scalars \(\text{ipiv}[i]\), as returned by GEBRD in its arguments A and tauq or taup.
 Parameters
[in] handle
: rocblas_handle.[in] storev
: rocblas_storev.Specifies whether to work columnwise or rowwise.
[in] m
: rocblas_int. m >= 0.The number of rows of the matrix Q. If rowwise, then min(n,k) <= m <= n.
[in] n
: rocblas_int. n >= 0.The number of columns of the matrix Q. If columnwise, then min(m,k) <= n <= m.
[in] k
: rocblas_int. k >= 0.The number of columns (if storev is columnwise) or rows (if rowwise) of the original matrix reduced by
GEBRD.[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the Householder vectors as returned by
GEBRD. On exit, the computed matrix Q.[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension min(m,k) if columnwise, or min(n,k) if rowwise.The Householder scalars as returned by
GEBRD.
rocsolver_<type>ungtr()Â¶

rocblas_status
rocsolver_zungtr
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)Â¶

rocblas_status
rocsolver_cungtr
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)Â¶ UNGTR generates an nbyn unitary Matrix Q.
Q is defined as the product of n1 Householder reflectors of order n. If uplo indicates upper, then Q has the form
\[ Q = H_{n1}H_{n2}\cdots H_1 \]On the other hand, if uplo indicates lower, then Q has the form
\[ Q = H_1H_2\cdots H_{n1} \]The Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors \(v_i\) and scalars \(\text{ipiv}[i]\), as returned by HETRD in its arguments A and tau.
 Parameters
[in] handle
: rocblas_handle.[in] uplo
: rocblas_fill.Specifies whether the
HETRD factorization was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.[in] n
: rocblas_int. n >= 0.The number of rows and columns of the matrix Q.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the Householder vectors as returned by
HETRD. On exit, the computed matrix Q.[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension n1.The Householder scalars as returned by
HETRD.
rocsolver_<type>unm2r()Â¶

rocblas_status
rocsolver_zunm2r
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)Â¶

rocblas_status
rocsolver_cunm2r
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)Â¶ UNM2R multiplies a complex matrix Q with orthonormal columns by a general mbyn matrix C.
(This is the unblocked version of the algorithm).
The matrix Q is applied in one of the following forms, depending on the values of side and trans:
\[\begin{split} \begin{array}{cl} QC & \: \text{No transpose from the left,}\\ Q^HC & \: \text{Conjugate transpose from the left,}\\ CQ & \: \text{No transpose from the right, and}\\ CQ^H & \: \text{Conjugate transpose from the right.} \end{array} \end{split}\]Q is defined as the product of k Householder reflectors
\[ Q = H_1H_2\cdots H_k \]of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QR factorization GEQRF.
 Parameters
[in] handle
: rocblas_handle.[in] side
: rocblas_side.Specifies from which side to apply Q.
[in] trans
: rocblas_operation.Specifies whether the matrix Q or its conjugate transpose is to be applied.
[in] m
: rocblas_int. m >= 0.Number of rows of matrix C.
[in] n
: rocblas_int. n >= 0.Number of columns of matrix C.
[in] k
: rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.The number of Householder reflectors that form Q.
[in] A
: pointer to type. Array on the GPU of size lda*k.The Householder vectors as returned by
GEQRF in the first k columns of its argument A.[in] lda
: rocblas_int. lda >= m if side is left, or lda >= n if side is right.Leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GEQRF.[inout] C
: pointer to type. Array on the GPU of size ldc*n.On entry, the matrix C. On exit, it is overwritten with Q*C, C*Q, Qâ€™*C, or C*Qâ€™.
[in] ldc
: rocblas_int. ldc >= m.Leading dimension of C.
rocsolver_<type>unmqr()Â¶

rocblas_status
rocsolver_zunmqr
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)Â¶

rocblas_status
rocsolver_cunmqr
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)Â¶ UNMQR multiplies a complex matrix Q with orthonormal columns by a general mbyn matrix C.
(This is the blocked version of the algorithm).
The matrix Q is applied in one of the following forms, depending on the values of side and trans:
\[\begin{split} \begin{array}{cl} QC & \: \text{No transpose from the left,}\\ Q^HC & \: \text{Conjugate transpose from the left,}\\ CQ & \: \text{No transpose from the right, and}\\ CQ^H & \: \text{Conjugate transpose from the right.} \end{array} \end{split}\]Q is defined as the product of k Householder reflectors
\[ Q = H_1H_2\cdots H_k \]of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QR factorization GEQRF.
 Parameters
[in] handle
: rocblas_handle.[in] side
: rocblas_side.Specifies from which side to apply Q.
[in] trans
: rocblas_operation.Specifies whether the matrix Q or its conjugate transpose is to be applied.
[in] m
: rocblas_int. m >= 0.Number of rows of matrix C.
[in] n
: rocblas_int. n >= 0.Number of columns of matrix C.
[in] k
: rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.The number of Householder reflectors that form Q.
[in] A
: pointer to type. Array on the GPU of size lda*k.The Householder vectors as returned by
GEQRF in the first k columns of its argument A.[in] lda
: rocblas_int. lda >= m if side is left, or lda >= n if side is right.Leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GEQRF.[inout] C
: pointer to type. Array on the GPU of size ldc*n.On entry, the matrix C. On exit, it is overwritten with Q*C, C*Q, Qâ€™*C, or C*Qâ€™.
[in] ldc
: rocblas_int. ldc >= m.Leading dimension of C.
rocsolver_<type>unml2()Â¶

rocblas_status
rocsolver_zunml2
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)Â¶

rocblas_status
rocsolver_cunml2
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)Â¶ UNML2 multiplies a complex matrix Q with orthonormal rows by a general mbyn matrix C.
(This is the unblocked version of the algorithm).
The matrix Q is applied in one of the following forms, depending on the values of side and trans:
\[\begin{split} \begin{array}{cl} QC & \: \text{No transpose from the left,}\\ Q^HC & \: \text{Conjugate transpose from the left,}\\ CQ & \: \text{No transpose from the right, and}\\ CQ^H & \: \text{Conjugate transpose from the right.} \end{array} \end{split}\]Q is defined as the product of k Householder reflectors
\[ Q = H_k^HH_{k1}^H\cdots H_1^H \]of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the LQ factorization GELQF.
 Parameters
[in] handle
: rocblas_handle.[in] side
: rocblas_side.Specifies from which side to apply Q.
[in] trans
: rocblas_operation.Specifies whether the matrix Q or its conjugate transpose is to be applied.
[in] m
: rocblas_int. m >= 0.Number of rows of matrix C.
[in] n
: rocblas_int. n >= 0.Number of columns of matrix C.
[in] k
: rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.The number of Householder reflectors that form Q.
[in] A
: pointer to type. Array on the GPU of size lda*m if side is left, or lda*n if side is right.The Householder vectors as returned by
GELQF in the first k rows of its argument A.[in] lda
: rocblas_int. lda >= k.Leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GELQF.[inout] C
: pointer to type. Array on the GPU of size ldc*n.On entry, the matrix C. On exit, it is overwritten with Q*C, C*Q, Qâ€™*C, or C*Qâ€™.
[in] ldc
: rocblas_int. ldc >= m.Leading dimension of C.
rocsolver_<type>unmlq()Â¶

rocblas_status
rocsolver_zunmlq
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)Â¶

rocblas_status
rocsolver_cunmlq
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)Â¶ UNMLQ multiplies a complex matrix Q with orthonormal rows by a general mbyn matrix C.
(This is the blocked version of the algorithm).
The matrix Q is applied in one of the following forms, depending on the values of side and trans:
\[\begin{split} \begin{array}{cl} QC & \: \text{No transpose from the left,}\\ Q^HC & \: \text{Conjugate transpose from the left,}\\ CQ & \: \text{No transpose from the right, and}\\ CQ^H & \: \text{Conjugate transpose from the right.} \end{array} \end{split}\]Q is defined as the product of k Householder reflectors
\[ Q = H_k^HH_{k1}^H\cdots H_1^H \]of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the LQ factorization GELQF.
 Parameters
[in] handle
: rocblas_handle.[in] side
: rocblas_side.Specifies from which side to apply Q.
[in] trans
: rocblas_operation.Specifies whether the matrix Q or its conjugate transpose is to be applied.
[in] m
: rocblas_int. m >= 0.Number of rows of matrix C.
[in] n
: rocblas_int. n >= 0.Number of columns of matrix C.
[in] k
: rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.The number of Householder reflectors that form Q.
[in] A
: pointer to type. Array on the GPU of size lda*m if side is left, or lda*n if side is right.The Householder vectors as returned by
GELQF in the first k rows of its argument A.[in] lda
: rocblas_int. lda >= k.Leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GELQF.[inout] C
: pointer to type. Array on the GPU of size ldc*n.On entry, the matrix C. On exit, it is overwritten with Q*C, C*Q, Qâ€™*C, or C*Qâ€™.
[in] ldc
: rocblas_int. ldc >= m.Leading dimension of C.
rocsolver_<type>unm2l()Â¶

rocblas_status
rocsolver_zunm2l
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)Â¶

rocblas_status
rocsolver_cunm2l
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)Â¶ UNM2L multiplies a complex matrix Q with orthonormal columns by a general mbyn matrix C.
(This is the unblocked version of the algorithm).
The matrix Q is applied in one of the following forms, depending on the values of side and trans:
\[\begin{split} \begin{array}{cl} QC & \: \text{No transpose from the left,}\\ Q^HC & \: \text{Conjugate transpose from the left,}\\ CQ & \: \text{No transpose from the right, and}\\ CQ^H & \: \text{Conjugate transpose from the right.} \end{array} \end{split}\]Q is defined as the product of k Householder reflectors
\[ Q = H_kH_{k1}\cdots H_1 \]of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QL factorization GEQLF.
 Parameters
[in] handle
: rocblas_handle.[in] side
: rocblas_side.Specifies from which side to apply Q.
[in] trans
: rocblas_operation.Specifies whether the matrix Q or its conjugate transpose is to be applied.
[in] m
: rocblas_int. m >= 0.Number of rows of matrix C.
[in] n
: rocblas_int. n >= 0.Number of columns of matrix C.
[in] k
: rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.The number of Householder reflectors that form Q.
[in] A
: pointer to type. Array on the GPU of size lda*k.The Householder vectors as returned by
GEQLF in the last k columns of its argument A.[in] lda
: rocblas_int. lda >= m if side is left, lda >= n if side is right.Leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GEQLF.[inout] C
: pointer to type. Array on the GPU of size ldc*n.On entry, the matrix C. On exit, it is overwritten with Q*C, C*Q, Qâ€™*C, or C*Qâ€™.
[in] ldc
: rocblas_int. ldc >= m.Leading dimension of C.
rocsolver_<type>unmql()Â¶

rocblas_status
rocsolver_zunmql
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)Â¶

rocblas_status
rocsolver_cunmql
(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)Â¶ UNMQL multiplies a complex matrix Q with orthonormal columns by a general mbyn matrix C.
(This is the blocked version of the algorithm).
The matrix Q is applied in one of the following forms, depending on the values of side and trans:
\[\begin{split} \begin{array}{cl} QC & \: \text{No transpose from the left,}\\ Q^HC & \: \text{Conjugate transpose from the left,}\\ CQ & \: \text{No transpose from the right, and}\\ CQ^H & \: \text{Conjugate transpose from the right.} \end{array} \end{split}\]Q is defined as the product of k Householder reflectors
\[ Q = H_kH_{k1}\cdots H_1 \]of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QL factorization GEQLF.
 Parameters
[in] handle
: rocblas_handle.[in] side
: rocblas_side.Specifies from which side to apply Q.
[in] trans
: rocblas_operation.Specifies whether the matrix Q or its conjugate transpose is to be applied.
[in] m
: rocblas_int. m >= 0.Number of rows of matrix C.
[in] n
: rocblas_int. n >= 0.Number of columns of matrix C.
[in] k
: rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.The number of Householder reflectors that form Q.
[in] A
: pointer to type. Array on the GPU of size lda*k.The Householder vectors as returned by
GEQLF in the last k columns of its argument A.[in] lda
: rocblas_int. lda >= m if side is left, lda >= n if side is right.Leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least k.The Householder scalars as returned by
GEQLF.[inout] C
: pointer to type. Array on the GPU of size ldc*n.On entry, the matrix C. On exit, it is overwritten with Q*C, C*Q, Qâ€™*C, or C*Qâ€™.
[in] ldc
: rocblas_int. ldc >= m.Leading dimension of C.
rocsolver_<type>unmbr()Â¶

rocblas_status
rocsolver_zunmbr
(rocblas_handle handle, const rocblas_storev storev, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)Â¶

rocblas_status
rocsolver_cunmbr
(rocblas_handle handle, const rocblas_storev storev, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)Â¶ UNMBR multiplies a complex matrix Q with orthonormal rows or columns by a general mbyn matrix C.
If storev is columnwise, then the matrix Q has orthonormal columns. If storev is rowwise, then the matrix Q has orthonormal rows. The matrix Q is applied in one of the following forms, depending on the values of side and trans:
\[\begin{split} \begin{array}{cl} QC & \: \text{No transpose from the left,}\\ Q^HC & \: \text{Conjugate transpose from the left,}\\ CQ & \: \text{No transpose from the right, and}\\ CQ^H & \: \text{Conjugate transpose from the right.} \end{array} \end{split}\]The order q of the unitary matrix Q is q = m if applying from the left, or q = n if applying from the right.
When storev is columnwise, if q >= k, then Q is defined as the product of k Householder reflectors
\[ Q = H_1H_2\cdots H_k, \]and if q < k, then Q is defined as the product
\[ Q = H_1H_2\cdots H_{q1}. \]When storev is rowwise, if q > k, then Q is defined as the product of k Householder reflectors
\[ Q = H_1H_2\cdots H_k, \]and if q <= k, Q is defined as the product
\[ Q = H_1H_2\cdots H_{q1}. \]The Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors and scalars as returned by GEBRD in its arguments A and tauq or taup.
 Parameters
[in] handle
: rocblas_handle.[in] storev
: rocblas_storev.Specifies whether to work columnwise or rowwise.
[in] side
: rocblas_side.Specifies from which side to apply Q.
[in] trans
: rocblas_operation.Specifies whether the matrix Q or its conjugate transpose is to be applied.
[in] m
: rocblas_int. m >= 0.Number of rows of matrix C.
[in] n
: rocblas_int. n >= 0.Number of columns of matrix C.
[in] k
: rocblas_int. k >= 0.The number of columns (if storev is columnwise) or rows (if rowwise) of the original matrix reduced by
GEBRD.[in] A
: pointer to type. Array on the GPU of size lda*min(q,k) if columnwise, or lda*q if rowwise.The Householder vectors as returned by
GEBRD.[in] lda
: rocblas_int. lda >= q if columnwise, or lda >= min(q,k) if rowwise.Leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least min(q,k).The Householder scalars as returned by
GEBRD.[inout] C
: pointer to type. Array on the GPU of size ldc*n.On entry, the matrix C. On exit, it is overwritten with Q*C, C*Q, Qâ€™*C, or C*Qâ€™.
[in] ldc
: rocblas_int. ldc >= m.Leading dimension of C.
rocsolver_<type>unmtr()Â¶

rocblas_status
rocsolver_zunmtr
(rocblas_handle handle, const rocblas_side side, const rocblas_fill uplo, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)Â¶

rocblas_status
rocsolver_cunmtr
(rocblas_handle handle, const rocblas_side side, const rocblas_fill uplo, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)Â¶ UNMTR multiplies a unitary matrix Q by a general mbyn matrix C.
The matrix Q is applied in one of the following forms, depending on the values of side and trans:
\[\begin{split} \begin{array}{cl} QC & \: \text{No transpose from the left,}\\ Q^HC & \: \text{Conjugate transpose from the left,}\\ CQ & \: \text{No transpose from the right, and}\\ CQ^H & \: \text{Conjugate transpose from the right.} \end{array} \end{split}\]The order q of the unitary matrix Q is q = m if applying from the left, or q = n if applying from the right.
Q is defined as a product of q1 Householder reflectors. If uplo indicates upper, then Q has the form
\[ Q = H_{q1}H_{q2}\cdots H_1. \]On the other hand, if uplo indicates lower, then Q has the form
\[ Q = H_1H_2\cdots H_{q1} \]The Householder matrices \(H_i\) are never stored, they are computed from its corresponding Householder vectors and scalars as returned by HETRD in its arguments A and tau.
 Parameters
[in] handle
: rocblas_handle.[in] side
: rocblas_side.Specifies from which side to apply Q.
[in] uplo
: rocblas_fill.Specifies whether the
HETRD factorization was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.[in] trans
: rocblas_operation.Specifies whether the matrix Q or its conjugate transpose is to be applied.
[in] m
: rocblas_int. m >= 0.Number of rows of matrix C.
[in] n
: rocblas_int. n >= 0.Number of columns of matrix C.
[in] A
: pointer to type. Array on the GPU of size lda*q.On entry, the Householder vectors as returned by
HETRD.[in] lda
: rocblas_int. lda >= q.Leading dimension of A.
[in] ipiv
: pointer to type. Array on the GPU of dimension at least q1.The Householder scalars as returned by
HETRD.[inout] C
: pointer to type. Array on the GPU of size ldc*n.On entry, the matrix C. On exit, it is overwritten with Q*C, C*Q, Qâ€™*C, or C*Qâ€™.
[in] ldc
: rocblas_int. ldc >= m.Leading dimension of C.
LAPACK FunctionsÂ¶
LAPACK routines solve complex Numerical Linear Algebra problems. These functions are organized in the following categories:
Triangular factorizations. Based on Gaussian elimination.
Orthogonal factorizations. Based on Householder reflections.
Problem and matrix reductions. Transformation of matrices and problems into equivalent forms.
Linearsystems solvers. Based on triangular factorizations.
Leastsquares solvers. Based on orthogonal factorizations.
Symmetric eigensolvers. Eigenproblems for symmetric matrices.
Singular value decomposition. Singular values and related problems for general matrices.
Note
Throughout the APIsâ€™ descriptions, we use the following notations:
x[i] stands for the ith element of vector x, while A[i,j] represents the element in the ith row and jth column of matrix A. Indices are 1based, i.e. x[1] is the first element of x.
If X is a real vector or matrix, \(X^T\) indicates its transpose; if X is complex, then \(X^H\) represents its conjugate transpose. When X could be real or complex, we use Xâ€™ to indicate X transposed or X conjugate transposed, accordingly.
x_i \(=x_i\); we sometimes use both notations, \(x_i\) when displaying mathematical equations, and x_i in the text describing the function parameters.
Triangular factorizationsÂ¶
List of triangular factorizations
rocsolver_<type>potf2()Â¶

rocblas_status
rocsolver_zpotf2
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)Â¶

rocblas_status
rocsolver_cpotf2
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)Â¶

rocblas_status
rocsolver_dpotf2
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)Â¶

rocblas_status
rocsolver_spotf2
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)Â¶ POTF2 computes the Cholesky factorization of a real symmetric (complex Hermitian) positive definite matrix A.
(This is the unblocked version of the algorithm).
The factorization has the form:
\[\begin{split} \begin{array}{cl} A = U'U & \: \text{if uplo is upper, or}\\ A = LL' & \: \text{if uplo is lower.} \end{array} \end{split}\]U is an upper triangular matrix and L is lower triangular.
 Parameters
[in] handle
: rocblas_handle.[in] uplo
: rocblas_fill.Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.
[in] n
: rocblas_int. n >= 0.The number of rows and columns of matrix A.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the matrix A to be factored. On exit, the lower or upper triangular factor.
[in] lda
: rocblas_int. lda >= n.specifies the leading dimension of A.
[out] info
: pointer to a rocblas_int on the GPU.If info = 0, successful factorization of matrix A. If info = j > 0, the leading minor of order j of A is not positive definite. The factorization stopped at this point.
rocsolver_<type>potf2_batched()Â¶

rocblas_status
rocsolver_zpotf2_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cpotf2_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dpotf2_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_spotf2_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)Â¶ POTF2_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.
(This is the unblocked version of the algorithm).
The factorization of matrix \(A_i\) in the batch has the form:
\[\begin{split} \begin{array}{cl} A_i = U_i'U_i & \: \text{if uplo is upper, or}\\ A_i = L_iL_i' & \: \text{if uplo is lower.} \end{array} \end{split}\]\(U_i\) is an upper triangular matrix and \(L_i\) is lower triangular.
 Parameters
[in] handle
: rocblas_handle.[in] uplo
: rocblas_fill.Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.
[in] n
: rocblas_int. n >= 0.The number of rows and columns of matrix A_i.
[inout] A
: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.On entry, the matrices A_i to be factored. On exit, the upper or lower triangular factors.
[in] lda
: rocblas_int. lda >= n.specifies the leading dimension of A_i.
[out] info
: pointer to rocblas_int. Array of batch_count integers on the GPU.If info[i] = 0, successful factorization of matrix A_i. If info[i] = j > 0, the leading minor of order j of A_i is not positive definite. The ith factorization stopped at this point.
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>potf2_strided_batched()Â¶

rocblas_status
rocsolver_zpotf2_strided_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cpotf2_strided_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dpotf2_strided_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_spotf2_strided_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)Â¶ POTF2_STRIDED_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.
(This is the unblocked version of the algorithm).
The factorization of matrix \(A_i\) in the batch has the form:
\[\begin{split} \begin{array}{cl} A_i = U_i'U_i & \: \text{if uplo is upper, or}\\ A_i = L_iL_i' & \: \text{if uplo is lower.} \end{array} \end{split}\]\(U_i\) is an upper triangular matrix and \(L_i\) is lower triangular.
 Parameters
[in] handle
: rocblas_handle.[in] uplo
: rocblas_fill.Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.
[in] n
: rocblas_int. n >= 0.The number of rows and columns of matrix A_i.
[inout] A
: pointer to type. Array on the GPU (the size depends on the value of strideA).On entry, the matrices A_i to be factored. On exit, the upper or lower triangular factors.
[in] lda
: rocblas_int. lda >= n.specifies the leading dimension of A_i.
[in] strideA
: rocblas_stride.Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.
[out] info
: pointer to rocblas_int. Array of batch_count integers on the GPU.If info[i] = 0, successful factorization of matrix A_i. If info[i] = j > 0, the leading minor of order j of A_i is not positive definite. The ith factorization stopped at this point.
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>potrf()Â¶

rocblas_status
rocsolver_zpotrf
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)Â¶

rocblas_status
rocsolver_cpotrf
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)Â¶

rocblas_status
rocsolver_dpotrf
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)Â¶

rocblas_status
rocsolver_spotrf
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)Â¶ POTRF computes the Cholesky factorization of a real symmetric (complex Hermitian) positive definite matrix A.
(This is the blocked version of the algorithm).
The factorization has the form:
\[\begin{split} \begin{array}{cl} A = U'U & \: \text{if uplo is upper, or}\\ A = LL' & \: \text{if uplo is lower.} \end{array} \end{split}\]U is an upper triangular matrix and L is lower triangular.
 Parameters
[in] handle
: rocblas_handle.[in] uplo
: rocblas_fill.Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.
[in] n
: rocblas_int. n >= 0.The number of rows and columns of matrix A.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the matrix A to be factored. On exit, the lower or upper triangular factor.
[in] lda
: rocblas_int. lda >= n.specifies the leading dimension of A.
[out] info
: pointer to a rocblas_int on the GPU.If info = 0, successful factorization of matrix A. If info = j > 0, the leading minor of order j of A is not positive definite. The factorization stopped at this point.
rocsolver_<type>potrf_batched()Â¶

rocblas_status
rocsolver_zpotrf_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cpotrf_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dpotrf_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_spotrf_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)Â¶ POTRF_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.
(This is the blocked version of the algorithm).
The factorization of matrix \(A_i\) in the batch has the form:
\[\begin{split} \begin{array}{cl} A_i = U_i'U_i & \: \text{if uplo is upper, or}\\ A_i = L_iL_i' & \: \text{if uplo is lower.} \end{array} \end{split}\]\(U_i\) is an upper triangular matrix and \(L_i\) is lower triangular.
 Parameters
[in] handle
: rocblas_handle.[in] uplo
: rocblas_fill.Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.
[in] n
: rocblas_int. n >= 0.The number of rows and columns of matrix A_i.
[inout] A
: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.On entry, the matrices A_i to be factored. On exit, the upper or lower triangular factors.
[in] lda
: rocblas_int. lda >= n.specifies the leading dimension of A_i.
[out] info
: pointer to rocblas_int. Array of batch_count integers on the GPU.If info[i] = 0, successful factorization of matrix A_i. If info[i] = j > 0, the leading minor of order j of A_i is not positive definite. The ith factorization stopped at this point.
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>potrf_strided_batched()Â¶

rocblas_status
rocsolver_zpotrf_strided_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cpotrf_strided_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dpotrf_strided_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_spotrf_strided_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)Â¶ POTRF_STRIDED_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.
(This is the blocked version of the algorithm).
The factorization of matrix \(A_i\) in the batch has the form:
\[\begin{split} \begin{array}{cl} A_i = U_i'U_i & \: \text{if uplo is upper, or}\\ A_i = L_iL_i' & \: \text{if uplo is lower.} \end{array} \end{split}\]\(U_i\) is an upper triangular matrix and \(L_i\) is lower triangular.
 Parameters
[in] handle
: rocblas_handle.[in] uplo
: rocblas_fill.Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.
[in] n
: rocblas_int. n >= 0.The number of rows and columns of matrix A_i.
[inout] A
: pointer to type. Array on the GPU (the size depends on the value of strideA).On entry, the matrices A_i to be factored. On exit, the upper or lower triangular factors.
[in] lda
: rocblas_int. lda >= n.specifies the leading dimension of A_i.
[in] strideA
: rocblas_stride.Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.
[out] info
: pointer to rocblas_int. Array of batch_count integers on the GPU.If info[i] = 0, successful factorization of matrix A_i. If info[i] = j > 0, the leading minor of order j of A_i is not positive definite. The ith factorization stopped at this point.
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>getf2()Â¶

rocblas_status
rocsolver_zgetf2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶

rocblas_status
rocsolver_cgetf2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶

rocblas_status
rocsolver_dgetf2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶

rocblas_status
rocsolver_sgetf2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶ GETF2 computes the LU factorization of a general mbyn matrix A using partial pivoting with row interchanges.
(This is the unblocked Level2BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and midsize matrices if optimizations are enabled (default option). For more details, see the â€œTuning rocSOLVER performanceâ€ section of the Library Design Guide).
The factorization has the form
\[ A = PLU \]where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of the matrix A.
[in] n
: rocblas_int. n >= 0.The number of columns of the matrix A.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the mbyn matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[out] ipiv
: pointer to rocblas_int. Array on the GPU of dimension min(m,n).The vector of pivot indices. Elements of ipiv are 1based indices. For 1 <= i <= min(m,n), the row i of the matrix was interchanged with row ipiv[i]. Matrix P of the factorization can be derived from ipiv.
[out] info
: pointer to a rocblas_int on the GPU.If info = 0, successful exit. If info = j > 0, U is singular. U[j,j] is the first zero pivot.
rocsolver_<type>getf2_batched()Â¶

rocblas_status
rocsolver_zgetf2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgetf2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgetf2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgetf2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶ GETF2_BATCHED computes the LU factorization of a batch of general mbyn matrices using partial pivoting with row interchanges.
(This is the unblocked Level2BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and midsize matrices if optimizations are enabled (default option). For more details, see the â€œTuning rocSOLVER performanceâ€ section of the Library Design Guide).
The factorization of matrix \(A_i\) in the batch has the form
\[ A_i = P_iL_iU_i \]where \(P_i\) is a permutation matrix, \(L_i\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_i\) is upper triangular (upper trapezoidal if m < n).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all matrices A_i in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all matrices A_i in the batch.
[inout] A
: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.On entry, the mbyn matrices A_i to be factored. On exit, the factors L_i and U_i from the factorizations. The unit diagonal elements of L_i are not stored.
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_i.
[out] ipiv
: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).Contains the vectors of pivot indices ipiv_i (corresponding to A_i). Dimension of ipiv_i is min(m,n). Elements of ipiv_i are 1based indices. For each instance A_i in the batch and for 1 <= j <= min(m,n), the row j of the matrix A_i was interchanged with row ipiv_i[j]. Matrix P_i of the factorization can be derived from ipiv_i.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).
[out] info
: pointer to rocblas_int. Array of batch_count integers on the GPU.If info[i] = 0, successful exit for factorization of A_i. If info[i] = j > 0, U_i is singular. U_i[j,j] is the first zero pivot.
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>getf2_strided_batched()Â¶

rocblas_status
rocsolver_zgetf2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgetf2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgetf2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgetf2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶ GETF2_STRIDED_BATCHED computes the LU factorization of a batch of general mbyn matrices using partial pivoting with row interchanges.
(This is the unblocked Level2BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and midsize matrices if optimizations are enabled (default option). For more details, see the â€œTuning rocSOLVER performanceâ€ section of the Library Design Guide).
The factorization of matrix \(A_i\) in the batch has the form
\[ A_i = P_iL_iU_i \]where \(P_i\) is a permutation matrix, \(L_i\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_i\) is upper triangular (upper trapezoidal if m < n).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all matrices A_i in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all matrices A_i in the batch.
[inout] A
: pointer to type. Array on the GPU (the size depends on the value of strideA).On entry, the mbyn matrices A_i to be factored. On exit, the factors L_i and U_i from the factorization. The unit diagonal elements of L_i are not stored.
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_i.
[in] strideA
: rocblas_stride.Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n
[out] ipiv
: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).Contains the vectors of pivots indices ipiv_i (corresponding to A_i). Dimension of ipiv_i is min(m,n). Elements of ipiv_i are 1based indices. For each instance A_i in the batch and for 1 <= j <= min(m,n), the row j of the matrix A_i was interchanged with row ipiv_i[j]. Matrix P_i of the factorization can be derived from ipiv_i.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).
[out] info
: pointer to rocblas_int. Array of batch_count integers on the GPU.If info[i] = 0, successful exit for factorization of A_i. If info[i] = j > 0, U_i is singular. U_i[j,j] is the first zero pivot.
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>getrf()Â¶

rocblas_status
rocsolver_zgetrf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶

rocblas_status
rocsolver_cgetrf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶

rocblas_status
rocsolver_dgetrf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶

rocblas_status
rocsolver_sgetrf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶ GETRF computes the LU factorization of a general mbyn matrix A using partial pivoting with row interchanges.
(This is the blocked Level3BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with midsize matrices if optimizations are enabled (default option). For more details, see the â€œTuning rocSOLVER performanceâ€ section of the Library Design Guide).
The factorization has the form
\[ A = PLU \]where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of the matrix A.
[in] n
: rocblas_int. n >= 0.The number of columns of the matrix A.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the mbyn matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[out] ipiv
: pointer to rocblas_int. Array on the GPU of dimension min(m,n).The vector of pivot indices. Elements of ipiv are 1based indices. For 1 <= i <= min(m,n), the row i of the matrix was interchanged with row ipiv[i]. Matrix P of the factorization can be derived from ipiv.
[out] info
: pointer to a rocblas_int on the GPU.If info = 0, successful exit. If info = j > 0, U is singular. U[j,j] is the first zero pivot.
rocsolver_<type>getrf_batched()Â¶

rocblas_status
rocsolver_zgetrf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgetrf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgetrf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgetrf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶ GETRF_BATCHED computes the LU factorization of a batch of general mbyn matrices using partial pivoting with row interchanges.
(This is the blocked Level3BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with midsize matrices if optimizations are enabled (default option). For more details, see the â€œTuning rocSOLVER performanceâ€ section of the Library Design Guide).
The factorization of matrix \(A_i\) in the batch has the form
\[ A_i = P_iL_iU_i \]where \(P_i\) is a permutation matrix, \(L_i\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_i\) is upper triangular (upper trapezoidal if m < n).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all matrices A_i in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all matrices A_i in the batch.
[inout] A
: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.On entry, the mbyn matrices A_i to be factored. On exit, the factors L_i and U_i from the factorizations. The unit diagonal elements of L_i are not stored.
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_i.
[out] ipiv
: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).Contains the vectors of pivot indices ipiv_i (corresponding to A_i). Dimension of ipiv_i is min(m,n). Elements of ipiv_i are 1based indices. For each instance A_i in the batch and for 1 <= j <= min(m,n), the row j of the matrix A_i was interchanged with row ipiv_i[j]. Matrix P_i of the factorization can be derived from ipiv_i.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).
[out] info
: pointer to rocblas_int. Array of batch_count integers on the GPU.If info[i] = 0, successful exit for factorization of A_i. If info[i] = j > 0, U_i is singular. U_i[j,j] is the first zero pivot.
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>getrf_strided_batched()Â¶

rocblas_status
rocsolver_zgetrf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgetrf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgetrf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgetrf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶ GETRF_STRIDED_BATCHED computes the LU factorization of a batch of general mbyn matrices using partial pivoting with row interchanges.
(This is the blocked Level3BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with midsize matrices if optimizations are enabled (default option). For more details, see the â€œTuning rocSOLVER performanceâ€ section of the Library Design Guide).
The factorization of matrix \(A_i\) in the batch has the form
\[ A_i = P_iL_iU_i \]where \(P_i\) is a permutation matrix, \(L_i\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_i\) is upper triangular (upper trapezoidal if m < n).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all matrices A_i in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all matrices A_i in the batch.
[inout] A
: pointer to type. Array on the GPU (the size depends on the value of strideA).On entry, the mbyn matrices A_i to be factored. On exit, the factors L_i and U_i from the factorization. The unit diagonal elements of L_i are not stored.
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_i.
[in] strideA
: rocblas_stride.Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n
[out] ipiv
: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).Contains the vectors of pivots indices ipiv_i (corresponding to A_i). Dimension of ipiv_i is min(m,n). Elements of ipiv_i are 1based indices. For each instance A_i in the batch and for 1 <= j <= min(m,n), the row j of the matrix A_i was interchanged with row ipiv_i[j]. Matrix P_i of the factorization can be derived from ipiv_i.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).
[out] info
: pointer to rocblas_int. Array of batch_count integers on the GPU.If info[i] = 0, successful exit for factorization of A_i. If info[i] = j > 0, U_i is singular. U_i[j,j] is the first zero pivot.
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>sytf2()Â¶

rocblas_status
rocsolver_zsytf2
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶

rocblas_status
rocsolver_csytf2
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶

rocblas_status
rocsolver_dsytf2
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶

rocblas_status
rocsolver_ssytf2
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶ SYTF2 computes the factorization of a symmetric indefinite matrix \(A\) using BunchKaufman diagonal pivoting.
(This is the unblocked version of the algorithm).
The factorization has the form
\[\begin{split} \begin{array}{cl} A = U D U^T & \: \text{or}\\ A = L D L^T & \end{array} \end{split}\]where \(U\) or \(L\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D\) is a symmetric block diagonal matrix with 1by1 and 2by2 diagonal blocks \(D(k)\).
Specifically, \(U\) and \(L\) are computed as
\[\begin{split} \begin{array}{cl} U = P(n) U(n) \cdots P(k) U(k) \cdots & \: \text{and}\\ L = P(1) L(1) \cdots P(k) L(k) \cdots & \end{array} \end{split}\]where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D(k)\), and \(P(k)\) is a permutation matrix defined by \(ipiv[k]\). If we let \(s\) denote the order of block \(D(k)\), then \(U(k)\) and \(L(k)\) are unit upper/lower triangular matrices defined as
\[\begin{split} U(k) = \left[ \begin{array}{ccc} I_{ks} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{nk} \end{array} \right] \end{split}\]and
\[\begin{split} L(k) = \left[ \begin{array}{ccc} I_{k1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{nks+1} \end{array} \right]. \end{split}\]If \(s = 1\), then \(D(k)\) is stored in \(A[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A\). If \(s = 2\) and uplo is upper, then \(D(k)\) is stored in \(A[k1,k1]\), \(A[k1,k]\), and \(A[k,k]\), and \(v\) is stored in the upper parts of columns \(k1\) and \(k\) of \(A\). If \(s = 2\) and uplo is lower, then \(D(k)\) is stored in \(A[k,k]\), \(A[k+1,k]\), and \(A[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A\).
 Parameters
[in] handle
: rocblas_handle.[in] uplo
: rocblas_fill.Specifies whether the upper or lower part of the matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.
[in] n
: rocblas_int. n >= 0.The number of rows and columns of the matrix A.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the symmetric matrix A to be factored. On exit, the block diagonal matrix D and the multipliers needed to compute U or L.
[in] lda
: rocblas_int. lda >= n.Specifies the leading dimension of A.
[out] ipiv
: pointer to rocblas_int. Array on the GPU of dimension n.The vector of pivot indices. Elements of ipiv are 1based indices. For 1 <= k <= n, if ipiv[k] > 0 then rows and columns k and ipiv[k] were interchanged and D[k,k] is a 1by1 diagonal block. If, instead, ipiv[k] = ipiv[k1] < 0 and uplo is upper (or ipiv[k] = ipiv[k+1] < 0 and uplo is lower), then rows and columns k1 and ipiv[k] (or rows and columns k+1 and ipiv[k]) were interchanged and D[k1,k1] to D[k,k] (or D[k,k] to D[k+1,k+1]) is a 2by2 diagonal block.
[out] info
: pointer to a rocblas_int on the GPU.If info = 0, successful exit. If info[i] = j > 0, D is singular. D[j,j] is the first diagonal zero.
rocsolver_<type>sytf2_batched()Â¶

rocblas_status
rocsolver_zsytf2_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_csytf2_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dsytf2_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_ssytf2_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶ SYTF2_BATCHED computes the factorization of a batch of symmetric indefinite matrices using BunchKaufman diagonal pivoting.
(This is the unblocked version of the algorithm).
The factorization has the form
\[\begin{split} \begin{array}{cl} A_i = U_i D_i U_i^T & \: \text{or}\\ A_i = L_i D_i L_i^T & \end{array} \end{split}\]where \(U_i\) or \(L_i\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D_i\) is a symmetric block diagonal matrix with 1by1 and 2by2 diagonal blocks \(D_i(k)\).
Specifically, \(U_i\) and \(L_i\) are computed as
\[\begin{split} \begin{array}{cl} U_i = P_i(n) U_i(n) \cdots P_i(k) U_i(k) \cdots & \: \text{and}\\ L_i = P_i(1) L_i(1) \cdots P_i(k) L_i(k) \cdots & \end{array} \end{split}\]where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_i(k)\), and \(P_i(k)\) is a permutation matrix defined by \(ipiv_i[k]\). If we let \(s\) denote the order of block \(D_i(k)\), then \(U_i(k)\) and \(L_i(k)\) are unit upper/lower triangular matrices defined as
\[\begin{split} U_i(k) = \left[ \begin{array}{ccc} I_{ks} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{nk} \end{array} \right] \end{split}\]and
\[\begin{split} L_i(k) = \left[ \begin{array}{ccc} I_{k1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{nks+1} \end{array} \right]. \end{split}\]If \(s = 1\), then \(D_i(k)\) is stored in \(A_i[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A_i\). If \(s = 2\) and uplo is upper, then \(D_i(k)\) is stored in \(A_i[k1,k1]\), \(A_i[k1,k]\), and \(A_i[k,k]\), and \(v\) is stored in the upper parts of columns \(k1\) and \(k\) of \(A_i\). If \(s = 2\) and uplo is lower, then \(D_i(k)\) is stored in \(A_i[k,k]\), \(A_i[k+1,k]\), and \(A_i[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A_i\).
 Parameters
[in] handle
: rocblas_handle.[in] uplo
: rocblas_fill.Specifies whether the upper or lower part of the matrices A_i are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_i is not used.
[in] n
: rocblas_int. n >= 0.The number of rows and columns of all matrices A_i in the batch.
[inout] A
: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.On entry, the symmetric matrices A_i to be factored. On exit, the block diagonal matrices D_i and the multipliers needed to compute U_i or L_i.
[in] lda
: rocblas_int. lda >= n.Specifies the leading dimension of matrices A_i.
[out] ipiv
: pointer to rocblas_int. Array on the GPU of dimension n.The vector of pivot indices. Elements of ipiv are 1based indices. For 1 <= k <= n, if ipiv_i[k] > 0 then rows and columns k and ipiv_i[k] were interchanged and D_i[k,k] is a 1by1 diagonal block. If, instead, ipiv_i[k] = ipiv_i[k1] < 0 and uplo is upper (or ipiv_i[k] = ipiv_i[k+1] < 0 and uplo is lower), then rows and columns k1 and ipiv_i[k] (or rows and columns k+1 and ipiv_i[k]) were interchanged and D_i[k1,k1] to D_i[k,k] (or D_i[k,k] to D_i[k+1,k+1]) is a 2by2 diagonal block.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.
[out] info
: pointer to rocblas_int. Array of batch_count integers on the GPU.If info[i] = 0, successful exit for factorization of A_i. If info[i] = j > 0, D_i is singular. D_i[j,j] is the first diagonal zero.
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>sytf2_strided_batched()Â¶

rocblas_status
rocsolver_zsytf2_strided_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_csytf2_strided_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dsytf2_strided_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_ssytf2_strided_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶ SYTF2_STRIDED_BATCHED computes the factorization of a batch of symmetric indefinite matrices using BunchKaufman diagonal pivoting.
(This is the unblocked version of the algorithm).
The factorization has the form
\[\begin{split} \begin{array}{cl} A_i = U_i D_i U_i^T & \: \text{or}\\ A_i = L_i D_i L_i^T & \end{array} \end{split}\]where \(U_i\) or \(L_i\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D_i\) is a symmetric block diagonal matrix with 1by1 and 2by2 diagonal blocks \(D_i(k)\).
Specifically, \(U_i\) and \(L_i\) are computed as
\[\begin{split} \begin{array}{cl} U_i = P_i(n) U_i(n) \cdots P_i(k) U_i(k) \cdots & \: \text{and}\\ L_i = P_i(1) L_i(1) \cdots P_i(k) L_i(k) \cdots & \end{array} \end{split}\]where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_i(k)\), and \(P_i(k)\) is a permutation matrix defined by \(ipiv_i[k]\). If we let \(s\) denote the order of block \(D_i(k)\), then \(U_i(k)\) and \(L_i(k)\) are unit upper/lower triangular matrices defined as
\[\begin{split} U_i(k) = \left[ \begin{array}{ccc} I_{ks} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{nk} \end{array} \right] \end{split}\]and
\[\begin{split} L_i(k) = \left[ \begin{array}{ccc} I_{k1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{nks+1} \end{array} \right]. \end{split}\]If \(s = 1\), then \(D_i(k)\) is stored in \(A_i[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A_i\). If \(s = 2\) and uplo is upper, then \(D_i(k)\) is stored in \(A_i[k1,k1]\), \(A_i[k1,k]\), and \(A_i[k,k]\), and \(v\) is stored in the upper parts of columns \(k1\) and \(k\) of \(A_i\). If \(s = 2\) and uplo is lower, then \(D_i(k)\) is stored in \(A_i[k,k]\), \(A_i[k+1,k]\), and \(A_i[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A_i\).
 Parameters
[in] handle
: rocblas_handle.[in] uplo
: rocblas_fill.Specifies whether the upper or lower part of the matrices A_i are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_i is not used.
[in] n
: rocblas_int. n >= 0.The number of rows and columns of all matrices A_i in the batch.
[inout] A
: pointer to type. Array on the GPU (the size depends on the value of strideA).On entry, the symmetric matrices A_i to be factored. On exit, the block diagonal matrices D_i and the multipliers needed to compute U_i or L_i.
[in] lda
: rocblas_int. lda >= n.Specifies the leading dimension of matrices A_i.
[in] strideA
: rocblas_stride.Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n
[out] ipiv
: pointer to rocblas_int. Array on the GPU of dimension n.The vector of pivot indices. Elements of ipiv are 1based indices. For 1 <= k <= n, if ipiv_i[k] > 0 then rows and columns k and ipiv_i[k] were interchanged and D_i[k,k] is a 1by1 diagonal block. If, instead, ipiv_i[k] = ipiv_i[k1] < 0 and uplo is upper (or ipiv_i[k] = ipiv_i[k+1] < 0 and uplo is lower), then rows and columns k1 and ipiv_i[k] (or rows and columns k+1 and ipiv_i[k]) were interchanged and D_i[k1,k1] to D_i[k,k] (or D_i[k,k] to D_i[k+1,k+1]) is a 2by2 diagonal block.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.
[out] info
: pointer to rocblas_int. Array of batch_count integers on the GPU.If info[i] = 0, successful exit for factorization of A_i. If info[i] = j > 0, D_i is singular. D_i[j,j] is the first diagonal zero.
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>sytrf()Â¶

rocblas_status
rocsolver_zsytrf
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶

rocblas_status
rocsolver_csytrf
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶

rocblas_status
rocsolver_dsytrf
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶

rocblas_status
rocsolver_ssytrf
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)Â¶ SYTRF computes the factorization of a symmetric indefinite matrix \(A\) using BunchKaufman diagonal pivoting.
(This is the blocked version of the algorithm).
The factorization has the form
\[\begin{split} \begin{array}{cl} A = U D U^T & \: \text{or}\\ A = L D L^T & \end{array} \end{split}\]where \(U\) or \(L\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D\) is a symmetric block diagonal matrix with 1by1 and 2by2 diagonal blocks \(D(k)\).
Specifically, \(U\) and \(L\) are computed as
\[\begin{split} \begin{array}{cl} U = P(n) U(n) \cdots P(k) U(k) \cdots & \: \text{and}\\ L = P(1) L(1) \cdots P(k) L(k) \cdots & \end{array} \end{split}\]where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D(k)\), and \(P(k)\) is a permutation matrix defined by \(ipiv[k]\). If we let \(s\) denote the order of block \(D(k)\), then \(U(k)\) and \(L(k)\) are unit upper/lower triangular matrices defined as
\[\begin{split} U(k) = \left[ \begin{array}{ccc} I_{ks} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{nk} \end{array} \right] \end{split}\]and
\[\begin{split} L(k) = \left[ \begin{array}{ccc} I_{k1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{nks+1} \end{array} \right]. \end{split}\]If \(s = 1\), then \(D(k)\) is stored in \(A[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A\). If \(s = 2\) and uplo is upper, then \(D(k)\) is stored in \(A[k1,k1]\), \(A[k1,k]\), and \(A[k,k]\), and \(v\) is stored in the upper parts of columns \(k1\) and \(k\) of \(A\). If \(s = 2\) and uplo is lower, then \(D(k)\) is stored in \(A[k,k]\), \(A[k+1,k]\), and \(A[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A\).
 Parameters
[in] handle
: rocblas_handle.[in] uplo
: rocblas_fill.Specifies whether the upper or lower part of the matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.
[in] n
: rocblas_int. n >= 0.The number of rows and columns of the matrix A.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the symmetric matrix A to be factored. On exit, the block diagonal matrix D and the multipliers needed to compute U or L.
[in] lda
: rocblas_int. lda >= n.Specifies the leading dimension of A.
[out] ipiv
: pointer to rocblas_int. Array on the GPU of dimension n.The vector of pivot indices. Elements of ipiv are 1based indices. For 1 <= k <= n, if ipiv[k] > 0 then rows and columns k and ipiv[k] were interchanged and D[k,k] is a 1by1 diagonal block. If, instead, ipiv[k] = ipiv[k1] < 0 and uplo is upper (or ipiv[k] = ipiv[k+1] < 0 and uplo is lower), then rows and columns k1 and ipiv[k] (or rows and columns k+1 and ipiv[k]) were interchanged and D[k1,k1] to D[k,k] (or D[k,k] to D[k+1,k+1]) is a 2by2 diagonal block.
[out] info
: pointer to a rocblas_int on the GPU.If info = 0, successful exit. If info[i] = j > 0, D is singular. D[j,j] is the first diagonal zero.
rocsolver_<type>sytrf_batched()Â¶

rocblas_status
rocsolver_zsytrf_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_csytrf_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dsytrf_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_ssytrf_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶ SYTRF_BATCHED computes the factorization of a batch of symmetric indefinite matrices using BunchKaufman diagonal pivoting.
(This is the blocked version of the algorithm).
The factorization has the form
\[\begin{split} \begin{array}{cl} A_i = U_i D_i U_i^T & \: \text{or}\\ A_i = L_i D_i L_i^T & \end{array} \end{split}\]where \(U_i\) or \(L_i\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D_i\) is a symmetric block diagonal matrix with 1by1 and 2by2 diagonal blocks \(D_i(k)\).
Specifically, \(U_i\) and \(L_i\) are computed as
\[\begin{split} \begin{array}{cl} U_i = P_i(n) U_i(n) \cdots P_i(k) U_i(k) \cdots & \: \text{and}\\ L_i = P_i(1) L_i(1) \cdots P_i(k) L_i(k) \cdots & \end{array} \end{split}\]where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_i(k)\), and \(P_i(k)\) is a permutation matrix defined by \(ipiv_i[k]\). If we let \(s\) denote the order of block \(D_i(k)\), then \(U_i(k)\) and \(L_i(k)\) are unit upper/lower triangular matrices defined as
\[\begin{split} U_i(k) = \left[ \begin{array}{ccc} I_{ks} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{nk} \end{array} \right] \end{split}\]and
\[\begin{split} L_i(k) = \left[ \begin{array}{ccc} I_{k1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{nks+1} \end{array} \right]. \end{split}\]If \(s = 1\), then \(D_i(k)\) is stored in \(A_i[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A_i\). If \(s = 2\) and uplo is upper, then \(D_i(k)\) is stored in \(A_i[k1,k1]\), \(A_i[k1,k]\), and \(A_i[k,k]\), and \(v\) is stored in the upper parts of columns \(k1\) and \(k\) of \(A_i\). If \(s = 2\) and uplo is lower, then \(D_i(k)\) is stored in \(A_i[k,k]\), \(A_i[k+1,k]\), and \(A_i[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A_i\).
 Parameters
[in] handle
: rocblas_handle.[in] uplo
: rocblas_fill.Specifies whether the upper or lower part of the matrices A_i are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_i is not used.
[in] n
: rocblas_int. n >= 0.The number of rows and columns of all matrices A_i in the batch.
[inout] A
: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.On entry, the symmetric matrices A_i to be factored. On exit, the block diagonal matrices D_i and the multipliers needed to compute U_i or L_i.
[in] lda
: rocblas_int. lda >= n.Specifies the leading dimension of matrices A_i.
[out] ipiv
: pointer to rocblas_int. Array on the GPU of dimension n.The vector of pivot indices. Elements of ipiv are 1based indices. For 1 <= k <= n, if ipiv_i[k] > 0 then rows and columns k and ipiv_i[k] were interchanged and D_i[k,k] is a 1by1 diagonal block. If, instead, ipiv_i[k] = ipiv_i[k1] < 0 and uplo is upper (or ipiv_i[k] = ipiv_i[k+1] < 0 and uplo is lower), then rows and columns k1 and ipiv_i[k] (or rows and columns k+1 and ipiv_i[k]) were interchanged and D_i[k1,k1] to D_i[k,k] (or D_i[k,k] to D_i[k+1,k+1]) is a 2by2 diagonal block.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.
[out] info
: pointer to rocblas_int. Array of batch_count integers on the GPU.If info[i] = 0, successful exit for factorization of A_i. If info[i] = j > 0, D_i is singular. D_i[j,j] is the first diagonal zero.
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>sytrf_strided_batched()Â¶

rocblas_status
rocsolver_zsytrf_strided_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_csytrf_strided_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dsytrf_strided_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_ssytrf_strided_batched
(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)Â¶ SYTRF_STRIDED_BATCHED computes the factorization of a batch of symmetric indefinite matrices using BunchKaufman diagonal pivoting.
(This is the blocked version of the algorithm).
The factorization has the form
\[\begin{split} \begin{array}{cl} A_i = U_i D_i U_i^T & \: \text{or}\\ A_i = L_i D_i L_i^T & \end{array} \end{split}\]where \(U_i\) or \(L_i\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D_i\) is a symmetric block diagonal matrix with 1by1 and 2by2 diagonal blocks \(D_i(k)\).
Specifically, \(U_i\) and \(L_i\) are computed as
\[\begin{split} \begin{array}{cl} U_i = P_i(n) U_i(n) \cdots P_i(k) U_i(k) \cdots & \: \text{and}\\ L_i = P_i(1) L_i(1) \cdots P_i(k) L_i(k) \cdots & \end{array} \end{split}\]where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_i(k)\), and \(P_i(k)\) is a permutation matrix defined by \(ipiv_i[k]\). If we let \(s\) denote the order of block \(D_i(k)\), then \(U_i(k)\) and \(L_i(k)\) are unit upper/lower triangular matrices defined as
\[\begin{split} U_i(k) = \left[ \begin{array}{ccc} I_{ks} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{nk} \end{array} \right] \end{split}\]and
\[\begin{split} L_i(k) = \left[ \begin{array}{ccc} I_{k1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{nks+1} \end{array} \right]. \end{split}\]If \(s = 1\), then \(D_i(k)\) is stored in \(A_i[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A_i\). If \(s = 2\) and uplo is upper, then \(D_i(k)\) is stored in \(A_i[k1,k1]\), \(A_i[k1,k]\), and \(A_i[k,k]\), and \(v\) is stored in the upper parts of columns \(k1\) and \(k\) of \(A_i\). If \(s = 2\) and uplo is lower, then \(D_i(k)\) is stored in \(A_i[k,k]\), \(A_i[k+1,k]\), and \(A_i[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A_i\).
 Parameters
[in] handle
: rocblas_handle.[in] uplo
: rocblas_fill.Specifies whether the upper or lower part of the matrices A_i are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_i is not used.
[in] n
: rocblas_int. n >= 0.The number of rows and columns of all matrices A_i in the batch.
[inout] A
: pointer to type. Array on the GPU (the size depends on the value of strideA).On entry, the symmetric matrices A_i to be factored. On exit, the block diagonal matrices D_i and the multipliers needed to compute U_i or L_i.
[in] lda
: rocblas_int. lda >= n.Specifies the leading dimension of matrices A_i.
[in] strideA
: rocblas_stride.Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n
[out] ipiv
: pointer to rocblas_int. Array on the GPU of dimension n.The vector of pivot indices. Elements of ipiv are 1based indices. For 1 <= k <= n, if ipiv_i[k] > 0 then rows and columns k and ipiv_i[k] were interchanged and D_i[k,k] is a 1by1 diagonal block. If, instead, ipiv_i[k] = ipiv_i[k1] < 0 and uplo is upper (or ipiv_i[k] = ipiv_i[k+1] < 0 and uplo is lower), then rows and columns k1 and ipiv_i[k] (or rows and columns k+1 and ipiv_i[k]) were interchanged and D_i[k1,k1] to D_i[k,k] (or D_i[k,k] to D_i[k+1,k+1]) is a 2by2 diagonal block.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.
[out] info
: pointer to rocblas_int. Array of batch_count integers on the GPU.If info[i] = 0, successful exit for factorization of A_i. If info[i] = j > 0, D_i is singular. D_i[j,j] is the first diagonal zero.
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
Orthogonal factorizationsÂ¶
List of orthogonal factorizations
rocsolver_<type>geqr2()Â¶

rocblas_status
rocsolver_zgeqr2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)Â¶

rocblas_status
rocsolver_cgeqr2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)Â¶

rocblas_status
rocsolver_dgeqr2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)Â¶

rocblas_status
rocsolver_sgeqr2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)Â¶ GEQR2 computes a QR factorization of a general mbyn matrix A.
(This is the unblocked version of the algorithm).
The factorization has the form
\[\begin{split} A = Q\left[\begin{array}{c} R\\ 0 \end{array}\right] \end{split}\]where R is upper triangular (upper trapezoidal if m < n), and Q is a mbym orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q = H_1H_2\cdots H_k, \quad \text{with} \: k = \text{min}(m,n) \]Each Householder matrix \(H_i\) is given by
\[ H_i = I  \text{ipiv}[i] \cdot v_i v_i' \]where the first i1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of the matrix A.
[in] n
: rocblas_int. n >= 0.The number of columns of the matrix A.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the mbyn matrix to be factored. On exit, the elements on and above the diagonal contain the factor R; the elements below the diagonal are the last m  i elements of Householder vector v_i.
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[out] ipiv
: pointer to type. Array on the GPU of dimension min(m,n).The Householder scalars.
rocsolver_<type>geqr2_batched()Â¶

rocblas_status
rocsolver_zgeqr2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgeqr2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgeqr2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgeqr2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶ GEQR2_BATCHED computes the QR factorization of a batch of general mbyn matrices.
(This is the unblocked version of the algorithm).
The factorization of matrix \(A_j\) in the batch has the form
\[\begin{split} A_j = Q_j\left[\begin{array}{c} R_j\\ 0 \end{array}\right] \end{split}\]where \(R_j\) is upper triangular (upper trapezoidal if m < n), and \(Q_j\) is a mbym orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q_j = H_{j_1}H_{j_2}\cdots H_{j_k}, \quad \text{with} \: k = \text{min}(m,n) \]Each Householder matrix \(H_{j_i}\) is given by
\[ H_{j_i} = I  \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]where the first i1 elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all the matrices A_j in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all the matrices A_j in the batch.
[inout] A
: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.On entry, the mbyn matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the last m  i elements of Householder vector v_(j_i).
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_j.
[out] ipiv
: pointer to type. Array on the GPU (the size depends on the value of strideP).Contains the vectors ipiv_j of corresponding Householder scalars.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>geqr2_strided_batched()Â¶

rocblas_status
rocsolver_zgeqr2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgeqr2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgeqr2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgeqr2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶ GEQR2_STRIDED_BATCHED computes the QR factorization of a batch of general mbyn matrices.
(This is the unblocked version of the algorithm).
The factorization of matrix \(A_j\) in the batch has the form
\[\begin{split} A_j = Q_j\left[\begin{array}{c} R_j\\ 0 \end{array}\right] \end{split}\]where \(R_j\) is upper triangular (upper trapezoidal if m < n), and \(Q_j\) is a mbym orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q_j = H_{j_1}H_{j_2}\cdots H_{j_k}, \quad \text{with} \: k = \text{min}(m,n) \]Each Householder matrix \(H_{j_i}\) is given by
\[ H_{j_i} = I  \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]where the first i1 elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all the matrices A_j in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all the matrices A_j in the batch.
[inout] A
: pointer to type. Array on the GPU (the size depends on the value of strideA).On entry, the mbyn matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the last m  i elements of Householder vector v_(j_i).
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_j.
[in] strideA
: rocblas_stride.Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.
[out] ipiv
: pointer to type. Array on the GPU (the size depends on the value of strideP).Contains the vectors ipiv_j of corresponding Householder scalars.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>geqrf()Â¶

rocblas_status
rocsolver_zgeqrf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)Â¶

rocblas_status
rocsolver_cgeqrf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)Â¶

rocblas_status
rocsolver_dgeqrf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)Â¶

rocblas_status
rocsolver_sgeqrf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)Â¶ GEQRF computes a QR factorization of a general mbyn matrix A.
(This is the blocked version of the algorithm).
The factorization has the form
\[\begin{split} A = Q\left[\begin{array}{c} R\\ 0 \end{array}\right] \end{split}\]where R is upper triangular (upper trapezoidal if m < n), and Q is a mbym orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q = H_1H_2\cdots H_k, \quad \text{with} \: k = \text{min}(m,n) \]Each Householder matrix \(H_i\) is given by
\[ H_i = I  \text{ipiv}[i] \cdot v_i v_i' \]where the first i1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of the matrix A.
[in] n
: rocblas_int. n >= 0.The number of columns of the matrix A.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the mbyn matrix to be factored. On exit, the elements on and above the diagonal contain the factor R; the elements below the diagonal are the last m  i elements of Householder vector v_i.
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[out] ipiv
: pointer to type. Array on the GPU of dimension min(m,n).The Householder scalars.
rocsolver_<type>geqrf_batched()Â¶

rocblas_status
rocsolver_zgeqrf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgeqrf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgeqrf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgeqrf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶ GEQRF_BATCHED computes the QR factorization of a batch of general mbyn matrices.
(This is the blocked version of the algorithm).
The factorization of matrix \(A_j\) in the batch has the form
\[\begin{split} A_j = Q_j\left[\begin{array}{c} R_j\\ 0 \end{array}\right] \end{split}\]where \(R_j\) is upper triangular (upper trapezoidal if m < n), and \(Q_j\) is a mbym orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q_j = H_{j_1}H_{j_2}\cdots H_{j_k}, \quad \text{with} \: k = \text{min}(m,n) \]Each Householder matrix \(H_{j_i}\) is given by
\[ H_{j_i} = I  \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]where the first i1 elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all the matrices A_j in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all the matrices A_j in the batch.
[inout] A
: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.On entry, the mbyn matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the last m  i elements of Householder vector v_(j_i).
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_j.
[out] ipiv
: pointer to type. Array on the GPU (the size depends on the value of strideP).Contains the vectors ipiv_j of corresponding Householder scalars.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>geqrf_strided_batched()Â¶

rocblas_status
rocsolver_zgeqrf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgeqrf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgeqrf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgeqrf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶ GEQRF_STRIDED_BATCHED computes the QR factorization of a batch of general mbyn matrices.
(This is the blocked version of the algorithm).
The factorization of matrix \(A_j\) in the batch has the form
\[\begin{split} A_j = Q_j\left[\begin{array}{c} R_j\\ 0 \end{array}\right] \end{split}\]where \(R_j\) is upper triangular (upper trapezoidal if m < n), and \(Q_j\) is a mbym orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q_j = H_{j_1}H_{j_2}\cdots H_{j_k}, \quad \text{with} \: k = \text{min}(m,n) \]Each Householder matrix \(H_{j_i}\) is given by
\[ H_{j_i} = I  \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]where the first i1 elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all the matrices A_j in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all the matrices A_j in the batch.
[inout] A
: pointer to type. Array on the GPU (the size depends on the value of strideA).On entry, the mbyn matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the last m  i elements of Householder vector v_(j_i).
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_j.
[in] strideA
: rocblas_stride.Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.
[out] ipiv
: pointer to type. Array on the GPU (the size depends on the value of strideP).Contains the vectors ipiv_j of corresponding Householder scalars.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>gerq2()Â¶

rocblas_status
rocsolver_zgerq2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)Â¶

rocblas_status
rocsolver_cgerq2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)Â¶

rocblas_status
rocsolver_dgerq2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)Â¶

rocblas_status
rocsolver_sgerq2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)Â¶ GERQ2 computes a RQ factorization of a general mbyn matrix A.
(This is the unblocked version of the algorithm).
The factorization has the form
\[ A = \left[\begin{array}{cc} 0 & R \end{array}\right] Q \]where R is upper triangular (upper trapezoidal if m > n), and Q is a nbyn orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q = H_1'H_2' \cdots H_k', \quad \text{with} \: k = \text{min}(m,n). \]Each Householder matrix \(H_i\) is given by
\[ H_i = I  \text{ipiv}[i] \cdot v_i v_i' \]where the last ni elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of the matrix A.
[in] n
: rocblas_int. n >= 0.The number of columns of the matrix A.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the mbyn matrix to be factored. On exit, the elements on and above the (mn)th subdiagonal (when m >= n) or the (nm)th superdiagonal (when n > m) contain the factor R; the elements below the sub/superdiagonal are the first i  1 elements of Householder vector v_i.
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[out] ipiv
: pointer to type. Array on the GPU of dimension min(m,n).The Householder scalars.
rocsolver_<type>gerq2_batched()Â¶

rocblas_status
rocsolver_zgerq2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgerq2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgerq2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgerq2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶ GERQ2_BATCHED computes the RQ factorization of a batch of general mbyn matrices.
(This is the unblocked version of the algorithm).
The factorization of matrix \(A_j\) in the batch has the form
\[ A_j = \left[\begin{array}{cc} 0 & R_j \end{array}\right] Q_j \]where \(R_j\) is upper triangular (upper trapezoidal if m > n), and \(Q_j\) is a nbyn orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q_j = H_{j_1}'H_{j_2}' \cdots H_{j_k}', \quad \text{with} \: k = \text{min}(m,n). \]Each Householder matrices \(H_{j_i}\) is given by
\[ H_{j_i} = I  \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]where the last ni elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all the matrices A_j in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all the matrices A_j in the batch.
[inout] A
: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.On entry, the mbyn matrices A_j to be factored. On exit, the elements on and above the (mn)th subdiagonal (when m >= n) or the (nm)th superdiagonal (when n > m) contain the factor R_j; the elements below the sub/superdiagonal are the first i  1 elements of Householder vector v_(j_i).
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_j.
[out] ipiv
: pointer to type. Array on the GPU (the size depends on the value of strideP).Contains the vectors ipiv_j of corresponding Householder scalars.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>gerq2_strided_batched()Â¶

rocblas_status
rocsolver_zgerq2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgerq2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgerq2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgerq2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶ GERQ2_STRIDED_BATCHED computes the RQ factorization of a batch of general mbyn matrices.
(This is the unblocked version of the algorithm).
The factorization of matrix \(A_j\) in the batch has the form
\[ A_j = \left[\begin{array}{cc} 0 & R_j \end{array}\right] Q_j \]where \(R_j\) is upper triangular (upper trapezoidal if m > n), and \(Q_j\) is a nbyn orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q_j = H_{j_1}'H_{j_2}' \cdots H_{j_k}', \quad \text{with} \: k = \text{min}(m,n). \]Each Householder matrices \(H_{j_i}\) is given by
\[ H_{j_i} = I  \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]where the last ni elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all the matrices A_j in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all the matrices A_j in the batch.
[inout] A
: pointer to type. Array on the GPU (the size depends on the value of strideA).On entry, the mbyn matrices A_j to be factored. On exit, the elements on and above the (mn)th subdiagonal (when m >= n) or the (nm)th superdiagonal (when n > m) contain the factor R_j; the elements below the sub/superdiagonal are the first i  1 elements of Householder vector v_(j_i).
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_j.
[in] strideA
: rocblas_stride.Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.
[out] ipiv
: pointer to type. Array on the GPU (the size depends on the value of strideP).Contains the vectors ipiv_j of corresponding Householder scalars.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>gerqf()Â¶

rocblas_status
rocsolver_zgerqf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)Â¶

rocblas_status
rocsolver_cgerqf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)Â¶

rocblas_status
rocsolver_dgerqf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)Â¶

rocblas_status
rocsolver_sgerqf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)Â¶ GERQF computes a RQ factorization of a general mbyn matrix A.
(This is the blocked version of the algorithm).
The factorization has the form
\[ A = \left[\begin{array}{cc} 0 & R \end{array}\right] Q \]where R is upper triangular (upper trapezoidal if m > n), and Q is a nbyn orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q = H_1'H_2' \cdots H_k', \quad \text{with} \: k = \text{min}(m,n). \]Each Householder matrix \(H_i\) is given by
\[ H_i = I  \text{ipiv}[i] \cdot v_i v_i' \]where the last ni elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of the matrix A.
[in] n
: rocblas_int. n >= 0.The number of columns of the matrix A.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the mbyn matrix to be factored. On exit, the elements on and above the (mn)th subdiagonal (when m >= n) or the (nm)th superdiagonal (when n > m) contain the factor R; the elements below the sub/superdiagonal are the first i  1 elements of Householder vector v_i.
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[out] ipiv
: pointer to type. Array on the GPU of dimension min(m,n).The Householder scalars.
rocsolver_<type>gerqf_batched()Â¶

rocblas_status
rocsolver_zgerqf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgerqf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgerqf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgerqf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶ GERQF_BATCHED computes the RQ factorization of a batch of general mbyn matrices.
(This is the blocked version of the algorithm).
The factorization of matrix \(A_j\) in the batch has the form
\[ A_j = \left[\begin{array}{cc} 0 & R_j \end{array}\right] Q_j \]where \(R_j\) is upper triangular (upper trapezoidal if m > n), and \(Q_j\) is a nbyn orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q_j = H_{j_1}'H_{j_2}' \cdots H_{j_k}', \quad \text{with} \: k = \text{min}(m,n). \]Each Householder matrices \(H_{j_i}\) is given by
\[ H_{j_i} = I  \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]where the last ni elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all the matrices A_j in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all the matrices A_j in the batch.
[inout] A
: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.On entry, the mbyn matrices A_j to be factored. On exit, the elements on and above the (mn)th subdiagonal (when m >= n) or the (nm)th superdiagonal (when n > m) contain the factor R_j; the elements below the sub/superdiagonal are the first i  1 elements of Householder vector v_(j_i).
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_j.
[out] ipiv
: pointer to type. Array on the GPU (the size depends on the value of strideP).Contains the vectors ipiv_j of corresponding Householder scalars.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>gerqf_strided_batched()Â¶

rocblas_status
rocsolver_zgerqf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgerqf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgerqf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgerqf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶ GERQF_STRIDED_BATCHED computes the RQ factorization of a batch of general mbyn matrices.
(This is the blocked version of the algorithm).
The factorization of matrix \(A_j\) in the batch has the form
\[ A_j = \left[\begin{array}{cc} 0 & R_j \end{array}\right] Q_j \]where \(R_j\) is upper triangular (upper trapezoidal if m > n), and \(Q_j\) is a nbyn orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q_j = H_{j_1}'H_{j_2}' \cdots H_{j_k}', \quad \text{with} \: k = \text{min}(m,n). \]Each Householder matrices \(H_{j_i}\) is given by
\[ H_{j_i} = I  \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]where the last ni elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all the matrices A_j in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all the matrices A_j in the batch.
[inout] A
: pointer to type. Array on the GPU (the size depends on the value of strideA).On entry, the mbyn matrices A_j to be factored. On exit, the elements on and above the (mn)th subdiagonal (when m >= n) or the (nm)th superdiagonal (when n > m) contain the factor R_j; the elements below the sub/superdiagonal are the first i  1 elements of Householder vector v_(j_i).
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_j.
[in] strideA
: rocblas_stride.Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.
[out] ipiv
: pointer to type. Array on the GPU (the size depends on the value of strideP).Contains the vectors ipiv_j of corresponding Householder scalars.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>geql2()Â¶

rocblas_status
rocsolver_zgeql2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)Â¶

rocblas_status
rocsolver_cgeql2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)Â¶

rocblas_status
rocsolver_dgeql2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)Â¶

rocblas_status
rocsolver_sgeql2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)Â¶ GEQL2 computes a QL factorization of a general mbyn matrix A.
(This is the unblocked version of the algorithm).
The factorization has the form
\[\begin{split} A = Q\left[\begin{array}{c} 0\\ L \end{array}\right] \end{split}\]where L is lower triangular (lower trapezoidal if m < n), and Q is a mbym orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q = H_kH_{k1}\cdots H_1, \quad \text{with} \: k = \text{min}(m,n) \]Each Householder matrix \(H_i\) is given by
\[ H_i = I  \text{ipiv}[i] \cdot v_i v_i' \]where the last mi elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of the matrix A.
[in] n
: rocblas_int. n >= 0.The number of columns of the matrix A.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the mbyn matrix to be factored. On exit, the elements on and below the (mn)th subdiagonal (when m >= n) or the (nm)th superdiagonal (when n > m) contain the factor L; the elements above the sub/superdiagonal are the first i  1 elements of Householder vector v_i.
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[out] ipiv
: pointer to type. Array on the GPU of dimension min(m,n).The Householder scalars.
rocsolver_<type>geql2_batched()Â¶

rocblas_status
rocsolver_zgeql2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgeql2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgeql2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgeql2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶ GEQL2_BATCHED computes the QL factorization of a batch of general mbyn matrices.
(This is the unblocked version of the algorithm).
The factorization of matrix \(A_j\) in the batch has the form
\[\begin{split} A_j = Q_j\left[\begin{array}{c} 0\\ L_j \end{array}\right] \end{split}\]where \(L_j\) is lower triangular (lower trapezoidal if m < n), and \(Q_j\) is a mbym orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q = H_{j_k}H_{j_{k1}}\cdots H_{j_1}, \quad \text{with} \: k = \text{min}(m,n) \]Each Householder matrix \(H_{j_i}\) is given by
\[ H_{j_i} = I  \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]where the last mi elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all the matrices A_j in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all the matrices A_j in the batch.
[inout] A
: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.On entry, the mbyn matrices A_j to be factored. On exit, the elements on and below the (mn)th subdiagonal (when m >= n) or the (nm)th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the first i  1 elements of Householder vector v_(j_i).
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_j.
[out] ipiv
: pointer to type. Array on the GPU (the size depends on the value of strideP).Contains the vectors ipiv_j of corresponding Householder scalars.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>geql2_strided_batched()Â¶

rocblas_status
rocsolver_zgeql2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgeql2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgeql2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgeql2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶ GEQL2_STRIDED_BATCHED computes the QL factorization of a batch of general mbyn matrices.
(This is the unblocked version of the algorithm).
The factorization of matrix \(A_j\) in the batch has the form
\[\begin{split} A_j = Q_j\left[\begin{array}{c} 0\\ L_j \end{array}\right] \end{split}\]where \(L_j\) is lower triangular (lower trapezoidal if m < n), and \(Q_j\) is a mbym orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q = H_{j_k}H_{j_{k1}}\cdots H_{j_1}, \quad \text{with} \: k = \text{min}(m,n) \]Each Householder matrix \(H_{j_i}\) is given by
\[ H_{j_i} = I  \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]where the last mi elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all the matrices A_j in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all the matrices A_j in the batch.
[inout] A
: pointer to type. Array on the GPU (the size depends on the value of strideA).On entry, the mbyn matrices A_j to be factored. On exit, the elements on and below the (mn)th subdiagonal (when m >= n) or the (nm)th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the first i  1 elements of Householder vector v_(j_i).
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_j.
[in] strideA
: rocblas_stride.Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.
[out] ipiv
: pointer to type. Array on the GPU (the size depends on the value of strideP).Contains the vectors ipiv_j of corresponding Householder scalars.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>geqlf()Â¶

rocblas_status
rocsolver_zgeqlf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)Â¶

rocblas_status
rocsolver_cgeqlf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)Â¶

rocblas_status
rocsolver_dgeqlf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)Â¶

rocblas_status
rocsolver_sgeqlf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)Â¶ GEQLF computes a QL factorization of a general mbyn matrix A.
(This is the blocked version of the algorithm).
The factorization has the form
\[\begin{split} A = Q\left[\begin{array}{c} 0\\ L \end{array}\right] \end{split}\]where L is lower triangular (lower trapezoidal if m < n), and Q is a mbym orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q = H_kH_{k1}\cdots H_1, \quad \text{with} \: k = \text{min}(m,n) \]Each Householder matrix \(H_i\) is given by
\[ H_i = I  \text{ipiv}[i] \cdot v_i v_i' \]where the last mi elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of the matrix A.
[in] n
: rocblas_int. n >= 0.The number of columns of the matrix A.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the mbyn matrix to be factored. On exit, the elements on and below the (mn)th subdiagonal (when m >= n) or the (nm)th superdiagonal (when n > m) contain the factor L; the elements above the sub/superdiagonal are the first i  1 elements of Householder vector v_i.
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[out] ipiv
: pointer to type. Array on the GPU of dimension min(m,n).The Householder scalars.
rocsolver_<type>geqlf_batched()Â¶

rocblas_status
rocsolver_zgeqlf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgeqlf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgeqlf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgeqlf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶ GEQLF_BATCHED computes the QL factorization of a batch of general mbyn matrices.
(This is the blocked version of the algorithm).
The factorization of matrix \(A_j\) in the batch has the form
\[\begin{split} A_j = Q_j\left[\begin{array}{c} 0\\ L_j \end{array}\right] \end{split}\]where \(L_j\) is lower triangular (lower trapezoidal if m < n), and \(Q_j\) is a mbym orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q = H_{j_k}H_{j_{k1}}\cdots H_{j_1}, \quad \text{with} \: k = \text{min}(m,n) \]Each Householder matrix \(H_{j_i}\) is given by
\[ H_{j_i} = I  \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]where the last mi elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all the matrices A_j in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all the matrices A_j in the batch.
[inout] A
: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.On entry, the mbyn matrices A_j to be factored. On exit, the elements on and below the (mn)th subdiagonal (when m >= n) or the (nm)th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the first i  1 elements of Householder vector v_(j_i).
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_j.
[out] ipiv
: pointer to type. Array on the GPU (the size depends on the value of strideP).Contains the vectors ipiv_j of corresponding Householder scalars.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>geqlf_strided_batched()Â¶

rocblas_status
rocsolver_zgeqlf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgeqlf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgeqlf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgeqlf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶ GEQLF_STRIDED_BATCHED computes the QL factorization of a batch of general mbyn matrices.
(This is the blocked version of the algorithm).
The factorization of matrix \(A_j\) in the batch has the form
\[\begin{split} A_j = Q_j\left[\begin{array}{c} 0\\ L_j \end{array}\right] \end{split}\]where \(L_j\) is lower triangular (lower trapezoidal if m < n), and \(Q_j\) is a mbym orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q = H_{j_k}H_{j_{k1}}\cdots H_{j_1}, \quad \text{with} \: k = \text{min}(m,n) \]Each Householder matrix \(H_{j_i}\) is given by
\[ H_{j_i} = I  \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]where the last mi elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all the matrices A_j in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all the matrices A_j in the batch.
[inout] A
: pointer to type. Array on the GPU (the size depends on the value of strideA).On entry, the mbyn matrices A_j to be factored. On exit, the elements on and below the (mn)th subdiagonal (when m >= n) or the (nm)th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the first i  1 elements of Householder vector v_(j_i).
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_j.
[in] strideA
: rocblas_stride.Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.
[out] ipiv
: pointer to type. Array on the GPU (the size depends on the value of strideP).Contains the vectors ipiv_j of corresponding Householder scalars.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>gelq2()Â¶

rocblas_status
rocsolver_zgelq2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)Â¶

rocblas_status
rocsolver_cgelq2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)Â¶

rocblas_status
rocsolver_dgelq2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)Â¶

rocblas_status
rocsolver_sgelq2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)Â¶ GELQ2 computes a LQ factorization of a general mbyn matrix A.
(This is the unblocked version of the algorithm).
The factorization has the form
\[ A = \left[\begin{array}{cc} L & 0 \end{array}\right] Q \]where L is lower triangular (lower trapezoidal if m > n), and Q is a nbyn orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q = H_k'H_{k1}' \cdots H_1', \quad \text{with} \: k = \text{min}(m,n). \]Each Householder matrix \(H_i\) is given by
\[ H_i = I  \text{ipiv}[i] \cdot v_i' v_i \]where the first i1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of the matrix A.
[in] n
: rocblas_int. n >= 0.The number of columns of the matrix A.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the mbyn matrix to be factored. On exit, the elements on and below the diagonal contain the factor L; the elements above the diagonal are the last n  i elements of Householder vector v_i.
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[out] ipiv
: pointer to type. Array on the GPU of dimension min(m,n).The Householder scalars.
rocsolver_<type>gelq2_batched()Â¶

rocblas_status
rocsolver_zgelq2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgelq2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgelq2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgelq2_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶ GELQ2_BATCHED computes the LQ factorization of a batch of general mbyn matrices.
(This is the unblocked version of the algorithm).
The factorization of matrix \(A_j\) in the batch has the form
\[ A_j = \left[\begin{array}{cc} L_j & 0 \end{array}\right] Q_j \]where \(L_j\) is lower triangular (lower trapezoidal if m > n), and \(Q_j\) is a nbyn orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q_j = H_{j_k}'H_{j_{k1}}' \cdots H_{j_1}', \quad \text{with} \: k = \text{min}(m,n). \]Each Householder matrices \(H_{j_i}\) is given by
\[ H_{j_i} = I  \text{ipiv}_j[i] \cdot v_{j_i}' v_{j_i} \]where the first i1 elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all the matrices A_j in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all the matrices A_j in the batch.
[inout] A
: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.On entry, the mbyn matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the last n  i elements of Householder vector v_(j_i).
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_j.
[out] ipiv
: pointer to type. Array on the GPU (the size depends on the value of strideP).Contains the vectors ipiv_j of corresponding Householder scalars.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>gelq2_strided_batched()Â¶

rocblas_status
rocsolver_zgelq2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgelq2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgelq2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgelq2_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶ GELQ2_STRIDED_BATCHED computes the LQ factorization of a batch of general mbyn matrices.
(This is the unblocked version of the algorithm).
The factorization of matrix \(A_j\) in the batch has the form
\[ A_j = \left[\begin{array}{cc} L_j & 0 \end{array}\right] Q_j \]where \(L_j\) is lower triangular (lower trapezoidal if m > n), and \(Q_j\) is a nbyn orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q_j = H_{j_k}'H_{j_{k1}}' \cdots H_{j_1}', \quad \text{with} \: k = \text{min}(m,n). \]Each Householder matrices \(H_{j_i}\) is given by
\[ H_{j_i} = I  \text{ipiv}_j[i] \cdot v_{j_i}' v_{j_i} \]where the first i1 elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all the matrices A_j in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all the matrices A_j in the batch.
[inout] A
: pointer to type. Array on the GPU (the size depends on the value of strideA).On entry, the mbyn matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the last n  i elements of Householder vector v_(j_i).
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_j.
[in] strideA
: rocblas_stride.Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.
[out] ipiv
: pointer to type. Array on the GPU (the size depends on the value of strideP).Contains the vectors ipiv_j of corresponding Householder scalars.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>gelqf()Â¶

rocblas_status
rocsolver_zgelqf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)Â¶

rocblas_status
rocsolver_cgelqf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)Â¶

rocblas_status
rocsolver_dgelqf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)Â¶

rocblas_status
rocsolver_sgelqf
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)Â¶ GELQF computes a LQ factorization of a general mbyn matrix A.
(This is the blocked version of the algorithm).
The factorization has the form
\[ A = \left[\begin{array}{cc} L & 0 \end{array}\right] Q \]where L is lower triangular (lower trapezoidal if m > n), and Q is a nbyn orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q = H_k'H_{k1}' \cdots H_1', \quad \text{with} \: k = \text{min}(m,n). \]Each Householder matrix \(H_i\) is given by
\[ H_i = I  \text{ipiv}[i] \cdot v_i' v_i \]where the first i1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of the matrix A.
[in] n
: rocblas_int. n >= 0.The number of columns of the matrix A.
[inout] A
: pointer to type. Array on the GPU of dimension lda*n.On entry, the mbyn matrix to be factored. On exit, the elements on and below the diagonal contain the factor L; the elements above the diagonal are the last n  i elements of Householder vector v_i.
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of A.
[out] ipiv
: pointer to type. Array on the GPU of dimension min(m,n).The Householder scalars.
rocsolver_<type>gelqf_batched()Â¶

rocblas_status
rocsolver_zgelqf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgelqf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgelqf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgelqf_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶ GELQF_BATCHED computes the LQ factorization of a batch of general mbyn matrices.
(This is the blocked version of the algorithm).
The factorization of matrix \(A_j\) in the batch has the form
\[ A_j = \left[\begin{array}{cc} L_j & 0 \end{array}\right] Q_j \]where \(L_j\) is lower triangular (lower trapezoidal if m > n), and \(Q_j\) is a nbyn orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q_j = H_{j_k}'H_{j_{k1}}' \cdots H_{j_1}', \quad \text{with} \: k = \text{min}(m,n). \]Each Householder matrices \(H_{j_i}\) is given by
\[ H_{j_i} = I  \text{ipiv}_j[i] \cdot v_{j_i}' v_{j_i} \]where the first i1 elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all the matrices A_j in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all the matrices A_j in the batch.
[inout] A
: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.On entry, the mbyn matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the last n  i elements of Householder vector v_(j_i).
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_j.
[out] ipiv
: pointer to type. Array on the GPU (the size depends on the value of strideP).Contains the vectors ipiv_j of corresponding Householder scalars.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
rocsolver_<type>gelqf_strided_batched()Â¶

rocblas_status
rocsolver_zgelqf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_cgelqf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_dgelqf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶

rocblas_status
rocsolver_sgelqf_strided_batched
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)Â¶ GELQF_STRIDED_BATCHED computes the LQ factorization of a batch of general mbyn matrices.
(This is the blocked version of the algorithm).
The factorization of matrix \(A_j\) in the batch has the form
\[ A_j = \left[\begin{array}{cc} L_j & 0 \end{array}\right] Q_j \]where \(L_j\) is lower triangular (lower trapezoidal if m > n), and \(Q_j\) is a nbyn orthogonal/unitary matrix represented as the product of Householder matrices
\[ Q_j = H_{j_k}'H_{j_{k1}}' \cdots H_{j_1}', \quad \text{with} \: k = \text{min}(m,n). \]Each Householder matrices \(H_{j_i}\) is given by
\[ H_{j_i} = I  \text{ipiv}_j[i] \cdot v_{j_i}' v_{j_i} \]where the first i1 elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in] m
: rocblas_int. m >= 0.The number of rows of all the matrices A_j in the batch.
[in] n
: rocblas_int. n >= 0.The number of columns of all the matrices A_j in the batch.
[inout] A
: pointer to type. Array on the GPU (the size depends on the value of strideA).On entry, the mbyn matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the last n  i elements of Householder vector v_(j_i).
[in] lda
: rocblas_int. lda >= m.Specifies the leading dimension of matrices A_j.
[in] strideA
: rocblas_stride.Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.
[out] ipiv
: pointer to type. Array on the GPU (the size depends on the value of strideP).Contains the vectors ipiv_j of corresponding Householder scalars.
[in] strideP
: rocblas_stride.Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).
[in] batch_count
: rocblas_int. batch_count >= 0.Number of matrices in the batch.
Problem and matrix reductionsÂ¶
List of reductions
rocsolver_<type>gebd2()Â¶

rocblas_status
rocsolver_zgebd2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tauq, rocblas_double_complex *taup)Â¶

rocblas_status
rocsolver_cgebd2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tauq, rocblas_float_complex *taup)Â¶

rocblas_status
rocsolver_dgebd2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *D, double *E, double *tauq, double *taup)Â¶

rocblas_status
rocsolver_sgebd2
(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *D, float *E, float *tauq, float *taup)Â¶ GEBD2 computes the bidiagonal form of a general mbyn matrix A.
(This is the unblocked version of the algorithm).
The bidiagonal form is given by:
\[ B = Q' A P \]where B is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q and P are orthogonal/unitary matrices represented as the product of Householder matrices
\[\begin{split} \begin{array}{cl} Q = H_1H_2\cdots H_n\: \text{and} \: P = G_1G_2\cdots G_{n1}, & \: \text{if}\: m >= n, \:\text{or}\\ Q = H_1H_2\cdots H_{m1}\: \text{and} \: P = G_1G_2\cdots G_{m}, & \: \text{if}\: m < n. \end{array} \end{split}\]Each Householder matrix \(H_i\) and \(G_i\) is given by
\[\begin{split} \begin{array}{cl} H_i = I  \text{tauq}[i] \cdot v_i v_i', & \: \text{and}\\ G_i = I  \text{taup}[i] \cdot u_i' u_i. \end{array} \end{split}\]If m >= n, the first i1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\); while the first i elements of the Householder vector \(u_i\) are zero, and \(u_i[i+1] = 1\). If m < n, the first i elements of the Householder vector \(v_i\) are zero, and \(v_i[i+1] = 1\); while the first i1 elements of the Householder vector \(u_i\) are zero, and \(u_i[i] = 1\).
 Parameters
[in] handle
: rocblas_handle.[in]