rocSOLVER API

This section provides details of the rocSOLVER library API.

Types

rocSOLVER uses types and enumerations defined by the rocBLAS API. For more information, see the rocBLAS types.

Additional Types

These are types that extend the rocBLAS API.

rocblas_direct

enum rocblas_direct

Used to specify the order in which multiple elementary matrices are applied together.

Values:

enumerator rocblas_forward_direction

Elementary matrices applied from the right.

enumerator rocblas_backward_direction

Elementary matrices applied from the left.

rocblas_storev

enum rocblas_storev

Used to specify how householder vectors are stored in a matrix of vectors.

Values:

enumerator rocblas_column_wise

Householder vectors are stored in the columns of a matrix.

enumerator rocblas_row_wise

Householder vectors are stored in the rows of a matrix.

rocblas_svect

enum rocblas_svect

Used to specify how the singular vectors are to be computed and stored.

Values:

enumerator rocblas_svect_all

The entire associated orthogonal/unitary matrix is computed.

enumerator rocblas_svect_singular

Only the singular vectors are computed and stored in output array.

enumerator rocblas_svect_overwrite

Only the singular vectors are computed and overwrite the input matrix.

enumerator rocblas_svect_none

No singular vectors are computed.

rocblas_evect

enum rocblas_evect

Used to specify how the eigenvectors are to be computed.

Values:

enumerator rocblas_evect_original

Compute eigenvectors for the original symmetric/Hermitian matrix.

enumerator rocblas_evect_tridiagonal

Compute eigenvectors for the symmetric tridiagonal matrix.

enumerator rocblas_evect_none

No eigenvectors are computed.

rocblas_workmode

enum rocblas_workmode

Used to enable the use of fast algorithms (with out-of-place computations) in some of the routines.

Values:

enumerator rocblas_outofplace

Out-of-place computations are allowed; this requires enough free memory.

enumerator rocblas_inplace

When not enough memory, this forces in-place computations

rocblas_eform

enum rocblas_eform

Used to specify the form of the generalized eigenproblem.

Values:

enumerator rocblas_eform_ax

The problem is A*x = lambda*B*x.

enumerator rocblas_eform_abx

The problem is A*B*x = lambda*x.

enumerator rocblas_eform_bax

The problem is B*A*x = lambda*x.

Logging Functions

These are functions that enable and control rocSOLVER’s Multi-level Logging capabilities.

Logging set-up and tear-down

rocsolver_log_<function>()

rocblas_status rocsolver_log_begin(void)
rocblas_status rocsolver_log_end(void)
rocblas_status rocsolver_log_set_layer_mode(const rocblas_layer_mode_flags layer_mode)
rocblas_status rocsolver_log_set_max_levels(const rocblas_int max_levels)
rocblas_status rocsolver_log_restore_defaults(void)

LOG_RESTORE_DEFAULTS restores the default values of the rocSOLVER multi-level logging environment.

This function sets the logging mode and maximum trace log depth to their default values (no logging and one level depth).

Profile log

rocsolver_log_<write/flush>_profile()

rocblas_status rocsolver_log_write_profile(void)
rocblas_status rocsolver_log_flush_profile(void)

LOG_FLUSH_PROFILE prints the profile logging results and clears the profile record.

LAPACK Auxiliary Functions

These are functions that support more advanced LAPACK routines.

Vector and Matrix manipulations

rocsolver_<type>lacgv()

rocblas_status rocsolver_zlacgv(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *x, const rocblas_int incx)
rocblas_status rocsolver_clacgv(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *x, const rocblas_int incx)

LACGV conjugates the complex vector x.

It conjugates the n entries of a complex vector x with increment incx.

Parameters
  • handle[in] rocblas_handle

  • n[in]

    rocblas_int. n >= 0.

    The number of entries of the vector x.

  • x[inout]

    pointer to type. Array on the GPU of size at least n.

    On input it is the vector x, on output it is overwritten with vector conjg(x).

  • incx[in]

    rocblas_int. incx != 0.

    The increment between consecutive elements of x. If incx is negative, the elements of x are indexed in reverse order.

rocsolver_<type>laswp()

rocblas_status rocsolver_zlaswp(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_int k1, const rocblas_int k2, const rocblas_int *ipiv, const rocblas_int incx)
rocblas_status rocsolver_claswp(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_int k1, const rocblas_int k2, const rocblas_int *ipiv, const rocblas_int incx)
rocblas_status rocsolver_dlaswp(rocblas_handle handle, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_int k1, const rocblas_int k2, const rocblas_int *ipiv, const rocblas_int incx)
rocblas_status rocsolver_slaswp(rocblas_handle handle, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_int k1, const rocblas_int k2, const rocblas_int *ipiv, const rocblas_int incx)

LASWP performs a series of row interchanges on the matrix A.

It interchanges row I with row IPIV[k1 + (I - k1) * abs(inx)], for each of rows K1 through K2 of A. k1 and k2 are 1-based indices.

Parameters
  • handle[in] rocblas_handle

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix of column dimension n to which the row interchanges will be applied. On exit, the permuted matrix.

  • lda[in]

    rocblas_int. lda > 0.

    The leading dimension of the array A.

  • k1[in]

    rocblas_int. k1 > 0.

    The first element of IPIV for which a row interchange will be done. This is a 1-based index.

  • k2[in]

    rocblas_int. k2 > k1 > 0.

    (K2-K1+1) is the number of elements of IPIV for which a row interchange will be done. This is a 1-based index.

  • ipiv[in]

    pointer to rocblas_int. Array on the GPU of dimension at least k1 + (k2 - k1) * abs(incx).

    The vector of pivot indices. Only the elements in positions k1 through (k1 + (k2 - k1) * abs(incx)) of IPIV are accessed. Elements of ipiv are considered 1-based.

  • incx[in]

    rocblas_int. incx != 0.

    The increment between successive values of IPIV. If IPIV is negative, the pivots are applied in reverse order.

Householder reflexions

rocsolver_<type>larfg()

rocblas_status rocsolver_zlarfg(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *alpha, rocblas_double_complex *x, const rocblas_int incx, rocblas_double_complex *tau)
rocblas_status rocsolver_clarfg(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *alpha, rocblas_float_complex *x, const rocblas_int incx, rocblas_float_complex *tau)
rocblas_status rocsolver_dlarfg(rocblas_handle handle, const rocblas_int n, double *alpha, double *x, const rocblas_int incx, double *tau)
rocblas_status rocsolver_slarfg(rocblas_handle handle, const rocblas_int n, float *alpha, float *x, const rocblas_int incx, float *tau)

LARFG generates an orthogonal Householder reflector H of order n.

Householder reflector H is such that

H * [alpha] = [beta]
    [  x  ]   [  0 ]

where x is an n-1 vector and alpha and beta are scalars. Matrix H can be generated as

H = I - tau * [1] * [1 v']
              [v]

with v an n-1 vector and tau a scalar.

Parameters
  • handle[in] rocblas_handle

  • n[in]

    rocblas_int. n >= 0.

    The order (size) of reflector H.

  • alpha[inout]

    pointer to type. A scalar on the GPU.

    On input the scalar alpha, on output it is overwritten with beta.

  • x[inout]

    pointer to type. Array on the GPU of size at least n-1.

    On input it is the vector x, on output it is overwritten with vector v.

  • incx[in]

    rocblas_int. incx > 0.

    The increment between consecutive elements of x.

  • tau[out]

    pointer to type. A scalar on the GPU.

    The scalar tau.

rocsolver_<type>larft()

rocblas_status rocsolver_zlarft(rocblas_handle handle, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int n, const rocblas_int k, rocblas_double_complex *V, const rocblas_int ldv, rocblas_double_complex *tau, rocblas_double_complex *T, const rocblas_int ldt)
rocblas_status rocsolver_clarft(rocblas_handle handle, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int n, const rocblas_int k, rocblas_float_complex *V, const rocblas_int ldv, rocblas_float_complex *tau, rocblas_float_complex *T, const rocblas_int ldt)
rocblas_status rocsolver_dlarft(rocblas_handle handle, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int n, const rocblas_int k, double *V, const rocblas_int ldv, double *tau, double *T, const rocblas_int ldt)
rocblas_status rocsolver_slarft(rocblas_handle handle, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int n, const rocblas_int k, float *V, const rocblas_int ldv, float *tau, float *T, const rocblas_int ldt)

LARFT Generates the triangular factor T of a block reflector H of order n.

The block reflector H is defined as the product of k Householder matrices as

H = H(1) * H(2) * ... * H(k)  (forward direction), or
H = H(k) * ... * H(2) * H(1)  (backward direction)

depending on the value of direct.

The triangular matrix T is upper triangular in forward direction and lower triangular in backward direction. If storev is column-wise, then

H = I - V * T * V'

where the i-th column of matrix V contains the Householder vector associated to H(i). If storev is row-wise, then

H = I - V' * T * V

where the i-th row of matrix V contains the Householder vector associated to H(i).

Parameters
  • handle[in] rocblas_handle.

  • direct[in] rocblas_direct

    .

    Specifies the direction in which the Householder matrices are applied.

  • storev[in] rocblas_storev

    .

    Specifies how the Householder vectors are stored in matrix V.

  • n[in]

    rocblas_int. n >= 0.

    The order (size) of the block reflector.

  • k[in]

    rocblas_int. k >= 1.

    The number of Householder matrices.

  • V[in]

    pointer to type. Array on the GPU of size ldv*k if column-wise, or ldv*n if row-wise.

    The matrix of Householder vectors.

  • ldv[in]

    rocblas_int. ldv >= n if column-wise, or ldv >= k if row-wise.

    Leading dimension of V.

  • tau[in]

    pointer to type. Array of k scalars on the GPU.

    The vector of all the scalars associated to the Householder matrices.

  • T[out]

    pointer to type. Array on the GPU of dimension ldt*k.

    The triangular factor. T is upper triangular is forward operation, otherwise it is lower triangular. The rest of the array is not used.

  • ldt[in]

    rocblas_int. ldt >= k.

    The leading dimension of T.

rocsolver_<type>larf()

rocblas_status rocsolver_zlarf(rocblas_handle handle, const rocblas_side side, const rocblas_int m, const rocblas_int n, rocblas_double_complex *x, const rocblas_int incx, const rocblas_double_complex *alpha, rocblas_double_complex *A, const rocblas_int lda)
rocblas_status rocsolver_clarf(rocblas_handle handle, const rocblas_side side, const rocblas_int m, const rocblas_int n, rocblas_float_complex *x, const rocblas_int incx, const rocblas_float_complex *alpha, rocblas_float_complex *A, const rocblas_int lda)
rocblas_status rocsolver_dlarf(rocblas_handle handle, const rocblas_side side, const rocblas_int m, const rocblas_int n, double *x, const rocblas_int incx, const double *alpha, double *A, const rocblas_int lda)
rocblas_status rocsolver_slarf(rocblas_handle handle, const rocblas_side side, const rocblas_int m, const rocblas_int n, float *x, const rocblas_int incx, const float *alpha, float *A, const rocblas_int lda)

LARF applies a Householder reflector H to a general matrix A.

The Householder reflector H, of order m (or n), is to be applied to a m-by-n matrix A from the left (or the right). H is given by

H = I - alpha * x * x'

where alpha is a scalar and x a Householder vector. H is never actually computed.

Parameters
  • handle[in] rocblas_handle.

  • side[in]

    rocblas_side.

    If side = rocblas_side_left, then compute H*A If side = rocblas_side_right, then compute A*H

  • m[in]

    rocblas_int. m >= 0.

    Number of rows of A.

  • n[in]

    rocblas_int. n >= 0.

    Number of columns of A.

  • x[in]

    pointer to type. Array on the GPU of size at least (1 + (m-1)*abs(incx)) if left side, or at least (1 + (n-1)*abs(incx)) if right side.

    The Householder vector x.

  • incx[in]

    rocblas_int. incx != 0.

    Increment between to consecutive elements of x. If incx < 0, the elements of x are used in reverse order.

  • alpha[in]

    pointer to type. A scalar on the GPU.

    If alpha = 0, then H = I (A will remain the same, x is never used)

  • A[inout]

    pointer to type. Array on the GPU of size lda*n.

    On input, the matrix A. On output it is overwritten with H*A (or A*H).

  • lda[in]

    rocblas_int. lda >= m.

    Leading dimension of A.

rocsolver_<type>larfb()

rocblas_status rocsolver_zlarfb(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *V, const rocblas_int ldv, rocblas_double_complex *T, const rocblas_int ldt, rocblas_double_complex *A, const rocblas_int lda)
rocblas_status rocsolver_clarfb(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *V, const rocblas_int ldv, rocblas_float_complex *T, const rocblas_int ldt, rocblas_float_complex *A, const rocblas_int lda)
rocblas_status rocsolver_dlarfb(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *V, const rocblas_int ldv, double *T, const rocblas_int ldt, double *A, const rocblas_int lda)
rocblas_status rocsolver_slarfb(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *V, const rocblas_int ldv, float *T, const rocblas_int ldt, float *A, const rocblas_int lda)

LARFB applies a block reflector H to a general m-by-n matrix A.

The block reflector H is applied in one of the following forms, depending on the values of side and trans:

H  * A  (No transpose from the left)
H' * A  (Transpose or conjugate transpose from the left)
A * H   (No transpose from the right), and
A * H'  (Transpose or conjugate transpose from the right)

The block reflector H is defined as the product of k Householder matrices as

H = H(1) * H(2) * ... * H(k)  (forward direction), or
H = H(k) * ... * H(2) * H(1)  (backward direction)

depending on the value of direct. H is never stored. It is calculated as

H = I - V * T * V'

where the i-th column of matrix V contains the Householder vector associated with H(i), if storev is column-wise; or

H = I - V' * T * V

where the i-th row of matrix V contains the Householder vector associated with H(i), if storev is row-wise. T is the associated triangular factor as computed by LARFT.

Parameters
  • handle[in] rocblas_handle.

  • side[in]

    rocblas_side.

    Specifies from which side to apply H.

  • trans[in]

    rocblas_operation.

    Specifies whether the block reflector or its transpose/conjugate transpose is to be applied.

  • direct[in] rocblas_direct

    .

    Specifies the direction in which the Householder matrices were to be applied to generate H.

  • storev[in] rocblas_storev

    .

    Specifies how the Householder vectors are stored in matrix V.

  • m[in]

    rocblas_int. m >= 0.

    Number of rows of matrix A.

  • n[in]

    rocblas_int. n >= 0.

    Number of columns of matrix A.

  • k[in]

    rocblas_int. k >= 1.

    The number of Householder matrices.

  • V[in]

    pointer to type. Array on the GPU of size ldv*k if column-wise, ldv*n if row-wise and applying from the right, or ldv*m if row-wise and applying from the left.

    The matrix of Householder vectors.

  • ldv[in]

    rocblas_int. ldv >= k if row-wise, ldv >= m if column-wise and applying from the left, or ldv >= n if column-wise and applying from the right.

    Leading dimension of V.

  • T[in]

    pointer to type. Array on the GPU of dimension ldt*k.

    The triangular factor of the block reflector.

  • ldt[in]

    rocblas_int. ldt >= k.

    The leading dimension of T.

  • A[inout]

    pointer to type. Array on the GPU of size lda*n.

    On input, the matrix A. On output it is overwritten with H*A, A*H, H’*A, or A*H’.

  • lda[in]

    rocblas_int. lda >= m.

    Leading dimension of A.

Bidiagonal forms

rocsolver_<type>labrd()

rocblas_status rocsolver_zlabrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tauq, rocblas_double_complex *taup, rocblas_double_complex *X, const rocblas_int ldx, rocblas_double_complex *Y, const rocblas_int ldy)
rocblas_status rocsolver_clabrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tauq, rocblas_float_complex *taup, rocblas_float_complex *X, const rocblas_int ldx, rocblas_float_complex *Y, const rocblas_int ldy)
rocblas_status rocsolver_dlabrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *D, double *E, double *tauq, double *taup, double *X, const rocblas_int ldx, double *Y, const rocblas_int ldy)
rocblas_status rocsolver_slabrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *D, float *E, float *tauq, float *taup, float *X, const rocblas_int ldx, float *Y, const rocblas_int ldy)

LABRD computes the bidiagonal form of the first k rows and columns of a general m-by-n matrix A, as well as the matrices X and Y needed to reduce the remaining part of A.

The bidiagonal form is given by:

B = Q' * A * P

where B is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q and P are orthogonal/unitary matrices represented as the product of Householder matrices

Q = H(1) * H(2) * ... *  H(k)  and P = G(1) * G(2) * ... * G(k-1), if m >= n, or
Q = H(1) * H(2) * ... * H(k-1) and P = G(1) * G(2) * ... *  G(k),  if m < n

Each Householder matrix H(i) and G(i) is given by

H(i) = I - tauq[i-1] * v(i) * v(i)', and
G(i) = I - taup[i-1] * u(i) * u(i)'

If m >= n, the first i-1 elements of the Householder vector v(i) are zero, and v(i)[i] = 1; while the first i elements of the Householder vector u(i) are zero, and u(i)[i+1] = 1. If m < n, the first i elements of the Householder vector v(i) are zero, and v(i)[i+1] = 1; while the first i-1 elements of the Householder vector u(i) are zero, and u(i)[i] = 1.

The unreduced part of the matrix A can be updated using a block update:

A = A - V * Y' - X * U'

where V is an m-by-k matrix and U is an n-by-k formed using the vectors v and u.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • k[in]

    rocblas_int. min(m,n) >= k >= 0.

    The number of leading rows and columns of the matrix A to be reduced.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B. If m >= n, the elements below the diagonal are the m - i elements of vector v(i) for i = 1,2,…,n, and the elements above the superdiagonal are the n - i - 1 elements of vector u(i) for i = 1,2,…,n-1. If m < n, the elements below the subdiagonal are the m - i - 1 elements of vector v(i) for i = 1,2,…,m-1, and the elements above the diagonal are the n - i elements of vector u(i) for i = 1,2,…,m.

  • lda[in]

    rocblas_int. lda >= m.

    specifies the leading dimension of A.

  • D[out]

    pointer to real type. Array on the GPU of dimension k.

    The diagonal elements of B.

  • E[out]

    pointer to real type. Array on the GPU of dimension k.

    The off-diagonal elements of B.

  • tauq[out]

    pointer to type. Array on the GPU of dimension k.

    The scalar factors of the Householder matrices H(i).

  • taup[out]

    pointer to type. Array on the GPU of dimension k.

    The scalar factors of the Householder matrices G(i).

  • X[out]

    pointer to type. Array on the GPU of dimension ldx*k.

    The m-by-k matrix needed to reduce the unreduced part of A.

  • ldx[in]

    rocblas_int. ldx >= m.

    specifies the leading dimension of X.

  • Y[out]

    pointer to type. Array on the GPU of dimension ldy*k.

    The n-by-k matrix needed to reduce the unreduced part of A.

  • ldy[in]

    rocblas_int. ldy >= n.

    specifies the leading dimension of Y.

rocsolver_<type>bdsqr()

rocblas_status rocsolver_zbdsqr(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nv, const rocblas_int nu, const rocblas_int nc, double *D, double *E, rocblas_double_complex *V, const rocblas_int ldv, rocblas_double_complex *U, const rocblas_int ldu, rocblas_double_complex *C, const rocblas_int ldc, rocblas_int *info)
rocblas_status rocsolver_cbdsqr(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nv, const rocblas_int nu, const rocblas_int nc, float *D, float *E, rocblas_float_complex *V, const rocblas_int ldv, rocblas_float_complex *U, const rocblas_int ldu, rocblas_float_complex *C, const rocblas_int ldc, rocblas_int *info)
rocblas_status rocsolver_dbdsqr(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nv, const rocblas_int nu, const rocblas_int nc, double *D, double *E, double *V, const rocblas_int ldv, double *U, const rocblas_int ldu, double *C, const rocblas_int ldc, rocblas_int *info)
rocblas_status rocsolver_sbdsqr(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nv, const rocblas_int nu, const rocblas_int nc, float *D, float *E, float *V, const rocblas_int ldv, float *U, const rocblas_int ldu, float *C, const rocblas_int ldc, rocblas_int *info)

BDSQR computes the singular value decomposition (SVD) of a n-by-n bidiagonal matrix B.

The SVD of B has the form:

B = Ub * S * Vb'

where S is the n-by-n diagonal matrix of singular values of B, the columns of Ub are the left singular vectors of B, and the columns of Vb are its right singular vectors.

The computation of the singular vectors is optional; this function accepts input matrices U (of size nu-by-n) and V (of size n-by-nv) that are overwritten with U*Ub and Vb’*V. If nu = 0 no left vectors are computed; if nv = 0 no right vectors are computed.

Optionally, this function can also compute Ub’*C for a given n-by-nc input matrix C.

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether B is upper or lower bidiagonal.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of matrix B.

  • nv[in]

    rocblas_int. nv >= 0.

    The number of columns of matrix V.

  • nu[in]

    rocblas_int. nu >= 0.

    The number of rows of matrix U.

  • nc[in]

    rocblas_int. nu >= 0.

    The number of columns of matrix C.

  • D[inout]

    pointer to real type. Array on the GPU of dimension n.

    On entry, the diagonal elements of B. On exit, if info = 0, the singular values of B in decreasing order; if info > 0, the diagonal elements of a bidiagonal matrix orthogonally equivalent to B.

  • E[inout]

    pointer to real type. Array on the GPU of dimension n-1.

    On entry, the off-diagonal elements of B. On exit, if info > 0, the off-diagonal elements of a bidiagonal matrix orthogonally equivalent to B (if info = 0 this matrix converges to zero).

  • V[inout]

    pointer to type. Array on the GPU of dimension ldv*nv.

    On entry, the matrix V. On exit, it is overwritten with Vb’*V. (Not referenced if nv = 0).

  • ldv[in]

    rocblas_int. ldv >= n if nv > 0, or ldv >=1 if nv = 0.

    Specifies the leading dimension of V.

  • U[inout]

    pointer to type. Array on the GPU of dimension ldu*n.

    On entry, the matrix U. On exit, it is overwritten with U*Ub. (Not referenced if nu = 0).

  • ldu[in]

    rocblas_int. ldu >= nu.

    Specifies the leading dimension of U.

  • C[inout]

    pointer to type. Array on the GPU of dimension ldc*nc.

    On entry, the matrix C. On exit, it is overwritten with Ub’*C. (Not referenced if nc = 0).

  • ldc[in]

    rocblas_int. ldc >= n if nc > 0, or ldc >=1 if nc = 0.

    Specifies the leading dimension of C.

  • info[out]

    pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, i elements of E have not converged to zero.

Tridiagonal forms

rocsolver_<type>latrd()

rocblas_status rocsolver_zlatrd(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, double *E, rocblas_double_complex *tau, rocblas_double_complex *W, const rocblas_int ldw)
rocblas_status rocsolver_clatrd(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, float *E, rocblas_float_complex *tau, rocblas_float_complex *W, const rocblas_int ldw)
rocblas_status rocsolver_dlatrd(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *E, double *tau, double *W, const rocblas_int ldw)
rocblas_status rocsolver_slatrd(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *E, float *tau, float *W, const rocblas_int ldw)

LATRD computes the tridiagonal form of k rows and columns of a symmetric/hermitian matrix A, as well as the matrix W needed to update the remaining part of A.

The reduced form is given by:

T = Q' * A * Q

If uplo is lower, the first k rows and columns of T form a tridiagonal block, if uplo is upper, then the last k rows and columns of T form the tridiagonal block. Q is an orthogonal/unitary matrix represented as the product of Householder matrices

Q = H(1) * H(2) * ... *  H(k)  if uplo indicates lower, or
Q = H(n-1) * H(n-2) * ... * H(n-k) if uplo is upper.

Each Householder matrix H(i) is given by

H(i) = I - tau[i] * v(i) * v(i)'

where tau[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector v(i) are zero, and v(i)[i+1] = 1. If uplo is upper, the last n-i elements of the Householder vector v(i) are zero, and v(i)[i] = 1.

The unreduced part of the matrix A can be updated using a rank update of the form:

A = A - V * W' - W * V'

where V is an n-by-k matrix formed by the vectors v(i).

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower part of the matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of the matrix A.

  • k[in]

    rocblas_int. 0 <= k <= n.

    The number of rows and columns of the matrix A to be reduced.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the n-by-n matrix to be reduced. On exit, if uplo is lower, the first k columns have been reduced to tridiagonal form (given in the diagonal elements of A and the array E), the elements below the diagonal contain the vectors v(i) stored as columns. If uplo is upper, the last k columns have been reduced to tridiagonal form (given in the diagonal elements of A and the array E), the elements above the diagonal contain the vectors v(i) stored as columns.

  • lda[in]

    rocblas_int. lda >= n.

    specifies the leading dimension of A.

  • E[out]

    pointer to real type. Array on the GPU of dimension n-1.

    If upper (lower), the last (first) k elements of E are the off-diagonal elements of the computed tridiagonal block.

  • tau[out]

    pointer to type. Array on the GPU of dimension n-1.

    If upper (lower), the last (first) k elements of tau are the scalar factors of the Householder matrices H(i).

  • W[out]

    pointer to type. Array on the GPU of dimension ldw*k.

    The n-by-k matrix needed to update the unreduced part of A.

  • ldw[in]

    rocblas_int. ldw >= n.

    specifies the leading dimension of W.

rocsolver_<type>sterf()

rocblas_status rocsolver_dsterf(rocblas_handle handle, const rocblas_int n, double *D, double *E, rocblas_int *info)
rocblas_status rocsolver_ssterf(rocblas_handle handle, const rocblas_int n, float *D, float *E, rocblas_int *info)

STERF computes the eigenvalues of a symmetric tridiagonal matrix.

The eigenvalues of the symmetric tridiagonal matrix are computed by the Pal-Walker-Kahan variant of the QL/QR algorithm, and returned in increasing order.

The matrix is not represented explicitly, but rather as the array of diagonal elements D and the array of symmetric off-diagonal elements E as returned by, e.g., SYTRD or HETRD.

Parameters
  • handle[in] rocblas_handle.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of the tridiagonal matrix.

  • D[inout]

    pointer to real type. Array on the GPU of dimension n.

    On entry, the diagonal elements of the matrix. On exit, if info = 0, the eigenvalues in increasing order. If info > 0, the diagonal elements of a tridiagonal matrix that is similar to the original matrix (i.e. has the same eigenvalues).

  • E[inout]

    pointer to real type. Array on the GPU of dimension n-1.

    On entry, the off-diagonal elements of the matrix. On exit, if info = 0, this array converges to zero. If info > 0, the off-diagonal elements of a tridiagonal matrix that is similar to the original matrix (i.e. has the same eigenvalues).

  • info[out]

    pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, STERF did not converge. i elements of E did not converge to zero.

rocsolver_<type>steqr()

rocblas_status rocsolver_zsteqr(rocblas_handle handle, const rocblas_evect compC, const rocblas_int n, double *D, double *E, rocblas_double_complex *C, const rocblas_int ldc, rocblas_int *info)
rocblas_status rocsolver_csteqr(rocblas_handle handle, const rocblas_evect compC, const rocblas_int n, float *D, float *E, rocblas_float_complex *C, const rocblas_int ldc, rocblas_int *info)
rocblas_status rocsolver_dsteqr(rocblas_handle handle, const rocblas_evect compC, const rocblas_int n, double *D, double *E, double *C, const rocblas_int ldc, rocblas_int *info)
rocblas_status rocsolver_ssteqr(rocblas_handle handle, const rocblas_evect compC, const rocblas_int n, float *D, float *E, float *C, const rocblas_int ldc, rocblas_int *info)

STEQR computes the eigenvalues and (optionally) eigenvectors of a symmetric tridiagonal matrix.

The eigenvalues of the symmetric tridiagonal matrix are computed by the implicit QL/QR algorithm, and returned in increasing order.

The matrix is not represented explicitly, but rather as the array of diagonal elements D and the array of symmetric off-diagonal elements E as returned by, e.g., SYTRD or HETRD. If the tridiagonal matrix is the reduced form of a full symmetric/Hermitian matrix as returned by, e.g., SYTRD or HETRD, then the eigenvectors of the original matrix can also be computed, depending on the value of compC.

Parameters
  • handle[in] rocblas_handle.

  • compC[in] rocblas_evect

    .

    Specifies how the eigenvectors are computed.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of the tridiagonal matrix.

  • D[inout]

    pointer to real type. Array on the GPU of dimension n.

    On entry, the diagonal elements of the matrix. On exit, if info = 0, the eigenvalues in increasing order. If info > 0, the diagonal elements of a tridiagonal matrix that is similar to the original matrix (i.e. has the same eigenvalues).

  • E[inout]

    pointer to real type. Array on the GPU of dimension n-1.

    On entry, the off-diagonal elements of the matrix. On exit, if info = 0, this array converges to zero. If info > 0, the off-diagonal elements of a tridiagonal matrix that is similar to the original matrix (i.e. has the same eigenvalues).

  • C[inout]

    pointer to type. Array on the GPU of dimension ldc*n.

    On entry, if compC is original, the orthogonal/unitary matrix used for the reduction to tridiagonal form as returned by, e.g., ORGTR or UNGTR. On exit, it is overwritten with the eigenvectors of the original symmetric/Hermitian matrix (if compC is original), or the eigenvectors of the tridiagonal matrix (if compC is tridiagonal). (Not referenced if compC is none).

  • ldc[in]

    rocblas_int. ldc >= n if compc is original or tridiagonal.

    Specifies the leading dimension of C. (Not referenced if compC is none).

  • info[out]

    pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, STEQR did not converge. i elements of E did not converge to zero.

Orthonormal matrices

rocsolver_<type>org2r()

rocblas_status rocsolver_dorg2r(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sorg2r(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)

ORG2R generates a m-by-n Matrix Q with orthonormal columns.

(This is the unblocked version of the algorithm).

The matrix Q is defined as the first n columns of the product of k Householder reflectors of order m

Q = H(1) * H(2) * ... * H(k)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GEQRF.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix Q.

  • n[in]

    rocblas_int. 0 <= n <= m.

    The number of columns of the matrix Q.

  • k[in]

    rocblas_int. 0 <= k <= n.

    The number of Householder reflectors.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th column has Householder vector v(i), for i = 1,2,…,k as returned in the first k columns of matrix A of GEQRF. On exit, the computed matrix Q.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQRF.

rocsolver_<type>orgqr()

rocblas_status rocsolver_dorgqr(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sorgqr(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)

ORGQR generates a m-by-n Matrix Q with orthonormal columns.

(This is the blocked version of the algorithm).

The matrix Q is defined as the first n columns of the product of k Householder reflectors of order m

Q = H(1) * H(2) * ... * H(k)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GEQRF.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix Q.

  • n[in]

    rocblas_int. 0 <= n <= m.

    The number of columns of the matrix Q.

  • k[in]

    rocblas_int. 0 <= k <= n.

    The number of Householder reflectors.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th column has Householder vector v(i), for i = 1,2,…,k as returned in the first k columns of matrix A of GEQRF. On exit, the computed matrix Q.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQRF.

rocsolver_<type>orgl2()

rocblas_status rocsolver_dorgl2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sorgl2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)

ORGL2 generates a m-by-n Matrix Q with orthonormal rows.

(This is the unblocked version of the algorithm).

The matrix Q is defined as the first m rows of the product of k Householder reflectors of order n

Q = H(k) * H(k-1) * ... * H(1)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GELQF.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. 0 <= m <= n.

    The number of rows of the matrix Q.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix Q.

  • k[in]

    rocblas_int. 0 <= k <= m.

    The number of Householder reflectors.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th row has Householder vector v(i), for i = 1,2,…,k as returned in the first k rows of matrix A of GELQF. On exit, the computed matrix Q.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GELQF.

rocsolver_<type>orglq()

rocblas_status rocsolver_dorglq(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sorglq(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)

ORGLQ generates a m-by-n Matrix Q with orthonormal rows.

(This is the blocked version of the algorithm).

The matrix Q is defined as the first m rows of the product of k Householder reflectors of order n

Q = H(k) * H(k-1) * ... * H(1)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GELQF.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. 0 <= m <= n.

    The number of rows of the matrix Q.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix Q.

  • k[in]

    rocblas_int. 0 <= k <= m.

    The number of Householder reflectors.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th row has Householder vector v(i), for i = 1,2,…,k as returned in the first k rows of matrix A of GELQF. On exit, the computed matrix Q.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GELQF.

rocsolver_<type>org2l()

rocblas_status rocsolver_dorg2l(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sorg2l(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)

ORG2L generates a m-by-n Matrix Q with orthonormal columns.

(This is the unblocked version of the algorithm).

The matrix Q is defined as the last n columns of the product of k Householder reflectors of order m

Q = H(k) * H(k-1) * ... * H(1)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GEQLF.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix Q.

  • n[in]

    rocblas_int. 0 <= n <= m.

    The number of columns of the matrix Q.

  • k[in]

    rocblas_int. 0 <= k <= n.

    The number of Householder reflectors.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the (n-k+i)-th column has Householder vector v(i), for i = 1,2,…,k as returned in the last k columns of matrix A of GEQLF. On exit, the computed matrix Q.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQLF.

rocsolver_<type>orgql()

rocblas_status rocsolver_dorgql(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sorgql(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)

ORGQL generates a m-by-n Matrix Q with orthonormal columns.

(This is the blocked version of the algorithm).

The matrix Q is defined as the last n column of the product of k Householder reflectors of order m

Q = H(k) * H(k-1) * ... * H(1)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GEQLF.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix Q.

  • n[in]

    rocblas_int. 0 <= n <= m.

    The number of columns of the matrix Q.

  • k[in]

    rocblas_int. 0 <= k <= n.

    The number of Householder reflectors.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the (n-k+i)-th column has Householder vector v(i), for i = 1,2,…,k as returned in the last k columns of matrix A of GEQLF. On exit, the computed matrix Q.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQLF.

rocsolver_<type>orgbr()

rocblas_status rocsolver_dorgbr(rocblas_handle handle, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sorgbr(rocblas_handle handle, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)

ORGBR generates a m-by-n Matrix Q with orthonormal rows or columns.

If storev is column-wise, then the matrix Q has orthonormal columns. If m >= k, Q is defined as the first n columns of the product of k Householder reflectors of order m

Q = H(1) * H(2) * ... * H(k)

If m < k, Q is defined as the product of Householder reflectors of order m

Q = H(1) * H(2) * ... * H(m-1)

On the other hand, if storev is row-wise, then the matrix Q has orthonormal rows. If n > k, Q is defined as the first m rows of the product of k Householder reflectors of order n

Q = H(k) * H(k-1) * ... * H(1)

If n <= k, Q is defined as the product of Householder reflectors of order n

Q = H(n-1) * H(n-2) * ... * H(1)

The Householder matrices H(i) are never stored, they are computed from its corresponding Householder vectors v(i) and scalars ipiv_i as returned by GEBRD in its arguments A and tauq or taup.

Parameters
  • handle[in] rocblas_handle.

  • storev[in] rocblas_storev

    .

    Specifies whether to work column-wise or row-wise.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix Q. If row-wise, then min(n,k) <= m <= n.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix Q. If column-wise, then min(m,k) <= n <= m.

  • k[in]

    rocblas_int. k >= 0.

    The number of columns (if storev is colum-wise) or rows (if row-wise) of the original matrix reduced by GEBRD.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th column (or row) has the Householder vector v(i) as returned by GEBRD. On exit, the computed matrix Q.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension min(m,k) if column-wise, or min(n,k) if row-wise.

    The scalar factors of the Householder matrices H(i) as returned by GEBRD.

rocsolver_<type>orgtr()

rocblas_status rocsolver_dorgtr(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sorgtr(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

ORGTR generates a n-by-n orthogonal Matrix Q.

Q is defined as the product of n-1 Householder reflectors of order n. If uplo indicates upper, then Q has the form

Q = H(n-1) * H(n-2) * ... * H(1)

On the other hand, if uplo indicates lower, then Q has the form

Q = H(1) * H(2) * ... * H(n-1)

The Householder matrices H(i) are never stored, they are computed from its corresponding Householder vectors v(i) and scalars ipiv_i as returned by SYTRD in its arguments A and tau.

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether the SYTRD factorization was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of the matrix Q.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the (i+1)-th column (if uplo indicates upper) or i-th column (if uplo indicates lower) has the Householder vector v(i) as returned by SYTRD. On exit, the computed matrix Q.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension n-1.

    The scalar factors of the Householder matrices H(i) as returned by SYTRD.

rocsolver_<type>orm2r()

rocblas_status rocsolver_dorm2r(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)
rocblas_status rocsolver_sorm2r(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)

ORM2R applies a matrix Q with orthonormal columns to a general m-by-n matrix C.

(This is the unblocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Transpose from the right)

Q is an orthogonal matrix defined as the product of k Householder reflectors as

Q = H(1) * H(2) * ... * H(k)

of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QR factorization GEQRF.

Parameters
  • handle[in] rocblas_handle.

  • side[in]

    rocblas_side.

    Specifies from which side to apply Q.

  • trans[in]

    rocblas_operation.

    Specifies whether the matrix Q or its transpose is to be applied.

  • m[in]

    rocblas_int. m >= 0.

    Number of rows of matrix C.

  • n[in]

    rocblas_int. n >= 0.

    Number of columns of matrix C.

  • k[in]

    rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • A[in]

    pointer to type. Array on the GPU of size lda*k.

    The i-th column has the Householder vector v(i) associated with H(i) as returned by GEQRF in the first k columns of its argument A.

  • lda[in]

    rocblas_int. lda >= m if side is left, or lda >= n if side is right.

    Leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQRF.

  • C[inout]

    pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • ldc[in]

    rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>ormqr()

rocblas_status rocsolver_dormqr(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)
rocblas_status rocsolver_sormqr(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)

ORMQR applies a matrix Q with orthonormal columns to a general m-by-n matrix C.

(This is the blocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Transpose from the right)

Q is an orthogonal matrix defined as the product of k Householder reflectors as

Q = H(1) * H(2) * ... * H(k)

of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QR factorization GEQRF.

Parameters
  • handle[in] rocblas_handle.

  • side[in]

    rocblas_side.

    Specifies from which side to apply Q.

  • trans[in]

    rocblas_operation.

    Specifies whether the matrix Q or its transpose is to be applied.

  • m[in]

    rocblas_int. m >= 0.

    Number of rows of matrix C.

  • n[in]

    rocblas_int. n >= 0.

    Number of columns of matrix C.

  • k[in]

    rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • A[in]

    pointer to type. Array on the GPU of size lda*k.

    The i-th column has the Householder vector v(i) associated with H(i) as returned by GEQRF in the first k columns of its argument A.

  • lda[in]

    rocblas_int. lda >= m if side is left, or lda >= n if side is right.

    Leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQRF.

  • C[inout]

    pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • ldc[in]

    rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>orml2()

rocblas_status rocsolver_dorml2(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)
rocblas_status rocsolver_sorml2(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)

ORML2 applies a matrix Q with orthonormal rows to a general m-by-n matrix C.

(This is the unblocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Transpose from the right)

Q is an orthogonal matrix defined as the product of k Householder reflectors as

Q = H(k) * H(k-1) * ... * H(1)

of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the LQ factorization GELQF.

Parameters
  • handle[in] rocblas_handle.

  • side[in]

    rocblas_side.

    Specifies from which side to apply Q.

  • trans[in]

    rocblas_operation.

    Specifies whether the matrix Q or its transpose is to be applied.

  • m[in]

    rocblas_int. m >= 0.

    Number of rows of matrix C.

  • n[in]

    rocblas_int. n >= 0.

    Number of columns of matrix C.

  • k[in]

    rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • A[in]

    pointer to type. Array on the GPU of size lda*m if side is left, or lda*n if side is right.

    The i-th row has the Householder vector v(i) associated with H(i) as returned by GELQF in the first k rows of its argument A.

  • lda[in]

    rocblas_int. lda >= k.

    Leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GELQF.

  • C[inout]

    pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • ldc[in]

    rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>ormlq()

rocblas_status rocsolver_dormlq(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)
rocblas_status rocsolver_sormlq(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)

ORMLQ applies a matrix Q with orthonormal rows to a general m-by-n matrix C.

(This is the blocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Transpose from the right)

Q is an orthogonal matrix defined as the product of k Householder reflectors as

Q = H(k) * H(k-1) * ... * H(1)

of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the LQ factorization GELQF.

Parameters
  • handle[in] rocblas_handle.

  • side[in]

    rocblas_side.

    Specifies from which side to apply Q.

  • trans[in]

    rocblas_operation.

    Specifies whether the matrix Q or its transpose is to be applied.

  • m[in]

    rocblas_int. m >= 0.

    Number of rows of matrix C.

  • n[in]

    rocblas_int. n >= 0.

    Number of columns of matrix C.

  • k[in]

    rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • A[in]

    pointer to type. Array on the GPU of size lda*m if side is left, or lda*n if side is right.

    The i-th row has the Householder vector v(i) associated with H(i) as returned by GELQF in the first k rows of its argument A.

  • lda[in]

    rocblas_int. lda >= k.

    Leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GELQF.

  • C[inout]

    pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • ldc[in]

    rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>orm2l()

rocblas_status rocsolver_dorm2l(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)
rocblas_status rocsolver_sorm2l(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)

ORM2L applies a matrix Q with orthonormal columns to a general m-by-n matrix C.

(This is the unblocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Transpose from the right)

Q is an orthogonal matrix defined as the product of k Householder reflectors as

Q = H(k) * H(k-1) * ... * H(1)

of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QL factorization GEQLF.

Parameters
  • handle[in] rocblas_handle.

  • side[in]

    rocblas_side.

    Specifies from which side to apply Q.

  • trans[in]

    rocblas_operation.

    Specifies whether the matrix Q or its transpose is to be applied.

  • m[in]

    rocblas_int. m >= 0.

    Number of rows of matrix C.

  • n[in]

    rocblas_int. n >= 0.

    Number of columns of matrix C.

  • k[in]

    rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • A[in]

    pointer to type. Array on the GPU of size lda*k.

    The i-th column has the Householder vector v(i) associated with H(i) as returned by GEQLF in the last k columns of its argument A.

  • lda[in]

    rocblas_int. lda >= m if side is left, lda >= n if side is right.

    Leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQLF.

  • C[inout]

    pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • ldc[in]

    rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>ormql()

rocblas_status rocsolver_dormql(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)
rocblas_status rocsolver_sormql(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)

ORMQL applies a matrix Q with orthonormal columns to a general m-by-n matrix C.

(This is the blocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Transpose from the right)

Q is an orthogonal matrix defined as the product of k Householder reflectors as

Q = H(k) * H(k-1) * ... * H(1)

of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QL factorization GEQLF.

Parameters
  • handle[in] rocblas_handle.

  • side[in]

    rocblas_side.

    Specifies from which side to apply Q.

  • trans[in]

    rocblas_operation.

    Specifies whether the matrix Q or its transpose is to be applied.

  • m[in]

    rocblas_int. m >= 0.

    Number of rows of matrix C.

  • n[in]

    rocblas_int. n >= 0.

    Number of columns of matrix C.

  • k[in]

    rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • A[in]

    pointer to type. Array on the GPU of size lda*k.

    The i-th column has the Householder vector v(i) associated with H(i) as returned by GEQLF in the last k columns of its argument A.

  • lda[in]

    rocblas_int. lda >= m if side is left, lda >= n if side is right.

    Leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQLF.

  • C[inout]

    pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • ldc[in]

    rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>ormbr()

rocblas_status rocsolver_dormbr(rocblas_handle handle, const rocblas_storev storev, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)
rocblas_status rocsolver_sormbr(rocblas_handle handle, const rocblas_storev storev, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)

ORMBR applies a matrix Q with orthonormal rows or columns to a general m-by-n matrix C.

If storev is column-wise, then the matrix Q has orthonormal columns. If storev is row-wise, then the matrix Q has orthonormal rows. The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Transpose from the right)

The order nq of orthogonal matrix Q is nq = m if applying from the left, or nq = n if applying from the right.

When storev is column-wise, if nq >= k, then Q is defined as the product of k Householder reflectors of order nq

Q = H(1) * H(2) * ... * H(k),

and if nq < k, then Q is defined as the product

Q = H(1) * H(2) * ... * H(nq-1).

When storev is row-wise, if nq > k, then Q is defined as the product of k Householder reflectors of order nq

Q = H(1) * H(2) * ... * H(k),

and if n <= k, Q is defined as the product

Q = H(1) * H(2) * ... * H(nq-1)

The Householder matrices H(i) are never stored, they are computed from its corresponding Householder vectors v(i) and scalars ipiv_i as returned by GEBRD in its arguments A and tauq or taup.

Parameters
  • handle[in] rocblas_handle.

  • storev[in] rocblas_storev

    .

    Specifies whether to work column-wise or row-wise.

  • side[in]

    rocblas_side.

    Specifies from which side to apply Q.

  • trans[in]

    rocblas_operation.

    Specifies whether the matrix Q or its transpose is to be applied.

  • m[in]

    rocblas_int. m >= 0.

    Number of rows of matrix C.

  • n[in]

    rocblas_int. n >= 0.

    Number of columns of matrix C.

  • k[in]

    rocblas_int. k >= 0.

    The number of columns (if storev is colum-wise) or rows (if row-wise) of the original matrix reduced by GEBRD.

  • A[in]

    pointer to type. Array on the GPU of size lda*min(nq,k) if column-wise, or lda*nq if row-wise.

    The i-th column (or row) has the Householder vector v(i) associated with H(i) as returned by GEBRD.

  • lda[in]

    rocblas_int. lda >= nq if column-wise, or lda >= min(nq,k) if row-wise.

    Leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least min(nq,k).

    The scalar factors of the Householder matrices H(i) as returned by GEBRD.

  • C[inout]

    pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • ldc[in]

    rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>ormtr()

rocblas_status rocsolver_dormtr(rocblas_handle handle, const rocblas_side side, const rocblas_fill uplo, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)
rocblas_status rocsolver_sormtr(rocblas_handle handle, const rocblas_side side, const rocblas_fill uplo, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)

ORMTR applies an orthogonal matrix Q to a general m-by-n matrix C.

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Transpose from the right)

The order nq of orthogonal matrix Q is nq = m if applying from the left, or nq = n if applying from the right.

Q is defined as the product of nq-1 Householder reflectors of order nq. If uplo indicates upper, then Q has the form

Q = H(nq-1) * H(nq-2) * ... * H(1).

On the other hand, if uplo indicates lower, then Q has the form

Q = H(1) * H(2) * ... * H(nq-1)

The Householder matrices H(i) are never stored, they are computed from its corresponding Householder vectors v(i) and scalars ipiv_i as returned by SYTRD in its arguments A and tau.

Parameters
  • handle[in] rocblas_handle.

  • side[in]

    rocblas_side.

    Specifies from which side to apply Q.

  • uplo[in]

    rocblas_fill.

    Specifies whether the SYTRD factorization was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • trans[in]

    rocblas_operation.

    Specifies whether the matrix Q or its transpose is to be applied.

  • m[in]

    rocblas_int. m >= 0.

    Number of rows of matrix C.

  • n[in]

    rocblas_int. n >= 0.

    Number of columns of matrix C.

  • A[in]

    pointer to type. Array on the GPU of size lda*nq.

    On entry, the (i+1)-th column (if uplo indicates upper) or i-th column (if uplo indicates lower) has the Householder vector v(i) as returned by SYTRD.

  • lda[in]

    rocblas_int. lda >= nq.

    Leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least nq-1.

    The scalar factors of the Householder matrices H(i) as returned by SYTRD.

  • C[inout]

    pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • ldc[in]

    rocblas_int. ldc >= m.

    Leading dimension of C.

Unitary matrices

rocsolver_<type>ung2r()

rocblas_status rocsolver_zung2r(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cung2r(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)

UNG2R generates a m-by-n complex Matrix Q with orthonormal columns.

(This is the unblocked version of the algorithm).

The matrix Q is defined as the first n columns of the product of k Householder reflectors of order m

Q = H(1) * H(2) * ... * H(k)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GEQRF.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix Q.

  • n[in]

    rocblas_int. 0 <= n <= m.

    The number of columns of the matrix Q.

  • k[in]

    rocblas_int. 0 <= k <= n.

    The number of Householder reflectors.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th column has Householder vector v(i), for i = 1,2,…,k as returned in the first k columns of matrix A of GEQRF. On exit, the computed matrix Q.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQRF.

rocsolver_<type>ungqr()

rocblas_status rocsolver_zungqr(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cungqr(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)

UNGQR generates a m-by-n complex Matrix Q with orthonormal columns.

(This is the blocked version of the algorithm).

The matrix Q is defined as the first n columns of the product of k Householder reflectors of order m

Q = H(1) * H(2) * ... * H(k)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GEQRF.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix Q.

  • n[in]

    rocblas_int. 0 <= n <= m.

    The number of columns of the matrix Q.

  • k[in]

    rocblas_int. 0 <= k <= n.

    The number of Householder reflectors.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th column has Householder vector v(i), for i = 1,2,…,k as returned in the first k columns of matrix A of GEQRF. On exit, the computed matrix Q.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQRF.

rocsolver_<type>ungl2()

rocblas_status rocsolver_zungl2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cungl2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)

UNGL2 generates a m-by-n complex Matrix Q with orthonormal rows.

(This is the unblocked version of the algorithm).

The matrix Q is defined as the first m rows of the product of k Householder reflectors of order n

Q = H(k)**H * H(k-1)**H * ... * H(1)**H

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GELQF.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. 0 <= m <= n.

    The number of rows of the matrix Q.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix Q.

  • k[in]

    rocblas_int. 0 <= k <= m.

    The number of Householder reflectors.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th row has Householder vector v(i), for i = 1,2,…,k as returned in the first k rows of matrix A of GELQF. On exit, the computed matrix Q.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GELQF.

rocsolver_<type>unglq()

rocblas_status rocsolver_zunglq(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cunglq(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)

UNGLQ generates a m-by-n complex Matrix Q with orthonormal rows.

(This is the blocked version of the algorithm).

The matrix Q is defined as the first m rows of the product of k Householder reflectors of order n

Q = H(k)**H * H(k-1)**H * ... * H(1)**H

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GELQF.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. 0 <= m <= n.

    The number of rows of the matrix Q.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix Q.

  • k[in]

    rocblas_int. 0 <= k <= m.

    The number of Householder reflectors.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th row has Householder vector v(i), for i = 1,2,…,k as returned in the first k rows of matrix A of GELQF. On exit, the computed matrix Q.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GELQF.

rocsolver_<type>ung2l()

rocblas_status rocsolver_zung2l(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cung2l(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)

UNG2L generates a m-by-n complex Matrix Q with orthonormal columns.

(This is the unblocked version of the algorithm).

The matrix Q is defined as the last n columns of the product of k Householder reflectors of order m

Q = H(k) * H(k-1) * ... * H(1)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GEQLF.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix Q.

  • n[in]

    rocblas_int. 0 <= n <= m.

    The number of columns of the matrix Q.

  • k[in]

    rocblas_int. 0 <= k <= n.

    The number of Householder reflectors.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the (n-k+i)-th column has Householder vector v(i), for i = 1,2,…,k as returned in the last k columns of matrix A of GEQLF. On exit, the computed matrix Q.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQLF.

rocsolver_<type>ungql()

rocblas_status rocsolver_zungql(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cungql(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)

UNGQL generates a m-by-n complex Matrix Q with orthonormal columns.

(This is the blocked version of the algorithm).

The matrix Q is defined as the last n columns of the product of k Householder reflectors of order m

Q = H(k) * H(k-1) * ... * H(1)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GEQLF.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix Q.

  • n[in]

    rocblas_int. 0 <= n <= m.

    The number of columns of the matrix Q.

  • k[in]

    rocblas_int. 0 <= k <= n.

    The number of Householder reflectors.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the (n-k+i)-th column has Householder vector v(i), for i = 1,2,…,k as returned in the last k columns of matrix A of GEQLF. On exit, the computed matrix Q.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQLF.

rocsolver_<type>ungbr()

rocblas_status rocsolver_zungbr(rocblas_handle handle, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cungbr(rocblas_handle handle, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)

UNGBR generates a m-by-n complex Matrix Q with orthonormal rows or columns.

If storev is column-wise, then the matrix Q has orthonormal columns. If m >= k, Q is defined as the first n columns of the product of k Householder reflectors of order m

Q = H(1) * H(2) * ... * H(k)

If m < k, Q is defined as the product of Householder reflectors of order m

Q = H(1) * H(2) * ... * H(m-1)

On the other hand, if storev is row-wise, then the matrix Q has orthonormal rows. If n > k, Q is defined as the first m rows of the product of k Householder reflectors of order n

Q = H(k) * H(k-1) * ... * H(1)

If n <= k, Q is defined as the product of Householder reflectors of order n

Q = H(n-1) * H(n-2) * ... * H(1)

The Householder matrices H(i) are never stored, they are computed from its corresponding Householder vectors v(i) and scalars ipiv_i as returned by GEBRD in its arguments A and tauq or taup.

Parameters
  • handle[in] rocblas_handle.

  • storev[in] rocblas_storev

    .

    Specifies whether to work column-wise or row-wise.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix Q. If row-wise, then min(n,k) <= m <= n.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix Q. If column-wise, then min(m,k) <= n <= m.

  • k[in]

    rocblas_int. k >= 0.

    The number of columns (if storev is colum-wise) or rows (if row-wise) of the original matrix reduced by GEBRD.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th column (or row) has the Householder vector v(i) as returned by GEBRD. On exit, the computed matrix Q.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension min(m,k) if column-wise, or min(n,k) if row-wise.

    The scalar factors of the Householder matrices H(i) as returned by GEBRD.

rocsolver_<type>ungtr()

rocblas_status rocsolver_zungtr(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cungtr(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)

UNGTR generates a n-by-n unitary Matrix Q.

Q is defined as the product of n-1 Householder reflectors of order n. If uplo indicates upper, then Q has the form

Q = H(n-1) * H(n-2) * ... * H(1)

On the other hand, if uplo indicates lower, then Q has the form

Q = H(1) * H(2) * ... * H(n-1)

The Householder matrices H(i) are never stored, they are computed from its corresponding Householder vectors v(i) and scalars ipiv_i as returned by HETRD in its arguments A and tau.

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether the HETRD factorization was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of the matrix Q.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the (i+1)-th column (if uplo indicates upper) or i-th column (if uplo indicates lower) has the Householder vector v(i) as returned by HETRD. On exit, the computed matrix Q.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension n-1.

    The scalar factors of the Householder matrices H(i) as returned by HETRD.

rocsolver_<type>unm2r()

rocblas_status rocsolver_zunm2r(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)
rocblas_status rocsolver_cunm2r(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)

UNM2R applies a complex matrix Q with orthonormal columns to a general m-by-n matrix C.

(This is the unblocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Conjugate transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Conjugate transpose from the right)

Q is a unitary matrix defined as the product of k Householder reflectors as

Q = H(1) * H(2) * ... * H(k)

of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QR factorization GEQRF.

Parameters
  • handle[in] rocblas_handle.

  • side[in]

    rocblas_side.

    Specifies from which side to apply Q.

  • trans[in]

    rocblas_operation.

    Specifies whether the matrix Q or its conjugate transpose is to be applied.

  • m[in]

    rocblas_int. m >= 0.

    Number of rows of matrix C.

  • n[in]

    rocblas_int. n >= 0.

    Number of columns of matrix C.

  • k[in]

    rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • A[in]

    pointer to type. Array on the GPU of size lda*k.

    The i-th column has the Householder vector v(i) associated with H(i) as returned by GEQRF in the first k columns of its argument A.

  • lda[in]

    rocblas_int. lda >= m if side is left, or lda >= n if side is right.

    Leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQRF.

  • C[inout]

    pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • ldc[in]

    rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>unmqr()

rocblas_status rocsolver_zunmqr(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)
rocblas_status rocsolver_cunmqr(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)

UNMQR applies a complex matrix Q with orthonormal columns to a general m-by-n matrix C.

(This is the blocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Conjugate transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Conjugate transpose from the right)

Q is a unitary matrix defined as the product of k Householder reflectors as

Q = H(1) * H(2) * ... * H(k)

of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QR factorization GEQRF.

Parameters
  • handle[in] rocblas_handle.

  • side[in]

    rocblas_side.

    Specifies from which side to apply Q.

  • trans[in]

    rocblas_operation.

    Specifies whether the matrix Q or its conjugate transpose is to be applied.

  • m[in]

    rocblas_int. m >= 0.

    Number of rows of matrix C.

  • n[in]

    rocblas_int. n >= 0.

    Number of columns of matrix C.

  • k[in]

    rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • A[in]

    pointer to type. Array on the GPU of size lda*k.

    The i-th column has the Householder vector v(i) associated with H(i) as returned by GEQRF in the first k columns of its argument A.

  • lda[in]

    rocblas_int. lda >= m if side is left, or lda >= n if side is right.

    Leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQRF.

  • C[inout]

    pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • ldc[in]

    rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>unml2()

rocblas_status rocsolver_zunml2(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)
rocblas_status rocsolver_cunml2(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)

UNML2 applies a complex matrix Q with orthonormal rows to a general m-by-n matrix C.

(This is the unblocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Conjugate transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Conjugate transpose from the right)

Q is a unitary matrix defined as the product of k Householder reflectors as

Q = H(k)**H * H(k-1)**H * ... * H(1)**H

of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the LQ factorization GELQF.

Parameters
  • handle[in] rocblas_handle.

  • side[in]

    rocblas_side.

    Specifies from which side to apply Q.

  • trans[in]

    rocblas_operation.

    Specifies whether the matrix Q or its conjugate transpose is to be applied.

  • m[in]

    rocblas_int. m >= 0.

    Number of rows of matrix C.

  • n[in]

    rocblas_int. n >= 0.

    Number of columns of matrix C.

  • k[in]

    rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • A[in]

    pointer to type. Array on the GPU of size lda*m if side is left, or lda*n if side is right.

    The i-th row has the Householder vector v(i) associated with H(i) as returned by GELQF in the first k rows of its argument A.

  • lda[in]

    rocblas_int. lda >= k.

    Leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GELQF.

  • C[inout]

    pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • ldc[in]

    rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>unmlq()

rocblas_status rocsolver_zunmlq(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)
rocblas_status rocsolver_cunmlq(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)

UNMLQ applies a complex matrix Q with orthonormal rows to a general m-by-n matrix C.

(This is the blocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Conjugate transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Conjugate transpose from the right)

Q is a unitary matrix defined as the product of k Householder reflectors as

Q = H(k)**H * H(k-1)**H * ... * H(1)**H

of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the LQ factorization GELQF.

Parameters
  • handle[in] rocblas_handle.

  • side[in]

    rocblas_side.

    Specifies from which side to apply Q.

  • trans[in]

    rocblas_operation.

    Specifies whether the matrix Q or its conjugate transpose is to be applied.

  • m[in]

    rocblas_int. m >= 0.

    Number of rows of matrix C.

  • n[in]

    rocblas_int. n >= 0.

    Number of columns of matrix C.

  • k[in]

    rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • A[in]

    pointer to type. Array on the GPU of size lda*m if side is left, or lda*n if side is right.

    The i-th row has the Householder vector v(i) associated with H(i) as returned by GELQF in the first k rows of its argument A.

  • lda[in]

    rocblas_int. lda >= k.

    Leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GELQF.

  • C[inout]

    pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • ldc[in]

    rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>unm2l()

rocblas_status rocsolver_zunm2l(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)
rocblas_status rocsolver_cunm2l(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)

UNM2L applies a complex matrix Q with orthonormal columns to a general m-by-n matrix C.

(This is the unblocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Conjugate transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Conjugate transpose from the right)

Q is a unitary matrix defined as the product of k Householder reflectors as

Q = H(k) * H(k-1) * ... * H(1)

of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QL factorization GEQLF.

Parameters
  • handle[in] rocblas_handle.

  • side[in]

    rocblas_side.

    Specifies from which side to apply Q.

  • trans[in]

    rocblas_operation.

    Specifies whether the matrix Q or its conjugate transpose is to be applied.

  • m[in]

    rocblas_int. m >= 0.

    Number of rows of matrix C.

  • n[in]

    rocblas_int. n >= 0.

    Number of columns of matrix C.

  • k[in]

    rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • A[in]

    pointer to type. Array on the GPU of size lda*k.

    The i-th column has the Householder vector v(i) associated with H(i) as returned by GEQLF in the last k columns of its argument A.

  • lda[in]

    rocblas_int. lda >= m if side is left, lda >= n if side is right.

    Leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQLF.

  • C[inout]

    pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • ldc[in]

    rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>unmql()

rocblas_status rocsolver_zunmql(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)
rocblas_status rocsolver_cunmql(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)

UNMQL applies a complex matrix Q with orthonormal columns to a general m-by-n matrix C.

(This is the blocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Conjugate transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Conjugate transpose from the right)

Q is a unitary matrix defined as the product of k Householder reflectors as

Q = H(k) * H(k-1) * ... * H(1)

of order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QL factorization GEQLF.

Parameters
  • handle[in] rocblas_handle.

  • side[in]

    rocblas_side.

    Specifies from which side to apply Q.

  • trans[in]

    rocblas_operation.

    Specifies whether the matrix Q or its conjugate transpose is to be applied.

  • m[in]

    rocblas_int. m >= 0.

    Number of rows of matrix C.

  • n[in]

    rocblas_int. n >= 0.

    Number of columns of matrix C.

  • k[in]

    rocblas_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • A[in]

    pointer to type. Array on the GPU of size lda*k.

    The i-th column has the Householder vector v(i) associated with H(i) as returned by GEQLF in the last k columns of its argument A.

  • lda[in]

    rocblas_int. lda >= m if side is left, lda >= n if side is right.

    Leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQLF.

  • C[inout]

    pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • ldc[in]

    rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>unmbr()

rocblas_status rocsolver_zunmbr(rocblas_handle handle, const rocblas_storev storev, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)
rocblas_status rocsolver_cunmbr(rocblas_handle handle, const rocblas_storev storev, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)

UNMBR applies a complex matrix Q with orthonormal rows or columns to a general m-by-n matrix C.

If storev is column-wise, then the matrix Q has orthonormal columns. If storev is row-wise, then the matrix Q has orthonormal rows. The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Conjugate transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Conjugate transpose from the right)

The order nq of unitary matrix Q is nq = m if applying from the left, or nq = n if applying from the right.

When storev is column-wise, if nq >= k, then Q is defined as the product of k Householder reflectors of order nq

Q = H(1) * H(2) * ... * H(k),

and if nq < k, then Q is defined as the product

Q = H(1) * H(2) * ... * H(nq-1).

When storev is row-wise, if nq > k, then Q is defined as the product of k Householder reflectors of order nq

Q = H(1) * H(2) * ... * H(k),

and if n <= k, Q is defined as the product

Q = H(1) * H(2) * ... * H(nq-1)

The Householder matrices H(i) are never stored, they are computed from its corresponding Householder vectors v(i) and scalars ipiv_i as returned by GEBRD in its arguments A and tauq or taup.

Parameters
  • handle[in] rocblas_handle.

  • storev[in] rocblas_storev

    .

    Specifies whether to work column-wise or row-wise.

  • side[in]

    rocblas_side.

    Specifies from which side to apply Q.

  • trans[in]

    rocblas_operation.

    Specifies whether the matrix Q or its conjugate transpose is to be applied.

  • m[in]

    rocblas_int. m >= 0.

    Number of rows of matrix C.

  • n[in]

    rocblas_int. n >= 0.

    Number of columns of matrix C.

  • k[in]

    rocblas_int. k >= 0.

    The number of columns (if storev is colum-wise) or rows (if row-wise) of the original matrix reduced by GEBRD.

  • A[in]

    pointer to type. Array on the GPU of size lda*min(nq,k) if column-wise, or lda*nq if row-wise.

    The i-th column (or row) has the Householder vector v(i) associated with H(i) as returned by GEBRD.

  • lda[in]

    rocblas_int. lda >= nq if column-wise, or lda >= min(nq,k) if row-wise.

    Leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least min(nq,k).

    The scalar factors of the Householder matrices H(i) as returned by GEBRD.

  • C[inout]

    pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • ldc[in]

    rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>unmtr()

rocblas_status rocsolver_zunmtr(rocblas_handle handle, const rocblas_side side, const rocblas_fill uplo, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)
rocblas_status rocsolver_cunmtr(rocblas_handle handle, const rocblas_side side, const rocblas_fill uplo, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)

UNMTR applies a unitary matrix Q to a general m-by-n matrix C.

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Conjugate transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Conjugate transpose from the right)

The order nq of unitary matrix Q is nq = m if applying from the left, or nq = n if applying from the right.

Q is defined as the product of nq-1 Householder reflectors of order nq. If uplo indicates upper, then Q has the form

Q = H(nq-1) * H(nq-2) * ... * H(1).

On the other hand, if uplo indicates lower, then Q has the form

Q = H(1) * H(2) * ... * H(nq-1)

The Householder matrices H(i) are never stored, they are computed from its corresponding Householder vectors v(i) and scalars ipiv_i as returned by HETRD in its arguments A and tau.

Parameters
  • handle[in] rocblas_handle.

  • side[in]

    rocblas_side.

    Specifies from which side to apply Q.

  • uplo[in]

    rocblas_fill.

    Specifies whether the SYTRD factorization was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • trans[in]

    rocblas_operation.

    Specifies whether the matrix Q or its conjugate transpose is to be applied.

  • m[in]

    rocblas_int. m >= 0.

    Number of rows of matrix C.

  • n[in]

    rocblas_int. n >= 0.

    Number of columns of matrix C.

  • A[in]

    pointer to type. Array on the GPU of size lda*nq.

    On entry, the (i+1)-th column (if uplo indicates upper) or i-th column (if uplo indicates lower) has the Householder vector v(i) as returned by HETRD.

  • lda[in]

    rocblas_int. lda >= nq.

    Leading dimension of A.

  • ipiv[in]

    pointer to type. Array on the GPU of dimension at least nq-1.

    The scalar factors of the Householder matrices H(i) as returned by HETRD.

  • C[inout]

    pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • ldc[in]

    rocblas_int. ldc >= m.

    Leading dimension of C.

LAPACK Functions

LAPACK routines solve complex Numerical Linear Algebra problems.

Triangular Factorizations

rocsolver_<type>potf2()

rocblas_status rocsolver_zpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_cpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_dpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_spotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)

POTF2 computes the Cholesky factorization of a real symmetric/complex Hermitian positive definite matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form:

A = U' * U, or
A = L  * L'

depending on the value of uplo. U is an upper triangular matrix and L is lower triangular.

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    The matrix dimensions.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix A to be factored. On exit, the lower or upper triangular factor.

  • lda[in]

    rocblas_int. lda >= n.

    specifies the leading dimension of A.

  • info[out]

    pointer to a rocblas_int on the GPU.

    If info = 0, successful factorization of matrix A. If info = i > 0, the leading minor of order i of A is not positive definite. The factorization stopped at this point.

rocsolver_<type>potf2_batched()

rocblas_status rocsolver_zpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_spotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)

POTF2_BATCHED computes the Cholesky factorization of a batch of real symmetric/complex Hermitian positive definite matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix A_i in the batch has the form:

A_i = U_i' * U_i, or
A_i = L_i  * L_i'

depending on the value of uplo. U_i is an upper triangular matrix and L_i is lower triangular.

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    The dimension of matrix A_i.

  • A[inout]

    array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_i to be factored. On exit, the upper or lower triangular factors.

  • lda[in]

    rocblas_int. lda >= n.

    specifies the leading dimension of A_i.

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful factorization of matrix A_i. If info_i = j > 0, the leading minor of order j of A_i is not positive definite. The i-th factorization stopped at this point.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>potf2_strided_batched()

rocblas_status rocsolver_zpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_spotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)

POTF2_STRIDED_BATCHED computes the Cholesky factorization of a batch of real symmetric/complex Hermitian positive definite matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix A_i in the batch has the form:

A_i = U_i' * U_i, or
A_i = L_i  * L_i'

depending on the value of uplo. U_i is an upper triangular matrix and L_i is lower triangular.

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    The dimension of matrix A_i.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_i to be factored. On exit, the upper or lower triangular factors.

  • lda[in]

    rocblas_int. lda >= n.

    specifies the leading dimension of A_i.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_i and the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful factorization of matrix A_i. If info_i = j > 0, the leading minor of order j of A_i is not positive definite. The i-th factorization stopped at this point.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>potrf()

rocblas_status rocsolver_zpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_cpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_dpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_spotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)

POTRF computes the Cholesky factorization of a real symmetric/complex Hermitian positive definite matrix A.

(This is the blocked version of the algorithm).

The factorization has the form:

A = U' * U, or
A = L  * L'

depending on the value of uplo. U is an upper triangular matrix and L is lower triangular.

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    The matrix dimensions.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix A to be factored. On exit, the lower or upper triangular factor.

  • lda[in]

    rocblas_int. lda >= n.

    specifies the leading dimension of A.

  • info[out]

    pointer to a rocblas_int on the GPU.

    If info = 0, successful factorization of matrix A. If info = i > 0, the leading minor of order i of A is not positive definite. The factorization stopped at this point.

rocsolver_<type>potrf_batched()

rocblas_status rocsolver_zpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_spotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)

POTRF_BATCHED computes the Cholesky factorization of a batch of real symmetric/complex Hermitian positive definite matrices.

(This is the blocked version of the algorithm).

The factorization of matrix A_i in the batch has the form:

A_i = U_i' * U_i, or
A_i = L_i  * L_i'

depending on the value of uplo. U_i is an upper triangular matrix and L_i is lower triangular.

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    The dimension of matrix A_i.

  • A[inout]

    array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_i to be factored. On exit, the upper or lower triangular factors.

  • lda[in]

    rocblas_int. lda >= n.

    specifies the leading dimension of A_i.

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful factorization of matrix A_i. If info_i = j > 0, the leading minor of order j of A_i is not positive definite. The i-th factorization stopped at this point.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>potrf_strided_batched()

rocblas_status rocsolver_zpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_spotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)

POTRF_STRIDED_BATCHED computes the Cholesky factorization of a batch of real symmetric/complex Hermitian positive definite matrices.

(This is the blocked version of the algorithm).

The factorization of matrix A_i in the batch has the form:

A_i = U_i' * U_i, or
A_i = L_i  * L_i'

depending on the value of uplo. U_i is an upper triangular matrix and L_i is lower triangular.

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    The dimension of matrix A_i.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_i to be factored. On exit, the upper or lower triangular factors.

  • lda[in]

    rocblas_int. lda >= n.

    specifies the leading dimension of A_i.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_i and the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful factorization of matrix A_i. If info_i = j > 0, the leading minor of order j of A_i is not positive definite. The i-th factorization stopped at this point.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getf2()

rocblas_status rocsolver_zgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_cgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_dgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_sgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)

GETF2 computes the LU factorization of a general m-by-n matrix A using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details see the section “tuning rocSOLVER performance” on the User’s guide).

The factorization has the form

A = P * L * U

where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[out]

    pointer to rocblas_int. Array on the GPU of dimension min(m,n).

    The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= i <= min(m,n), the row i of the matrix was interchanged with row ipiv[i]. Matrix P of the factorization can be derived from ipiv.

  • info[out]

    pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, U is singular. U(i,i) is the first zero pivot.

rocsolver_<type>getf2_batched()

rocblas_status rocsolver_zgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

GETF2_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details see the section “tuning rocSOLVER performance” on the User’s guide).

The factorization of matrix A_i in the batch has the form

A_i = P_i * L_i * U_i

where P_i is a permutation matrix, L_i is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U_i is upper triangular (upper trapezoidal if m < n).

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all matrices A_i in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all matrices A_i in the batch.

  • A[inout]

    array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_i to be factored. On exit, the factors L_i and U_i from the factorizations. The unit diagonal elements of L_i are not stored.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_i.

  • ipiv[out]

    pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors of pivot indices ipiv_i (corresponding to A_i). Dimension of ipiv_i is min(m,n). Elements of ipiv_i are 1-based indices. For each instance A_i in the batch and for 1 <= j <= min(m,n), the row j of the matrix A_i was interchanged with row ipiv_i[j]. Matrix P_i of the factorization can be derived from ipiv_i.

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful exit for factorization of A_i. If info_i = j > 0, U_i is singular. U_i(j,j) is the first zero pivot.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getf2_strided_batched()

rocblas_status rocsolver_zgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

GETF2_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details see the section “tuning rocSOLVER performance” on the User’s guide).

The factorization of matrix A_i in the batch has the form

A_i = P_i * L_i * U_i

where P_i is a permutation matrix, L_i is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U_i is upper triangular (upper trapezoidal if m < n).

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all matrices A_i in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all matrices A_i in the batch.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_i to be factored. On exit, the factors L_i and U_i from the factorization. The unit diagonal elements of L_i are not stored.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_i.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_i and the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • ipiv[out]

    pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors of pivots indices ipiv_i (corresponding to A_i). Dimension of ipiv_i is min(m,n). Elements of ipiv_i are 1-based indices. For each instance A_i in the batch and for 1 <= j <= min(m,n), the row j of the matrix A_i was interchanged with row ipiv_i[j]. Matrix P_i of the factorization can be derived from ipiv_i.

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful exit for factorization of A_i. If info_i = j > 0, U_i is singular. U_i(j,j) is the first zero pivot.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getrf()

rocblas_status rocsolver_zgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_cgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_dgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_sgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)

GETRF computes the LU factorization of a general m-by-n matrix A using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details see the section “tuning rocSOLVER performance” on the User’s guide).

The factorization has the form

A = P * L * U

where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[out]

    pointer to rocblas_int. Array on the GPU of dimension min(m,n).

    The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= i <= min(m,n), the row i of the matrix was interchanged with row ipiv[i]. Matrix P of the factorization can be derived from ipiv.

  • info[out]

    pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, U is singular. U(i,i) is the first zero pivot.

rocsolver_<type>getrf_batched()

rocblas_status rocsolver_zgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

GETRF_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details see the section “tuning rocSOLVER performance” on the User’s guide).

The factorization of matrix A_i in the batch has the form

A_i = P_i * L_i * U_i

where P_i is a permutation matrix, L_i is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U_i is upper triangular (upper trapezoidal if m < n).

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all matrices A_i in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all matrices A_i in the batch.

  • A[inout]

    array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_i to be factored. On exit, the factors L_i and U_i from the factorizations. The unit diagonal elements of L_i are not stored.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_i.

  • ipiv[out]

    pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors of pivot indices ipiv_i (corresponding to A_i). Dimension of ipiv_i is min(m,n). Elements of ipiv_i are 1-based indices. For each instance A_i in the batch and for 1 <= j <= min(m,n), the row j of the matrix A_i was interchanged with row ipiv_i(j). Matrix P_i of the factorization can be derived from ipiv_i.

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful exit for factorization of A_i. If info_i = j > 0, U_i is singular. U_i(j,j) is the first zero pivot.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getrf_strided_batched()

rocblas_status rocsolver_zgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

GETRF_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details see the section “tuning rocSOLVER performance” on the User’s guide).

The factorization of matrix A_i in the batch has the form

A_i = P_i * L_i * U_i

where P_i is a permutation matrix, L_i is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U_i is upper triangular (upper trapezoidal if m < n).

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all matrices A_i in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all matrices A_i in the batch.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_i to be factored. On exit, the factors L_i and U_i from the factorization. The unit diagonal elements of L_i are not stored.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_i.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_i and the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • ipiv[out]

    pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors of pivots indices ipiv_i (corresponding to A_i). Dimension of ipiv_i is min(m,n). Elements of ipiv_i are 1-based indices. For each instance A_i in the batch and for 1 <= j <= min(m,n), the row j of the matrix A_i was interchanged with row ipiv_i(j). Matrix P_i of the factorization can be derived from ipiv_i.

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful exit for factorization of A_i. If info_i = j > 0, U_i is singular. U_i(j,j) is the first zero pivot.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

Orthogonal Factorizations

rocsolver_<type>geqr2()

rocblas_status rocsolver_zgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GEQR2 computes a QR factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

A =  Q * [ R ]
         [ 0 ]

where R is upper triangular (upper trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q = H(1) * H(2) * ... * H(k), with k = min(m,n)

Each Householder matrix H(i), for i = 1,2,…,k, is given by

H(i) = I - ipiv[i-1] * v(i) * v(i)'

where the first i-1 elements of the Householder vector v(i) are zero, and v(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and above the diagonal contain the factor R; the elements below the diagonal are the m - i elements of vector v(i) for i = 1,2,…,min(m,n).

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[out]

    pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices H(i).

rocsolver_<type>geqr2_batched()

rocblas_status rocsolver_zgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQR2_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j =  Q_j * [ R_j ]
             [  0  ]

where R_j is upper triangular (upper trapezoidal if m < n), and Q_j is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... * H_j(k), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i) * v_j(i)'

where the first i-1 elements of Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • A[inout]

    Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the m - i elements of vector v_j(i) for i=1,2,…,min(m,n).

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • ipiv[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geqr2_strided_batched()

rocblas_status rocsolver_zgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQR2_STRIDED_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j =  Q_j * [ R_j ]
             [  0  ]

where R_j is upper triangular (upper trapezoidal if m < n), and Q_j is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... * H_j(k), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i) * v_j(i)'

where the first i-1 elements of Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the m - i elements of vector v_j(i) for i = 1,2,…,min(m,n).

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • ipiv[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geqrf()

rocblas_status rocsolver_zgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GEQRF computes a QR factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

A =  Q * [ R ]
         [ 0 ]

where R is upper triangular (upper trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q = H(1) * H(2) * ... * H(k), with k = min(m,n)

Each Householder matrix H(i), for i = 1,2,…,k, is given by

H(i) = I - ipiv[i-1] * v(i) * v(i)'

where the first i-1 elements of the Householder vector v(i) are zero, and v(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and above the diagonal contain the factor R; the elements below the diagonal are the m - i elements of vector v(i) for i = 1,2,…,min(m,n).

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[out]

    pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices H(i).

rocsolver_<type>geqrf_batched()

rocblas_status rocsolver_zgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQRF_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j =  Q_j * [ R_j ]
             [  0  ]

where R_j is upper triangular (upper trapezoidal if m < n), and Q_j is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... * H_j(k), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i) * v_j(i)'

where the first i-1 elements of vector Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • A[inout]

    Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the m - i elements of vector v_j(i) for i=1,2,…,min(m,n).

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • ipiv[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geqrf_strided_batched()

rocblas_status rocsolver_zgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQRF_STRIDED_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j =  Q_j * [ R_j ]
             [  0  ]

where R_j is upper triangular (upper trapezoidal if m < n), and Q_j is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... * H_j(k), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i) * v_j(i)'

where the first i-1 elements of vector Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the m - i elements of vector v_j(i) for i = 1,2,…,min(m,n).

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • ipiv[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geql2()

rocblas_status rocsolver_zgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GEQL2 computes a QL factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

A =  Q * [ 0 ]
         [ L ]

where L is lower triangular (lower trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q = H(k) * ... * H(2) * H(1), with k = min(m,n)

Each Householder matrix H(i), for i = 1,2,…,k, is given by

H(i) = I - ipiv[i-1] * v(i) * v(i)'

where the last m-i elements of the Householder vector v(i) are zero, and v(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and below the (m-n)th subdiagonal (when m >= n) or the (n-m)th superdiagonal (when n > m) contain the factor L; the elements above the sub/superdiagonal are the i - 1 elements of vector v(i) for i = 1,2,…,min(m,n).

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[out]

    pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices H(i).

rocsolver_<type>geql2_batched()

rocblas_status rocsolver_zgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQL2_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j =  Q_j * [  0  ]
             [ L_j ]

where L_j is lower triangular (lower trapezoidal if m < n), and Q_j is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(k) * ... * H_j(2) * H_j(1), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i) * v_j(i)'

where the last m-i elements of Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • A[inout]

    Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the (m-n)th subdiagonal (when m >= n) or the (n-m)th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the i - 1 elements of vector v_j(i) for i = 1,2,…,min(m,n).

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • ipiv[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geql2_strided_batched()

rocblas_status rocsolver_zgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQL2_STRIDED_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j =  Q_j * [  0  ]
             [ L_j ]

where L_j is lower triangular (lower trapezoidal if m < n), and Q_j is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(k) * ... * H_j(2) * H_j(1), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i) * v_j(i)'

where the last m-i elements of Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the (m-n)th subdiagonal (when m >= n) or the (n-m)th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the i - 1 elements of vector v_j(i) for i = 1,2,…,min(m,n).

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • ipiv[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geqlf()

rocblas_status rocsolver_zgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GEQLF computes a QL factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

A =  Q * [ 0 ]
         [ L ]

where L is lower triangular (lower trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q = H(k) * ... * H(2) * H(1), with k = min(m,n)

Each Householder matrix H(i), for i = 1,2,…,k, is given by

H(i) = I - ipiv[i-1] * v(i) * v(i)'

where the last m-i elements of the Householder vector v(i) are zero, and v(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and below the (m-n)th subdiagonal (when m >= n) or the (n-m)th superdiagonal (when n > m) contain the factor L; the elements above the sub/superdiagonal are the i - 1 elements of vector v(i) for i = 1,2,…,min(m,n).

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[out]

    pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices H(i).

rocsolver_<type>geqlf_batched()

rocblas_status rocsolver_zgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQLF_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j =  Q_j * [  0  ]
             [ L_j ]

where L_j is lower triangular (lower trapezoidal if m < n), and Q_j is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(k) * ... * H_j(2) * H_j(1), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i) * v_j(i)'

where the last m-i elements of vector Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • A[inout]

    Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the (m-n)th subdiagonal (when m >= n) or the (n-m)th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the i - 1 elements of vector v_j(i) for i = 1,2,…,min(m,n).

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • ipiv[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geqlf_strided_batched()

rocblas_status rocsolver_zgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQLF_STRIDED_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j =  Q_j * [  0  ]
             [ L_j ]

where L_j is lower triangular (lower trapezoidal if m < n), and Q_j is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(k) * ... * H_j(2) * H_j(1), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i) * v_j(i)'

where the last m-i elements of vector Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the (m-n)th subdiagonal (when m >= n) or the (n-m)th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the i - 1 elements of vector v_j(i) for i = 1,2,…,min(m,n).

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • ipiv[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gelq2()

rocblas_status rocsolver_zgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GELQ2 computes a LQ factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

A = [ L 0 ] * Q

where L is lower triangular (lower trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

Q = H(k) * H(k-1) * ... * H(1), with k = min(m,n)

Each Householder matrix H(i), for i = 1,2,…,k, is given by

H(i) = I - ipiv[i-1] * v(i)' * v(i)

where the first i-1 elements of the Householder vector v(i) are zero, and v(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and delow the diagonal contain the factor L; the elements above the diagonal are the n - i elements of vector v(i) for i = 1,2,…,min(m,n).

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[out]

    pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices H(i).

rocsolver_<type>gelq2_batched()

rocblas_status rocsolver_zgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GELQ2_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j = [ L_j 0 ] * Q_j

where L_j is lower triangular (lower trapezoidal if m > n), and Q_j is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(k) * H_j(k-1) * ... * H_j(1), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i)' * v_j(i)

where the first i-1 elements of Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • A[inout]

    Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the n - i elements of vector v_j(i) for i=1,2,…,min(m,n).

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • ipiv[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gelq2_strided_batched()

rocblas_status rocsolver_zgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GELQ2_STRIDED_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j = [ L_j 0 ] * Q_j

where L_j is lower triangular (lower trapezoidal if m > n), and Q_j is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(k) * H_j(k-1) * ... * H_j(1), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i)' * v_j(i)

where the first i-1 elements of vector Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the n - i elements of vector v_j(i) for i = 1,2,…,min(m,n).

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • ipiv[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gelqf()

rocblas_status rocsolver_zgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GELQF computes a LQ factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

A = [ L 0 ] * Q

where L is lower triangular (lower trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

Q = H(k) * H(k-1) * ... * H(1), with k = min(m,n)

Each Householder matrix H(i), for i = 1,2,…,k, is given by

H(i) = I - ipiv[i-1] * v(i)' * v(i)

where the first i-1 elements of the Householder vector v(i) are zero, and v(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and below the diagonal contain the factor L; the elements above the diagonal are the n - i elements of vector v(i) for i = 1,2,…,min(m,n).

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • ipiv[out]

    pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices H(i).

rocsolver_<type>gelqf_batched()

rocblas_status rocsolver_zgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GELQF_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j = [ L_j 0 ] * Q_j

where L_j is lower triangular (lower trapezoidal if m > n), and Q_j is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(k) * H_j(k-1) * ... * H_j(1), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i)' * v_j(i)

where the first i-1 elements of Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • A[inout]

    Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the n - i elements of vector v_j(i) for i=1,2,…,min(m,n).

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • ipiv[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gelqf_strided_batched()

rocblas_status rocsolver_zgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GELQF_STRIDED_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j = [ L_j 0 ] * Q_j

where L_j is lower triangular (lower trapezoidal if m > n), and Q_j is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(k) * H_j(k-1) * ... * H_j(1), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i)' * v_j(i)

where the first i-1 elements of vector Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the n - i elements of vector v_j(i) for i = 1,2,…,min(m,n).

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • ipiv[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

Problem and matrix reductions

rocsolver_<type>gebd2()

rocblas_status rocsolver_zgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tauq, rocblas_double_complex *taup)
rocblas_status rocsolver_cgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tauq, rocblas_float_complex *taup)
rocblas_status rocsolver_dgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *D, double *E, double *tauq, double *taup)
rocblas_status rocsolver_sgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *D, float *E, float *tauq, float *taup)

GEBD2 computes the bidiagonal form of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The bidiagonal form is given by:

B = Q' * A * P

where B is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q and P are orthogonal/unitary matrices represented as the product of Householder matrices

Q = H(1) * H(2) * ... *  H(n)  and P = G(1) * G(2) * ... * G(n-1), if m >= n, or
Q = H(1) * H(2) * ... * H(m-1) and P = G(1) * G(2) * ... *  G(m),  if m < n

Each Householder matrix H(i) and G(i) is given by

H(i) = I - tauq[i-1] * v(i) * v(i)', and
G(i) = I - taup[i-1] * u(i) * u(i)'

If m >= n, the first i-1 elements of the Householder vector v(i) are zero, and v(i)[i] = 1; while the first i elements of the Householder vector u(i) are zero, and u(i)[i+1] = 1. If m < n, the first i elements of the Householder vector v(i) are zero, and v(i)[i+1] = 1; while the first i-1 elements of the Householder vector u(i) are zero, and u(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B. If m >= n, the elements below the diagonal are the m - i elements of vector v(i) for i = 1,2,…,n, and the elements above the superdiagonal are the n - i - 1 elements of vector u(i) for i = 1,2,…,n-1. If m < n, the elements below the subdiagonal are the m - i - 1 elements of vector v(i) for i = 1,2,…,m-1, and the elements above the diagonal are the n - i elements of vector u(i) for i = 1,2,…,m.

  • lda[in]

    rocblas_int. lda >= m.

    specifies the leading dimension of A.

  • D[out]

    pointer to real type. Array on the GPU of dimension min(m,n).

    The diagonal elements of B.

  • E[out]

    pointer to real type. Array on the GPU of dimension min(m,n)-1.

    The off-diagonal elements of B.

  • tauq[out]

    pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices H(i).

  • taup[out]

    pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices G(i).

rocsolver_<type>gebd2_batched()

rocblas_status rocsolver_zgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tauq, const rocblas_stride strideQ, rocblas_double_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tauq, const rocblas_stride strideQ, rocblas_float_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tauq, const rocblas_stride strideQ, double *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tauq, const rocblas_stride strideQ, float *taup, const rocblas_stride strideP, const rocblas_int batch_count)

GEBD2_BATCHED computes the bidiagonal form of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The bidiagonal form is given by:

B_j = Q_j' * A_j * P_j

where B_j is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q_j and P_j are orthogonal/unitary matrices represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... *  H_j(n)  and P_j = G_j(1) * G_j(2) * ... * G_j(n-1), if m >= n, or
Q_j = H_j(1) * H_j(2) * ... * H_j(m-1) and P_j = G_j(1) * G_j(2) * ... *  G_j(m),  if m < n

Each Householder matrix H_j(i) and G_j(i), for j = 1,2,…,batch_count, is given by

H_j(i) = I - tauq_j[i-1] * v_j(i) * v_j(i)', and
G_j(i) = I - taup_j[i-1] * u_j(i) * u_j(i)'

If m >= n, the first i-1 elements of the Householder vector v_j(i) are zero, and v_j(i)[i] = 1; while the first i elements of the Householder vector u_j(i) are zero, and u_j(i)[i+1] = 1. If m < n, the first i elements of the Householder vector v_j(i) are zero, and v_j(i)[i+1] = 1; while the first i-1 elements of the Householder vector u_j(i) are zero, and u_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • A[inout]

    Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B_j. If m >= n, the elements below the diagonal are the m - i elements of vector v_j(i) for i = 1,2,…,n, and the elements above the superdiagonal are the n - i - 1 elements of vector u_j(i) for i = 1,2,…,n-1. If m < n, the elements below the subdiagonal are the m - i - 1 elements of vector v_j(i) for i = 1,2,…,m-1, and the elements above the diagonal are the n - i elements of vector u_j(i) for i = 1,2,…,m.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • D[out]

    pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of B_j.

  • strideD[in]

    rocblas_stride.

    Stride from the start of one vector D_j and the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= min(m,n).

  • E[out]

    pointer to real type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of B_j.

  • strideE[in]

    rocblas_stride.

    Stride from the start of one vector E_j and the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • tauq[out]

    pointer to type. Array on the GPU (the size depends on the value of strideQ).

    Contains the vectors tauq_j of scalar factors of the Householder matrices H_j(i).

  • strideQ[in]

    rocblas_stride.

    Stride from the start of one vector tauq_j to the next one tauq_(j+1). There is no restriction for the value of strideQ. Normal use is strideQ >= min(m,n).

  • taup[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors taup_j of scalar factors of the Householder matrices G_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector taup_j to the next one taup_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gebd2_strided_batched()

rocblas_status rocsolver_zgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tauq, const rocblas_stride strideQ, rocblas_double_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tauq, const rocblas_stride strideQ, rocblas_float_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tauq, const rocblas_stride strideQ, double *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tauq, const rocblas_stride strideQ, float *taup, const rocblas_stride strideP, const rocblas_int batch_count)

GEBD2_STRIDED_BATCHED computes the bidiagonal form of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The bidiagonal form is given by:

B_j = Q_j' * A_j * P_j

where B_j is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q_j and P_j are orthogonal/unitary matrices represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... *  H_j(n)  and P_j = G_j(1) * G_j(2) * ... * G_j(n-1), if m >= n, or
Q_j = H_j(1) * H_j(2) * ... * H_j(m-1) and P_j = G_j(1) * G_j(2) * ... *  G_j(m),  if m < n

Each Householder matrix H_j(i) and G_j(i), for j = 1,2,…,batch_count, is given by

H_j(i) = I - tauq_j[i-1] * v_j(i) * v_j(i)', and
G_j(i) = I - taup_j[i-1] * u_j(i) * u_j(i)'

If m >= n, the first i-1 elements of the Householder vector v_j(i) are zero, and v_j(i)[i] = 1; while the first i elements of the Householder vector u_j(i) are zero, and u_j(i)[i+1] = 1. If m < n, the first i elements of the Householder vector v_j(i) are zero, and v_j(i)[i+1] = 1; while the first i-1 elements of the Householder vector u_j(i) are zero, and u_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B_j. If m >= n, the elements below the diagonal are the m - i elements of vector v_j(i) for i = 1,2,…,n, and the elements above the superdiagonal are the n - i - 1 elements of vector u_j(i) for i = 1,2,…,n-1. If m < n, the elements below the subdiagonal are the m - i - 1 elements of vector v_j(i) for i = 1,2,…,m-1, and the elements above the diagonal are the n - i elements of vector u_j(i) for i = 1,2,…,m.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • D[out]

    pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of B_j.

  • strideD[in]

    rocblas_stride.

    Stride from the start of one vector D_j and the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= min(m,n).

  • E[out]

    pointer to real type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of B_j.

  • strideE[in]

    rocblas_stride.

    Stride from the start of one vector E_j and the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • tauq[out]

    pointer to type. Array on the GPU (the size depends on the value of strideQ).

    Contains the vectors tauq_j of scalar factors of the Householder matrices H_j(i).

  • strideQ[in]

    rocblas_stride.

    Stride from the start of one vector tauq_j to the next one tauq_(j+1). There is no restriction for the value of strideQ. Normal use is strideQ >= min(m,n).

  • taup[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors taup_j of scalar factors of the Householder matrices G_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector taup_j to the next one taup_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gebrd()

rocblas_status rocsolver_zgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tauq, rocblas_double_complex *taup)
rocblas_status rocsolver_cgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tauq, rocblas_float_complex *taup)
rocblas_status rocsolver_dgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *D, double *E, double *tauq, double *taup)
rocblas_status rocsolver_sgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *D, float *E, float *tauq, float *taup)

GEBRD computes the bidiagonal form of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The bidiagonal form is given by:

B = Q' * A * P

where B is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q and P are orthogonal/unitary matrices represented as the product of Householder matrices

Q = H(1) * H(2) * ... *  H(n)  and P = G(1) * G(2) * ... * G(n-1), if m >= n, or
Q = H(1) * H(2) * ... * H(m-1) and P = G(1) * G(2) * ... *  G(m),  if m < n

Each Householder matrix H(i) and G(i) is given by

H(i) = I - tauq[i-1] * v(i) * v(i)', and
G(i) = I - taup[i-1] * u(i) * u(i)'

If m >= n, the first i-1 elements of the Householder vector v(i) are zero, and v(i)[i] = 1; while the first i elements of the Householder vector u(i) are zero, and u(i)[i+1] = 1. If m < n, the first i elements of the Householder vector v(i) are zero, and v(i)[i+1] = 1; while the first i-1 elements of the Householder vector u(i) are zero, and u(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B. If m >= n, the elements below the diagonal are the m - i elements of vector v(i) for i = 1,2,…,n, and the elements above the superdiagonal are the n - i - 1 elements of vector u(i) for i = 1,2,…,n-1. If m < n, the elements below the subdiagonal are the m - i - 1 elements of vector v(i) for i = 1,2,…,m-1, and the elements above the diagonal are the n - i elements of vector u(i) for i = 1,2,…,m.

  • lda[in]

    rocblas_int. lda >= m.

    specifies the leading dimension of A.

  • D[out]

    pointer to real type. Array on the GPU of dimension min(m,n).

    The diagonal elements of B.

  • E[out]

    pointer to real type. Array on the GPU of dimension min(m,n)-1.

    The off-diagonal elements of B.

  • tauq[out]

    pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices H(i).

  • taup[out]

    pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices G(i).

rocsolver_<type>gebrd_batched()

rocblas_status rocsolver_zgebrd_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tauq, const rocblas_stride strideQ, rocblas_double_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgebrd_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tauq, const rocblas_stride strideQ, rocblas_float_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgebrd_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tauq, const rocblas_stride strideQ, double *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgebrd_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tauq, const rocblas_stride strideQ, float *taup, const rocblas_stride strideP, const rocblas_int batch_count)

GEBRD_BATCHED computes the bidiagonal form of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The bidiagonal form is given by:

B_j = Q_j' * A_j * P_j

where B_j is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q_j and P_j are orthogonal/unitary matrices represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... *  H_j(n)  and P_j = G_j(1) * G_j(2) * ... * G_j(n-1), if m >= n, or
Q_j = H_j(1) * H_j(2) * ... * H_j(m-1) and P_j = G_j(1) * G_j(2) * ... *  G_j(m),  if m < n

Each Householder matrix H_j(i) and G_j(i), for j = 1,2,…,batch_count, is given by

H_j(i) = I - tauq_j[i-1] * v_j(i) * v_j(i)', and
G_j(i) = I - taup_j[i-1] * u_j(i) * u_j(i)'

If m >= n, the first i-1 elements of the Householder vector v_j(i) are zero, and v_j(i)[i] = 1; while the first i elements of the Householder vector u_j(i) are zero, and u_j(i)[i+1] = 1. If m < n, the first i elements of the Householder vector v_j(i) are zero, and v_j(i)[i+1] = 1; while the first i-1 elements of the Householder vector u_j(i) are zero, and u_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • A[inout]

    Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B_j. If m >= n, the elements below the diagonal are the m - i elements of vector v_j(i) for i = 1,2,…,n, and the elements above the superdiagonal are the n - i - 1 elements of vector u_j(i) for i = 1,2,…,n-1. If m < n, the elements below the subdiagonal are the m - i - 1 elements of vector v_j(i) for i = 1,2,…,m-1, and the elements above the diagonal are the n - i elements of vector u_j(i) for i = 1,2,…,m.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • D[out]

    pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of B_j.

  • strideD[in]

    rocblas_stride.

    Stride from the start of one vector D_j and the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= min(m,n).

  • E[out]

    pointer to real type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of B_j.

  • strideE[in]

    rocblas_stride.

    Stride from the start of one vector E_j and the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • tauq[out]

    pointer to type. Array on the GPU (the size depends on the value of strideQ).

    Contains the vectors tauq_j of scalar factors of the Householder matrices H_j(i).

  • strideQ[in]

    rocblas_stride.

    Stride from the start of one vector tauq_j to the next one tauq_(j+1). There is no restriction for the value of strideQ. Normal use is strideQ >= min(m,n).

  • taup[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors taup_j of scalar factors of the Householder matrices G_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector taup_j to the next one taup_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gebrd_strided_batched()

rocblas_status rocsolver_zgebrd_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tauq, const rocblas_stride strideQ, rocblas_double_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgebrd_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tauq, const rocblas_stride strideQ, rocblas_float_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgebrd_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tauq, const rocblas_stride strideQ, double *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgebrd_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tauq, const rocblas_stride strideQ, float *taup, const rocblas_stride strideP, const rocblas_int batch_count)

GEBRD_STRIDED_BATCHED computes the bidiagonal form of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The bidiagonal form is given by:

B_j = Q_j' * A_j * P_j

where B_j is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q_j and P_j are orthogonal/unitary matrices represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... *  H_j(n)  and P_j = G_j(1) * G_j(2) * ... * G_j(n-1), if m >= n, or
Q_j = H_j(1) * H_j(2) * ... * H_j(m-1) and P_j = G_j(1) * G_j(2) * ... *  G_j(m),  if m < n

Each Householder matrix H_j(i) and G_j(i), for j = 1,2,…,batch_count, is given by

H_j(i) = I - tauq_j[i-1] * v_j(i) * v_j(i)', and
G_j(i) = I - taup_j[i-1] * u_j(i) * u_j(i)'

If m >= n, the first i-1 elements of the Householder vector v_j(i) are zero, and v_j(i)[i] = 1; while the first i elements of the Householder vector u_j(i) are zero, and u_j(i)[i+1] = 1. If m < n, the first i elements of the Householder vector v_j(i) are zero, and v_j(i)[i+1] = 1; while the first i-1 elements of the Householder vector u_j(i) are zero, and u_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B_j. If m >= n, the elements below the diagonal are the m - i elements of vector v_j(i) for i = 1,2,…,n, and the elements above the superdiagonal are the n - i - 1 elements of vector u_j(i) for i = 1,2,…,n-1. If m < n, the elements below the subdiagonal are the m - i - 1 elements of vector v_j(i) for i = 1,2,…,m-1, and the elements above the diagonal are the n - i elements of vector u_j(i) for i = 1,2,…,m.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • D[out]

    pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of B_j.

  • strideD[in]

    rocblas_stride.

    Stride from the start of one vector D_j and the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= min(m,n).

  • E[out]

    pointer to real type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of B_j.

  • strideE[in]

    rocblas_stride.

    Stride from the start of one vector E_j and the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • tauq[out]

    pointer to type. Array on the GPU (the size depends on the value of strideQ).

    Contains the vectors tauq_j of scalar factors of the Householder matrices H_j(i).

  • strideQ[in]

    rocblas_stride.

    Stride from the start of one vector tauq_j to the next one tauq_(j+1). There is no restriction for the value of strideQ. Normal use is strideQ >= min(m,n).

  • taup[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors taup_j of scalar factors of the Householder matrices G_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector taup_j to the next one taup_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sytd2()

rocblas_status rocsolver_dsytd2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, double *D, double *E, double *tau)
rocblas_status rocsolver_ssytd2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, float *D, float *E, float *tau)

SYTD2 computes the tridiagonal form of a real symmetric matrix A.

(This is the unblocked version of the algorithm).

The tridiagonal form is given by:

T = Q' * A * Q

where T is symmetric tridiagonal and Q is an orthogonal matrix represented as the product of Householder matrices

Q = H(1) * H(2) * ... *  H(n-1) if uplo indicates lower, or
Q = H(n-1) * H(n-2) * ... * H(1) if uplo indicates upper.

Each Householder matrix H(i) is given by

H(i) = I - tau[i] * v(i) * v(i)'

where tau[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector v(i) are zero, and v(i)[i+1] = 1. If uplo indicates upper, the last n-i elements of the Householder vector v(i) are zero, and v(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of the matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T; the elements above the superdiagonal contain the i-1 non-zero elements of vectors v(i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T; the elements below the subdiagonal contain the n-i-1 non-zero elements of vectors v(i) stored as columns.

  • lda[in]

    rocblas_int. lda >= n.

    specifies the leading dimension of A.

  • D[out]

    pointer to type. Array on the GPU of dimension n.

    The diagonal elements of T.

  • E[out]

    pointer to type. Array on the GPU of dimension n-1.

    The off-diagonal elements of T.

  • tau[out]

    pointer to type. Array on the GPU of dimension n-1.

    The scalar factors of the Householder matrices H(i).

rocsolver_<type>sytd2_batched()

rocblas_status rocsolver_dsytd2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tau, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_ssytd2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tau, const rocblas_stride strideP, const rocblas_int batch_count)

SYTD2_BATCHED computes the tridiagonal form of a batch of real symmetric matrices A_j.

(This is the unblocked version of the algorithm).

The tridiagonal form of A_j is given by:

T_j = Q_j' * A_j * Q_j, for j = 1,2,...,batch_count

where T_j is symmetric tridiagonal and Q_j is an orthogonal matrix represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... *  H_j(n-1) if uplo indicates lower, or
Q_j = H_j(n-1) * H_j(n-2) * ... * H_j(1) if uplo indicates upper.

Each Householder matrix H_j(i) is given by

H_j(i) = I - tau_j[i] * v_j(i) * v_j(i)'

where tau_j[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector v_j(i) are zero, and v_j(i)[i+1] = 1. If uplo indicates upper, the last n-i elements of the Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrix A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of the matrices A_j.

  • A[inout]

    array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_j to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal of A_j contain the tridiagonal form T_j; the elements above the superdiagonal contain the i-1 non-zero elements of vectors v_j(i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_j; the elements below the subdiagonal contain the n-i-1 non-zero elements of vectors v_j(i) stored as columns.

  • lda[in]

    rocblas_int. lda >= n.

    specifies the leading dimension of A_j.

  • D[out]

    pointer to type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of T_j.

  • strideD[in]

    rocblas_stride.

    Stride from the start of one vector D_j and the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • E[out]

    pointer to type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of T_j.

  • strideE[in]

    rocblas_stride.

    Stride from the start of one vector E_j and the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • tau[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors tau_j of scalar factors of the Householder matrices H_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector tau_j to the next one tau_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sytd2_strided_batched()

rocblas_status rocsolver_dsytd2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tau, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_ssytd2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tau, const rocblas_stride strideP, const rocblas_int batch_count)

SYTD2_STRIDED_BATCHED computes the tridiagonal form of a batch of real symmetric matrices A_j.

(This is the unblocked version of the algorithm).

The tridiagonal form of A_j is given by:

T_j = Q_j' * A_j * Q_j, for j = 1,2,...,batch_count

where T_j is symmetric tridiagonal and Q_j is an orthogonal matrix represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... *  H_j(n-1) if uplo indicates lower, or
Q_j = H_j(n-1) * H_j(n-2) * ... * H_j(1) if uplo indicates upper.

Each Householder matrix H_j(i) is given by

H_j(i) = I - tau_j[i] * v_j(i) * v_j(i)'

where tau_j[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector v_j(i) are zero, and v_j(i)[i+1] = 1. If uplo indicates upper, the last n-i elements of the Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrix A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of the matrices A_j.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_j to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal of A_j contain the tridiagonal form T_j; the elements above the superdiagonal contain the i-1 non-zero elements of vectors v_j(i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_j; the elements below the subdiagonal contain the n-i-1 non-zero elements of vectors v_j(i) stored as columns.

  • lda[in]

    rocblas_int. lda >= n.

    specifies the leading dimension of A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • D[out]

    pointer to type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of T_j.

  • strideD[in]

    rocblas_stride.

    Stride from the start of one vector D_j and the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • E[out]

    pointer to type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of T_j.

  • strideE[in]

    rocblas_stride.

    Stride from the start of one vector E_j and the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • tau[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors tau_j of scalar factors of the Householder matrices H_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector tau_j to the next one tau_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>hetd2()

rocblas_status rocsolver_zhetd2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tau)
rocblas_status rocsolver_chetd2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tau)

HETD2 computes the tridiagonal form of a complex hermitian matrix A.

(This is the unblocked version of the algorithm).

The tridiagonal form is given by:

T = Q' * A * Q

where T is hermitian tridiagonal and Q is an unitary matrix represented as the product of Householder matrices

Q = H(1) * H(2) * ... *  H(n-1) if uplo indicates lower, or
Q = H(n-1) * H(n-2) * ... * H(1) if uplo indicates upper.

Each Householder matrix H(i) is given by

H(i) = I - tau[i] * v(i) * v(i)'

where tau[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector v(i) are zero, and v(i)[i+1] = 1. If uplo indicates upper, the last n-i elements of the Householder vector v(i) are zero, and v(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower part of the hermitian matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of the matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T; the elements above the superdiagonal contain the i-1 non-zero elements of vectors v(i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T; the elements below the subdiagonal contain the n-i-1 non-zero elements of vectors v(i) stored as columns.

  • lda[in]

    rocblas_int. lda >= n.

    specifies the leading dimension of A.

  • D[out]

    pointer to real type. Array on the GPU of dimension n.

    The diagonal elements of T.

  • E[out]

    pointer to real type. Array on the GPU of dimension n-1.

    The off-diagonal elements of T.

  • tau[out]

    pointer to type. Array on the GPU of dimension n-1.

    The scalar factors of the Householder matrices H(i).

rocsolver_<type>hetd2_batched()

rocblas_status rocsolver_zhetd2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tau, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_chetd2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tau, const rocblas_stride strideP, const rocblas_int batch_count)

HETD2_BATCHED computes the tridiagonal form of a batch of complex hermitian matrices A_j.

(This is the unblocked version of the algorithm).

The tridiagonal form of A_j is given by:

T_j = Q_j' * A_j * Q_j, for j = 1,2,...,batch_count

where T_j is hermitian tridiagonal and Q_j is a unitary matrix represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... *  H_j(n-1) if uplo indicates lower, or
Q_j = H_j(n-1) * H_j(n-2) * ... * H_j(1) if uplo indicates upper.

Each Householder matrix H_j(i) is given by

H_j(i) = I - tau_j[i] * v_j(i) * v_j(i)'

where tau_j[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector v_j(i) are zero, and v_j(i)[i+1] = 1. If uplo indicates upper, the last n-i elements of the Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower part of the hermitian matrix A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of the matrices A_j.

  • A[inout]

    array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_j to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal of A_j contain the tridiagonal form T_j; the elements above the superdiagonal contain the i-1 non-zero elements of vectors v_j(i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_j; the elements below the subdiagonal contain the n-i-1 non-zero elements of vectors v_j(i) stored as columns.

  • lda[in]

    rocblas_int. lda >= n.

    specifies the leading dimension of A_j.

  • D[out]

    pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of T_j.

  • strideD[in]

    rocblas_stride.

    Stride from the start of one vector D_j and the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • E[out]

    pointer to real type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of T_j.

  • strideE[in]

    rocblas_stride.

    Stride from the start of one vector E_j and the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • tau[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors tau_j of scalar factors of the Householder matrices H_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector tau_j to the next one tau_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>hetd2_strided_batched()

rocblas_status rocsolver_zhetd2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tau, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_chetd2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tau, const rocblas_stride strideP, const rocblas_int batch_count)

HETD2_STRIDED_BATCHED computes the tridiagonal form of a batch of complex hermitian matrices A_j.

(This is the unblocked version of the algorithm).

The tridiagonal form of A_j is given by:

T_j = Q_j' * A_j * Q_j, for j = 1,2,...,batch_count

where T_j is hermitian tridiagonal and Q_j is a unitary matrix represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... *  H_j(n-1) if uplo indicates lower, or
Q_j = H_j(n-1) * H_j(n-2) * ... * H_j(1) if uplo indicates upper.

Each Householder matrix H_j(i) is given by

H_j(i) = I - tau_j[i] * v_j(i) * v_j(i)'

where tau_j[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector v_j(i) are zero, and v_j(i)[i+1] = 1. If uplo indicates upper, the last n-i elements of the Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower part of the hermitian matrix A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of the matrices A_j.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_j to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal of A_j contain the tridiagonal form T_j; the elements above the superdiagonal contain the i-1 non-zero elements of vectors v_j(i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_j; the elements below the subdiagonal contain the n-i-1 non-zero elements of vectors v_j(i) stored as columns.

  • lda[in]

    rocblas_int. lda >= n.

    specifies the leading dimension of A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • D[out]

    pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of T_j.

  • strideD[in]

    rocblas_stride.

    Stride from the start of one vector D_j and the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • E[out]

    pointer to real type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of T_j.

  • strideE[in]

    rocblas_stride.

    Stride from the start of one vector E_j and the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • tau[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors tau_j of scalar factors of the Householder matrices H_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector tau_j to the next one tau_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sytrd()

rocblas_status rocsolver_dsytrd(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, double *D, double *E, double *tau)
rocblas_status rocsolver_ssytrd(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, float *D, float *E, float *tau)

SYTRD computes the tridiagonal form of a real symmetric matrix A.

(This is the blocked version of the algorithm).

The tridiagonal form is given by:

T = Q' * A * Q

where T is symmetric tridiagonal and Q is an orthogonal matrix represented as the product of Householder matrices

Q = H(1) * H(2) * ... *  H(n-1) if uplo indicates lower, or
Q = H(n-1) * H(n-2) * ... * H(1) if uplo indicates upper.

Each Householder matrix H(i) is given by

H(i) = I - tau[i] * v(i) * v(i)'

where tau[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector v(i) are zero, and v(i)[i+1] = 1. If uplo indicates upper, the last n-i elements of the Householder vector v(i) are zero, and v(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of the matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T; the elements above the superdiagonal contain the i-1 non-zero elements of vectors v(i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T; the elements below the subdiagonal contain the n-i-1 non-zero elements of vectors v(i) stored as columns.

  • lda[in]

    rocblas_int. lda >= n.

    specifies the leading dimension of A.

  • D[out]

    pointer to type. Array on the GPU of dimension n.

    The diagonal elements of T.

  • E[out]

    pointer to type. Array on the GPU of dimension n-1.

    The off-diagonal elements of T.

  • tau[out]

    pointer to type. Array on the GPU of dimension n-1.

    The scalar factors of the Householder matrices H(i).

rocsolver_<type>sytrd_batched()

rocblas_status rocsolver_dsytrd_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tau, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_ssytrd_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tau, const rocblas_stride strideP, const rocblas_int batch_count)

SYTRD_BATCHED computes the tridiagonal form of a batch of real symmetric matrices A_j.

(This is the blocked version of the algorithm).

The tridiagonal form of A_j is given by:

T_j = Q_j' * A_j * Q_j, for j = 1,2,...,batch_count

where T_j is symmetric tridiagonal and Q_j is an orthogonal matrix represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... *  H_j(n-1) if uplo indicates lower, or
Q_j = H_j(n-1) * H_j(n-2) * ... * H_j(1) if uplo indicates upper.

Each Householder matrix H_j(i) is given by

H_j(i) = I - tau_j[i] * v_j(i) * v_j(i)'

where tau_j[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector v_j(i) are zero, and v_j(i)[i+1] = 1. If uplo indicates upper, the last n-i elements of the Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrix A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of the matrices A_j.

  • A[inout]

    array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_j to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal of A_j contain the tridiagonal form T_j; the elements above the superdiagonal contain the i-1 non-zero elements of vectors v_j(i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_j; the elements below the subdiagonal contain the n-i-1 non-zero elements of vectors v_j(i) stored as columns.

  • lda[in]

    rocblas_int. lda >= n.

    specifies the leading dimension of A_j.

  • D[out]

    pointer to type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of T_j.

  • strideD[in]

    rocblas_stride.

    Stride from the start of one vector D_j and the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • E[out]

    pointer to type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of T_j.

  • strideE[in]

    rocblas_stride.

    Stride from the start of one vector E_j and the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • tau[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors tau_j of scalar factors of the Householder matrices H_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector tau_j to the next one tau_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sytrd_strided_batched()

rocblas_status rocsolver_dsytrd_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tau, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_ssytrd_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tau, const rocblas_stride strideP, const rocblas_int batch_count)

SYTRD_STRIDED_BATCHED computes the tridiagonal form of a batch of real symmetric matrices A_j.

(This is the blocked version of the algorithm).

The tridiagonal form of A_j is given by:

T_j = Q_j' * A_j * Q_j, for j = 1,2,...,batch_count

where T_j is symmetric tridiagonal and Q_j is an orthogonal matrix represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... *  H_j(n-1) if uplo indicates lower, or
Q_j = H_j(n-1) * H_j(n-2) * ... * H_j(1) if uplo indicates upper.

Each Householder matrix H_j(i) is given by

H_j(i) = I - tau_j[i] * v_j(i) * v_j(i)'

where tau_j[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector v_j(i) are zero, and v_j(i)[i+1] = 1. If uplo indicates upper, the last n-i elements of the Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrix A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of the matrices A_j.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_j to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal of A_j contain the tridiagonal form T_j; the elements above the superdiagonal contain the i-1 non-zero elements of vectors v_j(i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_j; the elements below the subdiagonal contain the n-i-1 non-zero elements of vectors v_j(i) stored as columns.

  • lda[in]

    rocblas_int. lda >= n.

    specifies the leading dimension of A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • D[out]

    pointer to type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of T_j.

  • strideD[in]

    rocblas_stride.

    Stride from the start of one vector D_j and the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • E[out]

    pointer to type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of T_j.

  • strideE[in]

    rocblas_stride.

    Stride from the start of one vector E_j and the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • tau[out]

    pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors tau_j of scalar factors of the Householder matrices H_j(i).

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector tau_j to the next one tau_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>hetrd()

rocblas_status rocsolver_zhetrd(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tau)
rocblas_status rocsolver_chetrd(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tau)

HETRD computes the tridiagonal form of a complex hermitian matrix A.

(This is the blocked version of the algorithm).

The tridiagonal form is given by:

T = Q' * A * Q

where T is hermitian tridiagonal and Q is an unitary matrix represented as the product of Householder matrices

Q = H(1) * H(2) * ... *  H(n-1) if uplo indicates lower, or
Q = H(n-1) * H(n-2) * ... * H(1) if uplo indicates upper.

Each Householder matrix H(i) is given by

H(i) = I - tau[i] * v(i) * v(i)'

where tau[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector v(i) are zero, and v(i)[i+1] = 1. If uplo indicates upper, the last n-i elements of the Householder vector v(i) are zero, and v(i)[i] = 1.

Parameters
  • handle[in] rocblas_handle.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower part of the hermitian matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of the matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T; the elements above the superdiagonal contain the i-1 non-zero elements of vectors v(i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T; the elements below the subdiagonal contain the n-i-1 non-zero elements of vectors v(i) stored as columns.

  • lda[in]

    rocblas_int. lda >= n.

    specifies the leading dimension of A.

  • D[out]

    pointer to real type. Array on the GPU of dimension n.

    The diagonal elements of T.

  • E[out]

    pointer to real type. Array on the GPU of dimension n-1.

    The off-diagonal elements of T.

  • tau[out]

    pointer to type. Array on the GPU of dimension n-1.

    The scalar factors of the Householder matrices H(i).

rocsolver_<type>hetrd_batched()

rocblas_status rocsolver_zhetrd_batched(rocblas_handle handle, const