rocSOLVER API

This section provides details of the rocSOLVER library API as in last ROCm release.

Types

rocSOLVER Types

Most rocSOLVER types are aliases of rocBLAS types. See the rocBLAS types.

rocsolver_int

typedef rocblas_int rocsolver_int

Deprecated since version 3.5: Use rocblas_int.

rocsolver_handle

typedef rocblas_handle rocsolver_handle

Deprecated since version 3.5: Use rocblas_handle.

rocsolver_direction

typedef rocblas_direct rocsolver_direction

Deprecated since version 3.5: Use rocblas_direct.

rocsolver_storev

typedef rocblas_storev rocsolver_storev

Deprecated since version 3.5: Use rocblas_storev.

rocsolver_operation

typedef rocblas_operation rocsolver_operation

Deprecated since version 3.5: Use rocblas_operation.

rocsolver_fill

typedef rocblas_fill rocsolver_fill

Deprecated since version 3.5: Use rocblas_fill.

rocsolver_diagonal

typedef rocblas_diagonal rocsolver_diagonal

Deprecated since version 3.5: Use rocblas_diagonal.

rocsolver_side

typedef rocblas_side rocsolver_side

Deprecated since version 3.5: Use rocblas_side.

rocsolver_status

typedef rocblas_status rocsolver_status

Deprecated since version 3.5: Use rocblas_status.

Additional Types

rocblas_direct

enum rocblas_direct

Used to specify the order in which multiple elementary matrices are applied together.

Values:

enumerator rocblas_forward_direction

Elementary matrices applied from the right.

enumerator rocblas_backward_direction

Elementary matrices applied from the left.

rocblas_storev

enum rocblas_storev

Used to specify how householder vectors are stored in a matrix of vectors.

Values:

enumerator rocblas_column_wise

Householder vectors are stored in the columns of a matrix.

enumerator rocblas_row_wise

Householder vectors are stored in the rows of a matrix.

rocblas_svect

enum rocblas_svect

Used to specify how the singular vectors are to be computed and stored.

Values:

enumerator rocblas_svect_all

The entire associated orthogonal/unitary matrix is computed.

enumerator rocblas_svect_singular

Only the singular vectors are computed and stored in output array.

enumerator rocblas_svect_overwrite

Only the singular vectors are computed and overwrite the input matrix.

enumerator rocblas_svect_none

No singular vectors are computed.

rocblas_workmode

enum rocblas_workmode

Used to enable the use of fast algorithms (with out-of-place computations) in some of the routines.

Values:

enumerator rocblas_outofplace

Out-of-place computations are allowed; this requires enough free memory.

enumerator rocblas_inplace

When not enough memory, this forces in-place computations

LAPACK Auxiliary Functions

These are functions that support more advanced LAPACK routines.

Complex vector manipulations

rocsolver_<type>lacgv()

rocblas_status rocsolver_zlacgv(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *x, const rocblas_int incx)
rocblas_status rocsolver_clacgv(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *x, const rocblas_int incx)

LACGV conjugates the complex vector x.

It conjugates the n entries of a complex vector x with increment incx.

Parameters
  • [in] handle: rocblas_handle

  • [in] n: rocblas_int. n >= 0.

    The number of entries of the vector x.

  • [inout] x: pointer to type. Array on the GPU of size at least n.

    On input it is the vector x, on output it is overwritten with vector conjg(x).

  • [in] incx: rocblas_int. incx != 0.

    The increment between consecutive elements of x. If incx is negative, the elements of x are indexed in reverse order.

Matrix permutations and manipulations

rocsolver_<type>laswp()

rocblas_status rocsolver_zlaswp(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_int k1, const rocblas_int k2, const rocblas_int *ipiv, const rocblas_int incx)
rocblas_status rocsolver_claswp(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_int k1, const rocblas_int k2, const rocblas_int *ipiv, const rocblas_int incx)
rocblas_status rocsolver_dlaswp(rocblas_handle handle, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_int k1, const rocblas_int k2, const rocblas_int *ipiv, const rocblas_int incx)
rocblas_status rocsolver_slaswp(rocblas_handle handle, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_int k1, const rocblas_int k2, const rocblas_int *ipiv, const rocblas_int incx)

LASWP performs a series of row interchanges on the matrix A.

It interchanges row I with row IPIV[k1 + (I - k1) * abs(inx)], for each of rows K1 through K2 of A. k1 and k2 are 1-based indices.

Parameters
  • [in] handle: rocblas_handle

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix of column dimension n to which the row interchanges will be applied. On exit, the permuted matrix.

  • [in] lda: rocblas_int. lda > 0.

    The leading dimension of the array A.

  • [in] k1: rocblas_int. k1 > 0.

    The first element of IPIV for which a row interchange will be done. This is a 1-based index.

  • [in] k2: rocblas_int. k2 > k1 > 0.

    (K2-K1+1) is the number of elements of IPIV for which a row interchange will be done. This is a 1-based index.

  • [in] ipiv: pointer to rocblas_int. Array on the GPU of dimension at least k1 + (k2 - k1) * abs(incx).

    The vector of pivot indices. Only the elements in positions k1 through (k1 + (k2 - k1) * abs(incx)) of IPIV are accessed. Elements of ipiv are considered 1-based.

  • [in] incx: rocblas_int. incx != 0.

    The increment between successive values of IPIV. If IPIV is negative, the pivots are applied in reverse order.

Householder reflexions

rocsolver_<type>larfg()

rocblas_status rocsolver_zlarfg(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *alpha, rocblas_double_complex *x, const rocblas_int incx, rocblas_double_complex *tau)
rocblas_status rocsolver_clarfg(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *alpha, rocblas_float_complex *x, const rocblas_int incx, rocblas_float_complex *tau)
rocblas_status rocsolver_dlarfg(rocblas_handle handle, const rocblas_int n, double *alpha, double *x, const rocblas_int incx, double *tau)
rocblas_status rocsolver_slarfg(rocblas_handle handle, const rocblas_int n, float *alpha, float *x, const rocblas_int incx, float *tau)

LARFG generates an orthogonal Householder reflector H of order n.

Householder reflector H is such that

H * [alpha] = [beta]
    [  x  ]   [  0 ]

where x is an n-1 vector and alpha and beta are scalars. Matrix H can be generated as

H = I - tau * [1] * [1 v']
              [v]

with v an n-1 vector and tau a scalar.

Parameters
  • [in] handle: rocblas_handle

  • [in] n: rocblas_int. n >= 0.

    The order (size) of reflector H.

  • [inout] alpha: pointer to type. A scalar on the GPU.

    On input the scalar alpha, on output it is overwritten with beta.

  • [inout] x: pointer to type. Array on the GPU of size at least n-1.

    On input it is the vector x, on output it is overwritten with vector v.

  • [in] incx: rocblas_int. incx > 0.

    The increment between consecutive elements of x.

  • [out] tau: pointer to type. A scalar on the GPU.

    The scalar tau.

rocsolver_<type>larft()

rocblas_status rocsolver_zlarft(rocblas_handle handle, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int n, const rocblas_int k, rocblas_double_complex *V, const rocblas_int ldv, rocblas_double_complex *tau, rocblas_double_complex *T, const rocblas_int ldt)
rocblas_status rocsolver_clarft(rocblas_handle handle, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int n, const rocblas_int k, rocblas_float_complex *V, const rocblas_int ldv, rocblas_float_complex *tau, rocblas_float_complex *T, const rocblas_int ldt)
rocblas_status rocsolver_dlarft(rocblas_handle handle, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int n, const rocblas_int k, double *V, const rocblas_int ldv, double *tau, double *T, const rocblas_int ldt)
rocblas_status rocsolver_slarft(rocblas_handle handle, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int n, const rocblas_int k, float *V, const rocblas_int ldv, float *tau, float *T, const rocblas_int ldt)

LARFT Generates the triangular factor T of a block reflector H of order n.

The block reflector H is defined as the product of k Householder matrices as

H = H(1) * H(2) * ... * H(k)  (forward direction), or
H = H(k) * ... * H(2) * H(1)  (backward direction)

depending on the value of direct.

The triangular matrix T is upper triangular in forward direction and lower triangular in backward direction. If storev is column-wise, then

H = I - V * T * V'

where the i-th column of matrix V contains the Householder vector associated to H(i). If storev is row-wise, then

H = I - V' * T * V

where the i-th row of matrix V contains the Householder vector associated to H(i).

Parameters
  • [in] handle: rocblas_handle.

  • [in] direct: rocblas_direct.

    Specifies the direction in which the Householder matrices are applied.

  • [in] storev: rocblas_storev.

    Specifies how the Householder vectors are stored in matrix V.

  • [in] n: rocblas_int. n >= 0.

    The order (size) of the block reflector.

  • [in] k: rocsovler_int. k >= 1.

    The number of Householder matrices.

  • [in] V: pointer to type. Array on the GPU of size ldv*k if column-wise, or ldv*n if row-wise.

    The matrix of Householder vectors.

  • [in] ldv: rocblas_int. ldv >= n if column-wise, or ldv >= k if row-wise.

    Leading dimension of V.

  • [in] tau: pointer to type. Array of k scalars on the GPU.

    The vector of all the scalars associated to the Householder matrices.

  • [out] T: pointer to type. Array on the GPU of dimension ldt*k.

    The triangular factor. T is upper triangular is forward operation, otherwise it is lower triangular. The rest of the array is not used.

  • [in] ldt: rocblas_int. ldt >= k.

    The leading dimension of T.

rocsolver_<type>larf()

rocblas_status rocsolver_zlarf(rocblas_handle handle, const rocblas_side side, const rocblas_int m, const rocblas_int n, rocblas_double_complex *x, const rocblas_int incx, const rocblas_double_complex *alpha, rocblas_double_complex *A, const rocblas_int lda)
rocblas_status rocsolver_clarf(rocblas_handle handle, const rocblas_side side, const rocblas_int m, const rocblas_int n, rocblas_float_complex *x, const rocblas_int incx, const rocblas_float_complex *alpha, rocblas_float_complex *A, const rocblas_int lda)
rocblas_status rocsolver_dlarf(rocblas_handle handle, const rocblas_side side, const rocblas_int m, const rocblas_int n, double *x, const rocblas_int incx, const double *alpha, double *A, const rocblas_int lda)
rocblas_status rocsolver_slarf(rocblas_handle handle, const rocblas_side side, const rocblas_int m, const rocblas_int n, float *x, const rocblas_int incx, const float *alpha, float *A, const rocblas_int lda)

LARF applies a Householder reflector H to a general matrix A.

The Householder reflector H, of order m (or n), is to be applied to a m-by-n matrix A from the left (or the right). H is given by

H = I - alpha * x * x'

where alpha is a scalar and x a Householder vector. H is never actually computed.

Parameters
  • [in] handle: rocblas_handle.

  • [in] side: rocblas_side.

    If side = rocblas_side_left, then compute H*A If side = rocblas_side_right, then compute A*H

  • [in] m: rocblas_int. m >= 0.

    Number of rows of A.

  • [in] n: rocblas_int. n >= 0.

    Number of columns of A.

  • [in] x: pointer to type. Array on the GPU of size at least (1 + (m-1)*abs(incx)) if left side, or at least (1 + (n-1)*abs(incx)) if right side.

    The Householder vector x.

  • [in] incx: rocblas_int. incx != 0.

    Increment between to consecutive elements of x. If incx < 0, the elements of x are used in reverse order.

  • [in] alpha: pointer to type. A scalar on the GPU.

    If alpha = 0, then H = I (A will remain the same, x is never used)

  • [inout] A: pointer to type. Array on the GPU of size lda*n.

    On input, the matrix A. On output it is overwritten with H*A (or A*H).

  • [in] lda: rocblas_int. lda >= m.

    Leading dimension of A.

rocsolver_<type>larfb()

rocblas_status rocsolver_zlarfb(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *V, const rocblas_int ldv, rocblas_double_complex *T, const rocblas_int ldt, rocblas_double_complex *A, const rocblas_int lda)
rocblas_status rocsolver_clarfb(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *V, const rocblas_int ldv, rocblas_float_complex *T, const rocblas_int ldt, rocblas_float_complex *A, const rocblas_int lda)
rocblas_status rocsolver_dlarfb(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *V, const rocblas_int ldv, double *T, const rocblas_int ldt, double *A, const rocblas_int lda)
rocblas_status rocsolver_slarfb(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_direct direct, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *V, const rocblas_int ldv, float *T, const rocblas_int ldt, float *A, const rocblas_int lda)

LARFB applies a block reflector H to a general m-by-n matrix A.

The block reflector H is applied in one of the following forms, depending on the values of side and trans:

H  * A  (No transpose from the left)
H' * A  (Transpose or conjugate transpose from the left)
A * H   (No transpose from the right), and
A * H'  (Transpose or conjugate transpose from the right)

The block reflector H is defined as the product of k Householder matrices as

H = H(1) * H(2) * ... * H(k)  (forward direction), or
H = H(k) * ... * H(2) * H(1)  (backward direction)

depending on the value of direct. H is never stored. It is calculated as

H = I - V * T * V'

where the i-th column of matrix V contains the Householder vector associated with H(i), if storev is column-wise; or

H = I - V' * T * V

where the i-th row of matrix V contains the Householder vector associated with H(i), if storev is row-wise. T is the associated triangular factor as computed by LARFT.

Parameters
  • [in] handle: rocblas_handle.

  • [in] side: rocblas_side.

    Specifies from which side to apply H.

  • [in] trans: rocblas_operation.

    Specifies whether the block reflector or its transpose/conjugate transpose is to be applied.

  • [in] direct: rocblas_direct.

    Specifies the direction in which the Householder matrices were to be applied to generate H.

  • [in] storev: rocblas_storev.

    Specifies how the Householder vectors are stored in matrix V.

  • [in] m: rocblas_int. m >= 0.

    Number of rows of matrix A.

  • [in] n: rocblas_int. n >= 0.

    Number of columns of matrix A.

  • [in] k: rocsovler_int. k >= 1.

    The number of Householder matrices.

  • [in] V: pointer to type. Array on the GPU of size ldv*k if column-wise, ldv*n if row-wise and applying from the right, or ldv*m if row-wise and applying from the left.

    The matrix of Householder vectors.

  • [in] ldv: rocblas_int. ldv >= k if row-wise, ldv >= m if column-wise and applying from the left, or ldv >= n if column-wise and applying from the right.

    Leading dimension of V.

  • [in] T: pointer to type. Array on the GPU of dimension ldt*k.

    The triangular factor of the block reflector.

  • [in] ldt: rocblas_int. ldt >= k.

    The leading dimension of T.

  • [inout] A: pointer to type. Array on the GPU of size lda*n.

    On input, the matrix A. On output it is overwritten with H*A, A*H, H’*A, or A*H’.

  • [in] lda: rocblas_int. lda >= m.

    Leading dimension of A.

Bidiagonal forms

rocsolver_<type>labrd()

rocblas_status rocsolver_zlabrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tauq, rocblas_double_complex *taup, rocblas_double_complex *X, const rocblas_int ldx, rocblas_double_complex *Y, const rocblas_int ldy)
rocblas_status rocsolver_clabrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tauq, rocblas_float_complex *taup, rocblas_float_complex *X, const rocblas_int ldx, rocblas_float_complex *Y, const rocblas_int ldy)
rocblas_status rocsolver_dlabrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *D, double *E, double *tauq, double *taup, double *X, const rocblas_int ldx, double *Y, const rocblas_int ldy)
rocblas_status rocsolver_slabrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *D, float *E, float *tauq, float *taup, float *X, const rocblas_int ldx, float *Y, const rocblas_int ldy)

LABRD computes the bidiagonal form of the first k rows and columns of a general m-by-n matrix A, as well as the matrices X and Y needed to reduce the remaining part of A.

The bidiagonal form is given by:

B = Q' * A * P

where B is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q and P are orthogonal/unitary matrices represented as the product of Householder matrices

Q = H(1) * H(2) * ... *  H(k)  and P = G(1) * G(2) * ... * G(k-1), if m >= n, or
Q = H(1) * H(2) * ... * H(k-1) and P = G(1) * G(2) * ... *  G(k),  if m < n

Each Householder matrix H(i) and G(i) is given by

H(i) = I - tauq[i-1] * v(i) * v(i)', and
G(i) = I - taup[i-1] * u(i) * u(i)'

If m >= n, the first i-1 elements of the Householder vector v(i) are zero, and v(i)[i] = 1; while the first i elements of the Householder vector u(i) are zero, and u(i)[i+1] = 1. If m < n, the first i elements of the Householder vector v(i) are zero, and v(i)[i+1] = 1; while the first i-1 elements of the Householder vector u(i) are zero, and u(i)[i] = 1.

The unreduced part of the matrix A can be updated using a block update:

A = A - V * Y' - X * U'

where V is an m-by-k matrix and U is an n-by-k formed using the vectors v and u.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [in] k: rocblas_int. min(m,n) >= k >= 0.

    The number of leading rows and columns of the matrix A to be reduced.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B. If m >= n, the elements below the diagonal are the m - i elements of vector v(i) for i = 1,2,…,n, and the elements above the superdiagonal are the n - i - 1 elements of vector u(i) for i = 1,2,…,n-1. If m < n, the elements below the subdiagonal are the m - i - 1 elements of vector v(i) for i = 1,2,…,m-1, and the elements above the diagonal are the n - i elements of vector u(i) for i = 1,2,…,m.

  • [in] lda: rocblas_int. lda >= m.

    specifies the leading dimension of A.

  • [out] D: pointer to real type. Array on the GPU of dimension k.

    The diagonal elements of B.

  • [out] E: pointer to real type. Array on the GPU of dimension k.

    The off-diagonal elements of B.

  • [out] tauq: pointer to type. Array on the GPU of dimension k.

    The scalar factors of the Householder matrices H(i).

  • [out] taup: pointer to type. Array on the GPU of dimension k.

    The scalar factors of the Householder matrices G(i).

  • [out] X: pointer to type. Array on the GPU of dimension ldx*k.

    The m-by-k matrix needed to reduce the unreduced part of A.

  • [in] ldx: rocblas_int. ldx >= m.

    specifies the leading dimension of X.

  • [out] Y: pointer to type. Array on the GPU of dimension ldy*k.

    The n-by-k matrix needed to reduce the unreduced part of A.

  • [in] ldy: rocblas_int. ldy >= n.

    specifies the leading dimension of Y.

rocsolver_<type>bdsqr()

rocblas_status rocsolver_zbdsqr(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nv, const rocblas_int nu, const rocblas_int nc, double *D, double *E, rocblas_double_complex *V, const rocblas_int ldv, rocblas_double_complex *U, const rocblas_int ldu, rocblas_double_complex *C, const rocblas_int ldc, rocblas_int *info)
rocblas_status rocsolver_cbdsqr(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nv, const rocblas_int nu, const rocblas_int nc, float *D, float *E, rocblas_float_complex *V, const rocblas_int ldv, rocblas_float_complex *U, const rocblas_int ldu, rocblas_float_complex *C, const rocblas_int ldc, rocblas_int *info)
rocblas_status rocsolver_dbdsqr(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nv, const rocblas_int nu, const rocblas_int nc, double *D, double *E, double *V, const rocblas_int ldv, double *U, const rocblas_int ldu, double *C, const rocblas_int ldc, rocblas_int *info)
rocblas_status rocsolver_sbdsqr(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nv, const rocblas_int nu, const rocblas_int nc, float *D, float *E, float *V, const rocblas_int ldv, float *U, const rocblas_int ldu, float *C, const rocblas_int ldc, rocblas_int *info)

BDSQR computes the singular value decomposition (SVD) of a n-by-n bidiagonal matrix B.

The SVD of B has the form:

B = Ub * S * Vb'

where S is the n-by-n diagonal matrix of singular values of B, the columns of Ub are the left singular vectors of B, and the columns of Vb are its right singular vectors.

The computation of the singular vectors is optional; this function accepts input matrices U (of size nu-by-n) and V (of size n-by-nv) that are overwritten with U*Ub and Vb’*V. If nu = 0 no left vectors are computed; if nv = 0 no right vectors are computed.

Optionally, this function can also compute Ub’*C for a given n-by-nc input matrix C.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether B is upper or lower bidiagonal.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of matrix B.

  • [in] nv: rocblas_int. nv >= 0.

    The number of columns of matrix V.

  • [in] nu: rocblas_int. nu >= 0.

    The number of rows of matrix U.

  • [in] nc: rocblas_int. nu >= 0.

    The number of columns of matrix C.

  • [inout] D: pointer to real type. Array on the GPU of dimension n.

    On entry, the diagonal elements of B. On exit, if info = 0, the singular values of B in decreasing order; if info > 0, the diagonal elements of a bidiagonal matrix orthogonally equivalent to B.

  • [inout] E: pointer to real type. Array on the GPU of dimension n-1.

    On entry, the off-diagonal elements of B. On exit, if info > 0, the off-diagonal elements of a bidiagonal matrix orthogonally equivalent to B (if info = 0 this matrix converges to zero).

  • [inout] V: pointer to type. Array on the GPU of dimension ldv*nv.

    On entry, the matrix V. On exit, it is overwritten with Vb’*V. (Not referenced if nv = 0).

  • [in] ldv: rocblas_int. ldv >= n if nv > 0, or ldv >=1 if nv = 0.

    Specifies the leading dimension of V.

  • [inout] U: pointer to type. Array on the GPU of dimension ldu*n.

    On entry, the matrix U. On exit, it is overwritten with U*Ub. (Not referenced if nu = 0).

  • [in] ldu: rocblas_int. ldu >= nu.

    Specifies the leading dimension of U.

  • [inout] C: pointer to type. Array on the GPU of dimension ldc*nc.

    On entry, the matrix C. On exit, it is overwritten with Ub’*C. (Not referenced if nc = 0).

  • [in] ldc: rocblas_int. ldc >= n if nc > 0, or ldc >=1 if nc = 0.

    Specifies the leading dimension of C.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, i elements of E have not converged to zero.

Orthonormal matrices

rocsolver_<type>org2r()

rocblas_status rocsolver_dorg2r(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sorg2r(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)

ORG2R generates a m-by-n Matrix Q with orthonormal columns.

(This is the unblocked version of the algorithm).

The matrix Q is defined as the first n columns of the product of k Householder reflectors of order m

Q = H(1) * H(2) * ... * H(k)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GEQRF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix Q.

  • [in] n: rocblas_int. 0 <= n <= m.

    The number of columns of the matrix Q.

  • [in] k: rocblas_int. 0 <= k <= n.

    The number of Householder reflectors.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th column has Householder vector v(i), for i = 1,2,…,k as returned in the first k columns of matrix A of GEQRF. On exit, the computed matrix Q.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQRF.

rocsolver_<type>orgqr()

rocblas_status rocsolver_dorgqr(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sorgqr(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)

ORGQR generates a m-by-n Matrix Q with orthonormal columns.

(This is the blocked version of the algorithm).

The matrix Q is defined as the first n columns of the product of k Householder reflectors of order m

Q = H(1) * H(2) * ... * H(k)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GEQRF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix Q.

  • [in] n: rocblas_int. 0 <= n <= m.

    The number of columns of the matrix Q.

  • [in] k: rocblas_int. 0 <= k <= n.

    The number of Householder reflectors.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th column has Householder vector v(i), for i = 1,2,…,k as returned in the first k columns of matrix A of GEQRF. On exit, the computed matrix Q.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQRF.

rocsolver_<type>orgl2()

rocblas_status rocsolver_dorgl2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sorgl2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)

ORGL2 generates a m-by-n Matrix Q with orthonormal rows.

(This is the unblocked version of the algorithm).

The matrix Q is defined as the first m rows of the product of k Householder reflectors of order n

Q = H(k) * H(k-1) * ... * H(1)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GELQF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. 0 <= m <= n.

    The number of rows of the matrix Q.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix Q.

  • [in] k: rocblas_int. 0 <= k <= m.

    The number of Householder reflectors.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th row has Householder vector v(i), for i = 1,2,…,k as returned in the first k rows of matrix A of GELQF. On exit, the computed matrix Q.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GELQF.

rocsolver_<type>orglq()

rocblas_status rocsolver_dorglq(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sorglq(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)

ORGLQ generates a m-by-n Matrix Q with orthonormal rows.

(This is the blocked version of the algorithm).

The matrix Q is defined as the first m rows of the product of k Householder reflectors of order n

Q = H(k) * H(k-1) * ... * H(1)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GELQF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. 0 <= m <= n.

    The number of rows of the matrix Q.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix Q.

  • [in] k: rocblas_int. 0 <= k <= m.

    The number of Householder reflectors.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th row has Householder vector v(i), for i = 1,2,…,k as returned in the first k rows of matrix A of GELQF. On exit, the computed matrix Q.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GELQF.

rocsolver_<type>org2l()

rocblas_status rocsolver_dorg2l(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sorg2l(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)

ORG2L generates a m-by-n Matrix Q with orthonormal columns.

(This is the unblocked version of the algorithm).

The matrix Q is defined as the last n columns of the product of k Householder reflectors of order m

Q = H(k) * H(k-1) * ... * H(1)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GEQLF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix Q.

  • [in] n: rocblas_int. 0 <= n <= m.

    The number of columns of the matrix Q.

  • [in] k: rocblas_int. 0 <= k <= n.

    The number of Householder reflectors.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the (n-k+i)-th column has Householder vector v(i), for i = 1,2,…,k as returned in the last k columns of matrix A of GEQLF. On exit, the computed matrix Q.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQLF.

rocsolver_<type>orgql()

rocblas_status rocsolver_dorgql(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sorgql(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)

ORGQL generates a m-by-n Matrix Q with orthonormal columns.

(This is the blocked version of the algorithm).

The matrix Q is defined as the last n column of the product of k Householder reflectors of order m

Q = H(k) * H(k-1) * ... * H(1)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GEQLF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix Q.

  • [in] n: rocblas_int. 0 <= n <= m.

    The number of columns of the matrix Q.

  • [in] k: rocblas_int. 0 <= k <= n.

    The number of Householder reflectors.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the (n-k+i)-th column has Householder vector v(i), for i = 1,2,…,k as returned in the last k columns of matrix A of GEQLF. On exit, the computed matrix Q.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQLF.

rocsolver_<type>orgbr()

rocblas_status rocsolver_dorgbr(rocblas_handle handle, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sorgbr(rocblas_handle handle, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv)

ORGBR generates a m-by-n Matrix Q with orthonormal rows or columns.

If storev is column-wise, then the matrix Q has orthonormal columns. If m >= k, Q is defined as the first n columns of the product of k Householder reflectors of order m

Q = H(1) * H(2) * ... * H(k)

If m < k, Q is defined as the product of Householder reflectors of order m

Q = H(1) * H(2) * ... * H(m-1)

On the other hand, if storev is row-wise, then the matrix Q has orthonormal rows. If n > k, Q is defined as the first m rows of the product of k Householder reflectors of order n

Q = H(k) * H(k-1) * ... * H(1)

If n <= k, Q is defined as the product of Householder reflectors of order n

Q = H(n-1) * H(n-2) * ... * H(1)

The Householder matrices H(i) are never stored, they are computed from its corresponding Householder vectors v(i) and scalars ipiv_i as returned by GEBRD in its arguments A and tauq or taup.

Parameters
  • [in] handle: rocblas_handle.

  • [in] storev: rocblas_storev.

    Specifies whether to work column-wise or row-wise.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix Q. If row-wise, then min(n,k) <= m <= n.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix Q. If column-wise, then min(m,k) <= n <= m.

  • [in] k: rocblas_int. k >= 0.

    The number of columns (if storev is colum-wise) or rows (if row-wise) of the original matrix reduced by GEBRD.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th column (or row) has the Householder vector v(i) as returned by GEBRD. On exit, the computed matrix Q.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension min(m,k) if column-wise, or min(n,k) if row-wise.

    The scalar factors of the Householder matrices H(i) as returned by GEBRD.

rocsolver_<type>orm2r()

rocblas_status rocsolver_dorm2r(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)
rocblas_status rocsolver_sorm2r(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)

ORM2R applies a matrix Q with orthonormal columns to a general m-by-n matrix C.

(This is the unblocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Transpose from the right)

Q is an orthogonal matrix defined as the product of k Householder reflectors as

Q = H(1) * H(2) * ... * H(k)

or order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QR factorization GEQRF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] side: rocblas_side.

    Specifies from which side to apply Q.

  • [in] trans: rocblas_operation.

    Specifies whether the matrix Q or its transpose is to be applied.

  • [in] m: rocblas_int. m >= 0.

    Number of rows of matrix C.

  • [in] n: rocblas_int. n >= 0.

    Number of columns of matrix C.

  • [in] k: rocsovler_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • [in] A: pointer to type. Array on the GPU of size lda*k.

    The i-th column has the Householder vector v(i) associated with H(i) as returned by GEQRF in the first k columns of its argument A.

  • [in] lda: rocblas_int. lda >= m if side is left, or lda >= n if side is right.

    Leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQRF.

  • [inout] C: pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • [in] ldc: rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>ormqr()

rocblas_status rocsolver_dormqr(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)
rocblas_status rocsolver_sormqr(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)

ORMQR applies a matrix Q with orthonormal columns to a general m-by-n matrix C.

(This is the blocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Transpose from the right)

Q is an orthogonal matrix defined as the product of k Householder reflectors as

Q = H(1) * H(2) * ... * H(k)

or order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QR factorization GEQRF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] side: rocblas_side.

    Specifies from which side to apply Q.

  • [in] trans: rocblas_operation.

    Specifies whether the matrix Q or its transpose is to be applied.

  • [in] m: rocblas_int. m >= 0.

    Number of rows of matrix C.

  • [in] n: rocblas_int. n >= 0.

    Number of columns of matrix C.

  • [in] k: rocsovler_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • [in] A: pointer to type. Array on the GPU of size lda*k.

    The i-th column has the Householder vector v(i) associated with H(i) as returned by GEQRF in the first k columns of its argument A.

  • [in] lda: rocblas_int. lda >= m if side is left, or lda >= n if side is right.

    Leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQRF.

  • [inout] C: pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • [in] ldc: rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>orml2()

rocblas_status rocsolver_dorml2(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)
rocblas_status rocsolver_sorml2(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)

ORML2 applies a matrix Q with orthonormal rows to a general m-by-n matrix C.

(This is the unblocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Transpose from the right)

Q is an orthogonal matrix defined as the product of k Householder reflectors as

Q = H(k) * H(k-1) * ... * H(1)

or order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the LQ factorization GELQF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] side: rocblas_side.

    Specifies from which side to apply Q.

  • [in] trans: rocblas_operation.

    Specifies whether the matrix Q or its transpose is to be applied.

  • [in] m: rocblas_int. m >= 0.

    Number of rows of matrix C.

  • [in] n: rocblas_int. n >= 0.

    Number of columns of matrix C.

  • [in] k: rocsovler_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • [in] A: pointer to type. Array on the GPU of size lda*m if side is left, or lda*n if side is right.

    The i-th row has the Householder vector v(i) associated with H(i) as returned by GELQF in the first k rows of its argument A.

  • [in] lda: rocblas_int. lda >= k.

    Leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GELQF.

  • [inout] C: pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • [in] ldc: rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>ormlq()

rocblas_status rocsolver_dormlq(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)
rocblas_status rocsolver_sormlq(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)

ORMLQ applies a matrix Q with orthonormal rows to a general m-by-n matrix C.

(This is the blocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Transpose from the right)

Q is an orthogonal matrix defined as the product of k Householder reflectors as

Q = H(k) * H(k-1) * ... * H(1)

or order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the LQ factorization GELQF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] side: rocblas_side.

    Specifies from which side to apply Q.

  • [in] trans: rocblas_operation.

    Specifies whether the matrix Q or its transpose is to be applied.

  • [in] m: rocblas_int. m >= 0.

    Number of rows of matrix C.

  • [in] n: rocblas_int. n >= 0.

    Number of columns of matrix C.

  • [in] k: rocsovler_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • [in] A: pointer to type. Array on the GPU of size lda*m if side is left, or lda*n if side is right.

    The i-th row has the Householder vector v(i) associated with H(i) as returned by GELQF in the first k rows of its argument A.

  • [in] lda: rocblas_int. lda >= k.

    Leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GELQF.

  • [inout] C: pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • [in] ldc: rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>ormbr()

rocblas_status rocsolver_dormbr(rocblas_handle handle, const rocblas_storev storev, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, double *A, const rocblas_int lda, double *ipiv, double *C, const rocblas_int ldc)
rocblas_status rocsolver_sormbr(rocblas_handle handle, const rocblas_storev storev, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, float *A, const rocblas_int lda, float *ipiv, float *C, const rocblas_int ldc)

ORMBR applies a matrix Q with orthonormal rows or columns to a general m-by-n matrix C.

If storev is column-wise, then the matrix Q has orthonormal columns. If storev is row-wise, then the matrix Q has orthonormal rows. The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Transpose from the right)

The order nq of orthogonal matrix Q is nq = m if applying from the left, or nq = n if applying from the right.

When storev is column-wise, if nq >= k, then Q is defined as the product of k Householder reflectors of order nq

Q = H(1) * H(2) * ... * H(k),

and if nq < k, then Q is defined as the product

Q = H(1) * H(2) * ... * H(nq-1).

When storev is row-wise, if nq > k, then Q is defined as the product of k Householder reflectors of order nq

Q = H(1) * H(2) * ... * H(k),

and if n <= k, Q is defined as the product

Q = H(1) * H(2) * ... * H(nq-1)

The Householder matrices H(i) are never stored, they are computed from its corresponding Householder vectors v(i) and scalars ipiv_i as returned by GEBRD in its arguments A and tauq or taup.

Parameters
  • [in] handle: rocblas_handle.

  • [in] storev: rocblas_storev.

    Specifies whether to work column-wise or row-wise.

  • [in] side: rocblas_side.

    Specifies from which side to apply Q.

  • [in] trans: rocblas_operation.

    Specifies whether the matrix Q or its transpose is to be applied.

  • [in] m: rocblas_int. m >= 0.

    Number of rows of matrix C.

  • [in] n: rocblas_int. n >= 0.

    Number of columns of matrix C.

  • [in] k: rocsovler_int. k >= 0.

    The number of columns (if storev is colum-wise) or rows (if row-wise) of the original matrix reduced by GEBRD.

  • [in] A: pointer to type. Array on the GPU of size lda*min(nq,k) if column-wise, or lda*nq if row-wise.

    The i-th column (or row) has the Householder vector v(i) associated with H(i) as returned by GEBRD.

  • [in] lda: rocblas_int. lda >= nq if column-wise, or lda >= min(nq,k) if row-wise.

    Leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least min(nq,k).

    The scalar factors of the Householder matrices H(i) as returned by GEBRD.

  • [inout] C: pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • [in] ldc: rocblas_int. ldc >= m.

    Leading dimension of C.

Unitary matrices

rocsolver_<type>ung2r()

rocblas_status rocsolver_zung2r(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cung2r(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)

UNG2R generates a m-by-n complex Matrix Q with orthonormal columns.

(This is the unblocked version of the algorithm).

The matrix Q is defined as the first n columns of the product of k Householder reflectors of order m

Q = H(1) * H(2) * ... * H(k)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GEQRF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix Q.

  • [in] n: rocblas_int. 0 <= n <= m.

    The number of columns of the matrix Q.

  • [in] k: rocblas_int. 0 <= k <= n.

    The number of Householder reflectors.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th column has Householder vector v(i), for i = 1,2,…,k as returned in the first k columns of matrix A of GEQRF. On exit, the computed matrix Q.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQRF.

rocsolver_<type>ungqr()

rocblas_status rocsolver_zungqr(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cungqr(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)

UNGQR generates a m-by-n complex Matrix Q with orthonormal columns.

(This is the blocked version of the algorithm).

The matrix Q is defined as the first n columns of the product of k Householder reflectors of order m

Q = H(1) * H(2) * ... * H(k)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GEQRF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix Q.

  • [in] n: rocblas_int. 0 <= n <= m.

    The number of columns of the matrix Q.

  • [in] k: rocblas_int. 0 <= k <= n.

    The number of Householder reflectors.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th column has Householder vector v(i), for i = 1,2,…,k as returned in the first k columns of matrix A of GEQRF. On exit, the computed matrix Q.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQRF.

rocsolver_<type>ungl2()

rocblas_status rocsolver_zungl2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cungl2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)

UNGL2 generates a m-by-n complex Matrix Q with orthonormal rows.

(This is the unblocked version of the algorithm).

The matrix Q is defined as the first m rows of the product of k Householder reflectors of order n

Q = H(k)**H * H(k-1)**H * ... * H(1)**H

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GELQF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. 0 <= m <= n.

    The number of rows of the matrix Q.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix Q.

  • [in] k: rocblas_int. 0 <= k <= m.

    The number of Householder reflectors.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th row has Householder vector v(i), for i = 1,2,…,k as returned in the first k rows of matrix A of GELQF. On exit, the computed matrix Q.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GELQF.

rocsolver_<type>unglq()

rocblas_status rocsolver_zunglq(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cunglq(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)

UNGLQ generates a m-by-n complex Matrix Q with orthonormal rows.

(This is the blocked version of the algorithm).

The matrix Q is defined as the first m rows of the product of k Householder reflectors of order n

Q = H(k)**H * H(k-1)**H * ... * H(1)**H

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GELQF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. 0 <= m <= n.

    The number of rows of the matrix Q.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix Q.

  • [in] k: rocblas_int. 0 <= k <= m.

    The number of Householder reflectors.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th row has Householder vector v(i), for i = 1,2,…,k as returned in the first k rows of matrix A of GELQF. On exit, the computed matrix Q.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GELQF.

rocsolver_<type>ung2l()

rocblas_status rocsolver_zung2l(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cung2l(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)

UNG2L generates a m-by-n complex Matrix Q with orthonormal columns.

(This is the unblocked version of the algorithm).

The matrix Q is defined as the last n columns of the product of k Householder reflectors of order m

Q = H(k) * H(k-1) * ... * H(1)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GEQLF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix Q.

  • [in] n: rocblas_int. 0 <= n <= m.

    The number of columns of the matrix Q.

  • [in] k: rocblas_int. 0 <= k <= n.

    The number of Householder reflectors.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the (n-k+i)-th column has Householder vector v(i), for i = 1,2,…,k as returned in the last k columns of matrix A of GEQLF. On exit, the computed matrix Q.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQLF.

rocsolver_<type>ungql()

rocblas_status rocsolver_zungql(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cungql(rocblas_handle handle, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)

UNGQL generates a m-by-n complex Matrix Q with orthonormal columns.

(This is the blocked version of the algorithm).

The matrix Q is defined as the last n columns of the product of k Householder reflectors of order m

Q = H(k) * H(k-1) * ... * H(1)

Householder matrices H(i) are never stored, they are computed from its corresponding Householder vector v(i) and scalar ipiv_i as returned by GEQLF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix Q.

  • [in] n: rocblas_int. 0 <= n <= m.

    The number of columns of the matrix Q.

  • [in] k: rocblas_int. 0 <= k <= n.

    The number of Householder reflectors.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the (n-k+i)-th column has Householder vector v(i), for i = 1,2,…,k as returned in the last k columns of matrix A of GEQLF. On exit, the computed matrix Q.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQLF.

rocsolver_<type>ungbr()

rocblas_status rocsolver_zungbr(rocblas_handle handle, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cungbr(rocblas_handle handle, const rocblas_storev storev, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)

UNGBR generates a m-by-n complex Matrix Q with orthonormal rows or columns.

If storev is column-wise, then the matrix Q has orthonormal columns. If m >= k, Q is defined as the first n columns of the product of k Householder reflectors of order m

Q = H(1) * H(2) * ... * H(k)

If m < k, Q is defined as the product of Householder reflectors of order m

Q = H(1) * H(2) * ... * H(m-1)

On the other hand, if storev is row-wise, then the matrix Q has orthonormal rows. If n > k, Q is defined as the first m rows of the product of k Householder reflectors of order n

Q = H(k) * H(k-1) * ... * H(1)

If n <= k, Q is defined as the product of Householder reflectors of order n

Q = H(n-1) * H(n-2) * ... * H(1)

The Householder matrices H(i) are never stored, they are computed from its corresponding Householder vectors v(i) and scalars ipiv_i as returned by GEBRD in its arguments A and tauq or taup.

Parameters
  • [in] handle: rocblas_handle.

  • [in] storev: rocblas_storev.

    Specifies whether to work column-wise or row-wise.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix Q. If row-wise, then min(n,k) <= m <= n.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix Q. If column-wise, then min(m,k) <= n <= m.

  • [in] k: rocblas_int. k >= 0.

    The number of columns (if storev is colum-wise) or rows (if row-wise) of the original matrix reduced by GEBRD.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the i-th column (or row) has the Householder vector v(i) as returned by GEBRD. On exit, the computed matrix Q.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension min(m,k) if column-wise, or min(n,k) if row-wise.

    The scalar factors of the Householder matrices H(i) as returned by GEBRD.

rocsolver_<type>unm2r()

rocblas_status rocsolver_zunm2r(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)
rocblas_status rocsolver_cunm2r(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)

UNM2R applies a complex matrix Q with orthonormal columns to a general m-by-n matrix C.

(This is the unblocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Conjugate transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Conjugate transpose from the right)

Q is a unitary matrix defined as the product of k Householder reflectors as

Q = H(1) * H(2) * ... * H(k)

or order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QR factorization GEQRF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] side: rocblas_side.

    Specifies from which side to apply Q.

  • [in] trans: rocblas_operation.

    Specifies whether the matrix Q or its conjugate transpose is to be applied.

  • [in] m: rocblas_int. m >= 0.

    Number of rows of matrix C.

  • [in] n: rocblas_int. n >= 0.

    Number of columns of matrix C.

  • [in] k: rocsovler_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • [in] A: pointer to type. Array on the GPU of size lda*k.

    The i-th column has the Householder vector v(i) associated with H(i) as returned by GEQRF in the first k columns of its argument A.

  • [in] lda: rocblas_int. lda >= m if side is left, or lda >= n if side is right.

    Leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQRF.

  • [inout] C: pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • [in] ldc: rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>unmqr()

rocblas_status rocsolver_zunmqr(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)
rocblas_status rocsolver_cunmqr(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)

UNMQR applies a complex matrix Q with orthonormal columns to a general m-by-n matrix C.

(This is the blocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Conjugate transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Conjugate transpose from the right)

Q is a unitary matrix defined as the product of k Householder reflectors as

Q = H(1) * H(2) * ... * H(k)

or order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the QR factorization GEQRF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] side: rocblas_side.

    Specifies from which side to apply Q.

  • [in] trans: rocblas_operation.

    Specifies whether the matrix Q or its conjugate transpose is to be applied.

  • [in] m: rocblas_int. m >= 0.

    Number of rows of matrix C.

  • [in] n: rocblas_int. n >= 0.

    Number of columns of matrix C.

  • [in] k: rocsovler_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • [in] A: pointer to type. Array on the GPU of size lda*k.

    The i-th column has the Householder vector v(i) associated with H(i) as returned by GEQRF in the first k columns of its argument A.

  • [in] lda: rocblas_int. lda >= m if side is left, or lda >= n if side is right.

    Leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GEQRF.

  • [inout] C: pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • [in] ldc: rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>unml2()

rocblas_status rocsolver_zunml2(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)
rocblas_status rocsolver_cunml2(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)

UNML2 applies a complex matrix Q with orthonormal rows to a general m-by-n matrix C.

(This is the unblocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Conjugate transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Conjugate transpose from the right)

Q is a unitary matrix defined as the product of k Householder reflectors as

Q = H(k)**H * H(k-1)**H * ... * H(1)**H

or order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the LQ factorization GELQF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] side: rocblas_side.

    Specifies from which side to apply Q.

  • [in] trans: rocblas_operation.

    Specifies whether the matrix Q or its conjugate transpose is to be applied.

  • [in] m: rocblas_int. m >= 0.

    Number of rows of matrix C.

  • [in] n: rocblas_int. n >= 0.

    Number of columns of matrix C.

  • [in] k: rocsovler_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • [in] A: pointer to type. Array on the GPU of size lda*m if side is left, or lda*n if side is right.

    The i-th row has the Householder vector v(i) associated with H(i) as returned by GELQF in the first k rows of its argument A.

  • [in] lda: rocblas_int. lda >= k.

    Leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GELQF.

  • [inout] C: pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • [in] ldc: rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>unmlq()

rocblas_status rocsolver_zunmlq(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)
rocblas_status rocsolver_cunmlq(rocblas_handle handle, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)

UNMLQ applies a complex matrix Q with orthonormal rows to a general m-by-n matrix C.

(This is the blocked version of the algorithm).

The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Conjugate transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Conjugate transpose from the right)

Q is a unitary matrix defined as the product of k Householder reflectors as

Q = H(k)**H * H(k-1)**H * ... * H(1)**H

or order m if applying from the left, or n if applying from the right. Q is never stored, it is calculated from the Householder vectors and scalars returned by the LQ factorization GELQF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] side: rocblas_side.

    Specifies from which side to apply Q.

  • [in] trans: rocblas_operation.

    Specifies whether the matrix Q or its conjugate transpose is to be applied.

  • [in] m: rocblas_int. m >= 0.

    Number of rows of matrix C.

  • [in] n: rocblas_int. n >= 0.

    Number of columns of matrix C.

  • [in] k: rocsovler_int. k >= 0; k <= m if side is left, k <= n if side is right.

    The number of Householder reflectors that form Q.

  • [in] A: pointer to type. Array on the GPU of size lda*m if side is left, or lda*n if side is right.

    The i-th row has the Householder vector v(i) associated with H(i) as returned by GELQF in the first k rows of its argument A.

  • [in] lda: rocblas_int. lda >= k.

    Leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least k.

    The scalar factors of the Householder matrices H(i) as returned by GELQF.

  • [inout] C: pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • [in] ldc: rocblas_int. ldc >= m.

    Leading dimension of C.

rocsolver_<type>unmbr()

rocblas_status rocsolver_zunmbr(rocblas_handle handle, const rocblas_storev storev, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv, rocblas_double_complex *C, const rocblas_int ldc)
rocblas_status rocsolver_cunmbr(rocblas_handle handle, const rocblas_storev storev, const rocblas_side side, const rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int k, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv, rocblas_float_complex *C, const rocblas_int ldc)

UNMBR applies a complex matrix Q with orthonormal rows or columns to a general m-by-n matrix C.

If storev is column-wise, then the matrix Q has orthonormal columns. If storev is row-wise, then the matrix Q has orthonormal rows. The matrix Q is applied in one of the following forms, depending on the values of side and trans:

Q  * C  (No transpose from the left)
Q' * C  (Conjugate transpose from the left)
C * Q   (No transpose from the right), and
C * Q'  (Conjugate transpose from the right)

The order nq of unitary matrix Q is nq = m if applying from the left, or nq = n if applying from the right.

When storev is column-wise, if nq >= k, then Q is defined as the product of k Householder reflectors of order nq

Q = H(1) * H(2) * ... * H(k),

and if nq < k, then Q is defined as the product

Q = H(1) * H(2) * ... * H(nq-1).

When storev is row-wise, if nq > k, then Q is defined as the product of k Householder reflectors of order nq

Q = H(1) * H(2) * ... * H(k),

and if n <= k, Q is defined as the product

Q = H(1) * H(2) * ... * H(nq-1)

The Householder matrices H(i) are never stored, they are computed from its corresponding Householder vectors v(i) and scalars ipiv_i as returned by GEBRD in its arguments A and tauq or taup.

Parameters
  • [in] handle: rocblas_handle.

  • [in] storev: rocblas_storev.

    Specifies whether to work column-wise or row-wise.

  • [in] side: rocblas_side.

    Specifies from which side to apply Q.

  • [in] trans: rocblas_operation.

    Specifies whether the matrix Q or its conjugate transpose is to be applied.

  • [in] m: rocblas_int. m >= 0.

    Number of rows of matrix C.

  • [in] n: rocblas_int. n >= 0.

    Number of columns of matrix C.

  • [in] k: rocsovler_int. k >= 0.

    The number of columns (if storev is colum-wise) or rows (if row-wise) of the original matrix reduced by GEBRD.

  • [in] A: pointer to type. Array on the GPU of size lda*min(nq,k) if column-wise, or lda*nq if row-wise.

    The i-th column (or row) has the Householder vector v(i) associated with H(i) as returned by GEBRD.

  • [in] lda: rocblas_int. lda >= nq if column-wise, or lda >= min(nq,k) if row-wise.

    Leading dimension of A.

  • [in] ipiv: pointer to type. Array on the GPU of dimension at least min(nq,k).

    The scalar factors of the Householder matrices H(i) as returned by GEBRD.

  • [inout] C: pointer to type. Array on the GPU of size ldc*n.

    On input, the matrix C. On output it is overwritten with Q*C, C*Q, Q’*C, or C*Q’.

  • [in] ldc: rocblas_int. ldc >= m.

    Leading dimension of C.

LAPACK Functions

LAPACK routines solve complex Numerical Linear Algebra problems.

Special Matrix Factorizations

rocsolver_<type>potf2()

rocblas_status rocsolver_zpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_cpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_dpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_spotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)

POTF2 computes the Cholesky factorization of a real symmetric/complex Hermitian positive definite matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form:

A = U' * U, or
A = L  * L'

depending on the value of uplo. U is an upper triangular matrix and L is lower triangular.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix A to be factored. On exit, the lower or upper triangular factor.

  • [in] lda: rocblas_int. lda >= n.

    specifies the leading dimension of A.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful factorization of matrix A. If info = i > 0, the leading minor of order i of A is not positive definite. The factorization stopped at this point.

rocsolver_<type>potf2_batched()

rocblas_status rocsolver_zpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_spotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)

POTF2_BATCHED computes the Cholesky factorization of a batch of real symmetric/complex Hermitian positive definite matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix A_i in the batch has the form:

A_i = U_i' * U_i, or
A_i = L_i  * L_i'

depending on the value of uplo. U_i is an upper triangular matrix and L_i is lower triangular.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The dimension of matrix A_i.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_i to be factored. On exit, the upper or lower triangular factors.

  • [in] lda: rocblas_int. lda >= n.

    specifies the leading dimension of A_i.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful factorization of matrix A_i. If info_i = j > 0, the leading minor of order j of A_i is not positive definite. The i-th factorization stopped at this point.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>potf2_strided_batched()

rocblas_status rocsolver_zpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_spotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)

POTF2_STRIDED_BATCHED computes the Cholesky factorization of a batch of real symmetric/complex Hermitian positive definite matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix A_i in the batch has the form:

A_i = U_i' * U_i, or
A_i = L_i  * L_i'

depending on the value of uplo. U_i is an upper triangular matrix and L_i is lower triangular.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The dimension of matrix A_i.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_i to be factored. On exit, the upper or lower triangular factors.

  • [in] lda: rocblas_int. lda >= n.

    specifies the leading dimension of A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i and the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful factorization of matrix A_i. If info_i = j > 0, the leading minor of order j of A_i is not positive definite. The i-th factorization stopped at this point.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>potrf()

rocblas_status rocsolver_zpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_cpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_dpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_spotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)

POTRF computes the Cholesky factorization of a real symmetric/complex Hermitian positive definite matrix A.

(This is the blocked version of the algorithm).

The factorization has the form:

A = U' * U, or
A = L  * L'

depending on the value of uplo. U is an upper triangular matrix and L is lower triangular.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix A to be factored. On exit, the lower or upper triangular factor.

  • [in] lda: rocblas_int. lda >= n.

    specifies the leading dimension of A.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful factorization of matrix A. If info = i > 0, the leading minor of order i of A is not positive definite. The factorization stopped at this point.

rocsolver_<type>potrf_batched()

rocblas_status rocsolver_zpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_spotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)

POTRF_BATCHED computes the Cholesky factorization of a batch of real symmetric/complex Hermitian positive definite matrices.

(This is the blocked version of the algorithm).

The factorization of matrix A_i in the batch has the form:

A_i = U_i' * U_i, or
A_i = L_i  * L_i'

depending on the value of uplo. U_i is an upper triangular matrix and L_i is lower triangular.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The dimension of matrix A_i.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_i to be factored. On exit, the upper or lower triangular factors.

  • [in] lda: rocblas_int. lda >= n.

    specifies the leading dimension of A_i.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful factorization of matrix A_i. If info_i = j > 0, the leading minor of order j of A_i is not positive definite. The i-th factorization stopped at this point.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>potrf_strided_batched()

rocblas_status rocsolver_zpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_spotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)

POTRF_STRIDED_BATCHED computes the Cholesky factorization of a batch of real symmetric/complex Hermitian positive definite matrices.

(This is the blocked version of the algorithm).

The factorization of matrix A_i in the batch has the form:

A_i = U_i' * U_i, or
A_i = L_i  * L_i'

depending on the value of uplo. U_i is an upper triangular matrix and L_i is lower triangular.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The dimension of matrix A_i.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_i to be factored. On exit, the upper or lower triangular factors.

  • [in] lda: rocblas_int. lda >= n.

    specifies the leading dimension of A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i and the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful factorization of matrix A_i. If info_i = j > 0, the leading minor of order j of A_i is not positive definite. The i-th factorization stopped at this point.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

General Matrix Factorizations

rocsolver_<type>getf2()

rocblas_status rocsolver_zgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_cgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_dgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_sgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)

GETF2 computes the LU factorization of a general m-by-n matrix A using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details see the section “tuning rocSOLVER performance” on the User’s guide).

The factorization has the form

A = P * L * U

where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to rocblas_int. Array on the GPU of dimension min(m,n).

    The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= i <= min(m,n), the row i of the matrix was interchanged with row ipiv[i]. Matrix P of the factorization can be derived from ipiv.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, U is singular. U(i,i) is the first zero pivot.

rocsolver_<type>getf2_batched()

rocblas_status rocsolver_zgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

GETF2_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details see the section “tuning rocSOLVER performance” on the User’s guide).

The factorization of matrix A_i in the batch has the form

A_i = P_i * L_i * U_i

where P_i is a permutation matrix, L_i is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U_i is upper triangular (upper trapezoidal if m < n).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all matrices A_i in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all matrices A_i in the batch.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_i to be factored. On exit, the factors L_i and U_i from the factorizations. The unit diagonal elements of L_i are not stored.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_i.

  • [out] ipiv: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors of pivot indices ipiv_i (corresponding to A_i). Dimension of ipiv_i is min(m,n). Elements of ipiv_i are 1-based indices. For each instance A_i in the batch and for 1 <= j <= min(m,n), the row j of the matrix A_i was interchanged with row ipiv_i[j]. Matrix P_i of the factorization can be derived from ipiv_i.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful exit for factorization of A_i. If info_i = j > 0, U_i is singular. U_i(j,j) is the first zero pivot.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getf2_strided_batched()

rocblas_status rocsolver_zgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

GETF2_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details see the section “tuning rocSOLVER performance” on the User’s guide).

The factorization of matrix A_i in the batch has the form

A_i = P_i * L_i * U_i

where P_i is a permutation matrix, L_i is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U_i is upper triangular (upper trapezoidal if m < n).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all matrices A_i in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all matrices A_i in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_i to be factored. On exit, the factors L_i and U_i from the factorization. The unit diagonal elements of L_i are not stored.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i and the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • [out] ipiv: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors of pivots indices ipiv_i (corresponding to A_i). Dimension of ipiv_i is min(m,n). Elements of ipiv_i are 1-based indices. For each instance A_i in the batch and for 1 <= j <= min(m,n), the row j of the matrix A_i was interchanged with row ipiv_i[j]. Matrix P_i of the factorization can be derived from ipiv_i.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful exit for factorization of A_i. If info_i = j > 0, U_i is singular. U_i(j,j) is the first zero pivot.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getrf()

rocblas_status rocsolver_zgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_cgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_dgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_sgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)

GETRF computes the LU factorization of a general m-by-n matrix A using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details see the section “tuning rocSOLVER performance” on the User’s guide).

The factorization has the form

A = P * L * U

where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to rocblas_int. Array on the GPU of dimension min(m,n).

    The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= i <= min(m,n), the row i of the matrix was interchanged with row ipiv[i]. Matrix P of the factorization can be derived from ipiv.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, U is singular. U(i,i) is the first zero pivot.

rocsolver_<type>getrf_batched()

rocblas_status rocsolver_zgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

GETRF_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details see the section “tuning rocSOLVER performance” on the User’s guide).

The factorization of matrix A_i in the batch has the form

A_i = P_i * L_i * U_i

where P_i is a permutation matrix, L_i is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U_i is upper triangular (upper trapezoidal if m < n).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all matrices A_i in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all matrices A_i in the batch.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_i to be factored. On exit, the factors L_i and U_i from the factorizations. The unit diagonal elements of L_i are not stored.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_i.

  • [out] ipiv: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors of pivot indices ipiv_i (corresponding to A_i). Dimension of ipiv_i is min(m,n). Elements of ipiv_i are 1-based indices. For each instance A_i in the batch and for 1 <= j <= min(m,n), the row j of the matrix A_i was interchanged with row ipiv_i(j). Matrix P_i of the factorization can be derived from ipiv_i.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful exit for factorization of A_i. If info_i = j > 0, U_i is singular. U_i(j,j) is the first zero pivot.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getrf_strided_batched()

rocblas_status rocsolver_zgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

GETRF_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details see the section “tuning rocSOLVER performance” on the User’s guide).

The factorization of matrix A_i in the batch has the form

A_i = P_i * L_i * U_i

where P_i is a permutation matrix, L_i is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U_i is upper triangular (upper trapezoidal if m < n).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all matrices A_i in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all matrices A_i in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_i to be factored. On exit, the factors L_i and U_i from the factorization. The unit diagonal elements of L_i are not stored.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i and the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • [out] ipiv: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors of pivots indices ipiv_i (corresponding to A_i). Dimension of ipiv_i is min(m,n). Elements of ipiv_i are 1-based indices. For each instance A_i in the batch and for 1 <= j <= min(m,n), the row j of the matrix A_i was interchanged with row ipiv_i(j). Matrix P_i of the factorization can be derived from ipiv_i.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful exit for factorization of A_i. If info_i = j > 0, U_i is singular. U_i(j,j) is the first zero pivot.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geqr2()

rocblas_status rocsolver_zgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GEQR2 computes a QR factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

A =  Q * [ R ]
         [ 0 ]

where R is upper triangular (upper trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q = H(1) * H(2) * ... * H(k), with k = min(m,n)

Each Householder matrix H(i), for i = 1,2,…,k, is given by

H(i) = I - ipiv[i-1] * v(i) * v(i)'

where the first i-1 elements of the Householder vector v(i) are zero, and v(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and above the diagonal contain the factor R; the elements below the diagonal are the m - i elements of vector v(i) for i = 1,2,…,min(m,n).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices H(i).

rocsolver_<type>geqr2_batched()

rocblas_status rocsolver_zgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQR2_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j =  Q_j * [ R_j ]
             [  0  ]

where R_j is upper triangular (upper trapezoidal if m < n), and Q_j is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... * H_j(k), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i) * v_j(i)'

where the first i-1 elements of Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the m - i elements of vector v_j(i) for i=1,2,…,min(m,n).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geqr2_strided_batched()

rocblas_status rocsolver_zgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQR2_STRIDED_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j =  Q_j * [ R_j ]
             [  0  ]

where R_j is upper triangular (upper trapezoidal if m < n), and Q_j is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... * H_j(k), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i) * v_j(i)'

where the first i-1 elements of Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the m - i elements of vector v_j(i) for i = 1,2,…,min(m,n).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geqrf()

rocblas_status rocsolver_zgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GEQRF computes a QR factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

A =  Q * [ R ]
         [ 0 ]

where R is upper triangular (upper trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q = H(1) * H(2) * ... * H(k), with k = min(m,n)

Each Householder matrix H(i), for i = 1,2,…,k, is given by

H(i) = I - ipiv[i-1] * v(i) * v(i)'

where the first i-1 elements of the Householder vector v(i) are zero, and v(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and above the diagonal contain the factor R; the elements below the diagonal are the m - i elements of vector v(i) for i = 1,2,…,min(m,n).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices H(i).

rocsolver_<type>geqrf_batched()

rocblas_status rocsolver_zgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQRF_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j =  Q_j * [ R_j ]
             [  0  ]

where R_j is upper triangular (upper trapezoidal if m < n), and Q_j is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... * H_j(k), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i) * v_j(i)'

where the first i-1 elements of vector Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the m - i elements of vector v_j(i) for i=1,2,…,min(m,n).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geqrf_strided_batched()

rocblas_status rocsolver_zgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQRF_STRIDED_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j =  Q_j * [ R_j ]
             [  0  ]

where R_j is upper triangular (upper trapezoidal if m < n), and Q_j is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... * H_j(k), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i) * v_j(i)'

where the first i-1 elements of vector Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the m - i elements of vector v_j(i) for i = 1,2,…,min(m,n).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geql2()

rocblas_status rocsolver_zgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GEQL2 computes a QL factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

A =  Q * [ 0 ]
         [ L ]

where L is lower triangular (lower trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q = H(k) * ... * H(2) * H(1), with k = min(m,n)

Each Householder matrix H(i), for i = 1,2,…,k, is given by

H(i) = I - ipiv[i-1] * v(i) * v(i)'

where the last m-i elements of the Householder vector v(i) are zero, and v(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and below the (m-n)th subdiagonal (when m >= n) or the (n-m)th superdiagonal (when n > m) contain the factor L; the elements above the sub/superdiagonal are the i - 1 elements of vector v(i) for i = 1,2,…,min(m,n).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices H(i).

rocsolver_<type>geql2_batched()

rocblas_status rocsolver_zgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQL2_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j =  Q_j * [  0  ]
             [ L_j ]

where L_j is lower triangular (lower trapezoidal if m < n), and Q_j is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(k) * ... * H_j(2) * H_j(1), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i) * v_j(i)'

where the last m-i elements of Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the (m-n)th subdiagonal (when m >= n) or the (n-m)th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the i - 1 elements of vector v_j(i) for i = 1,2,…,min(m,n).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geql2_strided_batched()

rocblas_status rocsolver_zgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQL2_STRIDED_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j =  Q_j * [  0  ]
             [ L_j ]

where L_j is lower triangular (lower trapezoidal if m < n), and Q_j is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(k) * ... * H_j(2) * H_j(1), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i) * v_j(i)'

where the last m-i elements of Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the (m-n)th subdiagonal (when m >= n) or the (n-m)th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the i - 1 elements of vector v_j(i) for i = 1,2,…,min(m,n).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geqlf()

rocblas_status rocsolver_zgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GEQLF computes a QL factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

A =  Q * [ 0 ]
         [ L ]

where L is lower triangular (lower trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q = H(k) * ... * H(2) * H(1), with k = min(m,n)

Each Householder matrix H(i), for i = 1,2,…,k, is given by

H(i) = I - ipiv[i-1] * v(i) * v(i)'

where the last m-i elements of the Householder vector v(i) are zero, and v(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and below the (m-n)th subdiagonal (when m >= n) or the (n-m)th superdiagonal (when n > m) contain the factor L; the elements above the sub/superdiagonal are the i - 1 elements of vector v(i) for i = 1,2,…,min(m,n).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices H(i).

rocsolver_<type>geqlf_batched()

rocblas_status rocsolver_zgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQLF_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j =  Q_j * [  0  ]
             [ L_j ]

where L_j is lower triangular (lower trapezoidal if m < n), and Q_j is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(k) * ... * H_j(2) * H_j(1), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i) * v_j(i)'

where the last m-i elements of vector Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the (m-n)th subdiagonal (when m >= n) or the (n-m)th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the i - 1 elements of vector v_j(i) for i = 1,2,…,min(m,n).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geqlf_strided_batched()

rocblas_status rocsolver_zgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQLF_STRIDED_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j =  Q_j * [  0  ]
             [ L_j ]

where L_j is lower triangular (lower trapezoidal if m < n), and Q_j is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(k) * ... * H_j(2) * H_j(1), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i) * v_j(i)'

where the last m-i elements of vector Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the (m-n)th subdiagonal (when m >= n) or the (n-m)th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the i - 1 elements of vector v_j(i) for i = 1,2,…,min(m,n).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gelq2()

rocblas_status rocsolver_zgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GELQ2 computes a LQ factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

A = [ L 0 ] * Q

where L is lower triangular (lower trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

Q = H(k) * H(k-1) * ... * H(1), with k = min(m,n)

Each Householder matrix H(i), for i = 1,2,…,k, is given by

H(i) = I - ipiv[i-1] * v(i)' * v(i)

where the first i-1 elements of the Householder vector v(i) are zero, and v(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and delow the diagonal contain the factor L; the elements above the diagonal are the n - i elements of vector v(i) for i = 1,2,…,min(m,n).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices H(i).

rocsolver_<type>gelq2_batched()

rocblas_status rocsolver_zgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GELQ2_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j = [ L_j 0 ] * Q_j

where L_j is lower triangular (lower trapezoidal if m > n), and Q_j is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(k) * H_j(k-1) * ... * H_j(1), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i)' * v_j(i)

where the first i-1 elements of Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the n - i elements of vector v_j(i) for i=1,2,…,min(m,n).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gelq2_strided_batched()

rocblas_status rocsolver_zgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GELQ2_STRIDED_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j = [ L_j 0 ] * Q_j

where L_j is lower triangular (lower trapezoidal if m > n), and Q_j is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(k) * H_j(k-1) * ... * H_j(1), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i)' * v_j(i)

where the first i-1 elements of vector Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the n - i elements of vector v_j(i) for i = 1,2,…,min(m,n).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gelqf()

rocblas_status rocsolver_zgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GELQF computes a LQ factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

A = [ L 0 ] * Q

where L is lower triangular (lower trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

Q = H(k) * H(k-1) * ... * H(1), with k = min(m,n)

Each Householder matrix H(i), for i = 1,2,…,k, is given by

H(i) = I - ipiv[i-1] * v(i)' * v(i)

where the first i-1 elements of the Householder vector v(i) are zero, and v(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and below the diagonal contain the factor L; the elements above the diagonal are the n - i elements of vector v(i) for i = 1,2,…,min(m,n).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices H(i).

rocsolver_<type>gelqf_batched()

rocblas_status rocsolver_zgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GELQF_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j = [ L_j 0 ] * Q_j

where L_j is lower triangular (lower trapezoidal if m > n), and Q_j is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(k) * H_j(k-1) * ... * H_j(1), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i)' * v_j(i)

where the first i-1 elements of Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the n - i elements of vector v_j(i) for i=1,2,…,min(m,n).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gelqf_strided_batched()

rocblas_status rocsolver_zgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GELQF_STRIDED_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix A_j in the batch has the form

A_j = [ L_j 0 ] * Q_j

where L_j is lower triangular (lower trapezoidal if m > n), and Q_j is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

Q_j = H_j(k) * H_j(k-1) * ... * H_j(1), with k = min(m,n)

Each Householder matrices H_j(i), for j = 1,2,…,batch_count, and i = 1,2,…,k, is given by

H_j(i) = I - ipiv_j[i-1] * v_j(i)' * v_j(i)

where the first i-1 elements of vector Householder vector v_j(i) are zero, and v_j(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the n - i elements of vector v_j(i) for i = 1,2,…,min(m,n).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of scalar factors of the Householder matrices H_j(i).

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

General Matrix Diagonalizations

rocsolver_<type>gebd2()

rocblas_status rocsolver_zgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tauq, rocblas_double_complex *taup)
rocblas_status rocsolver_cgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tauq, rocblas_float_complex *taup)
rocblas_status rocsolver_dgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *D, double *E, double *tauq, double *taup)
rocblas_status rocsolver_sgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *D, float *E, float *tauq, float *taup)

GEBD2 computes the bidiagonal form of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The bidiagonal form is given by:

B = Q' * A * P

where B is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q and P are orthogonal/unitary matrices represented as the product of Householder matrices

Q = H(1) * H(2) * ... *  H(n)  and P = G(1) * G(2) * ... * G(n-1), if m >= n, or
Q = H(1) * H(2) * ... * H(m-1) and P = G(1) * G(2) * ... *  G(m),  if m < n

Each Householder matrix H(i) and G(i) is given by

H(i) = I - tauq[i-1] * v(i) * v(i)', and
G(i) = I - taup[i-1] * u(i) * u(i)'

If m >= n, the first i-1 elements of the Householder vector v(i) are zero, and v(i)[i] = 1; while the first i elements of the Householder vector u(i) are zero, and u(i)[i+1] = 1. If m < n, the first i elements of the Householder vector v(i) are zero, and v(i)[i+1] = 1; while the first i-1 elements of the Householder vector u(i) are zero, and u(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B. If m >= n, the elements below the diagonal are the m - i elements of vector v(i) for i = 1,2,…,n, and the elements above the superdiagonal are the n - i - 1 elements of vector u(i) for i = 1,2,…,n-1. If m < n, the elements below the subdiagonal are the m - i - 1 elements of vector v(i) for i = 1,2,…,m-1, and the elements above the diagonal are the n - i elements of vector u(i) for i = 1,2,…,m.

  • [in] lda: rocblas_int. lda >= m.

    specifies the leading dimension of A.

  • [out] D: pointer to real type. Array on the GPU of dimension min(m,n).

    The diagonal elements of B.

  • [out] E: pointer to real type. Array on the GPU of dimension min(m,n)-1.

    The off-diagonal elements of B.

  • [out] tauq: pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices H(i).

  • [out] taup: pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices G(i).

rocsolver_<type>gebd2_batched()

rocblas_status rocsolver_zgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tauq, const rocblas_stride strideQ, rocblas_double_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tauq, const rocblas_stride strideQ, rocblas_float_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tauq, const rocblas_stride strideQ, double *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tauq, const rocblas_stride strideQ, float *taup, const rocblas_stride strideP, const rocblas_int batch_count)

GEBD2_BATCHED computes the bidiagonal form of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The bidiagonal form is given by:

B_j = Q_j' * A_j * P_j

where B_j is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q_j and P_j are orthogonal/unitary matrices represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... *  H_j(n)  and P_j = G_j(1) * G_j(2) * ... * G_j(n-1), if m >= n, or
Q_j = H_j(1) * H_j(2) * ... * H_j(m-1) and P_j = G_j(1) * G_j(2) * ... *  G_j(m),  if m < n

Each Householder matrix H_j(i) and G_j(i), for j = 1,2,…,batch_count, is given by

H_j(i) = I - tauq_j[i-1] * v_j(i) * v_j(i)', and
G_j(i) = I - taup_j[i-1] * u_j(i) * u_j(i)'

If m >= n, the first i-1 elements of the Householder vector v_j(i) are zero, and v_j(i)[i] = 1; while the first i elements of the Householder vector u_j(i) are zero, and u_j(i)[i+1] = 1. If m < n, the first i elements of the Householder vector v_j(i) are zero, and v_j(i)[i+1] = 1; while the first i-1 elements of the Householder vector u_j(i) are zero, and u_j(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B_j. If m >= n, the elements below the diagonal are the m - i elements of vector v_j(i) for i = 1,2,…,n, and the elements above the superdiagonal are the n - i - 1 elements of vector u_j(i) for i = 1,2,…,n-1. If m < n, the elements below the subdiagonal are the m - i - 1 elements of vector v_j(i) for i = 1,2,…,m-1, and the elements above the diagonal are the n - i elements of vector u_j(i) for i = 1,2,…,m.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of B_j.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j and the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= min(m,n).

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of B_j.

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j and the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • [out] tauq: pointer to type. Array on the GPU (the size depends on the value of strideQ).

    Contains the vectors tauq_j of scalar factors of the Householder matrices H_j(i).

  • [in] strideQ: rocblas_stride.

    Stride from the start of one vector tauq_j to the next one tauq_(j+1). There is no restriction for the value of strideQ. Normal use is strideQ >= min(m,n).

  • [out] taup: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors taup_j of scalar factors of the Householder matrices G_j(i).

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector taup_j to the next one taup_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gebd2_strided_batched()

rocblas_status rocsolver_zgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tauq, const rocblas_stride strideQ, rocblas_double_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tauq, const rocblas_stride strideQ, rocblas_float_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tauq, const rocblas_stride strideQ, double *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tauq, const rocblas_stride strideQ, float *taup, const rocblas_stride strideP, const rocblas_int batch_count)

GEBD2_STRIDED_BATCHED computes the bidiagonal form of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The bidiagonal form is given by:

B_j = Q_j' * A_j * P_j

where B_j is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q_j and P_j are orthogonal/unitary matrices represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... *  H_j(n)  and P_j = G_j(1) * G_j(2) * ... * G_j(n-1), if m >= n, or
Q_j = H_j(1) * H_j(2) * ... * H_j(m-1) and P_j = G_j(1) * G_j(2) * ... *  G_j(m),  if m < n

Each Householder matrix H_j(i) and G_j(i), for j = 1,2,…,batch_count, is given by

H_j(i) = I - tauq_j[i-1] * v_j(i) * v_j(i)', and
G_j(i) = I - taup_j[i-1] * u_j(i) * u_j(i)'

If m >= n, the first i-1 elements of the Householder vector v_j(i) are zero, and v_j(i)[i] = 1; while the first i elements of the Householder vector u_j(i) are zero, and u_j(i)[i+1] = 1. If m < n, the first i elements of the Householder vector v_j(i) are zero, and v_j(i)[i+1] = 1; while the first i-1 elements of the Householder vector u_j(i) are zero, and u_j(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B_j. If m >= n, the elements below the diagonal are the m - i elements of vector v_j(i) for i = 1,2,…,n, and the elements above the superdiagonal are the n - i - 1 elements of vector u_j(i) for i = 1,2,…,n-1. If m < n, the elements below the subdiagonal are the m - i - 1 elements of vector v_j(i) for i = 1,2,…,m-1, and the elements above the diagonal are the n - i elements of vector u_j(i) for i = 1,2,…,m.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of B_j.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j and the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= min(m,n).

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of B_j.

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j and the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • [out] tauq: pointer to type. Array on the GPU (the size depends on the value of strideQ).

    Contains the vectors tauq_j of scalar factors of the Householder matrices H_j(i).

  • [in] strideQ: rocblas_stride.

    Stride from the start of one vector tauq_j to the next one tauq_(j+1). There is no restriction for the value of strideQ. Normal use is strideQ >= min(m,n).

  • [out] taup: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors taup_j of scalar factors of the Householder matrices G_j(i).

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector taup_j to the next one taup_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gebrd()

rocblas_status rocsolver_zgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tauq, rocblas_double_complex *taup)
rocblas_status rocsolver_cgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tauq, rocblas_float_complex *taup)
rocblas_status rocsolver_dgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *D, double *E, double *tauq, double *taup)
rocblas_status rocsolver_sgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *D, float *E, float *tauq, float *taup)

GEBRD computes the bidiagonal form of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The bidiagonal form is given by:

B = Q' * A * P

where B is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q and P are orthogonal/unitary matrices represented as the product of Householder matrices

Q = H(1) * H(2) * ... *  H(n)  and P = G(1) * G(2) * ... * G(n-1), if m >= n, or
Q = H(1) * H(2) * ... * H(m-1) and P = G(1) * G(2) * ... *  G(m),  if m < n

Each Householder matrix H(i) and G(i) is given by

H(i) = I - tauq[i-1] * v(i) * v(i)', and
G(i) = I - taup[i-1] * u(i) * u(i)'

If m >= n, the first i-1 elements of the Householder vector v(i) are zero, and v(i)[i] = 1; while the first i elements of the Householder vector u(i) are zero, and u(i)[i+1] = 1. If m < n, the first i elements of the Householder vector v(i) are zero, and v(i)[i+1] = 1; while the first i-1 elements of the Householder vector u(i) are zero, and u(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B. If m >= n, the elements below the diagonal are the m - i elements of vector v(i) for i = 1,2,…,n, and the elements above the superdiagonal are the n - i - 1 elements of vector u(i) for i = 1,2,…,n-1. If m < n, the elements below the subdiagonal are the m - i - 1 elements of vector v(i) for i = 1,2,…,m-1, and the elements above the diagonal are the n - i elements of vector u(i) for i = 1,2,…,m.

  • [in] lda: rocblas_int. lda >= m.

    specifies the leading dimension of A.

  • [out] D: pointer to real type. Array on the GPU of dimension min(m,n).

    The diagonal elements of B.

  • [out] E: pointer to real type. Array on the GPU of dimension min(m,n)-1.

    The off-diagonal elements of B.

  • [out] tauq: pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices H(i).

  • [out] taup: pointer to type. Array on the GPU of dimension min(m,n).

    The scalar factors of the Householder matrices G(i).

rocsolver_<type>gebrd_batched()

rocblas_status rocsolver_zgebrd_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tauq, const rocblas_stride strideQ, rocblas_double_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgebrd_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tauq, const rocblas_stride strideQ, rocblas_float_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgebrd_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tauq, const rocblas_stride strideQ, double *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgebrd_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tauq, const rocblas_stride strideQ, float *taup, const rocblas_stride strideP, const rocblas_int batch_count)

GEBRD_BATCHED computes the bidiagonal form of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The bidiagonal form is given by:

B_j = Q_j' * A_j * P_j

where B_j is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q_j and P_j are orthogonal/unitary matrices represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... *  H_j(n)  and P_j = G_j(1) * G_j(2) * ... * G_j(n-1), if m >= n, or
Q_j = H_j(1) * H_j(2) * ... * H_j(m-1) and P_j = G_j(1) * G_j(2) * ... *  G_j(m),  if m < n

Each Householder matrix H_j(i) and G_j(i), for j = 1,2,…,batch_count, is given by

H_j(i) = I - tauq_j[i-1] * v_j(i) * v_j(i)', and
G_j(i) = I - taup_j[i-1] * u_j(i) * u_j(i)'

If m >= n, the first i-1 elements of the Householder vector v_j(i) are zero, and v_j(i)[i] = 1; while the first i elements of the Householder vector u_j(i) are zero, and u_j(i)[i+1] = 1. If m < n, the first i elements of the Householder vector v_j(i) are zero, and v_j(i)[i+1] = 1; while the first i-1 elements of the Householder vector u_j(i) are zero, and u_j(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B_j. If m >= n, the elements below the diagonal are the m - i elements of vector v_j(i) for i = 1,2,…,n, and the elements above the superdiagonal are the n - i - 1 elements of vector u_j(i) for i = 1,2,…,n-1. If m < n, the elements below the subdiagonal are the m - i - 1 elements of vector v_j(i) for i = 1,2,…,m-1, and the elements above the diagonal are the n - i elements of vector u_j(i) for i = 1,2,…,m.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of B_j.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j and the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= min(m,n).

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of B_j.

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j and the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • [out] tauq: pointer to type. Array on the GPU (the size depends on the value of strideQ).

    Contains the vectors tauq_j of scalar factors of the Householder matrices H_j(i).

  • [in] strideQ: rocblas_stride.

    Stride from the start of one vector tauq_j to the next one tauq_(j+1). There is no restriction for the value of strideQ. Normal use is strideQ >= min(m,n).

  • [out] taup: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors taup_j of scalar factors of the Householder matrices G_j(i).

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector taup_j to the next one taup_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gebrd_strided_batched()

rocblas_status rocsolver_zgebrd_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tauq, const rocblas_stride strideQ, rocblas_double_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgebrd_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tauq, const rocblas_stride strideQ, rocblas_float_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgebrd_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tauq, const rocblas_stride strideQ, double *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgebrd_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tauq, const rocblas_stride strideQ, float *taup, const rocblas_stride strideP, const rocblas_int batch_count)

GEBRD_STRIDED_BATCHED computes the bidiagonal form of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The bidiagonal form is given by:

B_j = Q_j' * A_j * P_j

where B_j is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q_j and P_j are orthogonal/unitary matrices represented as the product of Householder matrices

Q_j = H_j(1) * H_j(2) * ... *  H_j(n)  and P_j = G_j(1) * G_j(2) * ... * G_j(n-1), if m >= n, or
Q_j = H_j(1) * H_j(2) * ... * H_j(m-1) and P_j = G_j(1) * G_j(2) * ... *  G_j(m),  if m < n

Each Householder matrix H_j(i) and G_j(i), for j = 1,2,…,batch_count, is given by

H_j(i) = I - tauq_j[i-1] * v_j(i) * v_j(i)', and
G_j(i) = I - taup_j[i-1] * u_j(i) * u_j(i)'

If m >= n, the first i-1 elements of the Householder vector v_j(i) are zero, and v_j(i)[i] = 1; while the first i elements of the Householder vector u_j(i) are zero, and u_j(i)[i+1] = 1. If m < n, the first i elements of the Householder vector v_j(i) are zero, and v_j(i)[i+1] = 1; while the first i-1 elements of the Householder vector u_j(i) are zero, and u_j(i)[i] = 1.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B_j. If m >= n, the elements below the diagonal are the m - i elements of vector v_j(i) for i = 1,2,…,n, and the elements above the superdiagonal are the n - i - 1 elements of vector u_j(i) for i = 1,2,…,n-1. If m < n, the elements below the subdiagonal are the m - i - 1 elements of vector v_j(i) for i = 1,2,…,m-1, and the elements above the diagonal are the n - i elements of vector u_j(i) for i = 1,2,…,m.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of B_j.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j and the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= min(m,n).

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of B_j.

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j and the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • [out] tauq: pointer to type. Array on the GPU (the size depends on the value of strideQ).

    Contains the vectors tauq_j of scalar factors of the Householder matrices H_j(i).

  • [in] strideQ: rocblas_stride.

    Stride from the start of one vector tauq_j to the next one tauq_(j+1). There is no restriction for the value of strideQ. Normal use is strideQ >= min(m,n).

  • [out] taup: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors taup_j of scalar factors of the Householder matrices G_j(i).

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector taup_j to the next one taup_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

General Matrix Inversion

rocsolver_<type>getri()

rocblas_status rocsolver_zgetri(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_cgetri(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_dgetri(rocblas_handle handle, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_sgetri(rocblas_handle handle, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)

GETRI inverts a general n-by-n matrix A using the LU factorization computed by GETRF.

The inverse is computed by solving the linear system

inv(A) * L = inv(U)

where L is the lower triangular factor of A with unit diagonal elements, and U is the upper triangular factor.

Parameters
  • [in] handle: rocblas_handle.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the factors L and U of the factorization A = P*L*U returned by GETRF. On exit, the inverse of A if info = 0; otherwise undefined.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A.

  • [in] ipiv: pointer to rocblas_int. Array on the GPU of dimension n.

    The pivot indices returned by GETRF.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, U is singular. U(i,i) is the first zero pivot.

rocsolver_<type>getri_batched()

rocblas_status rocsolver_zgetri_batched(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetri_batched(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetri_batched(rocblas_handle handle, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetri_batched(rocblas_handle handle, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

GETRI_BATCHED inverts a batch of general n-by-n matrices using the LU factorization computed by GETRF_BATCHED.

The inverse is computed by solving the linear system

inv(A_j) * L_j = inv(U_j)

where L_j is the lower triangular factor of A_j with unit diagonal elements, and U_j is the upper triangular factor.

Parameters
  • [in] handle: rocblas_handle.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of all matrices A_j in the batch.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the factors L_j and U_j of the factorization A = P_j*L_j*U_j returned by GETRF_BATCHED. On exit, the inverses of A_j if info_j = 0; otherwise undefined.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • [in] ipiv: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    The pivot indices returned by GETRF_BATCHED.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(i+j). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_j = 0, successful exit for inversion of A_j. If info_j = i > 0, U_j is singular. U_j(i,i) is the first zero pivot.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getri_strided_batched()

rocblas_status rocsolver_zgetri_strided_batched(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetri_strided_batched(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetri_strided_batched(rocblas_handle handle, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetri_strided_batched(rocblas_handle handle, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

GETRI_STRIDED_BATCHED inverts a batch of general n-by-n matrices using the LU factorization computed by GETRF_STRIDED_BATCHED.

The inverse is computed by solving the linear system

inv(A_j) * L_j = inv(U_j)

where L_j is the lower triangular factor of A_j with unit diagonal elements, and U_j is the upper triangular factor.

Parameters
  • [in] handle: rocblas_handle.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of all matrices A_i in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the factors L_j and U_j of the factorization A_j = P_j*L_j*U_j returned by GETRF_STRIDED_BATCHED. On exit, the inverses of A_j if info_j = 0; otherwise undefined.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • [in] ipiv: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    The pivot indices returned by GETRF_STRIDED_BATCHED.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_j = 0, successful exit for inversion of A_j. If info_j = i > 0, U_j is singular. U_j(i,i) is the first zero pivot.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

General Systems Solvers

rocsolver_<type>getrs()

rocblas_status rocsolver_zgetrs(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, rocblas_double_complex *A, const rocblas_int lda, const rocblas_int *ipiv, rocblas_double_complex *B, const rocblas_int ldb)
rocblas_status rocsolver_cgetrs(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, rocblas_float_complex *A, const rocblas_int lda, const rocblas_int *ipiv, rocblas_float_complex *B, const rocblas_int ldb)
rocblas_status rocsolver_dgetrs(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, double *A, const rocblas_int lda, const rocblas_int *ipiv, double *B, const rocblas_int ldb)
rocblas_status rocsolver_sgetrs(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, float *A, const rocblas_int lda, const rocblas_int *ipiv, float *B, const rocblas_int ldb)

GETRS solves a system of n linear equations on n variables using the LU factorization computed by GETRF.

It solves one of the following systems:

A  * X = B (no transpose),
A' * X = B (transpose),  or
A* * X = B (conjugate transpose)

depending on the value of trans.

Parameters
  • [in] handle: rocblas_handle.

  • [in] trans: rocblas_operation.

    Specifies the form of the system of equations.

  • [in] n: rocblas_int. n >= 0.

    The order of the system, i.e. the number of columns and rows of A.

  • [in] nrhs: rocblas_int. nrhs >= 0.

    The number of right hand sides, i.e., the number of columns of the matrix B.

  • [in] A: pointer to type. Array on the GPU of dimension lda*n.

    The factors L and U of the factorization A = P*L*U returned by GETRF.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of A.

  • [in] ipiv: pointer to rocblas_int. Array on the GPU of dimension n.

    The pivot indices returned by GETRF.

  • [inout] B: pointer to type. Array on the GPU of dimension ldb*nrhs.

    On entry, the right hand side matrix B. On exit, the solution matrix X.

  • [in] ldb: rocblas_int. ldb >= n.

    The leading dimension of B.

rocsolver_<type>getrs_batched()

rocblas_status rocsolver_zgetrs_batched(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, rocblas_double_complex *const A[], const rocblas_int lda, const rocblas_int *ipiv, const rocblas_stride strideP, rocblas_double_complex *const B[], const rocblas_int ldb, const rocblas_int batch_count)
rocblas_status rocsolver_cgetrs_batched(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, rocblas_float_complex *const A[], const rocblas_int lda, const rocblas_int *ipiv, const rocblas_stride strideP, rocblas_float_complex *const B[], const rocblas_int ldb, const rocblas_int batch_count)
rocblas_status rocsolver_dgetrs_batched(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, double *const A[], const rocblas_int lda, const rocblas_int *ipiv, const rocblas_stride strideP, double *const B[], const rocblas_int ldb, const rocblas_int batch_count)
rocblas_status rocsolver_sgetrs_batched(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, float *const A[], const rocblas_int lda, const rocblas_int *ipiv, const rocblas_stride strideP, float *const B[], const rocblas_int ldb, const rocblas_int batch_count)

GETRS_BATCHED solves a batch of systems of n linear equations on n variables using the LU factorization computed by GETRF_BATCHED.

For each instance j in the batch, it solves one of the following systems:

A_j  * X_j = B_j (no transpose),
A_j' * X_j = B_j (transpose),  or
A_j* * X_j = B_j (conjugate transpose)

depending on the value of trans.

Parameters
  • [in] handle: rocblas_handle.

  • [in] trans: rocblas_operation.

    Specifies the form of the system of equations of each instance in the batch.

  • [in] n: rocblas_int. n >= 0.

    The order of the system, i.e. the number of columns and rows of all A_j matrices.

  • [in] nrhs: rocblas_int. nrhs >= 0.

    The number of right hand sides, i.e., the number of columns of all the matrices B_j.

  • [in] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    The factors L_j and U_j of the factorization A_j = P_j*L_j*U_j returned by GETRF_BATCHED.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of matrices A_j.

  • [in] ipiv: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of pivot indices returned by GETRF_BATCHED.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • [inout] B: Array of pointers to type. Each pointer points to an array on the GPU of dimension ldb*nrhs.

    On entry, the right hand side matrices B_j. On exit, the solution matrix X_j of each system in the batch.

  • [in] ldb: rocblas_int. ldb >= n.

    The leading dimension of matrices B_j.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of instances (systems) in the batch.

rocsolver_<type>getrs_strided_batched()

rocblas_status rocsolver_zgetrs_strided_batched(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, const rocblas_int *ipiv, const rocblas_stride strideP, rocblas_double_complex *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)
rocblas_status rocsolver_cgetrs_strided_batched(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, const rocblas_int *ipiv, const rocblas_stride strideP, rocblas_float_complex *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)
rocblas_status rocsolver_dgetrs_strided_batched(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, double *A, const rocblas_int lda, const rocblas_stride strideA, const rocblas_int *ipiv, const rocblas_stride strideP, double *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)
rocblas_status rocsolver_sgetrs_strided_batched(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, float *A, const rocblas_int lda, const rocblas_stride strideA, const rocblas_int *ipiv, const rocblas_stride strideP, float *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)

GETRS_STRIDED_BATCHED solves a batch of systems of n linear equations on n variables using the LU factorization computed by GETRF_STRIDED_BATCHED.

For each instance j in the batch, it solves one of the following systems:

A_j  * X_j = B_j (no transpose),
A_j' * X_j = B_j (transpose),  or
A_j* * X_j = B_j (conjugate transpose)

depending on the value of trans.

Parameters
  • [in] handle: rocblas_handle.

  • [in] trans: rocblas_operation.

    Specifies the form of the system of equations of each instance in the batch.

  • [in] n: rocblas_int. n >= 0.

    The order of the system, i.e. the number of columns and rows of all A_j matrices.

  • [in] nrhs: rocblas_int. nrhs >= 0.

    The number of right hand sides, i.e., the number of columns of all the matrices B_j.

  • [in] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    The factors L_j and U_j of the factorization A_j = P_j*L_j*U_j returned by GETRF_STRIDED_BATCHED.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j and the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [in] ipiv: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of pivot indices returned by GETRF_STRIDED_BATCHED.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • [inout] B: pointer to type. Array on the GPU (size depends on the value of strideB).

    On entry, the right hand side matrices B_j. On exit, the solution matrix X_j of each system in the batch.

  • [in] ldb: rocblas_int. ldb >= n.

    The leading dimension of matrices B_j.

  • [in] strideB: rocblas_stride.

    Stride from the start of one matrix B_j and the next one B_(j+1). There is no restriction for the value of strideB. Normal use case is strideB >= ldb*nrhs.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of instances (systems) in the batch.

General Matrix Singular Value Decomposition

rocsolver_<type>gesvd()

rocblas_status rocsolver_zgesvd(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *S, rocblas_double_complex *U, const rocblas_int ldu, rocblas_double_complex *V, const rocblas_int ldv, double *E, const rocblas_workmode fast_alg, rocblas_int *info)
rocblas_status rocsolver_cgesvd(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *S, rocblas_float_complex *U, const rocblas_int ldu, rocblas_float_complex *V, const rocblas_int ldv, float *E, const rocblas_workmode fast_alg, rocblas_int *info)
rocblas_status rocsolver_dgesvd(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *S, double *U, const rocblas_int ldu, double *V, const rocblas_int ldv, double *E, const rocblas_workmode fast_alg, rocblas_int *info)
rocblas_status rocsolver_sgesvd(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *S, float *U, const rocblas_int ldu, float *V, const rocblas_int ldv, float *E, const rocblas_workmode fast_alg, rocblas_int *info)

GESVD computes the Singular Values and optionally the Singular Vectors of a general m-by-n matrix A (Singular Value Decomposition).

The SVD of matrix A is given by:

A = U * S * V'

where the m-by-n matrix S is zero except, possibly, for its min(m,n) diagonal elements, which are the singular values of A. U and V are orthogonal (unitary) matrices. The first min(m,n) columns of U and V are the left and right singular vectors of A, respectively.

The computation of the singular vectors is optional and it is controlled by the function arguments left_svect and right_svect as described below. When computed, this function returns the tranpose (or transpose conjugate) of the right singular vectors, i.e. the rows of V’.

left_svect and right_svect are rocblas_svect enums that can take the following values:

  • rocblas_svect_all: the entire matrix U (or V’) is computed,

  • rocblas_svect_singular: only the singular vectors (first min(m,n) columns of U or rows of V’) are computed,

  • rocblas_svect_overwrite: the first columns (or rows) of A are overwritten with the singular vectors, or

  • rocblas_svect_none: no columns (or rows) of U (or V’) are computed, i.e. no singular vectors.

left_svect and right_svect cannot both be set to overwrite. When neither is set to overwrite, the contents of A are destroyed by the time the function returns.

Note

When m >> n (or n >> m) the algorithm could be sped up by compressing the matrix A via a QR (or LQ) factorization, and working with the triangular factor afterwards (thin-SVD). If the singular vectors are also requested, its computation could be sped up as well via executing some intermediate operations out-of-place, and relying more on matrix multiplications (GEMMs); this will require, however, a larger memory workspace. The parameter fast_alg controls whether the fast algorithm is executed or not. For more details see the sections “Tuning rocSOLVER performance” and “Memory model” on the User’s guide.

Parameters
  • [in] handle: rocblas_handle.

  • [in] left_svect: rocblas_svect.

    Specifies how the left singular vectors are computed.

  • [in] right_svect: rocblas_svect.

    Specifies how the right singular vectors are computed.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry the matrix A. On exit, if left_svect (or right_svect) is equal to overwrite, the first columns (or rows) contain the left (or right) singular vectors; otherwise, contents of A are destroyed.

  • [in] lda: rocblas_int. lda >= m.

    The leading dimension of A.

  • [out] S: pointer to real type. Array on the GPU of dimension min(m,n).

    The singular values of A in decreasing order.

  • [out] U: pointer to type. Array on the GPU of dimension ldu*min(m,n) if left_svect is set to singular, or ldu*m when left_svect is equal to all.

    The matrix of left singular vectors stored as columns. Not referenced if left_svect is set to overwrite or none.

  • [in] ldu: rocblas_int. ldu >= m if left_svect is all or singular; ldu >= 1 otherwise.

    The leading dimension of U.

  • [out] V: pointer to type. Array on the GPU of dimension ldv*n.

    The matrix of right singular vectors stored as rows (transposed / conjugate-tranposed). Not referenced if right_svect is set to overwrite or none.

  • [in] ldv: rocblas_int. ldv >= n if right_svect is all; ldv >= min(m,n) if right_svect is set to singular; or ldv >= 1 otherwise.

    The leading dimension of V.

  • [out] E: pointer to real type. Array on the GPU of dimension min(m,n)-1.

    This array is used to work internaly with the bidiagonal matrix B associated to A (using BDSQR). On exit, if info > 0, it contains the unconverged off-diagonal elements of B (or properly speaking, a bidiagonal matrix orthogonally equivalent to B). The diagonal elements of this matrix are in S; those that converged correspond to a subset of the singular values of A (not necessarily ordered).

  • [in] fast_alg: rocblas_workmode.

    If set to rocblas_outofplace, the function will execute the fast thin-SVD version of the algorithm when possible.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, BDSQR did not converge. i elements of E did not converge to zero.

rocsolver_<type>gesvd_batched()

rocblas_status rocsolver_zgesvd_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, double *S, const rocblas_stride strideS, rocblas_double_complex *U, const rocblas_int ldu, const rocblas_stride strideU, rocblas_double_complex *V, const rocblas_int ldv, const rocblas_stride strideV, double *E, const rocblas_stride strideE, const rocblas_workmode fast_alg, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgesvd_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, float *S, const rocblas_stride strideS, rocblas_float_complex *U, const rocblas_int ldu, const rocblas_stride strideU, rocblas_float_complex *V, const rocblas_int ldv, const rocblas_stride strideV, float *E, const rocblas_stride strideE, const rocblas_workmode fast_alg, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgesvd_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *S, const rocblas_stride strideS, double *U, const rocblas_int ldu, const rocblas_stride strideU, double *V, const rocblas_int ldv, const rocblas_stride strideV, double *E, const rocblas_stride strideE, const rocblas_workmode fast_alg, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgesvd_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *S, const rocblas_stride strideS, float *U, const rocblas_int ldu, const rocblas_stride strideU, float *V, const rocblas_int ldv, const rocblas_stride strideV, float *E, const rocblas_stride strideE, const rocblas_workmode fast_alg, rocblas_int *info, const rocblas_int batch_count)

GESVD_BATCHED computes the Singular Values and optionally the Singular Vectors of a batch of general m-by-n matrix A (Singular Value Decomposition).

The SVD of matrix A_j is given by:

A_j = U_j * S_j * V_j'

where the m-by-n matrix S_j is zero except, possibly, for its min(m,n) diagonal elements, which are the singular values of A_j. U_j and V_j are orthogonal (unitary) matrices. The first min(m,n) columns of U_j and V_j are the left and right singular vectors of A_j, respectively.

The computation of the singular vectors is optional and it is controlled by the function arguments left_svect and right_svect as described below. When computed, this function returns the tranpose (or transpose conjugate) of the right singular vectors, i.e. the rows of V_j’.

left_svect and right_svect are rocblas_svect enums that can take the following values:

  • rocblas_svect_all: the entire matrix U_j (or V_j’) is computed,

  • rocblas_svect_singular: only the singular vectors (first min(m,n) columns of U_j or rows of V_j’) are computed,

  • rocblas_svect_overwrite: the first columns (or rows) of A_j are overwritten with the singular vectors, or

  • rocblas_svect_none: no columns (or rows) of U_j (or V_j’) are computed, i.e. no singular vectors.

left_svect and right_svect cannot both be set to overwrite. When neither is set to overwrite, the contents of A_j are destroyed by the time the function returns.

Note

When m >> n (or n >> m) the algorithm could be sped up by compressing the matrix A_j via a QR (or LQ) factorization, and working with the triangular factor afterwards (thin-SVD). If the singular vectors are also requested, its computation could be sped up as well via executing some intermediate operations out-of-place, and relying more on matrix multiplications (GEMMs); this will require, however, a larger memory workspace. The parameter fast_alg controls whether the fast algorithm is executed or not. For more details see the sections “Tuning rocSOLVER performance” and “Memory model” on the User’s guide.

Parameters
  • [in] handle: rocblas_handle.

  • [in] left_svect: rocblas_svect.

    Specifies how the left singular vectors are computed.

  • [in] right_svect: rocblas_svect.

    Specifies how the right singular vectors are computed.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry the matrices A_j. On exit, if left_svect (or right_svect) is equal to overwrite, the first columns (or rows) of A_j contain the left (or right) corresponding singular vectors; otherwise, contents of A_j are destroyed.

  • [in] lda: rocblas_int. lda >= m.

    The leading dimension of A_j.

  • [out] S: pointer to real type. Array on the GPU (the size depends on the value of strideS).

    The singular values of A_j in decreasing order.

  • [in] strideS: rocblas_stride.

    Stride from the start of one vector S_j to the next one S_(j+1). There is no restriction for the value of strideS. Normal use case is strideS >= min(m,n).

  • [out] U: pointer to type. Array on the GPU (the side depends on the value of strideU).

    The matrices U_j of left singular vectors stored as columns. Not referenced if left_svect is set to overwrite or none.

  • [in] ldu: rocblas_int. ldu >= m if left_svect is all or singular; ldu >= 1 otherwise.

    The leading dimension of U_j.

  • [in] strideU: rocblas_stride.

    Stride from the start of one matrix U_j to the next one U_(j+1). There is no restriction for the value of strideU. Normal use case is strideU >= ldu*min(m,n) if left_svect is set to singular, or strideU >= ldu*m when left_svect is equal to all.

  • [out] V: pointer to type. Array on the GPU (the size depends on the value of strideV).

    The matrices V_j of right singular vectors stored as rows (transposed / conjugate-tranposed). Not referenced if right_svect is set to overwrite or none.

  • [in] ldv: rocblas_int. ldv >= n if right_svect is all; ldv >= min(m,n) if right_svect is set to singular; or ldv >= 1 otherwise.

    The leading dimension of V.

  • [in] strideV: rocblas_stride.

    Stride from the start of one matrix V_j to the next one V_(j+1). There is no restriction for the value of strideV. Normal use case is strideV >= ldv*n.

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internaly with the bidiagonal matrix B_j associated to A_j (using BDSQR). On exit, if info > 0, it contains the unconverged off-diagonal elements of B_j (or properly speaking, a bidiagonal matrix orthogonally equivalent to B_j). The diagonal elements of this matrix are in S_j; those that converged correspond to a subset of the singular values of A_j (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • [in] fast_alg: rocblas_workmode.

    If set to rocblas_outofplace, the function will execute the fast thin-SVD version of the algorithm when possible.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, BDSQR did not converge. i elements of E did not converge to zero.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gesvd_strided_batched()

rocblas_status rocsolver_zgesvd_strided_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, double *S, const rocblas_stride strideS, rocblas_double_complex *U, const rocblas_int ldu, const rocblas_stride strideU, rocblas_double_complex *V, const rocblas_int ldv, const rocblas_stride strideV, double *E, const rocblas_stride strideE, const rocblas_workmode fast_alg, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgesvd_strided_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, float *S, const rocblas_stride strideS, rocblas_float_complex *U, const rocblas_int ldu, const rocblas_stride strideU, rocblas_float_complex *V, const rocblas_int ldv, const rocblas_stride strideV, float *E, const rocblas_stride strideE, const rocblas_workmode fast_alg, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgesvd_strided_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *S, const rocblas_stride strideS, double *U, const rocblas_int ldu, const rocblas_stride strideU, double *V, const rocblas_int ldv, const rocblas_stride strideV, double *E, const rocblas_stride strideE, const rocblas_workmode fast_alg, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgesvd_strided_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *S, const rocblas_stride strideS, float *U, const rocblas_int ldu, const rocblas_stride strideU, float *V, const rocblas_int ldv, const rocblas_stride strideV, float *E, const rocblas_stride strideE, const rocblas_workmode fast_alg, rocblas_int *info, const rocblas_int batch_count)

GESVD_STRIDED_BATCHED computes the Singular Values and optionally the Singular Vectors of a batch of general m-by-n matrix A (Singular Value Decomposition).

The SVD of matrix A_j is given by:

A_j = U_j * S_j * V_j'

where the m-by-n matrix S_j is zero except, possibly, for its min(m,n) diagonal elements, which are the singular values of A_j. U_j and V_j are orthogonal (unitary) matrices. The first min(m,n) columns of U_j and V_j are the left and right singular vectors of A_j, respectively.

The computation of the singular vectors is optional and it is controlled by the function arguments left_svect and right_svect as described below. When computed, this function returns the tranpose (or transpose conjugate) of the right singular vectors, i.e. the rows of V_j’.

left_svect and right_svect are rocblas_svect enums that can take the following values:

  • rocblas_svect_all: the entire matrix U_j (or V_j’) is computed,

  • rocblas_svect_singular: only the singular vectors (first min(m,n) columns of U_j or rows of V_j’) are computed,

  • rocblas_svect_overwrite: the first columns (or rows) of A_j are overwritten with the singular vectors, or

  • rocblas_svect_none: no columns (or rows) of U_j (or V_j’) are computed, i.e. no singular vectors.

left_svect and right_svect cannot both be set to overwrite. When neither is set to overwrite, the contents of A_j are destroyed by the time the function returns.

Note

When m >> n (or n >> m) the algorithm could be sped up by compressing the matrix A_j via a QR (or LQ) factorization, and working with the triangular factor afterwards (thin-SVD). If the singular vectors are also requested, its computation could be sped up as well via executing some intermediate operations out-of-place, and relying more on matrix multiplications (GEMMs); this will require, however, a larger memory workspace. The parameter fast_alg controls whether the fast algorithm is executed or not. For more details see the sections “Tuning rocSOLVER performance” and “Memory model” on the User’s guide.

Parameters
  • [in] handle: rocblas_handle.

  • [in] left_svect: rocblas_svect.

    Specifies how the left singular vectors are computed.

  • [in] right_svect: rocblas_svect.

    Specifies how the right singular vectors are computed.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry the matrices A_j. On exit, if left_svect (or right_svect) is equal to overwrite, the first columns (or rows) of A_j contain the left (or right) corresponding singular vectors; otherwise, contents of A_j are destroyed.

  • [in] lda: rocblas_int. lda >= m.

    The leading dimension of A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] S: pointer to real type. Array on the GPU (the size depends on the value of strideS).

    The singular values of A_j in decreasing order.

  • [in] strideS: rocblas_stride.

    Stride from the start of one vector S_j to the next one S_(j+1). There is no restriction for the value of strideS. Normal use case is strideS >= min(m,n).

  • [out] U: pointer to type. Array on the GPU (the side depends on the value of strideU).

    The matrices U_j of left singular vectors stored as columns. Not referenced if left_svect is set to overwrite or none.

  • [in] ldu: rocblas_int. ldu >= m if left_svect is all or singular; ldu >= 1 otherwise.

    The leading dimension of U_j.

  • [in] strideU: rocblas_stride.

    Stride from the start of one matrix U_j to the next one U_(j+1). There is no restriction for the value of strideU. Normal use case is strideU >= ldu*min(m,n) if left_svect is set to singular, or strideU >= ldu*m when left_svect is equal to all.

  • [out] V: pointer to type. Array on the GPU (the size depends on the value of strideV).

    The matrices V_j of right singular vectors stored as rows (transposed / conjugate-tranposed). Not referenced if right_svect is set to overwrite or none.

  • [in] ldv: rocblas_int. ldv >= n if right_svect is all; ldv >= min(m,n) if right_svect is set to singular; or ldv >= 1 otherwise.

    The leading dimension of V.

  • [in] strideV: rocblas_stride.

    Stride from the start of one matrix V_j to the next one V_(j+1). There is no restriction for the value of strideV. Normal use case is strideV >= ldv*n.

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internaly with the bidiagonal matrix B_j associated to A_j (using BDSQR). On exit, if info > 0, it contains the unconverged off-diagonal elements of B_j (or properly speaking, a bidiagonal matrix orthogonally equivalent to B_j). The diagonal elements of this matrix are in S_j; those that converged correspond to a subset of the singular values of A_j (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • [in] fast_alg: rocblas_workmode.

    If set to rocblas_outofplace, the function will execute the fast thin-SVD version of the algorithm when possible.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, BDSQR did not converge. i elements of E did not converge to zero.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

Lapack-like Functions

Other Lapack-like routines provided by rocSOLVER.

General Matrix Factorizations

rocsolver_<type>getf2_npvt()

rocblas_status rocsolver_zgetf2_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_cgetf2_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_dgetf2_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_sgetf2_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)

GETF2_NPVT computes the LU factorization of a general m-by-n matrix A without partial pivoting.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details see the section “tuning rocSOLVER performance” on the User’s guide).

The factorization has the form

A = L * U

where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

Note: Although this routine can offer better performance, Gaussian elimination without pivoting is not backward stable. If numerical accuracy is compromised, use the legacy-LAPACK-like API GETF2 routines instead.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, U is singular. U(i,i) is the first zero element in the diagonal. The factorization from this point might be incomplete.

rocsolver_<type>getf2_npvt_batched()

rocblas_status rocsolver_zgetf2_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetf2_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetf2_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetf2_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)

GETF2_NPVT_BATCHED computes the LU factorization of a batch of general m-by-n matrices without partial pivoting.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details see the section “tuning rocSOLVER performance” on the User’s guide).

The factorization of matrix A_i in the batch has the form

A_i = L_i * U_i

where L_i is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U_i is upper triangular (upper trapezoidal if m < n).

Note: Although this routine can offer better performance, Gaussian elimination without pivoting is not backward stable. If numerical accuracy is compromised, use the legacy-LAPACK-like API GETF2 routines instead.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all matrices A_i in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all matrices A_i in the batch.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_i to be factored. On exit, the factors L_i and U_i from the factorizations. The unit diagonal elements of L_i are not stored.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_i.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful exit for factorization of A_i. If info_i = j > 0, U_i is singular. U_i(j,j) is the first zero element in the diagonal. The factorization from this point might be incomplete.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getf2_npvt_strided_batched()

rocblas_status rocsolver_zgetf2_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetf2_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetf2_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetf2_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)

GETF2_NPVT_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices without partial pivoting.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details see the section “tuning rocSOLVER performance” on the User’s guide).

The factorization of matrix A_i in the batch has the form

A_i = L_i * U_i

where L_i is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U_i is upper triangular (upper trapezoidal if m < n).

Note: Although this routine can offer better performance, Gaussian elimination without pivoting is not backward stable. If numerical accuracy is compromised, use the legacy-LAPACK-like API GETF2 routines instead.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all matrices A_i in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all matrices A_i in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_i to be factored. On exit, the factors L_i and U_i from the factorization. The unit diagonal elements of L_i are not stored.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i and the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful exit for factorization of A_i. If info_i = j > 0, U_i is singular. U_i(j,j) is the first zero element in the diagonal. The factorization from this point might be incomplete.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getrf_npvt()

rocblas_status rocsolver_zgetrf_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_cgetrf_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_dgetrf_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_sgetrf_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)

GETRF_NPVT computes the LU factorization of a general m-by-n matrix A without partial pivoting.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details see the section “tuning rocSOLVER performance” on the User’s guide).

The factorization has the form

A = L * U

where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

Note: Although this routine can offer better performance, Gaussian elimination without pivoting is not backward stable. If numerical accuracy is compromised, use the legacy-LAPACK-like API GETRF routines instead.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, U is singular. U(i,i) is the first zero element in the diagonal. The factorization from this point might be incomplete.

rocsolver_<type>getrf_npvt_batched()

rocblas_status rocsolver_zgetrf_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetrf_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetrf_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetrf_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)

GETRF_NPVT_BATCHED computes the LU factorization of a batch of general m-by-n matrices without partial pivoting.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details see the section “tuning rocSOLVER performance” on the User’s guide).

The factorization of matrix A_i in the batch has the form

A_i = L_i * U_i

where L_i is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U_i is upper triangular (upper trapezoidal if m < n).

Note: Although this routine can offer better performance, Gaussian elimination without pivoting is not backward stable. If numerical accuracy is compromised, use the legacy-LAPACK-like API GETRF routines instead.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all matrices A_i in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all matrices A_i in the batch.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_i to be factored. On exit, the factors L_i and U_i from the factorizations. The unit diagonal elements of L_i are not stored.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_i.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful exit for factorization of A_i. If info_i = j > 0, U_i is singular. U_i(j,j) is the first zero element in the diagonal. The factorization from this point might be incomplete.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getrf_npvt_strided_batched()

rocblas_status rocsolver_zgetrf_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetrf_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetrf_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetrf_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)

GETRF_NPVT_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices without partial pivoting.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details see the section “tuning rocSOLVER performance” on the User’s guide).

The factorization of matrix A_i in the batch has the form

A_i = L_i * U_i

where L_i is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U_i is upper triangular (upper trapezoidal if m < n).

Note: Although this routine can offer better performance, Gaussian elimination without pivoting is not backward stable. If numerical accuracy is compromised, use the legacy-LAPACK-like API GETRF routines instead.

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all matrices A_i in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all matrices A_i in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_i to be factored. On exit, the factors L_i and U_i from the factorization. The unit diagonal elements of L_i are not stored.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i and the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info_i = 0, successful exit for factorization of A_i. If info_i = j > 0, U_i is singular. U_i(j,j) is the first zero element in the diagonal. The factorization from this point might be incomplete.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

Auxiliaries

Auxiliary Functions

rocSOLVER auxiliary functions are aliases of rocBLAS auxiliary functions. See the rocBLAS auxiliary functions.

rocsolver_create_handle()

rocsolver_status rocsolver_create_handle(rocsolver_handle *handle)

Creates a handle and sets the pointer mode to rocblas_pointer_mode_device.

Deprecated since version 3.5: Use rocblas_create_handle().

rocsolver_destroy_handle()

rocsolver_status rocsolver_destroy_handle(rocsolver_handle handle)

Deprecated since version 3.5: Use rocblas_destroy_handle().

rocsolver_set_stream()

rocsolver_status rocsolver_set_stream(rocsolver_handle handle, hipStream_t stream)

Deprecated since version 3.5: Use rocblas_set_stream().

rocsolver_get_stream()

rocsolver_status rocsolver_get_stream(rocsolver_handle handle, hipStream_t *stream)

Deprecated since version 3.5: Use rocblas_get_stream().

rocsolver_set_vector()

rocsolver_status rocsolver_set_vector(rocsolver_int n, rocsolver_int elem_size, const void *x, rocsolver_int incx, void *y, rocsolver_int incy)

Deprecated since version 3.5: Use rocblas_set_vector().

rocsolver_get_vector()

rocsolver_status rocsolver_get_vector(rocsolver_int n, rocsolver_int elem_size, const void *x, rocsolver_int incx, void *y, rocsolver_int incy)

Deprecated since version 3.5: Use rocblas_get_vector().

rocsolver_set_matrix()

rocsolver_status rocsolver_set_matrix(rocsolver_int rows, rocsolver_int cols, rocsolver_int elem_size, const void *a, rocsolver_int lda, void *b, rocsolver_int ldb)

Deprecated since version 3.5: Use rocblas_set_matrix().

rocsolver_get_matrix()

rocsolver_status rocsolver_get_matrix(rocsolver_int rows, rocsolver_int cols, rocsolver_int elem_size, const void *a, rocsolver_int lda, void *b, rocsolver_int ldb)

Deprecated since version 3.5: Use rocblas_get_matrix().