6533b851fe1ef96bd12a99ef
RESEARCH PRODUCT
An Scalable matrix computing unit architecture for FPGA and SCUMO user design interface
Manuel Bataller-mompeánTaras IakymchukAlfredo Rosado-muñozAsgar AbbaszadehJose V. Frances-villorasubject
Computer Networks and CommunicationsComputer scienceMathematicsofComputing_NUMERICALANALYSISSistemes informàticslcsh:TK7800-836002 engineering and technologyScalar multiplicationComputational scienceMatrix (mathematics)matrix-computing unitTranspose0202 electrical engineering electronic engineering information engineeringmatrix processorElectrical and Electronic EngineeringCirculant matrixcirculant matricesFPGA020208 electrical & electronic engineeringlcsh:ElectronicsDot productMatrix multiplicationArquitectura d'ordinadorsHardware and ArchitectureControl and Systems Engineeringmatrix arithmeticSignal Processing020201 artificial intelligence & image processingMultiplicationhardware implementationdescription
High dimensional matrix algebra is essential in numerous signal processing and machine learning algorithms. This work describes a scalable square matrix-computing unit designed on the basis of circulant matrices. It optimizes data flow for the computation of any sequence of matrix operations removing the need for data movement for intermediate results, together with the individual matrix operations’ performance in direct or transposed form (the transpose matrix operation only requires a data addressing modification). The allowed matrix operations are: matrix-by-matrix addition, subtraction, dot product and multiplication, matrix-by-vector multiplication, and matrix by scalar multiplication. The proposed architecture is fully scalable with the maximum matrix dimension limited by the available resources. In addition, a design environment is also developed, permitting assistance, through a friendly interface, from the customization of the hardware computing unit to the generation of the final synthesizable IP core. For N × N matrices, the architecture requires N ALU-RAM blocks and performs O ( N 2 ) , requiring N 2 + 7 and N + 7 clock cycles for matrix-matrix and matrix-vector operations, respectively. For the tested Virtex7 FPGA device, the computation for 500 × 500 matrices allows a maximum clock frequency of 346 MHz, achieving an overall performance of 173 GOPS. This architecture shows higher performance than other state-of-the-art matrix computing units.
| year | journal | country | edition | language |
|---|---|---|---|---|
| 2019-01-15 |