Bnechmarks GPU
Matrix mulitply provided with CUDA SDK
Using square matrix (NxN)
| N | K20 time_ms (GFlop/s) | k40 time_ms (GFlop/s) |
|---|---|---|
| 1000 | 8.349 ( 239.55) | 5.915 (338.15) |
| 2000 | 64.629 (247.57) | 45.403 (352.40) |
| 4000 | 491.310 (260.53) | 348.723 (367.05) |
| 6000 | 1746.502 (247.35) | 1229.593 (351.34) |
CUBLAS (DGEMM)
Peak performance
| K20 | K40 |
|---|---|
| 1040 Gflops/s | 1210 Glops/s |