Bnechmarks GPU
Matrix mulitply provided with CUDA SDK
Using square matrix (NxN)
N | K20 time_ms (GFlop/s) | k40 time_ms (GFlop/s) |
---|---|---|
1000 | 8.349 ( 239.55) | 5.915 (338.15) |
2000 | 64.629 (247.57) | 45.403 (352.40) |
4000 | 491.310 (260.53) | 348.723 (367.05) |
6000 | 1746.502 (247.35) | 1229.593 (351.34) |
CUBLAS (DGEMM)
Peak performance
K20 | K40 |
---|---|
1040 Gflops/s | 1210 Glops/s |