stream
実行結果のまとめ
実行結果の詳細
Test1-[1-3] : Xeon Phi 7250 - 2017年1月7日
COMPILE
icc -DSTREAM_ARRAY_SIZE=100000000 stream.c -o stream-100m -mcmodel medium -qopenmp -O
ENVIRONMENT
HPC-ProServer SM-5038K-i
CPU : Xeon Phi 7250
Mem : 48GB ( 6 x DDR4-2400 ECC REG 8GB )
OS : CentOS 7.2.1511
kernel : 3.10.0-327.36.3.el7.xppsl_1.4.3.3482.x86_64
icc version 16.0.0
RESULT
# ./stream-100m
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 100000000 (elements), Offset = 0 (elements)
Memory per array = 762.9 MiB (= 0.7 GiB).
Total memory required = 2288.8 MiB (= 2.2 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 272
Number of Threads counted = 272
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 5046 microseconds.
(= 5046 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 153669.2 0.010444 0.010412 0.010470
Scale: 159172.8 0.010187 0.010052 0.010908
Add: 216918.7 0.011089 0.011064 0.011118
Triad: 203425.9 0.011831 0.011798 0.011863
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
[root@localhost work]# ./stream-100m
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 100000000 (elements), Offset = 0 (elements)
Memory per array = 762.9 MiB (= 0.7 GiB).
Total memory required = 2288.8 MiB (= 2.2 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 68
Number of Threads counted = 68
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 4652 microseconds.
(= 4652 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 207036.7 0.007761 0.007728 0.007789
Scale: 37597.5 0.043476 0.042556 0.044371
Add: 263620.0 0.009129 0.009104 0.009162
Triad: 56177.2 0.043878 0.042722 0.044800
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
※参考
CPU : Intel Xeon E2690 v4 2.6GHz x 2個
Mem : 128GB (8 x DDR4-2400 ECC REG 16GB)
OS : CentOS 7.2.1511
kernel : 3.10.0-327.36.3.el7.xppsl_1.4.3.3482.x86_64
icc version 16.0.0
# ./stream-100m
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 100000000 (elements), Offset = 0 (elements)
Memory per array = 762.9 MiB (= 0.7 GiB).
Total memory required = 2288.8 MiB (= 2.2 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 28
Number of Threads counted = 28
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 14270 microseconds.
(= 14270 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 111660.1 0.014356 0.014329 0.014463
Scale: 110360.1 0.014553 0.014498 0.014644
Add: 118559.9 0.020286 0.020243 0.020330
Triad: 118624.2 0.020291 0.020232 0.020342
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------