stream

CPUキャッシュの影響を排除し、メモリ帯域の実効性能を評価します。

stream 公式サイト : https://www.cs.virginia.edu/stream/

​実行結果のまとめ
​実行​結果の詳細

Test1-[1-3] : Xeon Phi 7250 - 2017年1月7日

COMPILE

icc -DSTREAM_ARRAY_SIZE=100000000 stream.c -o stream-100m -mcmodel medium -qopenmp -O

ENVIRONMENT

HPC-ProServer SM-5038K-i

CPU : Xeon Phi 7250

Mem : 48GB ( 6 x DDR4-2400 ECC REG 8GB )

OS : CentOS 7.2.1511

kernel : 3.10.0-327.36.3.el7.xppsl_1.4.3.3482.x86_64

icc version 16.0.0

RESULT

# ./stream-100m
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 100000000 (elements), Offset = 0 (elements)
Memory per array = 762.9 MiB (= 0.7 GiB).
Total memory required = 2288.8 MiB (= 2.2 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 272
Number of Threads counted = 272

-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 5046 microseconds.
   (= 5046 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:          153669.2     0.010444     0.010412     0.010470
Scale:         159172.8     0.010187     0.010052     0.010908
Add:           216918.7     0.011089     0.011064     0.011118
Triad:         203425.9     0.011831     0.011798     0.011863

-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

 

[root@localhost work]# ./stream-100m
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 100000000 (elements), Offset = 0 (elements)
Memory per array = 762.9 MiB (= 0.7 GiB).
Total memory required = 2288.8 MiB (= 2.2 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 68
Number of Threads counted = 68

-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 4652 microseconds.
   (= 4652 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:          207036.7     0.007761     0.007728     0.007789
Scale:          37597.5     0.043476     0.042556     0.044371
Add:           263620.0     0.009129     0.009104     0.009162
Triad:          56177.2     0.043878     0.042722     0.044800

-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
 

※参考

CPU : Intel Xeon E2690 v4 2.6GHz x 2個

Mem : 128GB (8 x DDR4-2400 ECC REG 16GB)

OS : CentOS 7.2.1511

kernel : 3.10.0-327.36.3.el7.xppsl_1.4.3.3482.x86_64

icc version 16.0.0

 

# ./stream-100m
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 100000000 (elements), Offset = 0 (elements)
Memory per array = 762.9 MiB (= 0.7 GiB).
Total memory required = 2288.8 MiB (= 2.2 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 28
Number of Threads counted = 28

-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 14270 microseconds.
   (= 14270 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:          111660.1     0.014356     0.014329     0.014463
Scale:         110360.1     0.014553     0.014498     0.014644
Add:           118559.9     0.020286     0.020243     0.020330
Triad:         118624.2     0.020291     0.020232     0.020342

-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
 

Please reload

© 2006-2019 HPC Technologies Co., Ltd. All rights reserved.