...
| Code Block |
|---|
[root@cld-dfa-gpu-04]# yum -y group install "Development Tools" [root@cld-dfa-gpu-04]# git clone https://github.com/NVIDIA/cuda-samples.git [root@cld-dfa-gpu-04]# yum -y group install "Development Tools" [root@cld-dfa-gpu-04]# cd cuda-samples/Samples/0_Introduction/simpleP2P; make; cd [root@cld-dfa-gpu-04]# cd cuda-samples/Samples/5_Domain_Specific/p2pBandwidthLatencyTest; make; cd |
...
| Code Block |
|---|
[root@cld-dfa-gpu-04]# cuda-samples/Samples/5_Domain_Specific/p2pBandwidthLatencyTest/p2pBandwidthLatencyTest
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, NVIDIA A2, pciBusID: 17, pciDeviceID: 0, pciDomainID:0
Device: 1, NVIDIA A2, pciBusID: ca, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=1 CAN Access Peer Device=0
***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.
P2P Connectivity Matrix
D\D 0 1
0 1 1
1 1 1
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 149.61 11.70
1 11.66 164.30
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
D\D 0 1
0 149.58 11.36
1 11.36 164.34
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 156.74 16.32
1 16.31 164.48
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1
0 156.74 20.86
1 20.86 164.48
P2P=Disabled Latency Matrix (us)
GPU 0 1
0 1.44 10.70
1 12.97 1.34
CPU 0 1
0 2.72 6.56
1 6.55 2.60
P2P=Enabled Latency (P2P Writes) Matrix (us)
GPU 0 1
0 1.43 1.20
1 1.22 1.33
CPU 0 1
0 2.76 2.01
1 1.96 2.64
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled. |
...