senkumartup · April 30, 2018 08:15 · senkumartup · Apr 30, 2018 · senkumartup · May 10, 2018
diff --git a/TF_GPU_GTX_1060_issue b/TF_GPU_GTX_1060_issue
 Tensorflow GPU issue on GeForce GTX 1060 (6GB)

 Note: CUDA and CuDNN got installed automatically when keras-gpu
 conda install -c anaconda keras-gpu

 Summary
 keras-gpu - 2.1.5
 tensorflow-gpu - 1.7.0
 dcml@dcml-MS-7B61:~$ nvcc -V
 Cuda compilation tools, release 7.5, V7.5.17
 libcudnn.so.7 -> libcudnn.so.7.1.3
 libcuda.so.1 -> libcuda.so.384.111
 libcudart.so.7.5 -> libcudart.so.7.5.18
 libcudart.so.7.5 -> libcudart.so.7.5.18



 1 of 6) System and GPU Info

 dcml@dcml-MS-7B61:~$ uname -a
 Linux dcml-MS-7B61 4.13.0-39-generic #44~16.04.1-Ubuntu SMP Thu Apr 5 16:43:10 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

 dcml@dcml-MS-7B61:~$ nvidia-smi
 Mon Apr 30 13:11:55 2018
 +-----------------------------------------------------------------------------+
 | NVIDIA-SMI 384.111                Driver Version: 384.111                   |
 |-------------------------------+----------------------+----------------------+
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |===============================+======================+======================|
 |   0  GeForce GTX 1060..  Off  | 00000000:01:00.0  On |                  N/A |
 | 42%   46C    P8     7W / 120W |    219MiB /  6071MiB |      0%      Default |
 +-------------------------------+----------------------+----------------------+

 +-----------------------------------------------------------------------------+
 | Processes:                                                       GPU Memory |
 |  GPU       PID   Type   Process name                             Usage      |
 |=============================================================================|
 |    0      1067      G   /usr/lib/xorg/Xorg                           177MiB |
 |    0      1516      G   compiz                                        39MiB |
 +-----------------------------------------------------------------------------+


 2 of 6) Package list
 Using anaconda2 for Python 2.7
 dcml@dcml-MS-7B61:~$ /opt/anaconda2/bin/conda list
 # packages in environment at /opt/anaconda2:
 #
 # Name                    Version                   Build  Channel
 ...
 cudatoolkit               9.0                  h13b8566_0
 cudnn                     7.1.2                 cuda9.0_0
 keras-gpu                 2.1.5                    py27_0
 libprotobuf               3.5.2                h6f1eeef_0
 numpy                     1.14.2           py27hdbf6ddf_1
 tensorboard               1.7.0            py27hf484d3e_0
 tensorflow-gpu            1.7.0                         0
 tensorflow-gpu-base       1.7.0            py27h8a131e3_0
 ...


 3 of 6) Tensorflow GPU is using using
 conda install -c anaconda keras-gpu
 https://anaconda.org/anaconda/keras-gpu


 4 of 6) CUDA and CuDNN
 dcml@dcml-MS-7B61:~$ nvcc -V
 nvcc: NVIDIA (R) Cuda compiler driver
 Copyright (c) 2005-2015 NVIDIA Corporation
 Built on Tue_Aug_11_14:27:32_CDT_2015
 Cuda compilation tools, release 7.5, V7.5.17

 dcml@dcml-MS-7B61:~$ function lib_installed() { /sbin/ldconfig -N -v $(sed 's/:/ /' <<< $LD_LIBRARY_PATH) 2>/dev/null | grep $1; }
 dcml@dcml-MS-7B61:~$ function check() { lib_installed $1 && echo "$1 is installed" || echo "ERROR: $1 is NOT installed"; }
 dcml@dcml-MS-7B61:~$ check libcudnn
 	libcudnn.so.7 -> libcudnn.so.7.1.3
 libcudnn is installed

 dcml@dcml-MS-7B61:~$ function lib_installed() { /sbin/ldconfig -N -v $(sed 's/:/ /' <<< $LD_LIBRARY_PATH) 2>/dev/null | grep $1; }
 dcml@dcml-MS-7B61:~$ function check() { lib_installed $1 && echo "$1 is installed" || echo "ERROR: $1 is NOT installed"; }
 dcml@dcml-MS-7B61:~$ check libcuda
 	libcuda.so.1 -> libcuda.so.384.111
 	libcudart.so.7.5 -> libcudart.so.7.5.18
 libcuda is installed

 dcml@dcml-MS-7B61:~$ check libcudart
 	libcudart.so.7.5 -> libcudart.so.7.5.18
 libcudart is installed


 5 of 6) Error when running MNIST 1st_DNN
 bit.ly/2IBqQJD

 dcml@dcml-MS-7B61:~/workspace/test/mnist$ /opt/anaconda2/bin/python 1st_DNN.py
 /opt/anaconda2/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
 Using TensorFlow backend.
 (60000, 28, 28)
 1st_DNN.py:87: UserWarning: Update your `Conv2D` call to the Keras 2 API: `Conv2D(32, (3, 3), activation="relu", input_shape=(28, 28, 1...)`
  model.add(Convolution2D(32, 3, 3, activation='relu', input_shape=(28,28,1)))
 _________________________________________________________________
 Layer (type)                 Output Shape              Param #
 =================================================================
 conv2d_1 (Conv2D)            (None, 26, 26, 32)        320
 _________________________________________________________________
 conv2d_2 (Conv2D)            (None, 26, 26, 10)        330
 _________________________________________________________________
 conv2d_3 (Conv2D)            (None, 1, 1, 10)          67610
 _________________________________________________________________
 flatten_1 (Flatten)          (None, 10)                0
 _________________________________________________________________
 activation_1 (Activation)    (None, 10)                0
 =================================================================
 Total params: 68,260
 Trainable params: 68,260
 Non-trainable params: 0
 _________________________________________________________________
 /opt/anaconda2/lib/python2.7/site-packages/keras/models.py:942: UserWarning: The `nb_epoch` argument in `fit` has been renamed `epochs`.
  warnings.warn('The `nb_epoch` argument in `fit` '
 Epoch 1/10
 2018-04-30 13:29:03.725230: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
 2018-04-30 13:29:03.845809: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
 2018-04-30 13:29:03.846098: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
 name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.759
 pciBusID: 0000:01:00.0
 totalMemory: 5.93GiB freeMemory: 5.55GiB
 2018-04-30 13:29:03.846111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
 2018-04-30 13:29:03.997708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
 2018-04-30 13:29:03.997733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0
 2018-04-30 13:29:03.997738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N
 2018-04-30 13:29:03.997867: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5330 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
 2018-04-30 13:29:04.470804: E tensorflow/stream_executor/cuda/cuda_dnn.cc:403] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
 2018-04-30 13:29:04.470887: F tensorflow/core/kernels/conv_ops.cc:712] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)
 Aborted (core dumped)



 6 of 6) Error when running inception v3
 https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/#0

 2018-04-30 13:10:41.089411: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
 .2018-04-30 13:10:41.361936: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
 2018-04-30 13:10:41.363263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
 name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.759
 pciBusID: 0000:01:00.0
 totalMemory: 5.93GiB freeMemory: 5.65GiB
 2018-04-30 13:10:41.363542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
 ..2018-04-30 13:10:41.816074: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
 2018-04-30 13:10:41.816592: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0
 2018-04-30 13:10:41.816719: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N
 2018-04-30 13:10:41.842623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5433 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
 2018-04-30 13:10:41.982520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
 2018-04-30 13:10:41.983360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
 2018-04-30 13:10:41.983507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0
 2018-04-30 13:10:41.983562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N

 2018-04-30 13:10:41.985218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5433 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
 .......2018-04-30 13:10:45.930924: W tensorflow/core/framework/op_def_util.cc:343] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().

 2018-04-30 13:10:45.992450: E tensorflow/stream_executor/cuda/cuda_blas.cc:462] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED

 2018-04-30 13:10:46.062670: E tensorflow/stream_executor/cuda/cuda_dnn.cc:403] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

 2018-04-30 13:10:46.062701: F tensorflow/core/kernels/conv_ops.cc:712] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)
	Tensorflow GPU issue on GeForce GTX 1060 (6GB)

	Note: CUDA and CuDNN got installed automatically when keras-gpu
	conda install -c anaconda keras-gpu

	Summary
	keras-gpu - 2.1.5
	tensorflow-gpu - 1.7.0
	dcml@dcml-MS-7B61:~$ nvcc -V
	Cuda compilation tools, release 7.5, V7.5.17
	libcudnn.so.7 -> libcudnn.so.7.1.3
	libcuda.so.1 -> libcuda.so.384.111
	libcudart.so.7.5 -> libcudart.so.7.5.18
	libcudart.so.7.5 -> libcudart.so.7.5.18



	1 of 6) System and GPU Info

	dcml@dcml-MS-7B61:~$ uname -a
	Linux dcml-MS-7B61 4.13.0-39-generic #44~16.04.1-Ubuntu SMP Thu Apr 5 16:43:10 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

	dcml@dcml-MS-7B61:~$ nvidia-smi
	Mon Apr 30 13:11:55 2018
	+-----------------------------------------------------------------------------+
	\| NVIDIA-SMI 384.111 Driver Version: 384.111 \|
	\|-------------------------------+----------------------+----------------------+
	\| GPU Name Persistence-M\| Bus-Id Disp.A \| Volatile Uncorr. ECC \|
	\| Fan Temp Perf Pwr:Usage/Cap\| Memory-Usage \| GPU-Util Compute M. \|
	\|===============================+======================+======================\|
	\| 0 GeForce GTX 1060.. Off \| 00000000:01:00.0 On \| N/A \|
	\| 42% 46C P8 7W / 120W \| 219MiB / 6071MiB \| 0% Default \|
	+-------------------------------+----------------------+----------------------+

	+-----------------------------------------------------------------------------+
	\| Processes: GPU Memory \|
	\| GPU PID Type Process name Usage \|
	\|=============================================================================\|
	\| 0 1067 G /usr/lib/xorg/Xorg 177MiB \|
	\| 0 1516 G compiz 39MiB \|
	+-----------------------------------------------------------------------------+


	2 of 6) Package list
	Using anaconda2 for Python 2.7
	dcml@dcml-MS-7B61:~$ /opt/anaconda2/bin/conda list
	# packages in environment at /opt/anaconda2:
	#
	# Name Version Build Channel
	...
	cudatoolkit 9.0 h13b8566_0
	cudnn 7.1.2 cuda9.0_0
	keras-gpu 2.1.5 py27_0
	libprotobuf 3.5.2 h6f1eeef_0
	numpy 1.14.2 py27hdbf6ddf_1
	tensorboard 1.7.0 py27hf484d3e_0
	tensorflow-gpu 1.7.0 0
	tensorflow-gpu-base 1.7.0 py27h8a131e3_0
	...


	3 of 6) Tensorflow GPU is using using
	conda install -c anaconda keras-gpu
	https://anaconda.org/anaconda/keras-gpu


	4 of 6) CUDA and CuDNN
	dcml@dcml-MS-7B61:~$ nvcc -V
	nvcc: NVIDIA (R) Cuda compiler driver
	Copyright (c) 2005-2015 NVIDIA Corporation
	Built on Tue_Aug_11_14:27:32_CDT_2015
	Cuda compilation tools, release 7.5, V7.5.17

	dcml@dcml-MS-7B61:~$ function lib_installed() { /sbin/ldconfig -N -v $(sed 's/:/ /' <<< $LD_LIBRARY_PATH) 2>/dev/null \| grep $1; }
	dcml@dcml-MS-7B61:~$ function check() { lib_installed $1 && echo "$1 is installed" \|\| echo "ERROR: $1 is NOT installed"; }
	dcml@dcml-MS-7B61:~$ check libcudnn
	libcudnn.so.7 -> libcudnn.so.7.1.3
	libcudnn is installed

	dcml@dcml-MS-7B61:~$ function lib_installed() { /sbin/ldconfig -N -v $(sed 's/:/ /' <<< $LD_LIBRARY_PATH) 2>/dev/null \| grep $1; }
	dcml@dcml-MS-7B61:~$ function check() { lib_installed $1 && echo "$1 is installed" \|\| echo "ERROR: $1 is NOT installed"; }
	dcml@dcml-MS-7B61:~$ check libcuda
	libcuda.so.1 -> libcuda.so.384.111
	libcudart.so.7.5 -> libcudart.so.7.5.18
	libcuda is installed

	dcml@dcml-MS-7B61:~$ check libcudart
	libcudart.so.7.5 -> libcudart.so.7.5.18
	libcudart is installed


	5 of 6) Error when running MNIST 1st_DNN
	bit.ly/2IBqQJD

	dcml@dcml-MS-7B61:~/workspace/test/mnist$ /opt/anaconda2/bin/python 1st_DNN.py
	/opt/anaconda2/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
	from ._conv import register_converters as _register_converters
	Using TensorFlow backend.
	(60000, 28, 28)
	1st_DNN.py:87: UserWarning: Update your `Conv2D` call to the Keras 2 API: `Conv2D(32, (3, 3), activation="relu", input_shape=(28, 28, 1...)`
	model.add(Convolution2D(32, 3, 3, activation='relu', input_shape=(28,28,1)))
	_________________________________________________________________
	Layer (type) Output Shape Param #
	=================================================================
	conv2d_1 (Conv2D) (None, 26, 26, 32) 320
	_________________________________________________________________
	conv2d_2 (Conv2D) (None, 26, 26, 10) 330
	_________________________________________________________________
	conv2d_3 (Conv2D) (None, 1, 1, 10) 67610
	_________________________________________________________________
	flatten_1 (Flatten) (None, 10) 0
	_________________________________________________________________
	activation_1 (Activation) (None, 10) 0
	=================================================================
	Total params: 68,260
	Trainable params: 68,260
	Non-trainable params: 0
	_________________________________________________________________
	/opt/anaconda2/lib/python2.7/site-packages/keras/models.py:942: UserWarning: The `nb_epoch` argument in `fit` has been renamed `epochs`.
	warnings.warn('The `nb_epoch` argument in `fit` '
	Epoch 1/10
	2018-04-30 13:29:03.725230: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
	2018-04-30 13:29:03.845809: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
	2018-04-30 13:29:03.846098: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
	name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.759
	pciBusID: 0000:01:00.0
	totalMemory: 5.93GiB freeMemory: 5.55GiB
	2018-04-30 13:29:03.846111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
	2018-04-30 13:29:03.997708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
	2018-04-30 13:29:03.997733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
	2018-04-30 13:29:03.997738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
	2018-04-30 13:29:03.997867: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5330 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
	2018-04-30 13:29:04.470804: E tensorflow/stream_executor/cuda/cuda_dnn.cc:403] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
	2018-04-30 13:29:04.470887: F tensorflow/core/kernels/conv_ops.cc:712] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)
	Aborted (core dumped)



	6 of 6) Error when running inception v3
	https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/#0

	2018-04-30 13:10:41.089411: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
	.2018-04-30 13:10:41.361936: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
	2018-04-30 13:10:41.363263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
	name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.759
	pciBusID: 0000:01:00.0
	totalMemory: 5.93GiB freeMemory: 5.65GiB
	2018-04-30 13:10:41.363542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
	..2018-04-30 13:10:41.816074: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
	2018-04-30 13:10:41.816592: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
	2018-04-30 13:10:41.816719: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
	2018-04-30 13:10:41.842623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5433 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
	2018-04-30 13:10:41.982520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
	2018-04-30 13:10:41.983360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
	2018-04-30 13:10:41.983507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
	2018-04-30 13:10:41.983562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N

	2018-04-30 13:10:41.985218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5433 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
	.......2018-04-30 13:10:45.930924: W tensorflow/core/framework/op_def_util.cc:343] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().

	2018-04-30 13:10:45.992450: E tensorflow/stream_executor/cuda/cuda_blas.cc:462] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED

	2018-04-30 13:10:46.062670: E tensorflow/stream_executor/cuda/cuda_dnn.cc:403] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

	2018-04-30 13:10:46.062701: F tensorflow/core/kernels/conv_ops.cc:712] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)
No results found