-
-
Save senkumartup/953ad477b1eeee437db4f650f557d725 to your computer and use it in GitHub Desktop.
| Tensorflow GPU issue on GeForce GTX 1060 (6GB) | |
| Note: CUDA and CuDNN got installed automatically when keras-gpu | |
| conda install -c anaconda keras-gpu | |
| Summary | |
| keras-gpu - 2.1.5 | |
| tensorflow-gpu - 1.7.0 | |
| dcml@dcml-MS-7B61:~$ nvcc -V | |
| Cuda compilation tools, release 7.5, V7.5.17 | |
| libcudnn.so.7 -> libcudnn.so.7.1.3 | |
| libcuda.so.1 -> libcuda.so.384.111 | |
| libcudart.so.7.5 -> libcudart.so.7.5.18 | |
| libcudart.so.7.5 -> libcudart.so.7.5.18 | |
| 1 of 6) System and GPU Info | |
| dcml@dcml-MS-7B61:~$ uname -a | |
| Linux dcml-MS-7B61 4.13.0-39-generic #44~16.04.1-Ubuntu SMP Thu Apr 5 16:43:10 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | |
| dcml@dcml-MS-7B61:~$ nvidia-smi | |
| Mon Apr 30 13:11:55 2018 | |
| +-----------------------------------------------------------------------------+ | |
| | NVIDIA-SMI 384.111 Driver Version: 384.111 | | |
| |-------------------------------+----------------------+----------------------+ | |
| | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | |
| | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | |
| |===============================+======================+======================| | |
| | 0 GeForce GTX 1060.. Off | 00000000:01:00.0 On | N/A | | |
| | 42% 46C P8 7W / 120W | 219MiB / 6071MiB | 0% Default | | |
| +-------------------------------+----------------------+----------------------+ | |
| +-----------------------------------------------------------------------------+ | |
| | Processes: GPU Memory | | |
| | GPU PID Type Process name Usage | | |
| |=============================================================================| | |
| | 0 1067 G /usr/lib/xorg/Xorg 177MiB | | |
| | 0 1516 G compiz 39MiB | | |
| +-----------------------------------------------------------------------------+ | |
| 2 of 6) Package list | |
| Using anaconda2 for Python 2.7 | |
| dcml@dcml-MS-7B61:~$ /opt/anaconda2/bin/conda list | |
| # packages in environment at /opt/anaconda2: | |
| # | |
| # Name Version Build Channel | |
| ... | |
| cudatoolkit 9.0 h13b8566_0 | |
| cudnn 7.1.2 cuda9.0_0 | |
| keras-gpu 2.1.5 py27_0 | |
| libprotobuf 3.5.2 h6f1eeef_0 | |
| numpy 1.14.2 py27hdbf6ddf_1 | |
| tensorboard 1.7.0 py27hf484d3e_0 | |
| tensorflow-gpu 1.7.0 0 | |
| tensorflow-gpu-base 1.7.0 py27h8a131e3_0 | |
| ... | |
| 3 of 6) Tensorflow GPU is using using | |
| conda install -c anaconda keras-gpu | |
| https://anaconda.org/anaconda/keras-gpu | |
| 4 of 6) CUDA and CuDNN | |
| dcml@dcml-MS-7B61:~$ nvcc -V | |
| nvcc: NVIDIA (R) Cuda compiler driver | |
| Copyright (c) 2005-2015 NVIDIA Corporation | |
| Built on Tue_Aug_11_14:27:32_CDT_2015 | |
| Cuda compilation tools, release 7.5, V7.5.17 | |
| dcml@dcml-MS-7B61:~$ function lib_installed() { /sbin/ldconfig -N -v $(sed 's/:/ /' <<< $LD_LIBRARY_PATH) 2>/dev/null | grep $1; } | |
| dcml@dcml-MS-7B61:~$ function check() { lib_installed $1 && echo "$1 is installed" || echo "ERROR: $1 is NOT installed"; } | |
| dcml@dcml-MS-7B61:~$ check libcudnn | |
| libcudnn.so.7 -> libcudnn.so.7.1.3 | |
| libcudnn is installed | |
| dcml@dcml-MS-7B61:~$ function lib_installed() { /sbin/ldconfig -N -v $(sed 's/:/ /' <<< $LD_LIBRARY_PATH) 2>/dev/null | grep $1; } | |
| dcml@dcml-MS-7B61:~$ function check() { lib_installed $1 && echo "$1 is installed" || echo "ERROR: $1 is NOT installed"; } | |
| dcml@dcml-MS-7B61:~$ check libcuda | |
| libcuda.so.1 -> libcuda.so.384.111 | |
| libcudart.so.7.5 -> libcudart.so.7.5.18 | |
| libcuda is installed | |
| dcml@dcml-MS-7B61:~$ check libcudart | |
| libcudart.so.7.5 -> libcudart.so.7.5.18 | |
| libcudart is installed | |
| 5 of 6) Error when running MNIST 1st_DNN | |
| bit.ly/2IBqQJD | |
| dcml@dcml-MS-7B61:~/workspace/test/mnist$ /opt/anaconda2/bin/python 1st_DNN.py | |
| /opt/anaconda2/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. | |
| from ._conv import register_converters as _register_converters | |
| Using TensorFlow backend. | |
| (60000, 28, 28) | |
| 1st_DNN.py:87: UserWarning: Update your `Conv2D` call to the Keras 2 API: `Conv2D(32, (3, 3), activation="relu", input_shape=(28, 28, 1...)` | |
| model.add(Convolution2D(32, 3, 3, activation='relu', input_shape=(28,28,1))) | |
| _________________________________________________________________ | |
| Layer (type) Output Shape Param # | |
| ================================================================= | |
| conv2d_1 (Conv2D) (None, 26, 26, 32) 320 | |
| _________________________________________________________________ | |
| conv2d_2 (Conv2D) (None, 26, 26, 10) 330 | |
| _________________________________________________________________ | |
| conv2d_3 (Conv2D) (None, 1, 1, 10) 67610 | |
| _________________________________________________________________ | |
| flatten_1 (Flatten) (None, 10) 0 | |
| _________________________________________________________________ | |
| activation_1 (Activation) (None, 10) 0 | |
| ================================================================= | |
| Total params: 68,260 | |
| Trainable params: 68,260 | |
| Non-trainable params: 0 | |
| _________________________________________________________________ | |
| /opt/anaconda2/lib/python2.7/site-packages/keras/models.py:942: UserWarning: The `nb_epoch` argument in `fit` has been renamed `epochs`. | |
| warnings.warn('The `nb_epoch` argument in `fit` ' | |
| Epoch 1/10 | |
| 2018-04-30 13:29:03.725230: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA | |
| 2018-04-30 13:29:03.845809: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero | |
| 2018-04-30 13:29:03.846098: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: | |
| name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.759 | |
| pciBusID: 0000:01:00.0 | |
| totalMemory: 5.93GiB freeMemory: 5.55GiB | |
| 2018-04-30 13:29:03.846111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0 | |
| 2018-04-30 13:29:03.997708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: | |
| 2018-04-30 13:29:03.997733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 | |
| 2018-04-30 13:29:03.997738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N | |
| 2018-04-30 13:29:03.997867: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5330 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) | |
| 2018-04-30 13:29:04.470804: E tensorflow/stream_executor/cuda/cuda_dnn.cc:403] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR | |
| 2018-04-30 13:29:04.470887: F tensorflow/core/kernels/conv_ops.cc:712] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) | |
| Aborted (core dumped) | |
| 6 of 6) Error when running inception v3 | |
| https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/#0 | |
| 2018-04-30 13:10:41.089411: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA | |
| .2018-04-30 13:10:41.361936: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero | |
| 2018-04-30 13:10:41.363263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: | |
| name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.759 | |
| pciBusID: 0000:01:00.0 | |
| totalMemory: 5.93GiB freeMemory: 5.65GiB | |
| 2018-04-30 13:10:41.363542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0 | |
| ..2018-04-30 13:10:41.816074: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: | |
| 2018-04-30 13:10:41.816592: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 | |
| 2018-04-30 13:10:41.816719: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N | |
| 2018-04-30 13:10:41.842623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5433 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) | |
| 2018-04-30 13:10:41.982520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0 | |
| 2018-04-30 13:10:41.983360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: | |
| 2018-04-30 13:10:41.983507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 | |
| 2018-04-30 13:10:41.983562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N | |
| 2018-04-30 13:10:41.985218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5433 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) | |
| .......2018-04-30 13:10:45.930924: W tensorflow/core/framework/op_def_util.cc:343] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization(). | |
| 2018-04-30 13:10:45.992450: E tensorflow/stream_executor/cuda/cuda_blas.cc:462] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED | |
| 2018-04-30 13:10:46.062670: E tensorflow/stream_executor/cuda/cuda_dnn.cc:403] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR | |
| 2018-04-30 13:10:46.062701: F tensorflow/core/kernels/conv_ops.cc:712] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) |
senkumartup
commented
Apr 30, 2018

CUDA and CuDNN
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
Tensor flow Install
- https://www.tensorflow.org/install/install_linux#InstallingAnaconda
'conda create -n tensorflow pip python=2.7'
'pip install --ignore-installed --upgrade
https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.8.0-cp27-none-linux_x86_64.whl'
(tensorflow) dcml@dcml-MS-7B61:~/workspace/virtualenv$ python
Python 2.7.15 |Anaconda, Inc.| (default, May 1 2018, 23:32:55)
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import tensorflow
Traceback (most recent call last):
File "", line 1, in
File "/home/dcml/.conda/envs/tensorflow/lib/python2.7/site-packages/tensorflow/init.py", line 24, in
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/home/dcml/.conda/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/init.py", line 49, in
from tensorflow.python import pywrap_tensorflow
File "/home/dcml/.conda/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/home/dcml/.conda/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/dcml/.conda/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/home/dcml/.conda/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/install_sources#common_installation_problems
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
quit()
ldconfig -v
added by Anaconda2 installer
export PATH="/opt/anaconda2/bin:$PATH"
export CUDA_HOME=/usr/local/cuda
#export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
sudo /opt/anaconda2/bin/pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.8.0-cp27-none-linux_x86_64.whl
sudo /opt/anaconda2/bin/pip install --ignore-installed --upgrade --force keras
libtbb-dev
libv4l/libv4l2-dev
libgstreamer-plugins-base1.0-dev
tesseract-ocr
libtesseract-dev
libleptonica-dev
nvcc -V
function lib_installed() { /sbin/ldconfig -N -v $(sed 's/:/ /' <<< $LD_LIBRARY_PATH) 2>/dev/null | grep $1; }
function check() { lib_installed $1 && echo "$1 is installed" || echo "ERROR: $1 is NOT installed"; }
check libcudnn
function lib_installed() { /sbin/ldconfig -N -v $(sed 's/:/ /' <<< $LD_LIBRARY_PATH) 2>/dev/null | grep $1; }
function check() { lib_installed $1 && echo "$1 is installed" || echo "ERROR: $1 is NOT installed"; }
check libcuda
check libcudart
Notes
Patch
dcml@sun:~/workspace/training/india/session1/tf4p2$ git diff
diff --git a/scripts/label_image.py b/scripts/label_image.py
index 613a095..18bd5e7 100644
--- a/scripts/label_image.py
+++ b/scripts/label_image.py
@@ -39,6 +39,11 @@ def read_tensor_from_image_file(file_name, input_height=299, input_width=299,
input_mean=0, input_std=255):
input_name = "file_reader"
output_name = "normalized"
+
- #force values
-
input_height=224
-
input_width=224
- file_reader = tf.read_file(file_name, input_name)
if file_name.endswith(".png"):
image_reader = tf.image.decode_png(file_reader, channels = 3,
@@ -73,9 +78,14 @@ if name == "main":
label_file = "tf_files/retrained_labels.txt"
input_height = 224
input_width = 224 - input_height = 299
- input_width = 299
- input_mean = 128
input_std = 128
- input_layer = "input"
-
input_layer = "input"
-
input_layer = "Mul"
output_layer = "final_result"parser = argparse.ArgumentParser()
@@ -118,6 +128,8 @@ if name == "main":input_name = "import/" + input_layer
output_name = "import/" + output_layer -
print(input_name)
-
print(output_name)
input_operation = graph.get_operation_by_name(input_name);
output_operation = graph.get_operation_by_name(output_name);
Png to Jpg
mogrify -format jpg *.png
#!/bin/bash
cd tf_files/objects
cd Bus
mogrify -format jpg *.png
cd ..
cd Car
mogrify -format jpg *.png
cd ..
cd LCV
mogrify -format jpg *.png
cd ..
cd MAV
mogrify -format jpg *.png
cd ..
cd MCL
mogrify -format jpg *.png
cd ..
cd MiniBus
mogrify -format jpg *.png
cd ..
cd NoVehicle
mogrify -format jpg *.png
cd ..
cd Truck
mogrify -format jpg *.png
cd ..
Train
export ARCHITECTURE="inception_v3"
python -m scripts.retrain
--bottleneck_dir=tf_files/bottlenecks
--how_many_training_steps=500
--model_dir=tf_files/models/
--summaries_dir=tf_files/training_summaries/"${ARCHITECTURE}"
--output_graph=tf_files/retrained_graph.pb
--output_labels=tf_files/retrained_labels.txt
--architecture="${ARCHITECTURE}"
--image_dir=tf_files/objects
Test
#!/bin/bash
python -m scripts.label_image
--graph=tf_files/retrained_graph.pb
--image=$1

