Skip to content

Instantly share code, notes, and snippets.

@whizzzkid
Last active December 3, 2022 15:43
Show Gist options
  • Select an option

  • Save whizzzkid/37c0d365f1c7aa555885d102ec61c048 to your computer and use it in GitHub Desktop.

Select an option

Save whizzzkid/37c0d365f1c7aa555885d102ec61c048 to your computer and use it in GitHub Desktop.
[XPS 15 Early 2017 9560 kabylake] Making Nvidia Drivers + (CUDA 8 / CUDA 9 / CUDA 9.1) + Bumblebee work together on linux ( Ubuntu / KDE Neon / Linux Mint / debian )
# Update to 4.9 kernel do not delete the old kernel as it will be your failsafe if something happens to this one
# Install KabyLake graphics patches
cd /tmp;
wget https://01.org/sites/default/files/downloads/intelr-graphics-linux/kbldmcver101.tar.bz2;
tar xjvf kbldmcver101.tar.bz2; cd kbl_dmc_ver1_01/; sudo ./install.sh
cd /tmp;
wget https://01.org/sites/default/files/downloads/intelr-graphics-linux/kblgucver914.tar.gz;
tar xvzf kblgucver914.tar.gz; cd firmware/kbl/guc/kbl_guc_ver/; sudo ./install.sh
# Add Nvidia repository
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
# Install via ubuntu drivers
sudo ubuntu-drivers autoinstall
# Install CUDA 8 (if you're interested in using gpu for ML)
# This acts as repo so install it somewhere safe and do not delete
cd ~/Downloads
wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb
sudo dpkg -i cuda*.deb; sudo apt update; sudo apt install cuda
# The version of prime-select is BS. Patch prime select like this
cd /usr/bin
mv prime-select prime-select.bkup
sudo wget https://raw.githubusercontent.com/C11235/nvidia-prime-bugfix/master/prime-select
sudo chmod 755 prime-select
sudo reboot now
# Testing cuda.
$ sudo prime-select intel
$ nvidia-smi
# This should give an error, no drivers found
# Try this
$ sudo prime-select nvidia
# displays CUDA info.
# You can leave it here if you're not worried about battery but if you are then continue with this.
# Install powertop and tlp
sudo apt install tlp powertop
# Run powertop:
sudo powertop
# You should see battery discharge around 20w +/- 5W, this eats up my battery 4 times faster.
# Add command line params:
sudo nano /etc/default/grub
# Make the following look like this, do not ask why.
GRUB_CMDLINE_LINUX_DEFAULT='i915.edp_vswing=2 i915.preliminary_hw_support=1 intel_idle.max_cstate=1 pcie_port_pm=off acpi_backlight=vendor acpi_osi=Linux acpi_osi=! acpi_osi="Windows 2009"'
i915.edp_vswing=2 i915.preliminary_hw_support=1 intel_idle.max_cstate=1 acpi_backlight=vendor acpi_osi=Linux acpi_osi=! acpi_osi="Windows 2009"'
sudo update-grub2
# Install bumblebee - now this is the danger zone, this software has not been updated in a while and I am not sure when will this available.
# Avoid updating your system if you're fine with this.
sudo add-apt-repository ppa:bumblebee/testing
sudo apt update
sudo apt install bumblebee bumblebee-nvidia
# Add yourself to the bumblebee usergroup so you can make changes to the videocard.
sudo adduser $USER bumblebee
# Enable bumblebee service
sudo systemctl enable bumblebeed
# now you need to load the right driver in bumblebee because it's an old piece of software which has not been updated in a whie
# to know the nvidia driver currently loaded. Run:
lsmod | grep nvidia
# this should give you something like, nvidia-xxx
# at the time of writing this, the latest is nvidia-378 but cuda 8 requires the stable, which is nvidia-375
# add them to bumblebee config file.
sudo nano /etc/bumblebee/bumblebee.conf
# change 'Driver=' to 'Driver=nvidia'
# change all occurences of 'nvidia-current' to 'nvidia-xxx'
# Since the driver load will now be handled by bumblebee, we need to stop the OS from loading it.
sudo nano /etc/modprobe.d/bumblebee.conf
# add a new entry toward the end, with nvidia-xxx, in my case:
# nvidia-375
#375
blacklist nvidia-375
blacklist nvidia-375-drm
blacklist nvidia-375-updates
blacklist nvidia-experimental-375
# once that is done, you'll need bbswitch dkms module
sudo apt-get install bbswitch-dkms
# Load this with the kernel.
sudo nano /etc/modules-load.d/modules.conf
# add following
i915
bbswitch
# TLP is known to interfere with bumblebee, make it avoid using this https://wiki.archlinux.org/index.php/Talk:Bumblebee#Bumblebee_and_TLP_interferening
# Run powertop to see if battery consumption is in check: 10w +/- 5W
# Helpful links:
https://karlgrz.com/dell-xps-15-ubuntu-tweaks/
https://hemenkapadia.github.io/blog/2016/05/07/Ubuntu-with-Nvidia-Bumblebee.html
https://askubuntu.com/questions/879856/nvidia-prime-cant-switch-to-intel/885487
http://www.webupd8.org/2016/08/how-to-install-and-configure-bumblebee.html
@hlavki
Copy link
Copy Markdown

hlavki commented Sep 15, 2017

Hi, thanks for this. I am running openSUSE tumbleweed and it looks like it works, but if I run optirun glxspheres on HiDPi and change to resize to fullscreen, than FPS is 44. Its slower than intel.

Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
Visual ID of window: 0x21
Context is Direct
OpenGL Renderer: GeForce GTX 1050/PCIe/SSE2
208.091512 frames/sec - 298.622997 Mpixels/sec
46.062645 frames/sec - 357.829368 Mpixels/sec
44.692185 frames/sec - 347.183198 Mpixels/sec
44.929472 frames/sec - 349.026516 Mpixels/sec
44.540262 frames/sec - 346.003007 Mpixels/sec

optirun nvidia-smi

Fri Sep 15 11:49:00 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.69                 Driver Version: 384.69                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1050    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   34C    P0    N/A /  N/A |      6MiB /  4041MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     18941    G   Xorg                                             6MiB |
+-----------------------------------------------------------------------------+

Do you also have this problem? Thank you

@hlavki
Copy link
Copy Markdown

hlavki commented Sep 15, 2017

And also you can use boot options:

acpi_rev_override=5 acpi_backlight=none

instead of

pcie_port_pm=off acpi_backlight=none acpi_osi=Linux acpi_osi=! acpi_osi="Windows 2009"

@whizzzkid
Copy link
Copy Markdown
Author

whizzzkid commented Sep 16, 2017

@hlavki Here is the output for:

$ optirun glxgears
14096 frames in 5.0 seconds = 2819.109 FPS
11460 frames in 5.0 seconds = 2290.427 FPS
$ optirun ./glxspheres64
Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
Visual ID of window: 0x21
Context is Direct
OpenGL Renderer: GeForce GTX 1050/PCIe/SSE2
347.675228 frames/sec - 388.005555 Mpixels/sec
358.998031 frames/sec - 400.641803 Mpixels/sec
351.300209 frames/sec - 392.051033 Mpixels/sec
356.592040 frames/sec - 397.956717 Mpixels/sec
359.251480 frames/sec - 400.924652 Mpixels/sec
350.943396 frames/sec - 391.652830 Mpixels/sec
356.542326 frames/sec - 397.901236 Mpixels/sec
357.434807 frames/sec - 398.897245 Mpixels/sec
347.815407 frames/sec - 388.161995 Mpixels/sec
$ optirun nvidia-smi
Fri Sep 15 18:38:17 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.69                 Driver Version: 384.69                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1050    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   35C    P0    N/A /  N/A |      6MiB /  4041MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     30920    G   /usr/lib/xorg/Xorg                               6MiB |
+-----------------------------------------------------------------------------+

@hlavki
Copy link
Copy Markdown

hlavki commented Sep 19, 2017

@whizzzkid thanks for reply. Can you please try to run:

optirun glxgears -fullscreen

I am getting low FPS:

227 frames in 5.0 seconds = 45.246 FPS
238 frames in 5.0 seconds = 47.571 FPS
235 frames in 5.0 seconds = 46.935 FPS
237 frames in 5.0 seconds = 47.285 FPS

@rjcrystal
Copy link
Copy Markdown

When i run optirun it says
[ 200.660549] [ERROR]No bridge found. Try installing primus or virtualgl.
I am using linux mint 18 with nvidia 940MX kernel 4.13.

@Moulick
Copy link
Copy Markdown

Moulick commented Sep 26, 2017

@whizzzkid Thank You. After Trying dozens of tutorials all over the internet, your's is the only one that worked reliably. I am so thankful to you for finally getting my GPU properly running selectively. I had very nearly given up on Linux and moved back to windows before your Gist solved everything.

@codebymark
Copy link
Copy Markdown

codebymark commented Sep 29, 2017

I found that after I did this with 4.11 it worked fine but once I rebooted I got AHCI errors and cant use anything but my onboard graphics. Any suggestions?
I am temped to give 4.13 a go.

Further Details:
Ubuntu 16.04
Kernel: 4.11.2-041102-generic

Error on boot (doesn't stop booting)
ACPI Error: Namespace lookup failure, AE_NOT_FOUND.

Then when I attempt to shut down I get:
NMI watchdog: BUG: soft lockup - CPU#1 stuck....

I am keen to know what BIOS version everyone has.. I am running on v1.2.4 and im not sure if I need to upgrade or if that will cause me more headaches.

@whizzzkid

@bajubullet
Copy link
Copy Markdown

@mark-bucknell I am getting the same errors, pls let me know if you find a solution. Also I have tried Manjaro, solus, Ubuntu 17:10, Elementary as well getting same errors on everyone.

@codebymark
Copy link
Copy Markdown

@bajubullet no such luck but I am keeping an eye on this thread

@whizzzkid
Copy link
Copy Markdown
Author

Sorry guys, I was away. Also I noticed, there was an update for nvidia-384 yesterday, I am not sure what it's supposed to do but it did not mess up anything.

Let me respond to you one by one:

@hlavki
Here is the output:

$ optirun glxgears -fullscreen
1035 frames in 5.0 seconds = 206.857 FPS
953 frames in 5.0 seconds = 190.546 FPS
1017 frames in 5.0 seconds = 203.210 FPS
1010 frames in 5.0 seconds = 201.939 FPS
1012 frames in 5.0 seconds = 202.285 FPS

@rjcrystal
During my initial setup I started by installing mint, and it had too much stuff missing, ubuntu sucked too. KDE Neon looked like the best contender as it had most of the stuff working out of the box. I am not sure what the error here is but do post if you find a solution, btw primux should have been installed when installing bumblebee.

@Moulick
thanks :)

@mark-bucknell
As far as I remember I was having problems with 4.11, I moved to 4.13 already. I am on 4.13.0-041300-generic also I am on bios 1.4 planning to go to 1.5 soon

@codebymark
Copy link
Copy Markdown

So I just did the dell bios upgrade from v1.2.4 to 1.4.0 and I followed everything down to line 23. All looks good. I am now able to support 3 monitors (just) and the error message on shut down has gone. I still have an error message showing on startup but I know that is something to do with the linux kernel and ACPI so I'll wait for updates that will fix that.
@whizzzkid thankyou for maintaining this gist

@romainreignier
Copy link
Copy Markdown

Hi,
Just to let you know, I have followed this tutorial a month ago when I have received my laptop.

I am using it daily with Kubuntu 16.04 + backports to get Plasma 5.08, Bios 1.5, Linux 4.13.8, Nvidia 387, Bumblebee...

I have about 7W in powertop on IDLE.

Options in GRUB:

acpi_rev_override=1 pcie_port_pm=off

Everything is fine, so thank you so much!

@whizzzkid
Copy link
Copy Markdown
Author

whizzzkid commented Nov 12, 2017

@mark-bucknell: good for you!

@romainreignier: cheers :) I am still on Driver Version: 384.98 because CUDA9.

@dhfromkorea
Copy link
Copy Markdown

dhfromkorea commented Nov 13, 2017

I seek advice from you wise folks. I am having a hard time making this installation work with nvidia 384 (with cuda 8.0 ga_2 and libcudnn 6.0 on KDE Neon LTS 16.04) It seems kde neon does not like it atm because of this bug[1]

I tried rolling back to nvidia 375, but apt-get won't let me because 375 is a transitional version and automatically force-enable nvidia 384.

I tried installing nvidia 381, instead of 384, but cuda 8.0 deb installation forces -> nvidia 384.

Do you have any advice for a workaround I may try?

Also a dumb question--Is updating the kernel to 4.13 required for getting nvidia.384 to work?
[1] https://blog.martin-graesslin.com/blog/2017/08/warning-nvidia-driver-384-69-seems-to-be-broken-with-qtquick/

@whizzzkid
Copy link
Copy Markdown
Author

@dhfromkorea with this setup you will not be loading nvidia drivers when normally using the system. That bug is a problem when you're explicitly loading nvidia drivers at boot. Since this setup does not do that and instead loads them only when running a program with optirun it shouldn't be a problem.

@jasonbeach
Copy link
Copy Markdown

This is probably a dumb question, but what does bumblebee do? It sounds like it allows optionally use the nvidia card--ie take the output of the nvidia card and route it to the intel graphics unit. Why would you want to do this? Just for power savings or it there something else?

Thanks

@littlewine
Copy link
Copy Markdown

great gist @whizzzkid ! Many thanks.
After a lot of effort, black screens with no proper display adapter and reinstalls, I finally managed to get it properly working. For me, what was missing from the whole gist and caused trouble was sudo apt-get install primus. It gave me an errors something like "proper bridge missing".

Also, very useful for ML folks: Since you can run only what you need in your GPU, tensorflow and other ML and DL libraries are now faster than ever 👍
Just type optirun jupyter notebook and you are ready to go!

@saroele
Copy link
Copy Markdown

saroele commented Jan 8, 2018

Thanks for this excellent tutorial and for keeping it up to date! I had to do a complete reinstall of my laptop today but I had problems with the first step:

# Install Intel Graphics Patch Firmwares (This should reboot your system):
bash -c "$(curl -fsSL http://bit.ly/IGFWL-install)"

I solved it by making a local copy of that script and add cd /tmp in the beginning of the script.

@geertw
Copy link
Copy Markdown

geertw commented Jan 11, 2018

For nvidia-387 and kernel 3.14 I had to install libelf-dev before dkms would build a kernel module.

@scrat98
Copy link
Copy Markdown

scrat98 commented Jan 13, 2018

@whizzzkid
Copy link
Copy Markdown
Author

@jasonbeach bumblebee keeps nvidia off for most of the time. If you need to run some application which would require access to cuda, you can simply run it using $ optirun [any app]

@littlewine You're welcome :)

@saroele did you have curl installed?

@geertw updated the gist with kernel 4.14.13 and Cuda 9.1

@theo-m
Copy link
Copy Markdown

theo-m commented Jan 29, 2018

Hi @whizzzkid,

I managed to do a working install a few months ago following your gist, but I've reinstalled my system and can't get anything working now, it seems I've fell multiple times in kernel # and drivers # that don't work together. Do you have a definite list of such couples that actually work? I've observed bad performances with X with kernels >= 4.13 and I'm therefore running with 4.10 (which doesn't work with nvidia-384).

Thanks for your work, it's been a real help.

@wolny
Copy link
Copy Markdown

wolny commented Jan 31, 2018

great stuff! Especially the boot config (GRUB_CMDLINE_LINUX_DEFAULT)

@MatthiasSiewert
Copy link
Copy Markdown

MatthiasSiewert commented Feb 11, 2018

@paines For those stuck with a failed Xorg-server. I managed to recover by uninstalling all packages mentioned above and do

ubuntu-drivers autoinstall

from the recovery mode.
Its a nice guide. I got as far as having cuda running. Then gave up after three attempts to fix the rest. Could be that it doesn't work with gnome?!?

@fayazkhan
Copy link
Copy Markdown

Thanks for this!
I did this because my laptop had heating issues because of nvidia always being on. Now it's staying really cool!
Also, in my case, using primusrun instead of optirun gave me better performance for steam.

@fayazkhan
Copy link
Copy Markdown

@whizzzkid, my setup has stopped working suddenly. I had an abrupt shutdown while steam with primusrun was live.

Then, when I try to start steam again, it just runs the updater and quits.

$ primusrun steam
/usr/bin/primusrun: line 41: warning: command substitution: ignored null byte in input                                                                        
Running Steam on ubuntu 17.10 64-bit                                                                                                                          
STEAM_RUNTIME is enabled automatically
Pins up-to-date!
[2018-04-07 21:44:12] Startup - updater built Apr  2 2018 15:23:43
Looks like steam didn't shutdown cleanly, scheduling immediate update check
[2018-04-07 21:44:13] Checking for update on startup
[2018-04-07 21:44:13] Checking for available updates...
[2018-04-07 21:44:14] Download skipped: /client/steam_client_ubuntu12 version 1522709999, installed version 1522709999
[2018-04-07 21:44:14] Nothing to do
[2018-04-07 21:44:14] Verifying installation...
[2018-04-07 21:44:14] Performing checksum verification of executable files
[2018-04-07 21:44:15] Verification complete
$

This doesn't happen when running steam in intel or with optirun, but the performance is just not there.

Also I am seeing some weird dmesg output too.

[ 4265.046654] ldconfig.real[17704]: segfault at 338 ip 000000000049c5a7 sp 00007ffd43442710 error 4 in ldconfig.real[400000+e2000]
[ 4265.193360] bbswitch: enabling discrete graphics
[ 4265.407318] nvidia-nvlink: Nvlink Core is being initialized, major device number 239
[ 4265.407797] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  390.48  Thu Mar 22 00:42:57 PDT 2018 (using threaded interrupts)
[ 4266.463078] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  390.48  Wed Mar 21 23:48:34 PDT 2018
[ 4266.479341] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 238
[ 4266.509646] nvidia-modeset: Allocated GPU:0 (GPU-e67ebdca-1dc6-cc63-e15c-c63297995ddf) @ PCI:0000:01:00.0
[ 4266.509799] nvidia-modeset: Freed GPU:0 (GPU-e67ebdca-1dc6-cc63-e15c-c63297995ddf) @ PCI:0000:01:00.0
[ 4266.568571] nvidia-modeset: Allocated GPU:0 (GPU-e67ebdca-1dc6-cc63-e15c-c63297995ddf) @ PCI:0000:01:00.0
[ 4266.568720] nvidia-modeset: Freed GPU:0 (GPU-e67ebdca-1dc6-cc63-e15c-c63297995ddf) @ PCI:0000:01:00.0
[ 4268.457375] steam[17728]: segfault at a8 ip 00000000f75c8f83 sp 00000000ffd95e10 error 4 in libGL.so.390.48[f7547000+c5000]
[ 4268.582475] nvidia-modeset: Unloading
[ 4268.611549] nvidia-uvm: Unloaded the UVM driver in 8 mode
[ 4268.639389] nvidia-nvlink: Unregistered the Nvlink Core, major device number 239
[ 4268.697497] bbswitch: disabling discrete graphics
[ 4268.714983] pci 0000:01:00.0: Refused to change power state, currently in D0
[ 4273.194150] bbswitch: enabling discrete graphics
[ 4273.417666] nvidia-nvlink: Nvlink Core is being initialized, major device number 239
[ 4273.418106] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  390.48  Thu Mar 22 00:42:57 PDT 2018 (using threaded interrupts)
[ 4274.474290] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  390.48  Wed Mar 21 23:48:34 PDT 2018
[ 4274.491723] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 238
[ 4274.523783] nvidia-modeset: Allocated GPU:0 (GPU-e67ebdca-1dc6-cc63-e15c-c63297995ddf) @ PCI:0000:01:00.0
[ 4274.523930] nvidia-modeset: Freed GPU:0 (GPU-e67ebdca-1dc6-cc63-e15c-c63297995ddf) @ PCI:0000:01:00.0
[ 4274.575560] nvidia-modeset: Unloading
[ 4274.640395] nvidia-uvm: Unloaded the UVM driver in 8 mode
[ 4274.659580] nvidia-nvlink: Unregistered the Nvlink Core, major device number 239
[ 4274.693555] bbswitch: disabling discrete graphics
[ 4274.711057] pci 0000:01:00.0: Refused to change power state, currently in D0

@whizzzkid
Copy link
Copy Markdown
Author

Welcome everyone

I updated to 390.48 a couple of days ago and 4.15.18 kernel, the watts are oddly satisfying on idle.
4 43w

@fayazkhan try reinstalling, I am clueless.

Cheers :)

@fayazkhan
Copy link
Copy Markdown

fayazkhan commented Apr 27, 2018

@whizzzkid it actually wasn't a setup issue, but an issue from steam which had a solve here: ValveSoftware/steam-for-linux#5428 (comment)

This command solves it

primusrun steam -steamos

@ToothyTahr
Copy link
Copy Markdown

Hello! Thank you for taking the time to provide clear instructions.

I am running Ubuntu 16.04 on my Dell XPS 15 9560 (UHD screen, 16Gb RAM, 512 SSD etc.) After following the guide, I ended up with a black screen (no cursor) upon reboot. I have a feeling the error has something to do with one of these factors:

  1. When running sudo bumblebeed , the output was something to the effect of "nvidia" not being found.
  2. When I ran cat /proc/acpi/bbswitch, the output was 0000:01:00.0 ON
  3. I installed the V4.14.13 kernel, but sudo ubuntu-drivers autoinstall installed the version 396 driver.

Do you perhaps have any advice how I could get this working properly? At the moment, having removed all nvidia related components, powerstat indicates an average power draw of 25W. I would very much like to get this down to more reasonable figures.

My CPU temperature sits around 50 degrees - which is hotter than the 40 degrees I have read online others are able to achieve. Furthermore, the GPU temperature seems to be at around 60 degrees.

@SvenMeyer
Copy link
Copy Markdown

I finally installed Manjaro Arch Linux and suddenly everything works perfectly out of the box, zero configuration required !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment