YangWeiliang_DeepNova@Deepexi wilhelmjung

4 Steps in Running LLaMA-7B on a M1 MacBook

The large language models usability

The problem with large language models is that you can’t run these locally on your laptop. Thanks to Georgi Gerganov and his llama.cpp project, it is now possible to run Meta’s LLaMA on a single computer without a dedicated GPU.

Running LLaMA

There are multiple steps involved in running LLaMA locally on a M1 Mac after downloading the model weights.

	# Clone llama.cpp
	git clone https://github.com/ggerganov/llama.cpp.git
	cd llama.cpp

	# Build it
	make clean
	LLAMA_METAL=1 make

	# Download model
	export MODEL=llama-2-13b-chat.ggmlv3.q4_0.bin

	FROM python:3.7-alpine3.8

	RUN apk add --no-cache \
	build-base \
	cmake \
	bash \
	jemalloc-dev \
	boost-dev \
	autoconf \
	zlib-dev \