Jiacheng Miao jmiao24

How large are large language models? (2025)

This aims to be factual information about the size of large language models. None of this document was written by AI. I do not include any information from leaks or rumors. The focus of this document is on base models (the raw text continuation engines, not 'helpful chatbot/assistants'). This is a view from a few years ago to today of one very tiny fraction of the larger LLM story that's happening.

History

GPT-2,-medium,-large,-xl (2019): 137M, 380M, 812M, 1.61B. Source: openai-community/gpt2. Trained on the unreleased WebText dataset said to 40GB of Internet text - I estimate that to be roughly 10B tokens. You can see a list of the websites that went into that data set here domains.txt.
GPT-3 aka davinci, davinci-002 (2020): 175B parameters. There is a good breakdown of how those parameters are 'spent' here [How d

	# train_grpo.py
	#
	# See https://github.com/willccbb/verifiers for ongoing developments
	#
	"""
	citation:

	@misc{brown2025grpodemo,
	title={Granular Format Rewards for Eliciting Mathematical Reasoning Capabilities in Small Language Models},
	author={Brown, William},

	""" Trains an agent with (stochastic) Policy Gradients on Pong. Uses OpenAI Gym. """
	import numpy as np
	import cPickle as pickle
	import gym

	# hyperparameters
	H = 200 # number of hidden layer neurons
	batch_size = 10 # every how many episodes to do a param update?
	learning_rate = 1e-4
	gamma = 0.99 # discount factor for reward