Skip to content

Instantly share code, notes, and snippets.

View jmiao24's full-sized avatar

Jiacheng Miao jmiao24

View GitHub Profile
@rain-1
rain-1 / base model trends.md
Last active December 25, 2025 23:27
base model trends.md
@willccbb
willccbb / grpo_demo.py
Last active April 17, 2026 04:37
GRPO Llama-1B
# train_grpo.py
#
# See https://github.com/willccbb/verifiers for ongoing developments
#
"""
citation:
@misc{brown2025grpodemo,
title={Granular Format Rewards for Eliciting Mathematical Reasoning Capabilities in Small Language Models},
author={Brown, William},
@karpathy
karpathy / pg-pong.py
Created May 30, 2016 22:50
Training a Neural Network ATARI Pong agent with Policy Gradients from raw pixels
""" Trains an agent with (stochastic) Policy Gradients on Pong. Uses OpenAI Gym. """
import numpy as np
import cPickle as pickle
import gym
# hyperparameters
H = 200 # number of hidden layer neurons
batch_size = 10 # every how many episodes to do a param update?
learning_rate = 1e-4
gamma = 0.99 # discount factor for reward