You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NumPy Random — default_rng and Random Number Generation
SETUP
importnumpyasnp# Modern way — always use this (NumPy 1.17+)rng=np.random.default_rng(seed=42)
# seed= makes results reproducible# same seed → same sequence of random numbers every run# omit seed for truly random behavior
Why default_rng over np.random.rand?
Old API (avoid): New API (use this):
np.random.rand() rng = np.random.default_rng(42)
np.random.seed(42) rng.random()
np.random.randint() rng.integers()
Old API problems:
- Global state — seed affects entire program
- Not thread-safe
- Harder to reproduce specific results
default_rng advantages:
- Self-contained — seed lives in the rng object
- Thread-safe
- Better statistical properties (PCG64 algorithm)
- Reproducible without polluting global state
PART 1 — CREATING A GENERATOR
1. Create RNG with and without seed
# Reproducible — same output every runrng=np.random.default_rng(seed=42)
# Also reproducible — seed= keyword optionalrng=np.random.default_rng(42)
# Truly random — different output every runrng=np.random.default_rng()
# Multiple independent generators — each with own seedrng1=np.random.default_rng(1)
rng2=np.random.default_rng(2)
# rng1 and rng2 produce independent streams
2. Check the generator
rng=np.random.default_rng(42)
# See the underlying bit generatorprint(rng.bit_generator) # → <numpy.random.PCG64 object># Spawn child generators — independent but reproduciblechildren=rng.spawn(3) # → list of 3 independent RNGs
PART 2 — UNIFORM RANDOM NUMBERS
3. rng.random() — floats between 0.0 and 1.0
rng=np.random.default_rng(42)
# Single floatrng.random() # → 0.7739...# 1D arrayrng.random(5) # → [0.773, 0.438, 0.858, 0.697, 0.094]# 2D arrayrng.random((3, 4)) # → shape (3,4) array of floats in [0, 1)# Scale to custom range [low, high)low, high=5.0, 10.0rng.random(5) * (high-low) +low# → values between 5.0 and 10.0
4. rng.uniform() — floats in a custom range
# Uniform floats between low and highrng.uniform(low=0, high=10) # → single float in [0, 10)rng.uniform(low=0, high=10, size=5) # → 5 floats in [0, 10)rng.uniform(low=-1, high=1, size=(3, 3)) # → 3×3 array in [-1, 1)# Equivalent to:rng.random(5) *10# same distribution, less explicit
5. rng.integers() — random integers
# Single integer in [0, 10) — high is EXCLUSIVE by defaultrng.integers(10) # → int in {0,1,2,...,9}# Custom range [low, high)rng.integers(low=1, high=7) # → die roll {1,2,3,4,5,6}# Include high endpoint — endpoint=True makes high INCLUSIVErng.integers(1, 6, endpoint=True) # → {1,2,3,4,5,6}# Array of integersrng.integers(low=1, high=100, size=10) # → 1D arrayrng.integers(low=0, high=2, size=(4, 4)) # → 4×4 binary matrix {0,1}
PART 3 — STATISTICAL DISTRIBUTIONS
6. Normal (Gaussian) Distribution
# Standard normal — mean=0, std=1rng.standard_normal() # → single floatrng.standard_normal(5) # → 5 floatsrng.standard_normal((3, 3)) # → 3×3 array# Custom mean and stdrng.normal(loc=170, scale=10) # → height-like value (mean=170, std=10)rng.normal(loc=170, scale=10, size=1000) # → 1000 heights# GPA example — clamp with clip aftergpas=rng.normal(loc=3.0, scale=0.5, size=500)
gpas=np.clip(gpas, 0.0, 4.0) # → keep in valid range
7. Binomial Distribution — count of successes in N trials
# Flip a fair coin 10 times — count headsrng.binomial(n=10, p=0.5) # → int in {0..10}# 1000 simulations of 10 flipsrng.binomial(n=10, p=0.5, size=1000)
# % of Netflix originals out of 50 random titles (30% are originals)rng.binomial(n=50, p=0.3, size=100) # → 100 simulated counts
8. Poisson Distribution — count of rare events
# Average 3 events per intervalrng.poisson(lam=3) # → random countrng.poisson(lam=3, size=1000) # → 1000 simulated counts# Reviews per day — avg 5 reviewsrng.poisson(lam=5, size=30) # → 30 days of review counts
9. Exponential Distribution — time between events
# Scale = mean time between eventsrng.exponential(scale=2.0) # → single wait timerng.exponential(scale=2.0, size=100) # → 100 wait times
10. Other Distributions
# Beta — values between 0 and 1 (useful for probabilities)rng.beta(a=2, b=5, size=100)
# Gammarng.gamma(shape=2.0, scale=1.0, size=100)
# Log-normal — for income, prices (always positive, right-skewed)rng.lognormal(mean=0, sigma=1, size=100)
# Chi-squaredrng.chisquare(df=5, size=100)
# Student's trng.standard_t(df=10, size=100)
# Uniform integers (Bernoulli-like: 0 or 1)rng.integers(0, 2, size=100) # → binary array
PART 4 — SAMPLING AND SHUFFLING
11. rng.choice() — sample from an array
arr=np.array([10, 20, 30, 40, 50])
# Single random elementrng.choice(arr) # → one element# Sample 3 WITHOUT replacement (default)rng.choice(arr, size=3, replace=False)
# Sample 5 WITH replacementrng.choice(arr, size=5, replace=True)
# Weighted sampling — unequal probabilitiesprobs=np.array([0.1, 0.2, 0.4, 0.2, 0.1])
rng.choice(arr, size=3, p=probs) # → favors middle values# Sample from range 0..n-1 (pass integer instead of array)rng.choice(100, size=10, replace=False) # → 10 unique ints from 0–99# Practical — sample rows from a datasetindices=rng.choice(len(df), size=50, replace=False)
sample=df.iloc[indices]
12. rng.shuffle() — shuffle in place
arr=np.array([1, 2, 3, 4, 5])
rng.shuffle(arr) # modifies arr IN PLACEprint(arr) # → [3, 1, 5, 2, 4] (randomized)# 2D — shuffles along axis=0 (rows) by defaultmatrix=np.arange(12).reshape(3, 4)
rng.shuffle(matrix) # → rows reordered, columns within rows unchanged
arr=np.array([1, 2, 3, 4, 5])
shuffled=rng.permutation(arr) # → new array, arr unchangedprint(arr) # → [1, 2, 3, 4, 5] still originalprint(shuffled) # → [3, 1, 5, 2, 4] new shuffled copy# Permutation of integers 0..n-1rng.permutation(10) # → [7, 2, 0, 5, 9, 1, 4, 3, 8, 6]# Shuffle index and apply to multiple arrays togetheridx=rng.permutation(len(X))
X_shuffled=X[idx]
y_shuffled=y[idx] # same shuffle applied to labels
PART 5 — REPRODUCIBILITY PATTERNS
14. Reproducibility — always use a seed
# Same seed → identical output every timerng=np.random.default_rng(42)
print(rng.random(3)) # → [0.773, 0.438, 0.858]rng=np.random.default_rng(42) # reset with same seedprint(rng.random(3)) # → [0.773, 0.438, 0.858] identical# Different seed → different outputrng=np.random.default_rng(99)
print(rng.random(3)) # → [0.673, 0.114, 0.420] different
15. Save and restore generator state
# Save staterng=np.random.default_rng(42)
state=rng.bit_generator.staterng.random(10) # consume 10 numbers# Restore to saved staterng.bit_generator.state=staterng.random(10) # → exact same 10 numbers again
16. Multiple independent generators
# Spawn child generators — statistically independentrng=np.random.default_rng(42)
rng1, rng2, rng3=rng.spawn(3)
# Each child produces a different streamrng1.random(3) # → [0.462, 0.284, 0.919]rng2.random(3) # → [0.114, 0.782, 0.331]rng3.random(3) # → [0.605, 0.017, 0.748]# Use case: parallel simulations — give each worker its own rng
default_rng → a self-contained random number machine
seed → the starting position of that machine
same seed = same sequence, always
rng.random() → float [0.0, 1.0)
rng.uniform() → float [low, high) explicit range
rng.integers() → int [low, high) high EXCLUSIVE by default
rng.normal() → bell curve centered at loc with spread scale
shuffle vs permutation:
rng.shuffle(a) → modifies a IN PLACE, returns None
rng.permutation(a) → returns NEW shuffled copy, a unchanged
choice vs permutation:
rng.choice(a, size=k) → sample k elements (can repeat if replace=True)
rng.permutation(a) → reorder ALL elements, no repeats
Reproducibility recipe:
1. Always set a seed for experiments and tests
2. Pass rng objects around instead of using global state
3. Use rng.spawn() for parallel independent streams
4. Save rng.bit_generator.state to replay from a checkpoint