sreekarun/Numpy_random.md

NumPy Random — `default_rng` and Random Number Generation

SETUP

import numpy as np

# Modern way — always use this (NumPy 1.17+)
rng = np.random.default_rng(seed=42)

# seed= makes results reproducible
# same seed → same sequence of random numbers every run
# omit seed for truly random behavior

Why `default_rng` over `np.random.rand`?

Old API (avoid):          New API (use this):
np.random.rand()          rng = np.random.default_rng(42)
np.random.seed(42)        rng.random()
np.random.randint()       rng.integers()

Old API problems:
  - Global state — seed affects entire program
  - Not thread-safe
  - Harder to reproduce specific results

default_rng advantages:
  - Self-contained — seed lives in the rng object
  - Thread-safe
  - Better statistical properties (PCG64 algorithm)
  - Reproducible without polluting global state

PART 1 — CREATING A GENERATOR

1. Create RNG with and without seed

# Reproducible — same output every run
rng = np.random.default_rng(seed=42)

# Also reproducible — seed= keyword optional
rng = np.random.default_rng(42)

# Truly random — different output every run
rng = np.random.default_rng()

# Multiple independent generators — each with own seed
rng1 = np.random.default_rng(1)
rng2 = np.random.default_rng(2)
# rng1 and rng2 produce independent streams

2. Check the generator

rng = np.random.default_rng(42)

# See the underlying bit generator
print(rng.bit_generator)       # → <numpy.random.PCG64 object>

# Spawn child generators — independent but reproducible
children = rng.spawn(3)        # → list of 3 independent RNGs

PART 2 — UNIFORM RANDOM NUMBERS

3. `rng.random()` — floats between 0.0 and 1.0

rng = np.random.default_rng(42)

# Single float
rng.random()                   # → 0.7739...

# 1D array
rng.random(5)                  # → [0.773, 0.438, 0.858, 0.697, 0.094]

# 2D array
rng.random((3, 4))             # → shape (3,4) array of floats in [0, 1)

# Scale to custom range [low, high)
low, high = 5.0, 10.0
rng.random(5) * (high - low) + low
# → values between 5.0 and 10.0

4. `rng.uniform()` — floats in a custom range

# Uniform floats between low and high
rng.uniform(low=0, high=10)           # → single float in [0, 10)
rng.uniform(low=0, high=10, size=5)   # → 5 floats in [0, 10)
rng.uniform(low=-1, high=1, size=(3, 3))  # → 3×3 array in [-1, 1)

# Equivalent to:
rng.random(5) * 10                    # same distribution, less explicit

5. `rng.integers()` — random integers

# Single integer in [0, 10)  — high is EXCLUSIVE by default
rng.integers(10)                      # → int in {0,1,2,...,9}

# Custom range [low, high)
rng.integers(low=1, high=7)           # → die roll {1,2,3,4,5,6}

# Include high endpoint — endpoint=True makes high INCLUSIVE
rng.integers(1, 6, endpoint=True)     # → {1,2,3,4,5,6}

# Array of integers
rng.integers(low=1, high=100, size=10)        # → 1D array
rng.integers(low=0, high=2, size=(4, 4))      # → 4×4 binary matrix {0,1}

PART 3 — STATISTICAL DISTRIBUTIONS

6. Normal (Gaussian) Distribution

# Standard normal — mean=0, std=1
rng.standard_normal()                 # → single float
rng.standard_normal(5)                # → 5 floats
rng.standard_normal((3, 3))           # → 3×3 array

# Custom mean and std
rng.normal(loc=170, scale=10)         # → height-like value (mean=170, std=10)
rng.normal(loc=170, scale=10, size=1000)  # → 1000 heights

# GPA example — clamp with clip after
gpas = rng.normal(loc=3.0, scale=0.5, size=500)
gpas = np.clip(gpas, 0.0, 4.0)       # → keep in valid range

7. Binomial Distribution — count of successes in N trials

# Flip a fair coin 10 times — count heads
rng.binomial(n=10, p=0.5)            # → int in {0..10}

# 1000 simulations of 10 flips
rng.binomial(n=10, p=0.5, size=1000)

# % of Netflix originals out of 50 random titles (30% are originals)
rng.binomial(n=50, p=0.3, size=100)  # → 100 simulated counts

8. Poisson Distribution — count of rare events

# Average 3 events per interval
rng.poisson(lam=3)                   # → random count
rng.poisson(lam=3, size=1000)        # → 1000 simulated counts

# Reviews per day — avg 5 reviews
rng.poisson(lam=5, size=30)          # → 30 days of review counts

9. Exponential Distribution — time between events

# Scale = mean time between events
rng.exponential(scale=2.0)           # → single wait time
rng.exponential(scale=2.0, size=100) # → 100 wait times

10. Other Distributions

# Beta — values between 0 and 1 (useful for probabilities)
rng.beta(a=2, b=5, size=100)

# Gamma
rng.gamma(shape=2.0, scale=1.0, size=100)

# Log-normal — for income, prices (always positive, right-skewed)
rng.lognormal(mean=0, sigma=1, size=100)

# Chi-squared
rng.chisquare(df=5, size=100)

# Student's t
rng.standard_t(df=10, size=100)

# Uniform integers (Bernoulli-like: 0 or 1)
rng.integers(0, 2, size=100)         # → binary array

PART 4 — SAMPLING AND SHUFFLING

11. `rng.choice()` — sample from an array

arr = np.array([10, 20, 30, 40, 50])

# Single random element
rng.choice(arr)                       # → one element

# Sample 3 WITHOUT replacement (default)
rng.choice(arr, size=3, replace=False)

# Sample 5 WITH replacement
rng.choice(arr, size=5, replace=True)

# Weighted sampling — unequal probabilities
probs = np.array([0.1, 0.2, 0.4, 0.2, 0.1])
rng.choice(arr, size=3, p=probs)      # → favors middle values

# Sample from range 0..n-1 (pass integer instead of array)
rng.choice(100, size=10, replace=False)   # → 10 unique ints from 0–99

# Practical — sample rows from a dataset
indices = rng.choice(len(df), size=50, replace=False)
sample  = df.iloc[indices]

12. `rng.shuffle()` — shuffle in place

arr = np.array([1, 2, 3, 4, 5])

rng.shuffle(arr)          # modifies arr IN PLACE
print(arr)                # → [3, 1, 5, 2, 4]  (randomized)

# 2D — shuffles along axis=0 (rows) by default
matrix = np.arange(12).reshape(3, 4)
rng.shuffle(matrix)       # → rows reordered, columns within rows unchanged

13. `rng.permutation()` — shuffled copy (non-destructive)

arr = np.array([1, 2, 3, 4, 5])

shuffled = rng.permutation(arr)   # → new array, arr unchanged
print(arr)                        # → [1, 2, 3, 4, 5]  still original
print(shuffled)                   # → [3, 1, 5, 2, 4]  new shuffled copy

# Permutation of integers 0..n-1
rng.permutation(10)               # → [7, 2, 0, 5, 9, 1, 4, 3, 8, 6]

# Shuffle index and apply to multiple arrays together
idx = rng.permutation(len(X))
X_shuffled = X[idx]
y_shuffled = y[idx]               # same shuffle applied to labels

PART 5 — REPRODUCIBILITY PATTERNS

14. Reproducibility — always use a seed

# Same seed → identical output every time
rng = np.random.default_rng(42)
print(rng.random(3))              # → [0.773, 0.438, 0.858]

rng = np.random.default_rng(42)  # reset with same seed
print(rng.random(3))              # → [0.773, 0.438, 0.858]  identical

# Different seed → different output
rng = np.random.default_rng(99)
print(rng.random(3))              # → [0.673, 0.114, 0.420]  different

15. Save and restore generator state

# Save state
rng  = np.random.default_rng(42)
state = rng.bit_generator.state

rng.random(10)                    # consume 10 numbers

# Restore to saved state
rng.bit_generator.state = state
rng.random(10)                    # → exact same 10 numbers again

16. Multiple independent generators

# Spawn child generators — statistically independent
rng  = np.random.default_rng(42)
rng1, rng2, rng3 = rng.spawn(3)

# Each child produces a different stream
rng1.random(3)                    # → [0.462, 0.284, 0.919]
rng2.random(3)                    # → [0.114, 0.782, 0.331]
rng3.random(3)                    # → [0.605, 0.017, 0.748]

# Use case: parallel simulations — give each worker its own rng

PART 6 — PRACTICAL EXAMPLES

17. Simulate a dataset

rng = np.random.default_rng(42)
n   = 500

imdb_scores  = np.clip(rng.normal(7.0, 1.2, n), 1.0, 10.0)
duration     = rng.integers(60, 180, size=n)
release_year = rng.integers(2010, 2024, size=n, endpoint=True)
is_original  = rng.integers(0, 2, size=n)             # binary 0/1

18. Train / test split

rng = np.random.default_rng(42)
n   = 1000

indices   = rng.permutation(n)
split     = int(0.8 * n)

train_idx = indices[:split]       # 80%
test_idx  = indices[split:]       # 20%

19. Monte Carlo estimation of π

rng = np.random.default_rng(42)
n   = 1_000_000

x = rng.random(n)
y = rng.random(n)

inside = (x**2 + y**2) <= 1.0    # points inside unit circle
pi_est = 4 * np.sum(inside) / n

print(f"π ≈ {pi_est:.5f}")        # → π ≈ 3.14159 (approx)

20. Bootstrap sampling

rng  = np.random.default_rng(42)
data = np.array([3.2, 3.8, 2.9, 4.0, 3.5, 3.1, 3.7, 2.8])

# 1000 bootstrap samples — resample with replacement
n_boot   = 1000
boot_means = np.array([
    rng.choice(data, size=len(data), replace=True).mean()
    for _ in range(n_boot)
])

print(f"Mean:  {boot_means.mean():.3f}")
print(f"95% CI: [{np.percentile(boot_means, 2.5):.3f}, "
      f"{np.percentile(boot_means, 97.5):.3f}]")

🧠 Quick Reference

Goal	Method
Create generator	`rng = np.random.default_rng(seed)`
Float in [0, 1)	`rng.random(size)`
Float in [low, high)	`rng.uniform(low, high, size)`
Integer in [low, high)	`rng.integers(low, high, size)`
Integer in [low, high]	`rng.integers(low, high, size, endpoint=True)`
Normal distribution	`rng.normal(loc, scale, size)`
Standard normal	`rng.standard_normal(size)`
Binomial	`rng.binomial(n, p, size)`
Poisson	`rng.poisson(lam, size)`
Sample from array	`rng.choice(arr, size, replace)`
Weighted sample	`rng.choice(arr, size, p=probs)`
Shuffle in place	`rng.shuffle(arr)`
Shuffled copy	`rng.permutation(arr)`
Save state	`rng.bit_generator.state`
Independent streams	`rng.spawn(n)`

🧠 Mental Model

default_rng  →  a self-contained random number machine
seed         →  the starting position of that machine
               same seed = same sequence, always

rng.random()    →  float  [0.0, 1.0)
rng.uniform()   →  float  [low, high)   explicit range
rng.integers()  →  int    [low, high)   high EXCLUSIVE by default
rng.normal()    →  bell curve centered at loc with spread scale

shuffle vs permutation:
  rng.shuffle(a)      →  modifies a IN PLACE, returns None
  rng.permutation(a)  →  returns NEW shuffled copy, a unchanged

choice vs permutation:
  rng.choice(a, size=k)     →  sample k elements (can repeat if replace=True)
  rng.permutation(a)        →  reorder ALL elements, no repeats

Reproducibility recipe:
  1. Always set a seed for experiments and tests
  2. Pass rng objects around instead of using global state
  3. Use rng.spawn() for parallel independent streams
  4. Save rng.bit_generator.state to replay from a checkpoint

sreekarun/Numpy_random.md

Select an option

No results found

Select an option

No results found

NumPy Random — `default_rng` and Random Number Generation

SETUP

Why `default_rng` over `np.random.rand`?

PART 1 — CREATING A GENERATOR

1. Create RNG with and without seed

2. Check the generator

PART 2 — UNIFORM RANDOM NUMBERS

3. `rng.random()` — floats between 0.0 and 1.0

4. `rng.uniform()` — floats in a custom range

5. `rng.integers()` — random integers

PART 3 — STATISTICAL DISTRIBUTIONS

6. Normal (Gaussian) Distribution

7. Binomial Distribution — count of successes in N trials

8. Poisson Distribution — count of rare events

9. Exponential Distribution — time between events

10. Other Distributions

PART 4 — SAMPLING AND SHUFFLING

11. `rng.choice()` — sample from an array

12. `rng.shuffle()` — shuffle in place

13. `rng.permutation()` — shuffled copy (non-destructive)

PART 5 — REPRODUCIBILITY PATTERNS

14. Reproducibility — always use a seed

15. Save and restore generator state

16. Multiple independent generators

PART 6 — PRACTICAL EXAMPLES

17. Simulate a dataset

18. Train / test split

19. Monte Carlo estimation of π

20. Bootstrap sampling

🧠 Quick Reference

🧠 Mental Model

sreekarun/Numpy_random.md

NumPy Random — default_rng and Random Number Generation

SETUP

Why default_rng over np.random.rand?

PART 1 — CREATING A GENERATOR

1. Create RNG with and without seed

2. Check the generator

PART 2 — UNIFORM RANDOM NUMBERS

3. rng.random() — floats between 0.0 and 1.0

4. rng.uniform() — floats in a custom range

5. rng.integers() — random integers

PART 3 — STATISTICAL DISTRIBUTIONS

6. Normal (Gaussian) Distribution

7. Binomial Distribution — count of successes in N trials

8. Poisson Distribution — count of rare events

9. Exponential Distribution — time between events

10. Other Distributions

PART 4 — SAMPLING AND SHUFFLING

11. rng.choice() — sample from an array

12. rng.shuffle() — shuffle in place

13. rng.permutation() — shuffled copy (non-destructive)

PART 5 — REPRODUCIBILITY PATTERNS

14. Reproducibility — always use a seed

15. Save and restore generator state

16. Multiple independent generators

PART 6 — PRACTICAL EXAMPLES

17. Simulate a dataset

18. Train / test split

19. Monte Carlo estimation of π

20. Bootstrap sampling

🧠 Quick Reference

🧠 Mental Model

NumPy Random — `default_rng` and Random Number Generation

Why `default_rng` over `np.random.rand`?

3. `rng.random()` — floats between 0.0 and 1.0

4. `rng.uniform()` — floats in a custom range

5. `rng.integers()` — random integers

11. `rng.choice()` — sample from an array

12. `rng.shuffle()` — shuffle in place

13. `rng.permutation()` — shuffled copy (non-destructive)