Dr. Amit Puri amitpuri

🌐

Living my dream. What can I do or pivot it to Do what I can? Come what may!

Interested in #AI #Cloud, #Web, #Mobility, #Data, #ML, #IoT, #Microsoft365, #Azure, #Python, #R, #PowerPlatform And my son on GitHub @ishanpuri

amitpuri / transformer-architecture-v7.py

Created January 8, 2026 14:06

	"""
	Transformer Architecture v7
	==================================================================
	"""

	import torch
	import torch.nn as nn
	import torch.nn.functional as F
	import math
	import os

amitpuri / transformer-architecture-v8.py

Last active January 8, 2026 14:21

	"""
	Transformer Architecture v8
	==================================================================
	1. Architecture: GPT-2 style with Learnable Positional Embeddings & aligned naming.
	2. Pre-training: Train from scratch on TinyStories (restored from v7).
	3. Weight Loading: Load OpenAI GPT-2 pretrained weights.
	4. Fine-tuning: Instruction fine-tuning (Alpaca style).
	5. Persistence: Save/Load checkpoints (restored from v7).

	Dependencies: torch, tiktoken, requests, tqdm, numpy, datasets (for training), tensorflow (for weight loading)

amitpuri / transformer-architecture-v6.py

Created January 3, 2026 06:13

	"""
	Transformer Architecture v6 (NLTK Tokenization)
	==================================================================
	Improved version of v5 using NLTK for word-level tokenization.
	Includes interactive menu for Training, Saving, Loading, and Generation.
	"""

	import torch
	import torch.nn as nn
	import torch.nn.functional as F

amitpuri / transformer-architecture-v4.py

Last active January 3, 2026 06:11

	"""
	Transformer Architecture v4
	==================================================================
	"""

	import torch
	import torch.nn as nn
	import torch.nn.functional as F
	import math
	import os

amitpuri / transformer-architecture-v3.py

Created January 3, 2026 06:09

	"""
	Transformer Architecture v3
	=========================================================

	Key addition from v3:
	✓ Added causal masking to prevent attending to future tokens
	✓ Added Pre-LN option for deeper networks
	✓ Proper dropout placement in FFN
	✓ Better temperature scaling
	"""

amitpuri / transformer-architecture-v2.py

Created January 3, 2026 05:50

	import torch
	import torch.nn as nn
	import torch.nn.functional as F
	import math
	import os

	# ============================================
	# 1. TOKENIZATION (using simple mapping)
	# ============================================
	class SimpleTokenizer:

amitpuri / transformer-architecture.py

Created January 3, 2026 05:46

	import torch
	import torch.nn as nn
	import torch.nn.functional as F
	import math

	# ============================================
	# 1. TOKENIZATION (using simple mapping)
	# ============================================
	class SimpleTokenizer:
	def __init__(self, vocab):

	"""
	Transformer Architecture v8
	==================================================================
	1. Architecture: GPT-2 style with Learnable Positional Embeddings & aligned naming.
	2. Pre-training: Train from scratch on TinyStories (restored from v7).
	3. Weight Loading: Load OpenAI GPT-2 pretrained weights.
	4. Fine-tuning: Instruction fine-tuning (Alpaca style).
	5. Persistence: Save/Load checkpoints (restored from v7).

	Dependencies: torch, tiktoken, requests, tqdm, numpy, datasets (for training), tensorflow (for weight loading)