CSA501 / AI & ML
Syllabus
Resources
M1: Introduction to Reinforcement Learning
Overview of Reinforcement Learning (RL)
Definition and RL algorithms
Differences between RL and other ML paradigms
Elements of RL
Agent, Policy function, Value function, Model
Agent–environment interface
Types of RL environments
Deterministic vs. Stochastic
Fully observable vs. Partially observable
Discrete vs. Continuous
Episodic vs. Non-episodic
Single-agent vs. Multi-agent
RL platforms and frameworks
OpenAI Gym and Universe
DeepMind Lab
RL-Glue
Project Malmo
ViZDoom
Applications of RL
Education, Medicine, Healthcare
Manufacturing and Inventory Management
Finance
Natural Language Processing and Computer Vision
M2: Bandits and Markov Decision Processes
Multi-armed Bandits
The k-armed Bandit Problem
Action-value Methods
Incremental Implementation
Tracking Nonstationary Problems
Optimistic Initial Values
Upper-Confidence-Bound (UCB) Action Selection
Gradient Bandit Algorithms
Associative Search (Contextual Bandits)
Finite Markov Decision Processes (MDPs)
Agent–environment interface
Goals, Rewards, and Returns
Policies and Value Functions
Episodic vs. Continuing Tasks
Optimal Policies and Value Functions
Bellman Equation and Optimality
Deriving and Solving the Bellman Equation for Value and Q-functions
M3: Dynamic Programming and Monte Carlo Methods
Dynamic Programming (DP)
Policy Evaluation (Prediction)
Policy Improvement, Policy Iteration, and Value Iteration
Asynchronous Dynamic Programming
Generalized Policy Iteration
Efficiency of Dynamic Programming
Solving the Frozen Lake problem
Monte Carlo Methods
Monte Carlo Prediction (First Visit and Every Visit)
Blackjack with Monte Carlo
Monte Carlo Estimation of Action Values
Monte Carlo Control (with and without Exploring Starts)
Off-policy Prediction via Importance Sampling
Incremental Implementation
Off-policy Monte Carlo Control
M4: Temporal-Difference (TD) Learning
TD Learning Methods
TD Prediction and its advantages
Optimality of TD(0)
Sarsa: On-policy TD Control
Q-learning: Off-policy TD Control
Expected Sarsa
Applications and Problems Solved Using TD Methods
Q-learning solutions for Taxi and Frozen Lake problems
Sarsa implementation for Taxi problem
Advanced Topics
Maximization Bias and Double Learning
Games and Afterstates
Deep Q-learning
M5: Function Approximation in Reinforcement Learning
On-policy Prediction with Approximation
Value-function Approximation
Prediction Objective (VE)
Stochastic-gradient and Semi-gradient Methods
Linear Methods
Feature Construction for Linear Methods
Polynomials
Fourier Basis
Coarse Coding and Tile Coding
Radial Basis Functions (RBFs)
Step-size Parameter Selection
Nonlinear Function Approximation
Artificial Neural Networks
Least-Squares TD
Memory-based and Kernel-based Function Approximation
Advanced On-policy Learning Topics
Interest and Emphasis
Content Yet to be Updated for this Page, Submit Resources If you want to collaborate, find form attached on this page below.
Notes
MidTerm
EndSem
[⤓]
Question Directory
[⤓]
Assignment Questions
[⤓]
Previous Year Questions
Mid-Sem-PYQ
End-Sem-PYQ
External Sources
[⤓]
Last updated
