# CSA501 / AI & ML ## Syllabus {% file src="" %} ## Resources

M1: Introduction to Reinforcement Learning

* Overview of Reinforcement Learning (RL) * Definition and RL algorithms * Differences between RL and other ML paradigms * Elements of RL * Agent, Policy function, Value function, Model * Agent–environment interface * Types of RL environments * Deterministic vs. Stochastic * Fully observable vs. Partially observable * Discrete vs. Continuous * Episodic vs. Non-episodic * Single-agent vs. Multi-agent * RL platforms and frameworks * OpenAI Gym and Universe * DeepMind Lab * RL-Glue * Project Malmo * ViZDoom * Applications of RL * Education, Medicine, Healthcare * Manufacturing and Inventory Management * Finance * Natural Language Processing and Computer Vision

M2: Bandits and Markov Decision Processes

* Multi-armed Bandits * The k-armed Bandit Problem * Action-value Methods * Incremental Implementation * Tracking Nonstationary Problems * Optimistic Initial Values * Upper-Confidence-Bound (UCB) Action Selection * Gradient Bandit Algorithms * Associative Search (Contextual Bandits) * Finite Markov Decision Processes (MDPs) * Agent–environment interface * Goals, Rewards, and Returns * Policies and Value Functions * Episodic vs. Continuing Tasks * Optimal Policies and Value Functions * Bellman Equation and Optimality * Deriving and Solving the Bellman Equation for Value and Q-functions

M3: Dynamic Programming and Monte Carlo Methods

* Dynamic Programming (DP) * Policy Evaluation (Prediction) * Policy Improvement, Policy Iteration, and Value Iteration * Asynchronous Dynamic Programming * Generalized Policy Iteration * Efficiency of Dynamic Programming * Solving the Frozen Lake problem * Monte Carlo Methods * Monte Carlo Prediction (First Visit and Every Visit) * Blackjack with Monte Carlo * Monte Carlo Estimation of Action Values * Monte Carlo Control (with and without Exploring Starts) * Off-policy Prediction via Importance Sampling * Incremental Implementation * Off-policy Monte Carlo Control

M4: Temporal-Difference (TD) Learning

* TD Learning Methods * TD Prediction and its advantages * Optimality of TD(0) * Sarsa: On-policy TD Control * Q-learning: Off-policy TD Control * Expected Sarsa * Applications and Problems Solved Using TD Methods * Q-learning solutions for Taxi and Frozen Lake problems * Sarsa implementation for Taxi problem * Advanced Topics * Maximization Bias and Double Learning * Games and Afterstates * Deep Q-learning

M5: Function Approximation in Reinforcement Learning

* On-policy Prediction with Approximation * Value-function Approximation * Prediction Objective (VE) * Stochastic-gradient and Semi-gradient Methods * Linear Methods * Feature Construction for Linear Methods * Polynomials * Fourier Basis * Coarse Coding and Tile Coding * Radial Basis Functions (RBFs) * Step-size Parameter Selection * Nonlinear Function Approximation * Artificial Neural Networks * Least-Squares TD * Memory-based and Kernel-based Function Approximation * Advanced On-policy Learning Topics * Interest and Emphasis

{% hint style="danger" %} Content Yet to be Updated for this Page, Submit Resources If you want to collaborate, find form attached on this page below. {% endhint %} ## Notes ### MidTerm ### EndSem \[⤓] ## Question Directory \[⤓] ### Assignment Questions \[⤓] ### Previous Year Questions #### Mid-Sem-PYQ

[⤓]	Content Preview
	Y3S5-CSA501-RL-MidTerm-PYQ-OCT25

#### End-Sem-PYQ

[⤓]	Content Preview
	Y3S5-CSA501-AIR-EndSem-PYQ-DEC23
	Y3S5-CSA501-AIR-EndSem-PYQ-DEC24

## External Sources \[⤓] *** {% embed url="" %} {% embed url="" %} {% embed url="" %}