githubEdit

CSA501 / AI & ML

Syllabus

Resources

chevron-rightM1: Introduction to Reinforcement Learninghashtag
  • Overview of Reinforcement Learning (RL)

    • Definition and RL algorithms

    • Differences between RL and other ML paradigms

  • Elements of RL

    • Agent, Policy function, Value function, Model

    • Agent–environment interface

  • Types of RL environments

    • Deterministic vs. Stochastic

    • Fully observable vs. Partially observable

    • Discrete vs. Continuous

    • Episodic vs. Non-episodic

    • Single-agent vs. Multi-agent

  • RL platforms and frameworks

    • OpenAI Gym and Universe

    • DeepMind Lab

    • RL-Glue

    • Project Malmo

    • ViZDoom

  • Applications of RL

    • Education, Medicine, Healthcare

    • Manufacturing and Inventory Management

    • Finance

    • Natural Language Processing and Computer Vision

chevron-rightM2: Bandits and Markov Decision Processeshashtag
  • Multi-armed Bandits

    • The k-armed Bandit Problem

    • Action-value Methods

    • Incremental Implementation

    • Tracking Nonstationary Problems

    • Optimistic Initial Values

    • Upper-Confidence-Bound (UCB) Action Selection

    • Gradient Bandit Algorithms

    • Associative Search (Contextual Bandits)

  • Finite Markov Decision Processes (MDPs)

    • Agent–environment interface

    • Goals, Rewards, and Returns

    • Policies and Value Functions

    • Episodic vs. Continuing Tasks

    • Optimal Policies and Value Functions

    • Bellman Equation and Optimality

    • Deriving and Solving the Bellman Equation for Value and Q-functions

chevron-rightM3: Dynamic Programming and Monte Carlo Methodshashtag
  • Dynamic Programming (DP)

    • Policy Evaluation (Prediction)

    • Policy Improvement, Policy Iteration, and Value Iteration

    • Asynchronous Dynamic Programming

    • Generalized Policy Iteration

    • Efficiency of Dynamic Programming

    • Solving the Frozen Lake problem

  • Monte Carlo Methods

    • Monte Carlo Prediction (First Visit and Every Visit)

    • Blackjack with Monte Carlo

    • Monte Carlo Estimation of Action Values

    • Monte Carlo Control (with and without Exploring Starts)

    • Off-policy Prediction via Importance Sampling

    • Incremental Implementation

    • Off-policy Monte Carlo Control

chevron-rightM4: Temporal-Difference (TD) Learninghashtag
  • TD Learning Methods

    • TD Prediction and its advantages

    • Optimality of TD(0)

    • Sarsa: On-policy TD Control

    • Q-learning: Off-policy TD Control

    • Expected Sarsa

  • Applications and Problems Solved Using TD Methods

    • Q-learning solutions for Taxi and Frozen Lake problems

    • Sarsa implementation for Taxi problem

  • Advanced Topics

    • Maximization Bias and Double Learning

    • Games and Afterstates

    • Deep Q-learning

chevron-rightM5: Function Approximation in Reinforcement Learninghashtag
  • On-policy Prediction with Approximation

    • Value-function Approximation

    • Prediction Objective (VE)

    • Stochastic-gradient and Semi-gradient Methods

    • Linear Methods

  • Feature Construction for Linear Methods

    • Polynomials

    • Fourier Basis

    • Coarse Coding and Tile Coding

    • Radial Basis Functions (RBFs)

    • Step-size Parameter Selection

  • Nonlinear Function Approximation

    • Artificial Neural Networks

    • Least-Squares TD

    • Memory-based and Kernel-based Function Approximation

  • Advanced On-policy Learning Topics

    • Interest and Emphasis

triangle-exclamation

Notes

MidTerm

EndSem

[⤓]

Question Directory

[⤓]

Assignment Questions

[⤓]

Previous Year Questions

Mid-Sem-PYQ

End-Sem-PYQ

External Sources

[⤓]


Last updated