# CSA501 / AI & ML

## Syllabus

{% file src="<https://3148391480-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FoWho7cxjZIbvsuDwIAzB%2Fuploads%2Fc3ROvYo50Ihw6zGvVsyU%2FY3S5-CSA501%2B521-SYLLABUS-BTECH-CSE-IT.docx.pdf?alt=media&token=5dcea8b1-1bc4-498e-a060-2e699891da20>" %}

## Resources

<details>

<summary>M1: Introduction to Reinforcement Learning</summary>

* Overview of Reinforcement Learning (RL)
  * Definition and RL algorithms
  * Differences between RL and other ML paradigms
* Elements of RL
  * Agent, Policy function, Value function, Model
  * Agent–environment interface
* Types of RL environments
  * Deterministic vs. Stochastic
  * Fully observable vs. Partially observable
  * Discrete vs. Continuous
  * Episodic vs. Non-episodic
  * Single-agent vs. Multi-agent
* RL platforms and frameworks
  * OpenAI Gym and Universe
  * DeepMind Lab
  * RL-Glue
  * Project Malmo
  * ViZDoom
* Applications of RL
  * Education, Medicine, Healthcare
  * Manufacturing and Inventory Management
  * Finance
  * Natural Language Processing and Computer Vision

</details>

<details>

<summary>M2: Bandits and Markov Decision Processes</summary>

* Multi-armed Bandits
  * The k-armed Bandit Problem
  * Action-value Methods
  * Incremental Implementation
  * Tracking Nonstationary Problems
  * Optimistic Initial Values
  * Upper-Confidence-Bound (UCB) Action Selection
  * Gradient Bandit Algorithms
  * Associative Search (Contextual Bandits)
* Finite Markov Decision Processes (MDPs)
  * Agent–environment interface
  * Goals, Rewards, and Returns
  * Policies and Value Functions
  * Episodic vs. Continuing Tasks
  * Optimal Policies and Value Functions
  * Bellman Equation and Optimality
  * Deriving and Solving the Bellman Equation for Value and Q-functions

</details>

<details>

<summary>M3: Dynamic Programming and Monte Carlo Methods</summary>

* Dynamic Programming (DP)
  * Policy Evaluation (Prediction)
  * Policy Improvement, Policy Iteration, and Value Iteration
  * Asynchronous Dynamic Programming
  * Generalized Policy Iteration
  * Efficiency of Dynamic Programming
  * Solving the Frozen Lake problem
* Monte Carlo Methods
  * Monte Carlo Prediction (First Visit and Every Visit)
  * Blackjack with Monte Carlo
  * Monte Carlo Estimation of Action Values
  * Monte Carlo Control (with and without Exploring Starts)
  * Off-policy Prediction via Importance Sampling
  * Incremental Implementation
  * Off-policy Monte Carlo Control

</details>

<details>

<summary>M4: Temporal-Difference (TD) Learning</summary>

* TD Learning Methods
  * TD Prediction and its advantages
  * Optimality of TD(0)
  * Sarsa: On-policy TD Control
  * Q-learning: Off-policy TD Control
  * Expected Sarsa
* Applications and Problems Solved Using TD Methods
  * Q-learning solutions for Taxi and Frozen Lake problems
  * Sarsa implementation for Taxi problem
* Advanced Topics
  * Maximization Bias and Double Learning
  * Games and Afterstates
  * Deep Q-learning

</details>

<details>

<summary>M5: Function Approximation in Reinforcement Learning</summary>

* On-policy Prediction with Approximation
  * Value-function Approximation
  * Prediction Objective (VE)
  * Stochastic-gradient and Semi-gradient Methods
  * Linear Methods
* Feature Construction for Linear Methods
  * Polynomials
  * Fourier Basis
  * Coarse Coding and Tile Coding
  * Radial Basis Functions (RBFs)
  * Step-size Parameter Selection
* Nonlinear Function Approximation
  * Artificial Neural Networks
  * Least-Squares TD
  * Memory-based and Kernel-based Function Approximation
* Advanced On-policy Learning Topics
  * Interest and Emphasis

</details>

{% hint style="danger" %}
Content Yet to be Updated for this Page, Submit Resources If you want to collaborate, find form attached on this page below.
{% endhint %}

## Notes

### MidTerm

### EndSem

\[⤓]

## Question Directory

\[⤓]

### Assignment Questions

\[⤓]

### Previous Year Questions

#### Mid-Sem-PYQ

<table><thead><tr><th width="81.90771484375">[⤓]</th><th width="554.568115234375">Content Preview</th></tr></thead><tbody><tr><td><a href="https://drive.google.com/uc?export=download&#x26;id=1E880LOY9gMLpp4BeIdLDZ8qxHyZJUlc5" class="button primary" data-icon="arrow-down-to-square"></a></td><td><a href="https://drive.google.com/file/d/1E880LOY9gMLpp4BeIdLDZ8qxHyZJUlc5/view?usp=drive_link">Y3S5-CSA501-RL-MidTerm-PYQ-OCT25</a></td></tr></tbody></table>

#### End-Sem-PYQ

<table><thead><tr><th width="81.9005126953125">[⤓]</th><th width="547.80322265625">Content Preview</th></tr></thead><tbody><tr><td><a href="https://drive.google.com/uc?export=download&#x26;id=1OgG7Kk5RNYVQB6D8P-ffIpW-V19d7lv6" class="button primary" data-icon="arrow-down-to-square"></a></td><td><a href="https://drive.google.com/file/d/1OgG7Kk5RNYVQB6D8P-ffIpW-V19d7lv6/view?usp=drive_link">Y3S5-CSA501-AIR-EndSem-PYQ-DEC23</a></td></tr><tr><td><a href="https://drive.google.com/uc?export=download&#x26;id=1DIS_CCSQC3XKOeeYDaINx1k45orOyV7k" class="button primary" data-icon="arrow-down-to-square"></a></td><td><a href="https://drive.google.com/file/d/1DIS_CCSQC3XKOeeYDaINx1k45orOyV7k/view?usp=drive_link">Y3S5-CSA501-AIR-EndSem-PYQ-DEC24</a></td></tr></tbody></table>

## External Sources

\[⤓]

***

{% embed url="<https://discord.gg/6ywR3zbNfg>" %}

{% embed url="<https://mantavyam.notion.site/18152f7cde8880d699a5f2e65f87374e>" %}

{% embed url="<https://mantavyam.notion.site/17e52f7cde8880e0987fd06d33ef6019>" %}
