Types of Reinforcement Learning Algorithms

February 15, 2023 - Lou Farrell

Revolutionized is reader-supported. When you buy through links on our site, we may earn an affiliate commision. Learn more here.

Machine learning (ML), a subfield of artificial intelligence (AI), is evolving at a rapid pace. Major companies like Netflix, Google, Microsoft and IBM are leveraging ML for various applications. ML algorithms are core components of ML models. 

However, there are numerous algorithms that fall into the three basic paradigms of ML, including reinforcement, supervised and unsupervised learning. What are the specific categories of reinforcement learning (RL) algorithms, and how are they used for certain applications?

An Overview of Reinforcement Learning in ML

Reinforcement learning (RL) is an ML training method where software on a computer receives a reward after performing the user’s desired action by process of trial and error. A simple way to think of RL is comparing it to training a dog. Most dog training involves asking the pup to perform an action, and the owner rewards it with a treat if done successfully. 

If the dog does not respond to the task correctly, it does not receive a treat. Once this happens, the dog can understand that it must perform the action in order to get a treat. This process is similar to how reinforcement learning works in the field of ML.

Users train ML models to perform tasks, rewarding the machine if it performs the task correctly. It builds upon the idea of positive reinforcement, a well-known concept in the psychology field.

What are Reinforcement Learning Algorithms?

RL algorithms, also referred to as the “agents” in ML models, are comparable to the dog in the analogy outlined above. They are the key components of the ML model that decide which actions to pursue

Advanced RL algorithms are widely used in applications like:

  • Chess
  • Checkers
  • Go (AlphaGo)
  • Robotic control
  • Backgammon

The algorithms interact with the environment, which contains all of the data points needed to make decisions, forming what is called an agent-environment interaction loop. This loop supports the “reinforcement” idea in the field of RL. 

The goal of an RL algorithm is to make the right decisions to minimize punishment and maximize reward. Over time, RL algorithms tend to become better at making the right decisions, creating an effective ML model.

Characteristics of RL Algorithms

There are a few key characteristics all RL algorithms share. For example:

  • There is no supervisor monitoring the learning process, only real numbers and reward signals.
  • Decisions are made in sequential order.
  • Time is of the essence when it comes to RL problems.
  • There’s a delay in feedback, as it is not instantaneous.
  • The agent’s actions determine the subsequent data it receives from its environment.

Much of the RL field is still evolving, as it’s only in its infancy. With more time and research, RL algorithms could become even more advanced.

Two Categories of RL Algorithms

RL algorithms have several subcategories, as these algorithms are highly advanced and serve different purposes. However, RL algorithms typically fall into two broad categories: model-free or model-based. 

Model-Free Learning

In model-free learning, the algorithm relies solely on its experiences with trial and error. For example, suppose an ML expert is training a robot to wave hello. In that case, it would receive a reward for a successful wave, or a punishment for a thumbs down gesture. 

Over time and with repetition, the robot would eventually learn to wave hello consistently to maximize its reward.

Model-Based Learning

In this type of learning, the algorithm models the environment and makes the optimal choice based on that learned model. One upside to this algorithm classification is that the algorithm can plan to make decisions in the future based on the learned model. 

In model-free learning, it is unable to do so. The one downside of model-based learning is that it can only learn the model of the environment from experience, which takes time and is not always a linear journey.

Types of Model-Free and Model-Based Learning Algorithms

Nonprofit OpenAI has a comprehensive online resource dedicated to outlining the different model-free and model-based learning algorithms. Without these families of algorithms, reinforcement learning would not be possible. 

Below are the different algorithms used in RL and how they work. Let’s start with the model-free algorithms.

Policy Optimization

There are four main types of policy optimization, model-free algorithms:

  • Policy Gradient
  • A2C/A23
  • PPO
  • TRPO

In policy optimization, the agent learns directly from a policy the engineer sets. There are two types of policy maps given to the agent: deterministic or stochastic. The former states an action the agent should perform without certainty, and the latter outputs a policy distribution over several actions. 


There are also four types of Q-learning, model-free algorithms:

  • DQN
  • C51
  • QR-DQN
  • HER

Some model-free RL algorithms fall into both the policy optimization and Q-learning categories, including:

Now, onto the model-based learning algorithms.

When the Agent Learns the Model

Because model-based learning algorithms are different from those which are model-free, there is no way to easily organize the clusters of methods used in RL. Here are a few examples, but this is by no means an exhaustive list:

  • I2A
  • World Models
  • MBMF
  • MBVE

Each algorithm, whether it’s model-free or model-based, helps the agent (RL model) learn during the training process. 

When the Agent is Given the Model

There is only one algorithm that falls into this category, which is known as AlphaZero. AlphaZero recently got a lot of hype when the Zero iteration defeated the best Go player in the world. This highlights just how powerful these algorithms are, and we’re only just beginning to see their impact.

In addition to AlphaZero beating a human Go player, a deep RL algorithm combined with a neural network once beat experts at Stratego, a highly complex game. 

The Future of RL and RL Algorithms

While current RL algorithms are used in AI-powered games, it will likely become more widely used by industries around the world. Reinforcement learning is only a small piece of the advanced AI and ML puzzle, but is responsible for solving many complex problems. 

Ultimately, RL algorithms used for trivial applications are laying the groundwork for more useful, meaningful real-world applications in the future. It’ll be interesting to see how reinforcement learning changes as time goes on and what role it plays in the AI landscape.

Revolutionized is reader-supported. When you buy through links on our site, we may earn an affiliate commision. Learn more here.


Lou Farrell

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.