Reward Shaping¶
This document gives some pointers for helpful variables and functions for reward shaping.
A peculiarity in multi-agent environments is that agents are rewarded
much later than they take their action, that is, an agent receives its
reward along with the observation to take its next action. This means
that we cannot use present information but need to work with past
state. This is why below, several variables are prefixed with a prev
for ‘previous’, referring to when the agent was active last.
Useful Variables and Functions available to RewardFunction¶
What follows is a list of useful attributes accessible to the
RewardFunction that will help for reward shaping. In the list,
self refers to the RewardFunction instance.
Please see the respective docstrings for a detailed description of each of these. Docstrings for variables are triple-quoted strings below them.
Find attributes beginning with
self.envin the filehearts_gym/envs/hearts_env.pyunderHeartsEnv.Find attributes beginning with
self.gamein the filehearts_gym/envs/hearts_game.pyunderHeartsGame.
Attribute |
Summary |
|---|---|
|
Number of players. |
|
Number of cards in the deck. |
|
Cards in hands; use player index for retrieval. |
|
Cards actively played; use player index for retrieval. |
|
Cards on the table. |
|
All cards collected; use player index for retrieval. |
|
All cards collected after the action; use player index for retrieval. |
|
Penalty scores; use player index for retrieval. |
|
Wether actions were illegal; use player index for retrieval. |
|
Card state vector; use player index for retrieval. |
|
Wether it is the first trick of the game. |
|
Wether leading with hearts is allowed; use player index for retrieval. |
|
Leading suit. |
|
Index of the player that lead the trick. |
|
Index of the player that won the trick. |
|
Trick penalty. |
|
Return the penalty score of a given card. |
|
Return whether the given card has a penalty score greater than zero. |
|
Return whether the given player shot the moon. |