Reward Shaping¶

This document gives some pointers for helpful variables and functions for reward shaping.

A peculiarity in multi-agent environments is that agents are rewarded much later than they take their action, that is, an agent receives its reward along with the observation to take its next action. This means that we cannot use present information but need to work with past state. This is why below, several variables are prefixed with a prev for ‘previous’, referring to when the agent was active last.

Useful Variables and Functions available to `RewardFunction`¶

What follows is a list of useful attributes accessible to the RewardFunction that will help for reward shaping. In the list, self refers to the RewardFunction instance.

Please see the respective docstrings for a detailed description of each of these. Docstrings for variables are triple-quoted strings below them.

Find attributes beginning with self.env in the file hearts_gym/envs/hearts_env.py under HeartsEnv.
Find attributes beginning with self.game in the file hearts_gym/envs/hearts_game.py under HeartsGame.

Attribute	Summary
`self.game.num_players`	Number of players.
`self.game.deck_size`	Number of cards in the deck.
`self.game.prev_hands`	Cards in hands; use player index for retrieval.
`self.game.prev_played_cards`	Cards actively played; use player index for retrieval.
`self.game.prev_table_cards`	Cards on the table.
`self.game.prev_collected`	All cards collected; use player index for retrieval.
`self.game.collected`	All cards collected after the action; use player index for retrieval.
`self.game.penalties`	Penalty scores; use player index for retrieval.
`self.game.prev_was_illegals`	Wether actions were illegal; use player index for retrieval.
`self.game.prev_states`	Card state vector; use player index for retrieval.
`self.game.prev_was_first_trick`	Wether it is the first trick of the game.
`self.game.prev_leading_hearts_allowed`	Wether leading with hearts is allowed; use player index for retrieval.
`self.game.prev_leading_suit`	Leading suit.
`self.game.prev_leading_player_index`	Index of the player that lead the trick.
`self.game.prev_trick_winner_index`	Index of the player that won the trick.
`self.game.prev_trick_penalty`	Trick penalty.
`self.game.get_penalty`	Return the penalty score of a given card.
`self.game.has_penalty`	Return whether the given card has a penalty score greater than zero.
`self.game.has_shot_the_moon`	Return whether the given player shot the moon.

Reward Shaping¶

Useful Variables and Functions available to `RewardFunction`¶

Hearts Gym

Navigation

Related Topics

Reward Shaping¶

Useful Variables and Functions available to RewardFunction¶

Useful Variables and Functions available to `RewardFunction`¶