Rule-based Policies

This document goes more in-depth into implementing rule-based agents and what to look out for when doing so.

Implementing the Default Rule-based Policy

As no pre-existing rule-based agent is provided, the default one has to be implemented in the file hearts_gym/policies/rule_based_policy_impl.py before it is able to be used. Simply implement the compute_action method to return a single action and the agent will work. It is available under the policy ID RULEBASED_POLICY_ID in configuration.py. This is also the rule-based policy referred to by any policy_mapping_fn with rulebased in the name.

Observed Games

Any rule-based policy has access to an ObservedGame under the game member variable. The observed game provides several utility functions that may be useful to implement the policy.

For each observation, the observed game recreates the game state as viewed by the observing player (the agent being implemented). Due to only having limited knowledge of the game and working with the provided observations, some variables have to be treated with care. For example, the specially labeled variables offset_collected and offset_penalties are not ordered by player indices but instead by index offsets (see index offsets in docs/environment.md for an explanation of these). The cards on the table, available under table_cards, are simply ordered by time of placement.

Other differences between a standard HeartsGame and an ObservedGame include different and fewer variables and functionalities. For example, as an observed game only has access to the information one player has, there is only one hand but a list of unknown_cards whose location is not known to the observing player.

Implementing Other Rule-based Policies

Any number of rule-based agents may exist in parallel; simply create a new deterministic policy implementation by following these steps:

  1. Subclass hearts_gym.policies.deterministic_policy_impl.DeterministicPolicyImpl.

  2. Implement a compute_action method with the same signature as given by the superclass.

  3. Add the newly implemented class to the custom_rulebased_policies dictionary in configuration.py, for example like this:

    # Note we are adding the class, not an instance of it.
    custom_rulebased_policies: Dict[str, type] = {
        'my_new_policy': MyNewPolicyImpl,
    }
    
  4. Create a new policy_mapping_fn that includes the new policy ID by mapping a player index (the agent ID) to it.