How humans and other animals decide to allocate time and effort across competing priorities has fascinated researchers for decades. Psychologists have found that most animals allocate their time among options in proportion to rewards received from the options, adjusting their behavior in response to reward feedback—a behavioral law known as matching.
Now a Dartmouth-led research team has developed metrics to predict matching behavior when the chance of receiving a reward is unpredictable to the individual making the choice. The results are published in Nature Communications.
“Matching is fundamental to how we choose between options available to us,” says Ethan Trepka ’22, who is one of two first authors on the paper. Trepka, a computer science major and neuroscience minor who is a senior undergraduate researcher at the Computational and Cognitive Neuroscience Lab, says matching “governs things like which checkout line we choose at the grocery store or how much time we spend on different projects for school or work. How much time one chooses to spend on a given option depends on how frequently a reward is received from that option relative to other options.”
In collaboration with other researchers, the Dartmouth team re-analyzed behavioral data from past experiments with mice and monkeys. In the experiments, conducted at Johns Hopkins University and the National Institutes of Health, mice and monkeys chose between two options and received rewards—water for mice and drops of apple juice for monkeys—based on their choices. The option that offered the better probability of reward could change unpredictably, so the animals had to keep track of the previous rewards as they made their choices. The results showed that both mice and monkeys exhibited undermatching—that is, a general tendency to select the better option less often than what the law of matching would predict.
To predict this deviation from the matching law, the Dartmouth researchers developed a new set of metrics that measure inconsistency in the animal’s tendency to stay with—or switch from—their current option according to the reward outcome. The metrics are based on the concept of entropy in information theory, a mathematical framework that can be used to quantify the amount of uncertainty in a system. The metrics also provide a new way to quantify adaptive behavior and can be used to improve previous computational models of learning and decision-making.
“When we are faced with different options, we use the outcomes of our previous choices to make future decisions, and this should make us choose the better or more rewarding option most of the time,” says senior author Alireza Soltani, an associate professor of psychological and brain sciences and principal investigator of the Computational and Cognitive Neuroscience Lab. “However, we do not always choose the better option as often as we should and end up undermatching, which can be undesirable as this behavior can reduce the total reward that can be received. On the other hand, exploring inferior options could be crucial for survival, although this may result in missing some rewards.”