Criterion Filtering: A Dual Approach to
Bayesian Inference and Adaptive Control

Last Updated: 3 June 2024

Site Maintained By:
Leigh Tesfatsion
Professor Emerita of Economics
Courtesy Research Professor of
   Electrical & Computer Engineering
Heady Hall 260
Iowa State University
Ames, Iowa 50011-1054
tesfatsi AT

Table of Contents:

Research Summary

In sequential decision-making under uncertainty, can the associative mapping between decisions and expected payoffs be directly updated without prior recourse to the updating of probabilities?

As detailed in a 1982 Theory and Decision article titled "A Dual Approach to Bayesian Inference and Adaptive Control" (pdf,774KB), the short answer is "yes."

I first posed this question in my 1975 Ph.D. thesis in economics, written at the University of Minnesota under the direction of Clifford Hildreth and Leonid Hurwicz. This thesis, titled "Two Essays on Individual Choice" (pdf), was subsequently published in a shortened form as Tesfatsion (1980a). A major component of this thesis was the development of a conditional expected utility model for boundedly rational decision makers in which utility and probability were symmetrically treated. That is, utility and probability were both conditioned on the decision-maker's selected goals and actions (g,a), and both were defined over the same set of possible (g,a)-conditioned events. Given this symmetry, I wondered whether it might be both feasible and useful to derive a "Bayes' rule for utility" in parallel to the standard Bayes' rule for probability.

More precisely, many sequential decision techniques require the successive updating of a probability distribution defined over the state-of-the-world for some decision environment. For certain types of problems, however, an understanding of this probability distribution has no intrinsic value. Ultimate interest focuses on the criterion function -- e.g. expected utility -- which is indirectly updated by means of the updated probability distribution. Suppose attention is shifted from the state-of-the-world as a random event to the criterion function, itself, as a random function of the state. Could the criterion function then be estimated and updated directly, by-passing explicit probability distribution updating altogether?

An affirmative answer was provided in a series of studies Tesfatsion (1976), Tesfatsion (1977), Tesfatsion (1978a), and Tesfatsion (1978b). In these studies I showed that consistent directly updated expected utility estimates can be obtained for an interesting class of sequential decision problems by filtering over a sequence of past utility assessments. Under weak plausible restrictions, the sequence of actions maximizing the updated expected utility estimates converges to a local maximum of the true expected utility function. Moreover, suitable selection of a "prior" expected utility function at the beginning of the initial period can guarantee, in some cases, the convergence of these actions to a global maximum of the true expected utility function. I called this method criterion filtering.

Criterion filtering represents one possible approach to sequential decision-making that decreases computational complexity while retaining the essence of the Bayesian message: prior and sample information are to be combined to form updated expected utility evaluations over the set of feasible actions.

Without the inclusion of a prior expected utility function in the criterion filter, the selection of an action in accordance with criterion filtering reduces to the following simple opportunity cost maxim analogous to the maximum likelihood principle for determining parameter estimates: Select an action today that would have yielded maximum average utility had it been selected over "similar" past periods. With the inclusion of a prior expected utility function in the criterion filter, the selection of an action in accordance with criterion filtering is analogous to the Bayesian determination of a parameter estimate by maximization of a posterior probability density function.

At this point I was using dynamic programming extensively in other ongoing research. Consequently, I wondered if criterion filtering could be extended to the direct updating of dynamic programming value (cost-to-go) functions. An affirmative answer was provided in Tesfatsion (1979). Specifically, this article demonstrates the feasibility of directly updating dynamic programming value functions on the basis of sequentially obtained utility assessments, bypassing the need for explicit probability updating via Bayes' rule. The performance of the method is systematically explored for a class of adaptive control problems by means of computational experiments.

In Tesfatsion (1980b) and Tesfatsion (1984) I extended criterion filtering to sequential game frameworks with boundedly rational players.

The basic idea of criterion filtering motivated in part my development (with Robert Kalaba) of the "Flexible Least Squares (FLS)" approach to the sequential estimation of dynamic nonlinear systems with poorly understood structure. I have also used criterion filtering for the modeling of learning agents in some of my agent-based computational economics research, particularly in my research on the Trade Network Game (TNG).

It appears that criterion filtering has some interesting connections to the elegant Q-learning theory independently developed by Charles Watkins (1989, "Learning From Delayed Rewards," Ph.D. Thesis, Cambridge University) and to temporal-difference learning as elaborated by Richard S. Sutton and Andrew G. Barto (Reinforcement Learning: An Introduction, The MIT Press, Third Printing, 2000).

See Andrew G. Barto (2007), "Temporal Difference Learning", Scholarpedia (Art. #1604) for more details.