Criterion Filtering: A Dual Approach to Bayesian Inference and Adaptive Control

Last Updated: 3 June 2024

 Site Maintained By: Leigh Tesfatsion Professor Emerita of Economics Courtesy Research Professor of    Electrical & Computer Engineering Heady Hall 260 Iowa State University Ames, Iowa 50011-1054 https://www2.econ.iastate.edu/tesfatsi/ tesfatsi AT iastate.edu Table of Contents:

Research Summary

In sequential decision-making under uncertainty, can the associative mapping between decisions and expected payoffs be directly updated without prior recourse to the updating of probabilities?

As detailed in a 1982 Theory and Decision article titled "A Dual Approach to Bayesian Inference and Adaptive Control" (pdf,774KB), the short answer is "yes."

I first posed this question in my 1975 Ph.D. thesis in economics, written at the University of Minnesota under the direction of Clifford Hildreth and Leonid Hurwicz. This thesis, titled "Two Essays on Individual Choice" (pdf), was subsequently published in a shortened form as Tesfatsion (1980a). A major component of this thesis was the development of a conditional expected utility model for boundedly rational decision makers in which utility and probability were symmetrically treated. That is, utility and probability were both conditioned on the decision-maker's selected goals and actions (g,a), and both were defined over the same set of possible (g,a)-conditioned events. Given this symmetry, I wondered whether it might be both feasible and useful to derive a "Bayes' rule for utility" in parallel to the standard Bayes' rule for probability.

More precisely, many sequential decision techniques require the successive updating of a probability distribution defined over the state-of-the-world for some decision environment. For certain types of problems, however, an understanding of this probability distribution has no intrinsic value. Ultimate interest focuses on the criterion function -- e.g. expected utility -- which is indirectly updated by means of the updated probability distribution. Suppose attention is shifted from the state-of-the-world as a random event to the criterion function, itself, as a random function of the state. Could the criterion function then be estimated and updated directly, by-passing explicit probability distribution updating altogether?

An affirmative answer was provided in a series of studies Tesfatsion (1976), Tesfatsion (1977), Tesfatsion (1978a), and Tesfatsion (1978b). In these studies I showed that consistent directly updated expected utility estimates can be obtained for an interesting class of sequential decision problems by filtering over a sequence of past utility assessments. Under weak plausible restrictions, the sequence of actions maximizing the updated expected utility estimates converges to a local maximum of the true expected utility function. Moreover, suitable selection of a "prior" expected utility function at the beginning of the initial period can guarantee, in some cases, the convergence of these actions to a global maximum of the true expected utility function. I called this method criterion filtering.

Criterion filtering represents one possible approach to sequential decision-making that decreases computational complexity while retaining the essence of the Bayesian message: prior and sample information are to be combined to form updated expected utility evaluations over the set of feasible actions.

Without the inclusion of a prior expected utility function in the criterion filter, the selection of an action in accordance with criterion filtering reduces to the following simple opportunity cost maxim analogous to the maximum likelihood principle for determining parameter estimates: Select an action today that would have yielded maximum average utility had it been selected over "similar" past periods. With the inclusion of a prior expected utility function in the criterion filter, the selection of an action in accordance with criterion filtering is analogous to the Bayesian determination of a parameter estimate by maximization of a posterior probability density function.

At this point I was using dynamic programming extensively in other ongoing research. Consequently, I wondered if criterion filtering could be extended to the direct updating of dynamic programming value (cost-to-go) functions. An affirmative answer was provided in Tesfatsion (1979). Specifically, this article demonstrates the feasibility of directly updating dynamic programming value functions on the basis of sequentially obtained utility assessments, bypassing the need for explicit probability updating via Bayes' rule. The performance of the method is systematically explored for a class of adaptive control problems by means of computational experiments.

In Tesfatsion (1980b) and Tesfatsion (1984) I extended criterion filtering to sequential game frameworks with boundedly rational players.

The basic idea of criterion filtering motivated in part my development (with Robert Kalaba) of the "Flexible Least Squares (FLS)" approach to the sequential estimation of dynamic nonlinear systems with poorly understood structure. I have also used criterion filtering for the modeling of learning agents in some of my agent-based computational economics research, particularly in my research on the Trade Network Game (TNG).

It appears that criterion filtering has some interesting connections to the elegant Q-learning theory independently developed by Charles Watkins (1989, "Learning From Delayed Rewards," Ph.D. Thesis, Cambridge University) and to temporal-difference learning as elaborated by Richard S. Sutton and Andrew G. Barto (Reinforcement Learning: An Introduction, The MIT Press, Third Printing, 2000).

See Andrew G. Barto (2007), "Temporal Difference Learning", Scholarpedia (Art. #1604) for more details.

Publications

• Leigh Tesfatsion (1984), "Games, Goals, and Bounded Rationality" (pdf,734KB), Theory and Decision, Vol. 17, pp. 149-175. The published article is available from SpringerLink.
Abstract: This article presents a generalization of the standard N-person game with flexible information requirements suitable for players constrained by certain types of bounded rationality. In particular, strategies (complete contingency plans) are replaced with partial contingency plans augmented by goals. Both utility and probability are conditioned on selected goals and actions (g,a), and both are defined over the same set of possible (g,a)-conditioned events. Well-known existence theorems for Nash equilibria and Nash bargaining solutions are extended to this context. For adaptive sequential games, the symmetrical treatment of payoffs and probability assessments permits players to learn their successive moves via "criterion filtering." That is, the expected utility criterion function of each player can be directly updated in each decision period using transitional utility assessments in a manner analogous to Bayes' rule for updating probability distributions using transitional probability assessments.

• Leigh Tesfatsion (1982), "A Dual Approach to Bayesian Inference and Adaptive Control" (pdf,774KB), Theory and Decision, Vol. 14, pp. 177-194. The published article is available from SpringerLink.
Abstract: This article surveys results established to date for the "criterion filtering" approach to adaptive control. Criterion filtering bypasses the usual preliminary updating of probability distributions via transitional probability assessments (Bayes' rule) and focuses instead on the direct updating of the criterion function, itself, via transitional return assessments.

• Leigh Tesfatsion (1980a), "A Conditional Expected Utility Model for Myopic Decision Makers" [ (Model,pdf,1.2MB) and (Axiomatization,pdf,788KB)], Theory and Decision, Vol. 12, pp. 185-206. The published article is available from SpringerLink.
Abstract: This article formulates and axiomatizes a conditional expected utility model that allows a decision maker to specify actions in the form of partial rather than complete contingency plans and to simultaneously choose goals and actions in end-mean pairs. The resulting model is shown to be a generalization of the "Small World" model developed by Leonard Savage (The Foundations of Statistics, 1972). In this conditional expected utility model, both utility and probability are conditioned on selected goals and actions (g,a), and both are defined over the same set of possible (g,a)-conditioned events. For adaptive sequential decision problems, this symmetrical treatment of utility and probability permits agents to learn via "criterion filtering." That is, the expected utility criterion function can be directly updated in each decision period using transitional utility assessments in a manner analogous to Bayes' rule for updating probability distributions using transitional probability assessments.

• Leigh Tesfatsion (1980b), "C3 Modeling with Symmetrical Rationality" (pdf,647KB), Applied Mathematics and Computation, Vol. 6, pp. 51-61. The published article is available from Science Direct.
Abstract: In the absence of contrary information, it would seem prudent for competitors to attribute to their opposition the same level of rationality they attribute to themselves. Using a simple but interesting C3 (command, control, and communication) problem for illustration, a method is proposed for incorporating symmetrical rationality without resorting to the general multistage game framework which has proved difficult to apply in practice. A "criterion filtering" technique is then proposed for the approximate solution of the resulting model that does not require integration operations and that appears to be especially well suited for C3 problems with finite admissible control sets.

• Leigh Tesfatsion (1979), "Direct Updating of Intertemporal Criterion Functions for a Class of Adaptive Control Problems" (pdf,2MB), IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-9, pp. 143-151. The published article is available from IEEE Xplore.
Abstract: This article extends the criterion filtering approach [ Tesfatsion (1978a)] to intertemporal stochastic optimization problems. It demonstrates the feasibility of directly updating dynamic programming value functions on the basis of sequentially obtained return assessments, bypassing the need for explicit probability updating via Bayes' rule. The performance of the method for a class of adaptive control problems is systematically explored using computational experiments.

• Robert Kalaba and Leigh Tesfatsion (1978a), "Two Solution Techniques for Adaptive Reinvestment: A Small-Sample Comparison", Journal of Cybernetics, Vol. 8, pp. 101-111.
Abstract: Two adaptive control techniques are compared in the context of an adaptive reinvestment two-armed bandit problem. The first technique is Bayesian dynamic programming. The distinctive feature of the second technique -- "criterion filtering" -- is the direct estimation and updating of the criterion function without recourse to explicit probability updating via Bayes' rule. The control selections generated by the two techniques are shown to closely approximate each other. An explanation for this close approximation is provided by means of an equivalence theorem for the specification of the criterion function.

• Leigh Tesfatsion (1978a), "A New Approach to Filtering and Adaptive Control", Journal of Optimization Theory and Applications, Vol. 25, pp. 247-261. The published article is available from SpringerLink.
Abstract: A new approach to adaptive control is proposed, referred to as criterion filtering. The principle distinguishing feature of criterion filtering is the direct updating of the current expected return function by means of a filtering operation on a vector of past return functions. The data storage and computational problems often associated with explicit probability updating via Bayes' rule are thus avoided.

• Leigh Tesfatsion (1978b), "A New Approach to Filtering and Adaptive Control: Stability Results" (pdf,911KB), Applied Mathematics and Computation, Vol. 4, No. 1, pp. 27-44. The published article is available from Science Direct.
Abstract: In a companion study Tesfatsion (1978a), a new approach to adaptive control is proposed referred to as "criterion filtering." The principle distinguishing feature of criterion filtering is the direct updating of the current expected return function by means of a filtering operation on a vector of past return functions. In this paper convergence properties are established for a simple linear criterion function filter designed for a class of adaptive control problems typified by a well-known two-armed bandit problem.

• Leigh Tesfatsion (1977), "A New Approach to Filtering and Adaptive Control: Optimality Results", Journal of Cybernetics, Vol. 7, pp.133-146.
Abstract: In a companion study Tesfatsion (1978a), a new approach to adaptive control is proposed referred to as "criterion filtering." The principle distinguishing feature of criterion filtering is the direct updating of the current expected return function by means of a filtering operation on a vector of past return functions. In this paper sufficient conditions are established for control variables selected in accordance with a simple linear criterion filter to converge to a global maximum of the true criterion function. When states depend nontrivially on control variable selection, the decision maker determines the trade-off between rate of convergence and global optimality by the choice of the greatest lower bound for the prior (initial period) criterion function. When states are independent of control variable selection, the asymptotic global optimality of control variable selections holds under weak restrictions.

• Leigh Tesfatsion (1976), "Bayes' Theorem for Utility" (pdf,867KB), Discussion Paper 76-65, Center for Economic Research, University of Minnesota.
Abstract: This paper develops in depth the analogy between the direct updating of expected return functions on the basis of transitional return assessments and the use of Bayes' Rule to update probability distributions on the basis of transitional probability assessments.

• Leigh Tesfatsion (1975), "Two Essays on Individual Choice" (pdf,2.5MB), Thesis, Department of Economics, 76-15,007 University of Minnesota, December 1975. NOTE: A shortened version of this thesis was published in a 1980 Theory and Decision article, and a game theory extension of the theory developed in this thesis was published in a 1984 Theory and Decision article; see above for these two articles.
Abstract: This thesis formulates and axiomatizes an expected utility model of individual choice that allows a decision maker to specify his available actions in the form of "controls" (partial contingency plans) and to simultaneously choose goals and controls in end-mean pairs. It is shown that the Savage expected utility model, the Marschak-Radner team model, the Bayesian statistical decision model, and the standard optimal control model can all be viewed as special cases of this "goal-control expected utility model."