Search results for "Bellman equation"
showing 10 items of 26 documents
Testing the Form of a Decision-maker's Multiattribute Value Function Based on Pairwise Preference Information
1989
In a recent paper we presented a test, based on pairwise preference information, to identify to which class of functions (linear, quasi-concave, or neither) a decision-maker's (implicit) value function belongs. In this note we investigate the power of the test. Some improvements to the test are also suggested.
A Decision Model for the Multiple Criteria Group Secretary Problem: Theoretical Considerations
1996
A decision model is developed for solving the discrete multiple criteria group secretary problem. The model extends the single decision-maker progressive algorithm by Korhonen, Moskowitz and Wallenius to group contexts. As the original progressive algorithm, it relaxes the usual assumption of a fixed set of available decision alternatives and complete knowledge of a decision-maker's preference structure (value function). The decision-makers are requested to settle on a compromise, if possible. The model then proceeds with determining the likelihood of finding possibly/surely better settlements (compromises). Linear value functions, linear prospect theory-type value functions, and quasiconca…
Least-squares temporal difference learning based on an extreme learning machine
2014
Abstract Reinforcement learning (RL) is a general class of algorithms for solving decision-making problems, which are usually modeled using the Markov decision process (MDP) framework. RL can find exact solutions only when the MDP state space is discrete and small enough. Due to the fact that many real-world problems are described by continuous variables, approximation is essential in practical applications of RL. This paper is focused on learning the value function of a fixed policy in continuous MPDs. This is an important subproblem of several RL algorithms. We propose a least-squares temporal difference (LSTD) algorithm based on the extreme learning machine. LSTD is typically combined wi…
Optimal Impulse Control When Control Actions Have Random Consequences
1997
We consider a generalised impulse control model for controlling a process governed by a stochastic differential equation. The controller can only choose a parameter of the probability distribution of the consequence of his control action which is therefore random. We state optimality results relating the value function to quasi-variational inequalities and a formal optimal stopping problem. We also remark that the value function is a viscosity solution of the quasi-variational inequalities which could lead to developments and convergence proofs of numerical schemes. Further, we give some explicit examples and an application in financial mathematics, the optimal control of the exchange rate…
Stackelberg Equilibrium with Many Leaders and Followers. The Case of Setup Costs
2016
I provide conditions that guarantee that a Stackelberg game with a setup cost and an integer number of leaders and followers has an equilibrium in pure strategies. The main feature of the game is that when the marginal follower leaves the market the price jumps up, so that a leader’s payoff is neither continuous nor quasiconcave. To show existence I check that a leader’s value function satisfies the following single crossing condition: When the other leaders produce more the leader never accommodates entry of more followers. If demand is strictly logconcave, and if marginal costs are both non decreasing and not flatter than average costs, then a Stackelberg equilibrium exists. Besides showi…
Existence and uniqueness of solutions to a quasilinear parabolic equation with quadratic gradients in financial markets
2005
A quasilinear parabolic equation with quadratic gradient terms is analyzed. The equation models an optimal portfolio in so-called incomplete financial markets consisting of risky assets and non-tradable state variables. Its solution allows to compute an optimal portfolio strategy. The quadratic gradient terms are essentially connected to the assumption that the so-called relative risk aversion function is not logarithmic. The existence of weak global-in-time solutions in any dimension is shown under natural hypotheses. The proof is based on the monotonicity method of Frehse. Furthermore, the uniqueness of solutions is shown under a smallness condition on the derivatives of the covariance (?…
Treating Ordinal Criteria in Stochastic Weight Space Analysis
2001
We consider discrete co-operative group decision-making problems and suggest a method that is aimed at providing descriptive information about the acceptability of different decision alternatives. The method is a new variant of the Stochastic Multicriteria Acceptability Analysis (SMAA) method for discrete multicriteria decision-making problems with multiple decision makers. The new method is designed for problems where criterion information is completely or partially ordinal, that is, experts (or decision makers) have ranked the alternatives criterion-wise. The approach is particularly suitable for group decision making where either no or only partial preference information is available ass…
An infinite-horizon model of dynamic membership of international environmental agreements
2007
Abstract Much of the literature on international environmental agreements (IEAs) uses static models, although most important transboundary pollution problems involve stock pollutants. The few papers that study IEAs using models of stock pollutants do not allow for the possibility that membership of the IEA may change endogenously over time. In this paper we analyse a simple infinite-horizon version of the static model of self-enforcing IEAs, in which damage costs increase with the stock of pollution, and countries decide each period whether to join an IEA. Using a quadratic approximation of the value function of the representative country we show that there exists a steady-state stock of po…
A Quasilinear Parabolic Equation with Quadratic Growth of the Gradient modeling Incomplete Financial Markets
2004
We consider a quasilinear parabolic equation with quadratic gradient terms. It arises in the modeling of an optimal portfolio which maximizes the expected utility from terminal wealth in incomplete markets consisting of risky assets and non-tradable state variables. The existence of solutions is shown by extending the monotonicity method of Frehse. Furthermore, we prove the uniqueness of weak solutions under a smallness condition on the derivatives of the covariance matrices with respect to the solution. The in influence of the non-tradable state variables on the optimal value function is illustrated by a numerical example.
Sequence Q-learning: A memory-based method towards solving POMDP
2015
Partially observable Markov decision process (POMDP) models a control problem, where states are only partially observable by an agent. The two main approaches to solve such tasks are these of value function and direct search in policy space. This paper introduces the Sequence Q-learning method which extends the well known Q-learning algorithm towards the ability to solve POMDPs through adding a special sequence management framework by advancing from action values to “sequence” values and including the “sequence continuity principle”.