Decision theory (decision theory) & automatic planning and scheduling (automated planning and Scheduling) (bilingual)

Source: Internet
Author: User

The translation is not good, please forgive me ...

Most of the content comes from wikis


Decision theory decision Theory part:


Normative and descriptive decision theory

The theory of normative and descriptive decision-making

Normative or normative decision-making theory is concerned with determining the best decision (in practice, in some cases, the "best" is not necessarily the largest, the optimal may also include the value in addition to the largest, but in a specific or approximate range), assuming an ideal decision maker fully understand, can accurately calculate, completely rational. The practical application of this descriptive approach (people should make decisions) is a decision-making analysis aimed at discovering tools, methods and software that help people make better decisions. Systematic, comprehensive software tools are most developed in this way known as decision support systems.

What kinds of decisions need a theory?

What are the conditions under which decisions need to be made?

1.Choice under uncertainty

1. Under the conditions of uncertainty

This area represents the heart of decisiontheory. the procedure now referred to as expected value is known from the1 7th Century. blaise Pascal invoked it in he famous wager (see below), which was contained in he Pensées, published in 1670. the idea of Expectedvalue was, when faced with a number of the actions, each of the which could giverise to more t Han one possible outcome with different probabilities, therational procedure are to identify all possible outcomes, Determi Ne their values (positive or negative) and the probabilities that would result from each courseof action, and multiply the T Wo to give a expected value. the action Tobe chosen should be the one, gives rise to the highest total Expectedv Alue. in 1738, Daniel Bernoulli published an influential paper entitledexposition of a New theory on the measurement of Risk, in which he uses the St. Petersburg Paradox Toshow, expected value theory must be normatively wrong. he Also Givesan example inWhich a Dutch merchant is trying to decide whether to insure Acargo being sent from Amsterdam to St Petersburg in Winter,w Hen it is known this there is a 5% chance then the ship and cargo would belost. in his solution, he defines a utility function and computes expectedutility rather than expected financial value (See[2] for a review).

The idea of expectation is that when confronted with a large number of actions, each one has different probabilities and may cause multiple possible outcomes, the reasonable process is to identify all possible outcomes, determine their values (positive or negative) and the probability of each action result, and increase both to produce an expected value.

The action that should be chosen is the highest total expected value. In 1738, Daniel Bernoulli published an influential paper entitled "A New Theory at the Exposition" predicting risk, and he used the St. Petersburg paradox to show that the expected value theory was wrongly regulated. He also gave an example of a Dutch businessman trying to decide whether to insure goods in winter from Amsterdam to St. Petersburg, when he knew there was a 5% chance that his ship and cargo would be lost. In the solution, he defines a utility function and calculates the desired utility, rather than the expected financial value.

Utility function (effectivenessfunction; Utility function; Utility function used): Utility functions are commonly used to represent the functions of the number of consumers in consumption and the amount of goods they consume in order to measure the extent to which consumers are satisfied with the set of products they consume.

The explanation of "utility function" in the reference book

A. A function that represents the number of relationships between the utility of a consumer in consumption and the combination of goods consumed. It is used to measure the extent to which consumers are satisfied with the set of products they consume. Using the no-difference curve can only analyze the combination of two commodities, and the utility function can analyze the combination of more kinds of goods. The expression is: U=u (x, Y,z, ...) The type of x, Y, Z represents the number of goods owned or consumed by the consumer, the formula to the left of U is ...

The explanation of "utility function" in academic literature

b, the utility function is defined as: Set F is defined in the consumer set X preference relationship, if for any x,y,xfy in X if and only if you (x) ≥u (y), then called function u:x→r is the utility function that represents the preference relationship F

C, F (X) is called the utility function. The key of the weighted P-norm method is the determination of the weight coefficient. There are 2 basic methods, one is the old learning method, and the method chooses the weight coefficient according to the relative importance of the objective function.

D, a person's utility should be a function of wealth x, which is called the utility function, theoretically, it can be approximated by a series of psychological tests to get everyone's utility function. Different decision-makers should have different utility functions. First we seek the properties satisfied by the utility function or some special class utility functions.

E, this is a theoretical hypothesis, they use the mathematical function of the model is called "utility function". According to these models, people can be assumed to be able to decide in each possible time distribution of a certain level of interest, and the pursuit of maximizing the choice of profit

F,-The cost of the type I transport, sometimes referred to as utility function, U=ao+alx + moxibustion XC. D-The travel time of the type I mode of transport. Transport costs for C-type I modes of transport

G, in order to evaluate the control, a set of functions is required as an evaluation index: J (t) =∑∞k=0kγu (T+K) =u (t) +jγ (T+1) (2) where U (t) =u[r (t), A (t), T] is used to evaluate each step control, called utility function. The J (t) function represents the accumulation of the value of the utility function for each step starting at this point, called the cost function.

2.Intertemporal Choice

2. Cross-period selection

Intertemporal choice is concerned with thekind of choice where different actions leads to outcomes this is realised ATD Ifferent points in time. If someone received a windfall of several thousanddollars, they could spend it on an expensive holiday, giving them immedi Atepleasure, or they could invest it in a pension scheme, giving them an income atsome time in the future. What's the optimal thing to do? The answer dependspartly on factors such as the expected ratesof interest and inflation, the person ' s lifeexpectancy, and Their confidence in the pensions industry. However evenwith all those factors taken to account, human behavior again deviatesgreatly from the predictions of PRESCR Iptive decision theory, leading toalternative models in which, for example, objective interest rates is Replacedby SUBJEC tive discount rates.

Cross-period selection results in different points of time. If someone receives a $ thousands of windfall, they can spend on expensive vacations, make them feel happy immediately, or they can invest in a pension plan that allows them to invest in a certain amount of time in the future. What is the best thing? The answer depends in part on expectations of interest rates and inflation and other factors such as the average lifespan of people and their trust in the pension industry. However, even if all these factors are taken into account, human behavior deviates significantly from the predictions of normative decision-making theories, resulting in replacement models, such as the target interest rate being superseded by the subjective discount rate (subjective discount rates).

Interaction of decision makers

Some decisions is difficult because of theneed to take into account how other people in the situation would respond to the Decision is taken. The analysis of such social decisions are more oftentreated under the label Ofgame theory, rather than decision theory, tho Ugh itinvolves the same mathematical methods. From the standpoint of game theory mostof the problems treated in decision theory is One-player games (or the oneplayer I s viewed as playing against an impersonal background situation). In theemerging socio-cognitive Engineering, the isespecially focused on the different types of distributed Decisi On-making inhuman organizations, in normal and abnormal/emergency/crisis situations.

Some decisions are difficult because of the need to think about decisions that others will respond to. The analysis of such decisions is often based on the theory of game theory, not on decision theory, although it involves the same mathematical approach. From the perspective of game theory, most of the problems in decision-making theory is a player's game.

Other-regarding Preferences

His preference

Also called social preferences. In decisions which affectothers, people would sometimes give up some direct personal benefit or take on acost in order to a Chieve a fair or equal outcome. Boltonand Ockenfels (a) and Fehr and Schmidt (1999) Explore decision-makers whoare concerned with fairness of distribut Ions and has disutilityfrom others ' being much better off or much worse off. A closely related area Ofresearch is concerned withreciprocal fairness; The decision-makers desire to rewardkind actions or intentions and punish unkind ones.

Also known as social preferences. Decisions affect others, and people sometimes give up some personal benefits or incur costs to achieve fair or equal results. A closely related area of research involves mutual fairness, and policymakers want to reward such behaviour or intentions and punish unfriendly people.

Complex Decisions

Complex decisions

Other areas of decision theory areconcerned with decisions that is difficult simply because of their complexity,or the CO Mplexity of the organization that have to make them. Individuals makingdecisions is limited in resources or areboundedly rational. In such cases the issue isnot the deviation between real and optimal behaviour, but the difficulty ofdetermining the Optim Al behaviour in the first place. Theclub Ofrome, for example, developed a model of economic growth and resource usagethat helps politicians make Real-life Decisions in complex situations[citation needed]. Decisions is alsoaffected by whether options is framed together or separately. This is known asthedistinction bias.

Heuristics

Inspired

Main article:heuristic

One method of decision-making is heuristic. The heuristic approach makes decisions based on routine thinking. While this isquicker than step-by-step processing, heuristic decision-making opens the riskof inaccuracy. Mistakes that otherwise would has been avoided in step-by-stepprocessing can be made. One common and incorrect thought process that resultsfrom heuristic thinking is the gambler ' s fallacy. The Gambler ' s fallacy makesthe mistake of believing, a random event is affected by previous randomevents. For example, there is a fifty percent chance of a coin landing onheads. Gambler ' s fallacy suggests that if the coin lands on tails, the nexttime it flips, it'll land on heads, as if it's "the Coin ' s turn "to land onheads. This is simply not true. Such a fallacy is easily disproved in astep-by-step process of thinking. [5]

In another example, when choosing betweenoptions involving extremes, decision-makers may has a heuristic that Moderatealt Ernatives is preferable to extreme ones. The compromise Effect Operatesunder a mindset driven by the belief so the most moderate option, amidextremes, carries th E Most benefits from each extreme. [6]

Decision making is one of the heuristic methods. Heuristic method makes decision based on conventional thinking. Although this is faster than step-by-step processing, heuristic decision-making has the risk of being wrong and is a step-by-step process to avoid errors. A common incorrect thinking process, heuristic thinking-gambler fallacy.

The Gambler's fallacy argues that the possibilities of the future will be changed by past events, but that is not the case. Determine the probability, just like tossing a coin the result is a national emblem, will not change, the national emblem of the probability of upward is always 50%, and you in the first 10 times thrown is not related to the opposite. It is a common prejudice to think that probabilities change, especially when gambling. For example, playing roulette, the past four discs have been on the black side to stop, the next plate will be on the red side to stop it? Obviously wrong! The probability of staying in red is still 47.37%. This may sound like a matter of course, but it is this prejudice that makes many gamblers lose a lot of money, naïve to think the probability will change.

This fallacy is easy to prove that thinking is a gradual process.

General criticism

The general criticism

Main article:ludicfallacy (Drama Bureau fallacy)

A general criticism of decision theorybased on a fixed universe of possibilities are that it considers the "knownunknowns", Not the ' unknownunknowns ': It focuses on expected variations, not on unforeseenevents, which some argue (as Inblackswan t Heory) has outsized impact and must be considered–significantevents could be "outside model". This line of argument, called Theludicfallacy, was that there was inevitable imperfections in modeling the realworld by par Ticular models, and that unquestioning reliance in models blindsone to their limits.

The general criticism of decision-making theory is based on a fixed possibility that it considers the "known unknown" rather than the "unknown unknown": It concerns the expected changes, not unforeseen events, which some believe (such as the Black Swan theory) have a huge impact and must consider that significant events may be "outside models". This view of the so-called play-bureau fallacy is that there are unavoidable flaws by specific models, modeled in the real world, blindly dependent models that make people invisible to their limits. (The drama Bureau fallacy ludicfallacy over-use statistics and probabilities to predict the future.) )

Black Swan theory: Before discovering Australia, Europeans thought that all swans were white and used "black swans" to refer to things that could not exist. But the European belief collapsed with the advent of the first black Swan. Because the existence of the Black Swan represents an unpredictable and significant rare event that is unexpected but changes everything. People are always turning a blind eye to things and are accustomed to explaining these unexpected shocks with limited life experience and fragile beliefs. This is the Black Swan theory.

Automated planning and scheduling automatic planning and scheduling:


Automated planning and scheduling, in therelevant literature often denoted as simply planning,

Automatic planning and scheduling, usually expressed in the relevant literature as simple planning, is a branch of AI, a focus on the implementation of strategies or action sequences,

Planning is also related to Decisiontheory.

Planning is also related to decision theory (above)

Models and plans that are available in a known environment can be done offline. Solutions can be found and evaluated prior to execution. In dynamic, unknown environments, strategies often need to be modified online in real time. Models and policies must be adaptable

Given a description of the possible initial statesof the world, a description of the desired goals, and a description of a Set ofpossible actions, the planning problem is to find a plan that's guaranteed (from any of the initial states) to Gene Rate a sequence of actions leadsto one of the goal states.

The difficulty of planning is dependent on thesimplifying assumptions employed. Several classes of planning problems can beidentified depending on the properties the problems has in Several dimensions.

Given the initial state of the description, describing the ideal target, and describing a set of possible actions, the planned problem is to find a plan that guarantees (from any initial state) to generate an action sequence to reach the target state.
The difficulty of planning depends on the simplified assumptions used. In several ways, several types of attributes of planning problems can be defined based on attributes.

For nondeterministic actions, is theassociated probabilities available?

Is there an associated probability that the behavior of uncertainty can be used?

is the state variables discrete orcontinuous?

is the state variable discrete or continuous?

If They is discrete, do they has only afinite number of possible values?

If they are discrete, do they have a limited number of possible values?

Can the current state be observed unambiguously?

Is it possible to observe the current correct state?

Is there only one agent or is thereseveral agents?

Only one agent or can have several agents?

is the agents cooperative or selfish?

is the agency cooperative or selfish?

Can Several actions be taken concurrently,or are only one action possible at a time?

Can you take some action at the same time or only one action?

Is the objective of a plan to reach adesignated goal state, or to maximize a reward function?

Is the goal of the plan to reach the state of the specified goal or maximize the return function?

Do all of the agents construct their Ownplans separately, or is the the plans constructed centrally for all agents?

Do all the agents build their own plans, or are they centralized for all agents?

The simplest possible planning problem,known as the classical planning problem, is determined by:

The simplest possible planning problem, known as the classic planning problem, is determined by:

A unique known initial state,

A unique known initial state,

directionless actions,

Aimless Action,

Deterministic actions,

Determined actions,

which can taken only one at a time,

Once only one can be taken,

and a single agent.

An agent.

Since the initial state was known Unambiguously,and all actions be deterministic, the state of the world after any Sequenc E Ofactions can is accurately predicted, and the question of observability isirrelevant for classical planning.

Further, plans can be defined as sequencesof actions, because it's always known in advance which actions would be needed.

With nondeterministic actions or otherevents outside the control of the agent, the possible executions form a tree,and pla NS has to determine the appropriate actions for every node of the tree.

Since the initial state is known to be clear, and all operations are deterministic, the state after any sequence operation in the world can be accurately predicted, and an unrelated traditional plan can be observed. In addition, a plan can be defined as a sequence of actions, because it always knows in advance which actions are needed. Along with the uncertainty behavior of the control or other event agent, it is possible to perform a tree formation and plan the nodes of each tree as appropriate actions.

Discrete-time Markov Decision Processes (MDP) is planning problems with:

Discrete-time Markov decision process (MDP) planning problems:

directionless actions,

Nondeterministic Actions Withprobabilities,

Full observability,

Maximization of a reward function,

and a single agent.

Aimless action.

The probability of uncertain behavior,
Complete visibility,
Maximize the return function,
and an agent.

When full observability is replaced bypartial observability, planning corresponds topartially Observablemarkov decision PR Ocess (POMDP).

If there is more than one agent, we have multi-agent planning, which is closely relatedto Gametheory.

The Markov decision process (POMDP) is observed when the full observable replaces the partial observable and the planned counterpart.
If there is more than one agent, we have multi-agents planning, which is closely related to game theory.

Planning languages

Programming language

The most commonly used languages forrepresenting planning problems, such as STRIPS and pddl for classical planning,are BAS Ed on the state variables. Each possible state of a assignmentof values to the state variables, and actions determine how the values of Thestate variables change when that action is taken. Since a set of statevariables induce a state space the have a size that's exponential in the set,planning, similarly to M Any other computational problems, suffers from thecurse of dimensionality and the combinatorial explosion.

An alternative language for describingplanning problems are that's hierarchical task networks, in which aset of the tasks is G Iven, and each task can is either realized by a primitiveaction or decomposed to a set of other tasks. This does isn't necessarilyinvolve state variables, although on more realistic applications Statevariables simplify also the Description of task networks.

The most common programming problem languages, such as the strips and PDDL classic planning based on state variables. Every possible state in the world is the assignment of a variable, and the action determines how the value of the state variable should be changed when taking this action. From a set of state variables caused by the state Space Scale index settings, planning, similar to many other computational problems, suffer from the problem of dimensions and combinatorial explosions.

Another alternative language that describes hierarchical task network planning issues gives a set of tasks, each of which can be implemented as a primitive action or decomposed into a set of other tasks. This does not necessarily include state variables, although in more realistic application state variable simplification still task network description

preference-based Planning

Preference-based planning

Main article:preference-based Planning

In preference-based planning, the Objectiveis does not have to produce a plan but also to satisfy user-specifiedpreferences. A difference to the more common reward-based planning, for Examplecorresponding to MDPs, preferences don ' t necessarily hav e a precise numericalvalue.

In a personalized plan, the goal is not only to generate plans but also to meet user-specified preferences. Unlike the more general incentive-based schemes, such as correspondence, MDPs preferences do not have an absolute precise value.

Mdps:markov decision process Markov decision procedure

Algorithms for planning

Classical planning

Forwardchainingstate space Search, possibly enhanced with heuristics,

Backwardchaining Search, possibly enhanced by the use of state constraints (Seestrips,graphplan),

Partial-order planning (In contrast tononinterleaved planning).

See Also:sussman Anomaly

Traditional planning
Forward link state space search, possibility is enhanced by heuristic method,
Reverse-link search, with the possibility of using state constraints (see Strips,graphplan), and enhanced

Partial-order planning (compared to no cross-access).
See also: Sussmananomaly

Reduction to other problems

Reduction to the propositional satisfiability problem (Satplan).

Reduction to Modelchecking-both is essentially problems of traversing state spaces, andthe classical planning problem C Orresponds to a subclass of model Checkingproblems.

Reduce other issues
Reduce the problem of proposition gratification (Satplan).
Reduced model checking-essentially a matter of traversing state space, and classical planning problems correspond to a subclass of the model detection problem.

Temporal planning

Temporal planning can be solved withmethods similar to classical planning. The main difference is, because of thepossibility of several, temporally overlapping actions with a duration Beingtaken Co Ncurrently, that's the definition of a state have to include informationabout the current absolute time and how far the EXECU tion of each active Actionhas proceeded. Further, in planning with rational or real time, the state spacemay is infinite, unlike in classical planning or planning With integer time. Temporal planning can is understood in terms of timedautomata.

Time Planning
The time plan can be solved similar to the traditional planning approach. The main difference is that, because of several possibilities, temporarily overlapping the action duration, the definition of the State must include the absolute time of the current information and the progress of the execution of each activity. Further, with rational or real-time planning, the state space may be infinitely different from the planning time for traditional planning and integer planning. Time planning can be timedautomata understood.

Probabilistic planning

Main Articles:markov decision process and partially Observablemarkov decision process

Probabilistic planning can be solved withiterative methods such as valueiteration and policy iteration while the State spa Ce issufficiently Small. With partial observability, probabilistic planning issimilarly solved with iterative methods, but using a representation o f thevalue functions defined for the space of beliefs instead of States.

Probabilistic planning
Main article: Markov decision process and partially observable Markov decision process
Probabilistic planning and iterative methods can resolve such as value iterations and policy iterations when the state space is small enough. Partially observable, probabilistic planning also solves the iterative approach, but the use of a representation value function is defined as the opposite state of the belief in space.

Deployment of planning systems

The Hubble Space telescope uses a short-termsystem called Spssand a long-term planning system called Spike.

Scheduling of planning systems
The Hubble Space Telescope uses a short-term system called SPSS and a long-term planning system called Spike.

------Translated from: wolf96 http://blog.csdn.net/wolf96


Decision theory (decision theory) & automatic planning and scheduling (automated planning and Scheduling) (bilingual)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.