Use Python to demonstrate examples of dynamic rules to solve overlapping subproblems.
Dynamic Planning is an algorithm strategy used to solve the problem of defining a state space. These problems can be broken down into new subproblems with their own parameters. To solve them, we must search for this state space and evaluate each step in decision-making. This technology will not waste time solving overlapping sub-problems thanks to the fact that such problems have a large number of identical states.
As we can see, it will also lead to a lot of recursion, which is usually interesting.
To illustrate this algorithm strategy, I will use a very interesting question as an example. This question is the 14th Challenge in Tuenti Challenge #4 in a recent programming competition.
Train Empire
We are faced with a Board Game called Train Empire ). In this case, you must plan the most efficient route for the train to transport the freight car at each railway station. The rule is simple:
- Each station has a truck waiting for delivery to another station.
- When each van is delivered to a destination, it will reward players with scores. Trucks can be placed at any station.
- A train runs only on a single route. A freight car can be loaded at a time, because the fuel is limited to a certain distance.
We can beautify the original picture of our problem. To earn the highest score under the fuel limit, we need to know where the truck is loaded and where it is unloaded.
We can see in the picture that we have two train routes: red and blue. The station is located at some coordinate points, so we can easily calculate the distance between them. Each station has a van named after its endpoint, and the score reward we get when we deliver it successfully.
Now, let us assume that our truck can run 3 km yuan. The train on the Red Route can send the train at Station A to its end E (5 points), while the train on the blue route can transport the van C (10 points ), then the shipping truck B (5 points ). You can get the highest score of 20.
Status
We call the location of the train, the distance of the train, and the freight table of each station a problematic state. We still get the same problem by changing these values, but the parameter has changed. We can see that every time we move a train, our problem evolves into a different subproblem. To work out the best moving scheme, we must traverse these States and make decisions based on these States. Let's get started.
We will start with defining the train route. Because these routes are not straight lines, the graph is the best representation method.
import mathfrom decimal import Decimalfrom collections import namedtuple, defaultdict class TrainRoute: def __init__(self, start, connections): self.start = start self.E = defaultdict(set) self.stations = set() for u, v in connections: self.E[u].add(v) self.E[v].add(u) self.stations.add(u) self.stations.add(v) def next_stations(self, u): if u not in self.E: return yield from self.E[u] def fuel(self, u, v): x = abs(u.pos[0] - v.pos[0]) y = abs(u.pos[1] - v.pos[1]) return Decimal(math.sqrt(x * x + y * y))
The TrainRoute class implements a very basic directed graph. It uses vertices as a collection of stations and stores the connections between stations in a dictionary. Note that we have added the (u, v) and (v, u) sides because the train can move forward and backward.
There is an interesting thing in the next_stations method. Here I use a cool Python 3 feature yield from. This allows a generator to be delegated to another generator or iterator. Because every station is mapped to a collection of stations, we only need to iterate on it.
Let's take a look at main class:
TrainWagon = namedtuple('TrainWagon', ('dest', 'value'))TrainStation = namedtuple('TrainStation', ('name', 'pos', 'wagons')) class TrainEmpire: def __init__(self, fuel, stations, routes): self.fuel = fuel self.stations = self._build_stations(stations) self.routes = self._build_routes(routes) def _build_stations(self, station_lines): # ... def _build_routes(self, route_lines): # ... def maximum_route_score(self, route): def score(state): return sum(w.value for (w, s) in state.wgs if w.dest == s.name) def wagon_choices(state, t): # ... def delivered(state): # ... def next_states(state): # ... def backtrack(state): # ... # ... def maximum_score(self): return sum(self.maximum_route_score(r) for r in self.routes)
I omitted some code, but we can see something interesting. The two name tuples will help keep our data neat and simple. Main class has the longest distance, fuel, route, and station that our train can run. The maximum_score method is used to calculate the sum of scores of each route, which becomes an interface for solving the problem. Therefore, we have:
- A main class holds the connection between the route and the station.
- A station tuples containing names, locations, and the list of existing trucks
- A freight car with a value and a destination station
Dynamic Planning
I have tried to explain the key to how to efficiently search for state space in dynamic planning and make optimal decisions based on existing States. We have a state space that defines the location of a train, the remaining fuel of the train, and the location of each truck-so we can already express the initial state.
We must now consider every decision at each station. Should we load a truck and send it to our destination? What if we find a more valuable freight car at the next station? Should we send it back or move it forward? Or is it still not moving with a truck?
Obviously, the answer to these questions is the one that gives us more scores. To obtain the answer, we must obtain the values of the previous and next states in all possible circumstances. Of course, we use the score Function to calculate the value of each State.
def maximum_score(self): return sum(self.maximum_route_score(r) for r in self.routes) State = namedtuple('State', ('s', 'f', 'wgs')) wgs = set()for s in route.stations: for w in s.wagons: wgs.add((w, s))initial = State(route.start, self.fuel, tuple(wgs))
There are several options for starting from each status: either move the truck to the next station or move it without a truck. It will not enter a new state because nothing has changed. If the current station has multiple trucks, moving one of them will go into a different state.
def wagon_choices(state, t): yield state.wgs # not moving wagons is an option too wgs = set(state.wgs) other_wagons = {(w, s) for (w, s) in wgs if s != state.s} state_wagons = wgs - other_wagons for (w, s) in state_wagons: parked = state_wagons - {(w, s)} twgs = other_wagons | parked | {(w, t)} yield tuple(twgs) def delivered(state): return all(w.dest == s.name for (w, s) in state.wgs) def next_states(state): if delivered(state): return for s in route.next_stations(state.s): f = state.f - route.fuel(state.s, s) if f < 0: continue for wgs in wagon_choices(state, s): yield State(s, f, wgs)
Next_states is a generator that takes a status as a parameter and returns all the statuses that can be reached. Note how it stops when all trucks are moved to the destination, or it only enters the State where the fuel is still sufficient. The wagon_choices function may look a little complicated. In fact, it only returns a collection of trucks from the current station to the next station.
In this way, we have everything we need to implement the dynamic planning algorithm. We start to search for our decisions from the initial status, and then select the most effective policy. Look! The initial status changes to a different status! We are designing a recursive algorithm:
- Get status
- Computing our decisions
- Make optimal decisions
Obviously, every next state will do the same thing. Our recursive function will stop when the fuel is exhausted or when all trucks are shipped to the destination.
max_score = {} def backtrack(state): if state.f <= 0: return state choices = [] for s in next_states(state): if s not in max_score: max_score[s] = backtrack(s) choices.append(max_score[s]) if not choices: return state return max(choices, key=lambda s: score(s)) max_score[initial] = backtrack(initial)return score(max_score[initial])
The last trap of completing the dynamic planning policy: in the code, you can see that I used a max_score dictionary, which actually caches every state of the algorithm. In this way, we will not repeat and traverse our previous State decisions over and over again.
When we search for status space, a station may arrive multiple times, some of which may lead to the same fuel, the same freight car. It doesn't matter how trains arrive here, but the decision made at that time has an impact. If we calculate the state once and save the result, we do not need to search this sub-space again.
If we do not use this memory technology, we will do a lot of identical searches. This usually makes it difficult for our algorithms to efficiently solve our problems.
Summary
Train Empire provides an excellent example to demonstrate how dynamic planning makes optimal decisions on problems with overlapping subproblems. The powerful expression ability of Python allows us to easily implement ideas and Write clear and efficient algorithms.
The complete code is in contest repository.