Skip to content

Commit

Permalink
Quantitative Algorithm for solving the TSP problem
Browse files Browse the repository at this point in the history
  • Loading branch information
Matheus Santana Lima authored and Matheus Santana Lima committed May 13, 2021
1 parent cc27e3d commit 106ccbc
Show file tree
Hide file tree
Showing 13 changed files with 1,911 additions and 0 deletions.
6 changes: 6 additions & 0 deletions python/qa/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
TSP_FILE_PATH=a280.tsp
TSP_FILE_PATH=small_test5.tsp


all:
python TSP-simulated-annealing.py $(TSP_FILE_PATH)
109 changes: 109 additions & 0 deletions python/qa/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
Matheus Lima (Computer Science, Federal University of Sao Carlos, Brazil)

(1) To run:

$ ./run.sh

(2) To change the path of the TSP data file: change the path in the Makefile.


# Quantitative Algorithm to solve the TSP problem

The proposed Quantitative Algorithm is based on elements of Information Theory and Kolmogorov Complexity. The algorithm is described in more details in the paper "Information theory inspired optimization algorithm for efficient service orchestration in distributed systems" [https://doi.org/10.1371/journal.pone.0242285]

It's method is based on the assumption that the output of a given utility function such as the euclidean distance can be modeled by a probabilistic density function defined by a log-normal distribution. Therefore the entropy can be calculated and the level of state-output uncertainty about the best candidate string can be measured by the Shannon entropy function. The amount of information encoded by each sequence of a random candidate solution is defined by the Kolmogorov complexity with a mean and standard deviation. There is shared information between the total cost distance (utility function) and the solution encoded in the string that forms any random valid candidate.

This meta heuristics method proposed an alternative view on the TSP problem. The salesman needs to evaluate from a random set of candidate solutions where the information "at hand" could lead to the optimal solution with a given probability of Success and Failure. The salesman's needs to "place a bet" wherever any solution will find the best optimal solution. Unless there is inside information the salesmans is not able to know the candidate solution route that will find the optimal value with higher certainty. If the right candidate is chosen it will generate the path to the best optimal global value, but if it chooses the wrong path the salesmans will be destined to find only a local optima result such as in the hill climbing problem.

This problem is similar to the Gambler's Ruin problem in statistics where a player of a game with negative expected value will eventually go broke and lose. In probability theory, the optimal bet size can be calculated by the Kelly criterion and is found by maximizing the expected value of the logarithm of wealth. This method provides the minimal amount of wealth(fractional betting) a given player must place at each bet to optimize the rate of wealth return and avoid the risk of ruin in the long-run.

Algorithms such as Genetic Algorithm, Ant Colony and Neural Networks are based on nature inspired models that replicate the mechanics of living organisms or the decaying of a dependent temperature function (gradient) of metals such as Simulated Annealing to optimize the computational requirements imposed by the NP complexity. The Quantitative Algorithms defined here is a stochastic process that simulates the amount of entropy in a Bernoulli process and is the smallest encoded program that solves the TSP and halt, for a given degree of freedom and a turing-shannon machine. The proposed method does not depends on heuristics that are biased towards a given encoded schema and is therefore more efficient as it requires less code operations and string permutations to produce near-optimal results with the same or better quality per unit of computing running time than other finite-traditional algorithms.

Similarly to the Manhattan meta-heuristics in Simulated Annealing, the Quantitative Algorithm accepts solutions with worst quality occasionally with a decaying probability schema. For each iteration the probability of success is decreased by a given parameter rate. The Kelly fraction is then used to measure the amount of useful encoded information for the simulated function state. The ratio of quality improvement is defined as the expected returned cost optimization value for a sample distribution. In the TSP problem this is the average reduction in cost distance variable (from an array of candidate solutions). If the solution A has cost c(A)=10 and alternative solution B has cost derived by c(B)=100 then B is 90% more expensive (worst) than random sequence A. In a sequence of costs C=[10,9,8,2,3,2,3,4] with Mean=5.125 and Standard Deviation=9.609375.

As the QA iterates, the entropy is proportional reduced as the machine evaluates the search-space and the probability function decay in time. The candidate solutions are randomly generated by swapping two symbols in a random model sequence. Each symbols encode the euclidean coordinates in a 2D graph. The utility function is defined as the euclidean distance function between m nodes. The salesmans wants to reduce the distances for a given path as best as possible and find the shortest route. The method iterates sequentially until the maximum number of executions allowed is reached. If the alternative random solution is smaller (better) than the current best known solution then the best known state is updated with the alternative state values. However, if the alternative solution produces distances with larger distances, then the solution is only accepted according to the output binary values from the simulated kelly criterion method. If the return is True then the alternative with negative gain is temporarily accepted as the best known solution. This approach allows variability in the beginning of the execution process but gradually reduces the probability of acceptance, but still occasionally accepting worst alternatives, to avoid lock in a local minima state.

The pseudo code is demonstrated below:

```
# set parameters
prob_sucess = 1.0
prob_loss = 1 - prob_sucess
decay_rate = 0.001
improvement_ratio = average_solution_cost_quality_improvement / average_solution_cost_quality_loss
f_kelly = prob_sucess - ((1-prob_sucess)/improvement_ratio)
# generate first random solution
best_solution = generate_random_solution()
best_solution_cost = getCost(best_solution)
# runs until max iteration parameter is reached
For i in (0..N):
# generate new random solution
new_solution = generate_random_solution()
new_solution_cost = getCost(new_solution)
# if found new solution; accepts; iterates...
if new_solution_cost < best_solution_cost:
best_solution = new_solution_cost
best_solution_cost = new_solution_cost
else:
# Simulate Entropy Decay
# get a random index number _val set by a normal distribution between 0 and 1
_val = generate_a_value_random(0,1)
# simulate the uncertainty level with the kelly criterion
f_kelly = prob_sucess - ((1-prob_sucess)/improvement_ratio)
# control function: bernoulli process to avoid local minima solutions in the long run
bernoulli_sequence = generate_bernoulli_sequence(p_true=1/100)
# if index value is less than the kelly percentage or the control function
if (_val_rnd < f_kelly) or bernoulli_sequence == 1
# temporarily accept candidate solution with worst quality
best_solution = new_solution_cost
best_solution_cost = new_solution_cost
prob_sucess = 1 - decay_rate
```

The Quantitative Algorithm (QA) is implemented in python and the output is recorded. A simulation was deployed for a city with 280 nodes with a max of 3000 iterations. The performance was compared to a benchmark heuristics defined by the Simulated Annealing algorithm (SA). The trial size is n=100 split between QA=50 and SA=50, for a two-tailed t-test for 2 independent means. The average initial solution cost is 34012.854. The average best cost for QA=30036.7 and SA=31344.41 and standard deviation QA=631.6520 and SA=410.536. Therefore QA results have a improvement rate of 4.353% than the benchmark SA algorithm.

The 50 best-candidate solutions produced by QA compared to the 50 near-optimal solutions in the control (benchmark) group SA demonstrated significantly better cost with smaller total route distance in a 2D graph, the t-value is 12.27454. The p-value is < .00001. The result is significant at p < .05.

# Scores Calculations

## Quantitative Algorithm
```
N1: 50
df1 = N - 1 = 50 - 1 = 49
M1: 30036.7
SS1: 19550233.73
s21 = SS1/(N - 1) = 19550233.73/(50-1) = 398984.36
```

## Simulated Annealing
```
N2: 50
df2 = N - 1 = 50 - 1 = 49
M2: 31344.41
SS2: 8258472.24
s22 = SS2/(N - 1) = 8258472.24/(50-1) = 168540.25
```

## T-value Calculation
```
s2p = ((df1/(df1 + df2)) * s21) + ((df2/(df2 + df2)) * s22) = ((49/98) * 398984.36) + ((49/98) * 168540.25) = 283762.31
s2M1 = s2p/N1 = 283762.31/50 = 5675.25
s2M2 = s2p/N2 = 283762.31/50 = 5675.25
t = (M1 - M2)/√(s2M1 + s2M2) = -1307.71/√11350.49 = -12.27
```
Loading

0 comments on commit 106ccbc

Please sign in to comment.