Skip to content

Commit c4ff52a

Browse files
authored
Cosmetics in chapter 8
1 parent 7353680 commit c4ff52a

1 file changed

Lines changed: 141 additions & 52 deletions

File tree

Chapters/Chapter8/chapter8.md

Lines changed: 141 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -1,61 +1,150 @@
1-
## Link analysisLink analysis is a technique used to evaluate relationships between nodes. Link analysis is used on several fields,such as search engines, fraud detection, among others. There is several algorithms of different kinds to perform link analysis.Here we are only going to focus on the Hyperlink-Induced Topic Search \(HITS\) algorithm.This algorithm was originally developed to rate web pages. But, nowadays modern search engines do not usethis algorithm since there is more advanced techniques. HITS has been also used to identify the important classes that should becommented in a large software system or the classes that a developer should read to get an insight of the key classes.### Hyperlink-Induced Topic Search \(HITS\) algorithmHyperlink-Induced Topic Search \(HITS\) algorithm, also knows as Hubs and Authorities,is an algorithm that rates every the nodes of a graph. Every node has a hub and a authority score. A hub is a node that may notbe relevant but references relevant nodes. An authority is a node that contains relevant information.The algorithm does the following:1. Assign to each node a hub and an authority score equal to 1.1. Run the authority update rule for each node.1. Run the hub update rule for each.1. Normalize the values by dividing each Hub score by the square root of the sum of the squares of all Hub scores, and dividing each Authority score by the square root of the sum of the squares of all Authority scores.1. Repeat from the second step as necessary.The update rules are simple:**Authority update rule**Update each node's authority score to be equal to the sum of the hub scores of each node that points to it.**Hub update rule**Update each node's hub score to be equal to the sum of the authority scores of each node that it points to.### HITS implementationThe Pharo implementation is as follows. The `k` number is the number of times that the scores are going to beupdated. The default value is `20` but it can also be set manually.```AIHits >> run
2-
3-
self initializeNodes.
4-
k timesRepeat: [
5-
nodes do: [ :node | self computeAuthoritiesFor: node ].
6-
nodes do: [ :node | self computeHubsFor: node ].
7-
self normalizeScores ].
8-
^ nodes``````AIHits >> initializeNodes
9-
10-
"Here we are using float instead of int because of the normalization."
11-
nodes do: [ :n |
12-
n auth: 1.0.
13-
n hub: 1.0 ]``````AIHits >> computeAuthoritiesFor: aNode
14-
15-
aNode auth:
16-
(aNode incomingNodes
17-
inject: 0
18-
into: [ :sum :node | sum + node hub ])``````AIHits >> computeHubsFor: aNode
19-
20-
aNode hub:
21-
(aNode adjacentNodes
22-
inject: 0
23-
into: [ :sum :node | sum + node auth ])``````AIHits >> normalizeScores
24-
25-
| authNorm hubNorm |
26-
authNorm := 0.
27-
hubNorm := 0.
28-
29-
nodes do: [ :node |
30-
authNorm := authNorm + node auth squared.
31-
hubNorm := hubNorm + node hub squared ].
32-
33-
authNorm := authNorm sqrt.
34-
hubNorm := hubNorm sqrt.
35-
36-
"To avoid dividing by 0"
37-
authNorm = 0 ifTrue: [ authNorm := 1.0 ].
38-
hubNorm = 0 ifTrue: [ hubNorm := 1.0 ].
39-
40-
nodes do: [ :n |
41-
n auth: n auth / authNorm.
42-
n hub: n hub / hubNorm ]```### Case studyHere we calculate the hubs and authorities scores for all the nodes of the graph shown in Figure *@hits@* with 3 iterations.![A graph to play with the HITS algorithm.](figures/hits.pdf width=30&label=hits)```nodes := #( 'A' 'B' 'C' 'D' ).
1+
## Link Analysis
2+
3+
Link analysis is a technique used to evaluate relationships between nodes. Link analysis is used on several fields, such as search engines and fraud detection, among others. There is several algorithms of different kinds to perform link analysis.
4+
Here we are only going to focus on the Hyperlink-Induced Topic Search (HITS) algorithm.
5+
6+
This algorithm was originally developed to rate web pages. But nowadays, modern search engines do not use this algorithm since there are more advanced techniques. HITS has been also used to identify the important classes that should be commented in a large software system or the classes that a developer should read to get an insight of the key classes.
7+
8+
### Hyperlink-Induced Topic Search (HITS) algorithm
9+
10+
Hyperlink-Induced Topic Search (HITS) algorithm, also known as *Hubs and Authorities*,
11+
is an algorithm that rates every node of a graph. Every node has a hub and a authority score. A **hub** is a node that may not be relevant but references relevant nodes. An **authority** is a node that contains relevant information.
12+
13+
The algorithm does the following:
14+
15+
1. Assign to each node a hub and an authority score equal to $1$.
16+
2. Run the authority update rule for each node.
17+
3. Run the hub update rule for each.
18+
4. Normalize the values by dividing each hub score by the square root of the sum of the squares of all Hub scores, and dividing each authority score by the square root of the sum of the squares of all Authority scores.
19+
5. Repeat from the second step as necessary.
20+
21+
The update rules are simple:
22+
23+
- **Authority update rule**: Update each node's authority score to be equal to the sum of the hub scores of each node that points to it.
24+
25+
- **Hub update rule**: Update each node's hub score to be equal to the sum of the authority scores of each node that it points to.
26+
27+
### HITS implementation
28+
29+
The Pharo implementation is as follows. The `k` number is the number of times that the scores are going to be updated. The default value is $20$ but it can also be set manually.
30+
31+
```
32+
AIHits >> run
33+
34+
self initializeNodes.
35+
k timesRepeat: [
36+
nodes do: [ :node | self computeAuthoritiesFor: node ].
37+
nodes do: [ :node | self computeHubsFor: node ].
38+
self normalizeScores ].
39+
^ nodes
40+
```
41+
42+
```
43+
AIHits >> initializeNodes
44+
45+
"Here we are using float instead of int because of the normalization."
46+
nodes do: [ :n |
47+
n auth: 1.0.
48+
n hub: 1.0 ]
49+
```
50+
51+
```
52+
AIHits >> computeAuthoritiesFor: aNode
53+
54+
aNode auth:
55+
(aNode incomingNodes
56+
inject: 0
57+
into: [ :sum :node | sum + node hub ])
58+
```
59+
60+
```
61+
AIHits >> computeHubsFor: aNode
62+
63+
aNode hub:
64+
(aNode adjacentNodes
65+
inject: 0
66+
into: [ :sum :node | sum + node auth ])
67+
```
68+
69+
```
70+
AIHits >> normalizeScores
71+
72+
| authNorm hubNorm |
73+
authNorm := 0.
74+
hubNorm := 0.
75+
76+
nodes do: [ :node |
77+
authNorm := authNorm + node auth squared.
78+
hubNorm := hubNorm + node hub squared ].
79+
80+
authNorm := authNorm sqrt.
81+
hubNorm := hubNorm sqrt.
82+
83+
"To avoid dividing by 0"
84+
authNorm = 0 ifTrue: [ authNorm := 1.0 ].
85+
hubNorm = 0 ifTrue: [ hubNorm := 1.0 ].
86+
87+
nodes do: [ :n |
88+
n auth: n auth / authNorm.
89+
n hub: n hub / hubNorm ]
90+
```
91+
92+
### Case study
93+
94+
Here we calculate the hubs and authorities scores for all the nodes of the graph shown in Figure *@hits@* with three iterations.
95+
96+
![A graph to play with the HITS algorithm.](figures/hits.pdf width=30&label=hits)
97+
98+
```
99+
nodes := #( 'A' 'B' 'C' 'D' ).
43100
edges := #( #( 'A' 'B' ) #( 'A' 'C' ) #( 'A' 'D' ) #( 'B' 'C' )
44101
#( 'B' 'D' ) #( 'C' 'A' ) #( 'C' 'D' ) #( 'D' 'D' ) ).
45102
hits := AIHits new.
46103
hits
47-
nodes: nodes;
48-
edges: edges from: #first to: #second;
104+
nodes: nodes;
105+
edges: edges from: #first to: #second;
49106
k: 3.
50-
nodes := hits run```If we inspect the nodes, these are the scores calculated after 3 iterations.```('A' auth: 0.17 hub: 0.65)
107+
nodes := hits run
108+
```
109+
110+
If we inspect the nodes, these are the scores calculated after 3 iterations.
111+
112+
```
113+
('A' auth: 0.17 hub: 0.65)
51114
('B' auth: 0.27 hub: 0.54)
52115
('C' auth: 0.49 hub: 0.41)
53-
('D' auth: 0.81 hub: 0.34)```### Weighted HITSThere are cases where the Hits algorithm does not behave as expected and sometimes the Hits algorithm puts 0 as valuesfor the hubs and authorities. Using weights in a graph helps in obtaining better results. Establishing the weights is aresponsibility of the user.For more information, you can read these papers:- _Modifications of Kleinberg's HITS Algorithm Using Matrix Exponentiation and Web Log Records_ by Miller et al. % ${cite:Mill01a}$- _An Improved Weighted HITS Algorithm Based on Similarity andPopularity_ by Zhang et al. % ${cite:Zhan07a}$In terms of implementation, it is only necessary to multiply the weights with the scores in each iteration.That means changing `computeAuthoritiesFor:` and `computeHubsFor:` methods.This is done in `AIWeightedHits` class.```AIWeightedHits >> computeAuthoritiesFor: aNode
116+
('D' auth: 0.81 hub: 0.34)
117+
```
118+
119+
### Weighted HITS
120+
121+
There are cases where the Hits algorithm does not behave as expected and sometimes the HITS algorithm puts 0 as values for the hubs and authorities. Using weights in a graph helps in obtaining better results. Establishing the weights is a responsibility of the user.
122+
123+
For more information, you can read these papers:
124+
125+
- _Modifications of Kleinberg's HITS Algorithm Using Matrix Exponentiation and Web Log Records_, by Miller et al. (2001)
126+
- *An Improved Weighted HITS Algorithm Based on Similarity andPopularity*, by Zhang et al. (2007)
127+
128+
In terms of implementation, it is only necessary to multiply the weights with the scores in each iteration.
129+
That means changing `computeAuthoritiesFor:` and `computeHubsFor:` methods.
130+
This is done in `AIWeightedHits` class.
131+
132+
```
133+
AIWeightedHits >> computeAuthoritiesFor: aNode
134+
135+
aNode auth: (aNode incomingEdges
136+
inject: 0
137+
into: [ :sum :edge | sum + (edge weight * edge from hub) ])
138+
```
139+
140+
```
141+
AIWeightedHits >> computeHubsFor: aNode
142+
143+
aNode hub: (aNode outgoingEdges
144+
inject: 0
145+
into: [ :sum :edge | sum + (edge weight * edge to auth) ])
146+
```
54147

55-
aNode auth: (aNode incomingEdges
56-
inject: 0
57-
into: [ :sum :edge | sum + (edge weight * edge from hub) ])``````AIWeightedHits >> computeHubsFor: aNode
148+
### Conclusion
58149

59-
aNode hub: (aNode outgoingEdges
60-
inject: 0
61-
into: [ :sum :edge | sum + (edge weight * edge to auth) ])```### ConclusionEven if the HITS algorithm is not used anymore in the modern search engines, it is a very good algorithm forhaving a first look on how to classify links according to their relevance in the network.
150+
Even if the HITS algorithm is not used anymore in the modern search engines, it is a very good algorithm for having a first look on how to classify links according to their relevance in the network.

0 commit comments

Comments
 (0)