Repository for Running Label Propagation Algorithms on Author Graph Data

This repository contains code for running label propagation algorithms on author graph data.

Requirements

See requirements/requirements.txt for the required packages.

Usage

SciServer

You need to convert the anonymized graph data from Elsevier into Dataframes that contain mappings from auid -> eids and eid -> auids. These need to be put into the data directory.

Then to run the algorithm

python -m src.run_algo --runtime sciserver

For other options, see the help message.

python -m src.run_algo --help

Elsevier

Then to run the algorithm

python -m src.run_algo --runtime elsevier

For other options, see the help message.

python -m src.run_algo --help

Implementing a runtime

The algorithm works generally in three phases:

Get the prior data for that year.
Run the label propagation algorithm
Update the posterior data for that year

A backend then needs to implement steps 1 and 3.

You need to implement the following functions

MaybeSparseMatrix = Union[np.ndarray, sp.spmatrix]

get_data(
    year: int,
    logger: logging.Logger
) -> Iterable[Tuple[MaybeSparseMatrix, np.ndarray, np.ndarray]]:

This function accepts a year and a logger and returns a tuple of the following:

The adjacency matrix
The auids
The prior for the auids wrapped by an iterable. This is because the graph may be disconnected and you may want to parse it in pieces. However, you could parse the entire graph at and then the iterable would only contain one element.

The second function you need to implement is

def update_posterior(
    auids: np.ndarray,
    posterior_y_value: np.ndarray,
    year: int,
    logger: logging.Logger,
) -> None:

This function accepts the auids, the posterior_y_value, and the year and updates the posterior values for that year. It's important to note that if you parse the graph in pieces of disconnnected sets, this will update the same file multiple times.

TODO

Finish tests for sciserver.py
Add SocNL

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!