Scoring¶
Once an entry graph has been created and expanded, the entries in the graph can be assigned scores in order to evaluate the ‘connectedness’ of each entry in the graph.
For example, in order to find a consensus molecular structure for a given compound, an entry graph of
Compound and MolStructure can be created and expanded, after which the
MolStructure vertices can be
scored in order to find the most likely structure for that compound.
- class chemrecon.Scorer(score_entry_type: type[Entry], alpha: float = 0.85, decay_factor: float = 0.2, entry_weight: Callable[[Entry], float] | None = None, relation_weight: Callable[[Relation], float] | None = None)¶
Bases:
GenericA scorer is a callable which takes an entrygraph and produces a ranking of the vertices according to the parameters of the scorer.
The score of an entry is (informally) the probability that a random walk starting at one of the initial entries of the entry graph will terminate at that entry. The parameters of the random walk can be customized by specifying weights (probabilities) using a weight function on entries and relations, which alters the probability of choosing a given path. The default weight of all entries and relations is 1. For example, if you do not trust a particular source, edges and vertices from that source can have their weight reduced, making them count less in the scoring algorithm.
A damping factor, alpha can be specified. With probability 1-alpha, the random walk will choose to go to a random entry rather than continuing the walk. Furthermore, a _decay_factor_ can be specified such that entries further away from the initial entries are given lower scores. A decay factor of 0 disables this adjustment. The default is 0.2.
Scores are normalized such that the sum of scores is 1, which allows comparing scores across entry graphs.
Formally, the scores are computed using the PageRank algorithm (https://en.wikipedia.org/wiki/PageRank), starting from the initial vertices, and with dangling vertices pointing back to all initial vertices with equal probability.
- __init__(score_entry_type: type[Entry], alpha: float = 0.85, decay_factor: float = 0.2, entry_weight: Callable[[Entry], float] | None = None, relation_weight: Callable[[Relation], float] | None = None)¶
Specify a scorer.
- __call__(entrygraph: EntryGraph) OrderedDict[T_rank, float]¶
Produces a ranking of the entries of the type score_entry_type. The result is an OrderedDict, with entries given in descending order of score.