Expanding Entry Graphs

Exploring Entry Graphs

When entry graphs are created, they contain only the initial entries given when they were created. To expand the graph, the database needs to be explored. This is done by using the explore function.

chemrecon.explore(entrygraph: EntryGraph, protocol: ExplorationProtocol, steps: int = 4)

Expand the given entry graph by traversing the database network using the specified protocol for a given number of steps.

For many use cases, calling explore once is enough, but specialized workflows may require calling multiple times, performing one step of exploration at a time until some condition is met, or applying different exploration protocols in sequence.

Exploration Protocols

An ExplorationProtocol defines a method for expanding the entry graph by traversing the database. It defines the relations and entries which can be traversed and possibly applies filters to limit the scope of exploration.

Pre-defined Exploration Protocols

ChemRecon comes with a set of pre-defined exploration protocols for various purposes. These are located in the chemrecon.query.default_protocols module. We recommend looking in this file for inspiration on how to define custom protocols.

chemrecon.query.default_protocols.protocol_compound_structure = <chemrecon.entrygraph.explorationprotocol.ExplorationProtocol object>

The Compound-Structure protocol can be used to quickly gain an overview of the structural information relating to a given compound. The database compounds are traversed via the CompoundReference relation in order to expand the graph to include other databases which contain the compound. The CompoundHasMolStructure relation is then used to find the associated structure for each compound. The MolStructureStandardization relation is used to standardize various properties of the structures, which can be helpful in case the databases simply disagree on easy-to-standardize properties, such as charge or tautomerism.

Defining Custom Exploration Protocols

Arbitary protocols are defined by specifying the relation types which can be traversed, and optionally applying filters to entries and relations:

class chemrecon.ExplorationProtocol(relation_types: set[type[Relation] | tuple[type[Relation], Direction]], relation_types_terminal: set[type[Relation]] | None = None, entry_filters: dict[type[Entry], Callable[[Entry], bool]] | None = None, relation_filters: dict[type[Relation], Callable[[Relation], bool]] | None = None)

Bases: object

Represents a protocol for exploring relationships and entries within a graph structure.

This class is designed to define and manage the exploration of entries and relationships in a graph-like data model. Users can specify sets of entry and relationship types, as well as define filters and procedures for exploration.

__init__(relation_types: set[type[Relation] | tuple[type[Relation], Direction]], relation_types_terminal: set[type[Relation]] | None = None, entry_filters: dict[type[Entry], Callable[[Entry], bool]] | None = None, relation_filters: dict[type[Relation], Callable[[Relation], bool]] | None = None)

Specify an exploration protocol.

Parameters:
  • relation_types (set[type[Relation] | tuple[type[Relation], Direction]],) – A set of relation typess to explore. Optionally, (Relation, Direction) tuples can be passed to specify in which direction each relation can be traversed. By default, only the forwards direction is traversed. For symmetric relations, only the Direction.SYMMETRIC value is allowed.

  • relation_types_terminal (Optional[set[type[Relation] | tuple[type[Relation], Direction]]]) – A set of relation types which are not used for expanding the graph, but are added if both endpoints were already found. Here, directionality is not specified.

  • entry_filters (Optional[dict[type[Entry], Callable[[Entry], bool]]]) – An optional dictionary of filters for each entry type. Filters should be a function which accepts an entry, and returns False if the entry should not be included by the protocol.

  • relation_filters (Optional[dict[type[Relation], Callable[[Relation], bool]]]) – Ditto, for relations.

relation_types: set[tuple[type[Relation], Direction]]

List of relation types to traverse.

entry_filters: dict[type[Entry], EntryFilter]

Optional filters for each type of entry.

relation_filters: dict[type[Relation], RelationFilter]

Ditto, for relations.

entry_types: set[type[Entry]]

List of entry types involved in the specified relations

The Direction enum specifies the allowed directions:

class chemrecon.Direction(*values)

Bases: Enum

FORWARDS = 1

From source to target.

BACKWARDS = 2

From target to source.

BOTH = 3

Both of the above.

SYMMETRIC = 4

To be used for symmetric relations.