Identifier Types

Entries of the types Compound, Reaction, Enzyme, MolStructureRepr, and AAM are characterized by their identifiers (e.g. MNXM123456 for a MetaNetX compound). However, we cannot expect identifiers to be unique across data sources. Therefore, each entry is recognized by both its identifier and its IdentifierType. This page lists the available IdentifierTypes.

class chemrecon.IdentifierType(name: str, shortname: str, alt_names: set[str] = None, prefixes: set[str] = None, suffixes: set[str] = None, stdfunc: Callable[[str], str] = None, recogniser: Pattern | None = None, objectname: str = None)

Bases: object

Represents a general type of identifier with attributes and methods for standardization, recognition, and manipulation.

This class is designed to encapsulate information about a specific type of identifier, including its primary name, alternative names, recognizable patterns, and standardization logic.

The class also registers identifier types in global lookup dictionaries to facilitate type recognition and access.

enum_type: IdType

The corresponding Enum value, as present in entries

id_org_prefix: str

Prefix in identifiers.org

name: str

Primary name of this identifier type.

shortname: str

The name used for the type in the database.

alt_names: set[str]

Alternative names to search for.

stdfunc: Callable[[str], str]

Function used to standardize identifiers.

recogniser: re.Pattern | None

Pattern used to recognize identifiers of this type.

std_identifier(s: str) str

Standardize a given identifier of this type.

trim(s: str) str

Remove pre- and suffixes of the string, including identifiers.org urls.

Note that the id_type field of entries returned by database queries does not contain an IdentifierType object, but instead an enum value. The enum_type field of IdentifierType objects gives the corresponding enum value. Conversely, the enum value can be used to retrieve the corresponding IdentifierType object by the .value field of the enum value.

For compounds and reactions, the identifier type corresponds to the source database from which the entry was obtained. Enzymes are unique in that most sources agree to classify enzymes based on their EC number, so this is the only identifier type for enzymes. However, other sources can be added to the database.

For MolStructureRepr and AAM, the identifier represents a structure, and the identifier type then specifies which type of representation is used, e.g. SMILES, InChI.

The the name identifier types below do not represent concrete database entries, but represent common or systematic names given to various entries. Whenever an entry has a name attribute, a new name entry is created with that name as the identifier.

The tables below list the Enums with their values.

Compound Identifier Types

enum chemrecon.schema.IdTypeCompoundEnum(value)

Bases: IdTypeEnum

Valid values are as follows:

unknown = unknown
cname = cname
mnx = mnx
bigg = bigg
chebi = chebi
pubchem_cid = pubchem_cid
kegg = kegg
ecmdb = ecmdb
inchikey = inchikey
slm = slm
envipath = envipath
lipidmaps = lipidmaps
hmdb = hmdb
metacyc = metacyc
seed = seed
sabiork = sabiork
reactome = reactome
pdbe = pdbe
biocyc = biocyc
metamdb = metamdb
brenda = brenda

Reaction Identifier Types

enum chemrecon.schema.IdTypeReactionEnum(value)

Bases: IdTypeEnum

Valid values are as follows:

unknown = unknown
rname = rname
mnx = mnx
metacyc = metacyc
bigg = bigg
seed = seed
kegg = kegg
rhea = rhea
sabiork = sabiork
metamdb = metamdb
mcsa = mcsa
brenda = brenda

Enzyme Identifier Types

enum chemrecon.schema.IdTypeEnzymeEnum(value)

Bases: IdTypeEnum

Valid values are as follows:

unknown = unknown
ename = ename
ec = ec

MolStructureRepr Identifier Types

enum chemrecon.schema.IdTypeStructureRepresentationEnum(value)

Bases: IdTypeEnum

Valid values are as follows:

unknown = unknown
smiles = smiles
inchi = inchi
molfile = molfile
gml = gml

AAM Identifier Types

enum chemrecon.schema.IdTypeAAMEnum(value)

Bases: IdTypeEnum

Valid values are as follows:

unknown = unknown
reactionsmiles = reactionsmiles
rxn = rxn
gml_rule = gml_rule