Identifier Types¶
Entries of the types
Compound,
Reaction,
Enzyme,
MolStructureRepr, and
AAM
are characterized by their identifiers (e.g. MNXM123456 for a MetaNetX compound).
However, we cannot expect identifiers to be unique across data sources.
Therefore, each entry is recognized by both its identifier and its IdentifierType.
This page lists the available IdentifierTypes.
- class chemrecon.IdentifierType(name: str, shortname: str, alt_names: set[str] = None, prefixes: set[str] = None, suffixes: set[str] = None, stdfunc: Callable[[str], str] = None, recogniser: Pattern | None = None, objectname: str = None)¶
Bases:
objectRepresents a general type of identifier with attributes and methods for standardization, recognition, and manipulation.
This class is designed to encapsulate information about a specific type of identifier, including its primary name, alternative names, recognizable patterns, and standardization logic.
The class also registers identifier types in global lookup dictionaries to facilitate type recognition and access.
- enum_type: IdType¶
The corresponding Enum value, as present in entries
- id_org_prefix: str¶
Prefix in identifiers.org
- name: str¶
Primary name of this identifier type.
- shortname: str¶
The name used for the type in the database.
- alt_names: set[str]¶
Alternative names to search for.
- stdfunc: Callable[[str], str]¶
Function used to standardize identifiers.
- recogniser: re.Pattern | None¶
Pattern used to recognize identifiers of this type.
- std_identifier(s: str) str¶
Standardize a given identifier of this type.
- trim(s: str) str¶
Remove pre- and suffixes of the string, including identifiers.org urls.
Note that the id_type field of entries returned by database queries does not contain an IdentifierType
object, but instead an enum value.
The enum_type field of IdentifierType objects gives the corresponding enum value.
Conversely, the enum value can be used to retrieve the corresponding IdentifierType object by the .value
field of the enum value.
For compounds and reactions, the identifier type corresponds to the source database from which the entry was obtained. Enzymes are unique in that most sources agree to classify enzymes based on their EC number, so this is the only identifier type for enzymes. However, other sources can be added to the database.
For MolStructureRepr and AAM, the identifier represents a structure, and the
identifier type then specifies which type of representation is used, e.g. SMILES, InChI.
The the name identifier types below do not represent concrete database entries, but
represent common or systematic names given to various entries.
Whenever an entry has a name attribute, a new name entry is created with that name as the identifier.
The tables below list the Enums with their values.
Compound Identifier Types¶
- enum chemrecon.schema.IdTypeCompoundEnum(value)¶
Bases:
IdTypeEnumValid values are as follows:
- unknown = unknown¶
- cname = cname¶
- mnx = mnx¶
- bigg = bigg¶
- chebi = chebi¶
- pubchem_cid = pubchem_cid¶
- kegg = kegg¶
- ecmdb = ecmdb¶
- inchikey = inchikey¶
- slm = slm¶
- envipath = envipath¶
- lipidmaps = lipidmaps¶
- hmdb = hmdb¶
- metacyc = metacyc¶
- seed = seed¶
- sabiork = sabiork¶
- reactome = reactome¶
- pdbe = pdbe¶
- biocyc = biocyc¶
- metamdb = metamdb¶
- brenda = brenda¶