phages2050.classifiers.proteins package

Submodules

phages2050.classifiers.proteins.structural_protein module

class phages2050.classifiers.proteins.structural_protein.BacteriophageStructuralProteinClassifier(model_path: str, label_encoder_path: str)[source]

Bases: object

Classifier is responsible to load and execute pre-trained model and label encoder for phage structural protein prediction. This model support 11 proteins classes: - HTJ - basplate - collar - major_capsid - major_tail - minor_capsid - minor_tail - other - portal - tail_fiber - tail_shaft

The model accuracy is 96.92% on training and 95.64% on validation sets after 10-fold cross-validation. Model was trained with 11 000 samples.

FEATURE_SPACE = 1024
SUPPORTED_COLUMNS = ['predicted_index', 'predicted_class', 'accuracy']
predict(protein_vector: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]

Execute classification model and return best prediction as DataFrame with three columns: - “predicted_index” - predicted protein class index - “predicted_class” - predicted protein class name - “accuracy” - accuracy of prediction (0-100%)

This method can be executed many times for different protein vectors

protein_vector is represented by DataFrame with 1024 numeric values as a result of BERT embedding

class phages2050.classifiers.proteins.structural_protein.BacteriophageStructuralProteinManager(root_dir: str = 'bsp_model')[source]

Bases: object

Manager class is responsible to download and unzip pre-trained model and label encoder for Bacteriophage Structural Protein classification

BSP_LABELS_URL = b'https://deeppetri.ai/static/phages2050/bsp_label_encoder_21.08.2020.zip'
BSP_MODEL_URL = b'https://deeppetri.ai/static/phages2050/bsp_model_21.08.2020.zip'
STATUS_CODE_200 = 200
download_model()[source]

Download pre-trained model and label encoder and unzip them into directories

This procedure should be executed once and the result loaded by BacteriophageStructuralProteinClassifier class instance

Module contents