phages2050.features.extractors package

Submodules

phages2050.features.extractors.proteins module

class phages2050.features.extractors.proteins.MultifastaProteinFeatureExtractor(fasta_path: str)[source]

Bases: object

Feature extraction from proteins sequences from multifasta file

This class allows you to create DataFrame or save it as CSV

Example usage:

from features.extractors.proteins import MultifastaProteinFeatureExtractor

mpfe = MultifastaProteinFeatureExtractor(protein_sequence=’multifasta-example.fasta’) mpfe.to_df() mpfe.to_csv()

to_csv(csv_fname: str) → None[source]

Return DataFrame as CSV file (filename format: <csv_fname>.csv)

to_df() → pandas.core.frame.DataFrame[source]

Return extracted features from each proteins as DataFrame

class phages2050.features.extractors.proteins.ProteinFeatureExtractor(protein_sequence: str)[source]

Bases: object

Feature extraction from protein sequence for Machine Learning classification or deeper analysis

Example usage:

from features.extractors.proteins import ProteinFeatureExtractor

pfe = ProteinFeatureExtractor(protein_sequence=’MAKINELLRESTTTNSNSIGRPNLVALTRATTKLIYSDIVATQRTNQPVAA’) pfe.get_features()

FEATURE_NAMES = ['protein_length', 'gravy', 'molecular_weight', 'aromaticity', 'instability_index', 'isoelectric_point', 'flexibility', 'mec_cysteines', 'mec_cystines', 'ssf_helix', 'ssf_turn', 'ssf_sheet']
get_features() → Mapping[str, Union[int, float, None]][source]

Return full feature space for single protein as Python dict

Module contents