phages2050.crawlers.millardlab package

Submodules

phages2050.crawlers.millardlab.crawler module

class phages2050.crawlers.millardlab.crawler.MillardLabPhagesCrawler(url: str)[source]

Bases: object

MillardLab bacteriophages tabular data crawler

This class allows you to create DataFrame or save it as CSV with columns: - Accession - Description - Classification - Genome Length(bp) - molGC

Each of the cell is normalised before by strip and upper strings methods

Example usage:

from crawlers.millardlab.crawler import MillardLabPhagesCrawler

ml_pc = MillardLabPhagesCrawler(
url=’http://millardlab.org/bioinformatics/bacteriophage-genomes/phage-genomes-july2020/

) ml_pc.to_df() ml_pc.to_csv()

to_csv() → None[source]

Return DataFrame as CSV file (filename format: millardlab_<YYYY:MM:DD>.csv)

to_df() → pandas.core.frame.DataFrame[source]

Return data as DataFrame sorted by first column (Accession)

Module contents