Modules

dimepy.Spectrum

Initialise Spectrum object for a given mzML file.

class dimepy.Spectrum(filepath: str, identifier: str, injection_order: int = None, stratification: str = None, snr_estimator: str = False, peak_type: str = 'raw', MS1_precision: float = 5e-06, MSn_precision: float = 2e-05)[source]

Initialise Spectrum object for a given mzML file.

__init__(filepath: str, identifier: str, injection_order: int = None, stratification: str = None, snr_estimator: str = False, peak_type: str = 'raw', MS1_precision: float = 5e-06, MSn_precision: float = 2e-05)[source]

Initialise a Spectrum object for a given mzML file.

Arguments:

filepath (str): Path to the mzML file to parse.

identifier (str): Unique identifier for the Spectrum object.

injection_order (int): The injection number of the Spectrum object.

stratification (str): Class label of the Spectrum object.

snr_estimator (str): Signal to noise method used to filter.

Currently supported signal-to-noise estimation methods are:
  • ‘median’ (default)
  • ‘mean’
  • ‘mad’

peak_type (raw): What peak type to load in.

Currently supported peak types are:
  • raw (default)
  • centroided
  • reprofiled

MS1_precision (float): Measured precision for the MS level 1.

MSn_precision (float): Measured precision for the MS level n.

bin(bin_width: float = 0.01, statistic: str = 'mean')[source]

” Method to conduct mass binning to nominal mass and mass spectrum generation across a Spectrum.

Arguments:

bin_width (float): The mass-to-ion bin-widths to use for binning.

statistic (str): The statistic to use to calculate bin values.

Supported statistic types are:
  • ‘mean’ (default): compute the mean of intensities for points within each bin.
    Empty bins will be represented by NaN.
  • ‘std’: compute the standard deviation within each bin. This is
    implicitly calculated with ddof=0.
  • ‘median’: compute the median of values for points within each bin.
    Empty bins will be represented by NaN.
  • ‘count’: compute the count of points within each bin.
    This is identical to an unweighted histogram. values array is not referenced.
  • ‘sum’: compute the sum of values for points within each bin.
    This is identical to a weighted histogram.
  • ‘min’: compute the minimum of values for points within each bin.
    Empty bins will be represented by NaN.
  • ‘max’: compute the maximum of values for point within each bin.
    Empty bins will be represented by NaN.
limit_infusion(threshold: int = 3) → None[source]

This method is a slight extension of Manfred Beckmann’s (meb@aber.ac.uk) LCT/Q-ToF scan retrieval method in FIEMSpro in which we use the median absolute deviation of all TICs within a Spectrum to determine when the infusion has taken place.

Consider the following Infusion Profile:

       _
      / \ 
     /   \_
____/       \_________________
0     0.5     1     1.5     2 [min]
    |--------| Apex

We are only interested in the scans in which the infusion takes place (20 - 50 seconds). Applying this method changes the to_use values to only be True where the TIC is >= TIC * mad_multiplier.

Arguments:

mad_multiplier (int): The multiplier for the median absolute
deviation method to take the infusion profile from.
limit_polarity(polarity: str) → None[source]

Limit the Scans found within the mzML file to whatever polarity is given. This should only be called where fast-polarity switching is used.

Arguments:

polarity (str): polarity type of the scans required

Supported polarity types are:
  • ‘positive’
  • ‘negative’
load_scans() → None[source]

This method loads the scans in accordance to whatever Scans are set to True in the to_use list.

Note

If you want to actually make use of masses and intensities (you probably do), then ensure that you call this method.

remove_spurious_peaks(bin_width: float = 0.01, threshold: float = 0.25, scan_grouping: float = 50.0)[source]

Method that’s highly influenced by Jasen Finch’s (jsf9@aber.ac.uk) binneR, in which spurios peaks can be removed. At the time of writing, this method has serious performance issues and needs to be rectified. but should still work as intended (provided that you don’t mind how long it takes to complete)

Arguments:

bin_width (float): The mass-to-ion bin-widths to use for binning.

threshold (float): Percentage of scans in which a peak must be in
in order for it to be considered.
scan_grouping (float): Mass-to-ion scan groups, this splits the
scans into groups to ease the processing somewhat. It is strongly recommended that you keep this at it’s default value of of 50.0

Note

load_scans() must first be run in order for this to work.

reset() → None[source]

A method to reset the Spectrum object in its entirety.

dimepy.SpectrumList

class dimepy.SpectrumList[source]
__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

append(spectrum: dimepy.spectrum.Spectrum)[source]

Method to append a Spectrum to the SpectrumList.

Arguments:
spectrum (Spectrum): A Spectrum object.
bin(bin_width: float = 0.5, statistic: str = 'mean')[source]

Method to conduct mass binning to nominal mass and mass spectrum generation across a SpectrumList.

Arguments:

bin_width (float): The mass-to-ion bin-widths to use for binning.

statistic (str): The statistic to use to calculate bin values.

Supported statistic types are:
  • ‘mean’ (default): compute the mean of intensities for points within each bin.
    Empty bins will be represented by NaN.
  • ‘std’: compute the standard deviation within each bin. This is
    implicitly calculated with ddof=0.
  • ‘median’: compute the median of values for points within each bin.
    Empty bins will be represented by NaN.
  • ‘count’: compute the count of points within each bin.
    This is identical to an unweighted histogram. values array is not referenced.
  • ‘sum’: compute the sum of values for points within each bin.
    This is identical to a weighted histogram.
  • ‘min’: compute the minimum of values for points within each bin.
    Empty bins will be represented by NaN.
  • ‘max’: compute the maximum of values for point within each bin.
    Empty bins will be represented by NaN.
detect_outliers(threshold: float = 1, verbose: bool = False)[source]

Method to locate and remove outlier spectrum using the median-absolute deviation of the TICS within the SpectrumList.

Note

This method is still being actively developed, so is likely to change.

Arguments:

threshold (int): Threshold for MAD outlier detection.

verbose (bool): Whether to print out the identifiers of
the removed Spectrum.
normalise(method: str = 'tic') → None[source]

Method to conduct sample independent intensity normalisation.

Arguments:

method (str): The normalisation method to use.

Currently supported normalisation methods are:
  • ‘tic’ (default): Normalise to the total ion current
    of the Spectrum:
  • ‘median’: Normalise to the meidan of the Spectrum.
  • ‘mean’: Normalise to the mean of the Spectrum.
to_csv(fp: str, sep: str = ', ', output_type: str = 'base')[source]

Method to export the spectrum list.

Arguments:

fp (str): Filepath to export the file to.

sep (str): Separator to use for file export

output_type (str): What form of output to export:

Supported output types are:
*’base’: masses and intensities of each spectrum in a column each
in a single CSV file.
*’matrix’: The way in which I personally analyse the data.
This will not work unless the data has been binned.

*’metaboanalyst’: A zipfile ready for uploading to metaboanalyst.

transform(method: str = 'log10') → None[source]

Method to conduct sample independent intensity transformation.

Arguments:

method (str): The transformation method to use.

Currently supported transformation methods are:
*’log10’ (default) *’cube’ *’nlog’ *’log2’ *’glog’ *’sqrt’ *’ihs’
value_imputate(method: str = 'min', threshold: float = 0.5) → None[source]

A method to deploy value imputation to the Spectrum List.

Note

As most metabolite selection methods fail to deal with missing values, it is strongly recommended to run this method once binning has been performed over the SpectrumList

Arguments:

method (str): Method to use for value imputation.

Currently supported value imputation methods are:
  • ‘basic’ (default) : Replace thresholded null values
    with half the minimum intensity value per Spec
  • ‘mean’: Replace thresholded null values with the
    mean intensity value per Spec.
  • ‘min’: Replace thresholded null values with the
    minimum intensity value per Spec.
  • ‘median’: Replace thresholded null values with the
    minimum intensity value per Spec.
threshold (float): Number of samples an intensity needs to be
present in to be taken forward for imputation.

dimepy.Scan

class dimepy.Scan(pymzml_spectrum, snr_estimator: str = False, peak_type: str = 'raw')[source]
__init__(pymzml_spectrum, snr_estimator: str = False, peak_type: str = 'raw')[source]

Initialise a Scan object for a given pymzML Spectrum.

Arguments:

pymzml_spectrum (pymzml.Spectrum): Spectrum object.

snr_estimator (str): Signal to noise method used to filter.

peak_type (str): Peaks to take forward.

bin(bin_width: float = 0.01, statistic: str = 'mean') → None[source]

Method to conduct mass binning to nominal mass and mass spectrum generation.

Arguments:

bin_width (float): The mass-to-ion bin-widths to use for binning.

statistic (str): The statistic to use to calculate bin values.

Supported statistic types are:
  • ‘mean’ (default): compute the mean of intensities for points within each bin.
    Empty bins will be represented by NaN.
  • ‘std’: compute the standard deviation within each bin. This is
    implicitly calculated with ddof=0.
  • ‘median’: compute the median of values for points within each bin.
    Empty bins will be represented by NaN.
  • ‘count’: compute the count of points within each bin.
    This is identical to an unweighted histogram. values array is not referenced.
  • ‘sum’: compute the sum of values for points within each bin.
    This is identical to a weighted histogram.
  • ‘min’: compute the minimum of values for points within each bin.
    Empty bins will be represented by NaN.
  • ‘max’: compute the maximum of values for point within each bin.
    Empty bins will be represented by NaN.