Modules¶
dimepy.Spectrum¶
Initialise Spectrum object for a given mzML file.
-
class
dimepy.
Spectrum
(filepath: str, identifier: str, injection_order: int = None, stratification: str = None, snr_estimator: str = False, peak_type: str = 'raw', MS1_precision: float = 5e-06, MSn_precision: float = 2e-05)[source]¶ Initialise Spectrum object for a given mzML file.
-
__init__
(filepath: str, identifier: str, injection_order: int = None, stratification: str = None, snr_estimator: str = False, peak_type: str = 'raw', MS1_precision: float = 5e-06, MSn_precision: float = 2e-05)[source]¶ Initialise a Spectrum object for a given mzML file.
- Arguments:
filepath (str): Path to the mzML file to parse.
identifier (str): Unique identifier for the Spectrum object.
injection_order (int): The injection number of the Spectrum object.
stratification (str): Class label of the Spectrum object.
snr_estimator (str): Signal to noise method used to filter.
- Currently supported signal-to-noise estimation methods are:
- ‘median’ (default)
- ‘mean’
- ‘mad’
peak_type (raw): What peak type to load in.
- Currently supported peak types are:
- raw (default)
- centroided
- reprofiled
MS1_precision (float): Measured precision for the MS level 1.
MSn_precision (float): Measured precision for the MS level n.
-
bin
(bin_width: float = 0.01, statistic: str = 'mean')[source]¶ ” Method to conduct mass binning to nominal mass and mass spectrum generation across a Spectrum.
- Arguments:
bin_width (float): The mass-to-ion bin-widths to use for binning.
statistic (str): The statistic to use to calculate bin values.
- Supported statistic types are:
- ‘mean’ (default): compute the mean of intensities for points within each bin.
- Empty bins will be represented by NaN.
- ‘std’: compute the standard deviation within each bin. This is
- implicitly calculated with ddof=0.
- ‘median’: compute the median of values for points within each bin.
- Empty bins will be represented by NaN.
- ‘count’: compute the count of points within each bin.
- This is identical to an unweighted histogram. values array is not referenced.
- ‘sum’: compute the sum of values for points within each bin.
- This is identical to a weighted histogram.
- ‘min’: compute the minimum of values for points within each bin.
- Empty bins will be represented by NaN.
- ‘max’: compute the maximum of values for point within each bin.
- Empty bins will be represented by NaN.
-
limit_infusion
(threshold: int = 3) → None[source]¶ This method is a slight extension of Manfred Beckmann’s (meb@aber.ac.uk) LCT/Q-ToF scan retrieval method in FIEMSpro in which we use the median absolute deviation of all TICs within a Spectrum to determine when the infusion has taken place.
Consider the following Infusion Profile:
_ / \ / \_ ____/ \_________________ 0 0.5 1 1.5 2 [min] |--------| Apex
We are only interested in the scans in which the infusion takes place (20 - 50 seconds). Applying this method changes the to_use values to only be True where the TIC is >= TIC * mad_multiplier.
Arguments:
- mad_multiplier (int): The multiplier for the median absolute
- deviation method to take the infusion profile from.
-
limit_polarity
(polarity: str) → None[source]¶ Limit the Scans found within the mzML file to whatever polarity is given. This should only be called where fast-polarity switching is used.
- Arguments:
polarity (str): polarity type of the scans required
- Supported polarity types are:
- ‘positive’
- ‘negative’
-
load_scans
() → None[source]¶ This method loads the scans in accordance to whatever Scans are set to True in the to_use list.
Note
If you want to actually make use of masses and intensities (you probably do), then ensure that you call this method.
-
remove_spurious_peaks
(bin_width: float = 0.01, threshold: float = 0.25, scan_grouping: float = 50.0)[source]¶ Method that’s highly influenced by Jasen Finch’s (jsf9@aber.ac.uk) binneR, in which spurios peaks can be removed. At the time of writing, this method has serious performance issues and needs to be rectified. but should still work as intended (provided that you don’t mind how long it takes to complete)
- Arguments:
bin_width (float): The mass-to-ion bin-widths to use for binning.
- threshold (float): Percentage of scans in which a peak must be in
- in order for it to be considered.
- scan_grouping (float): Mass-to-ion scan groups, this splits the
- scans into groups to ease the processing somewhat. It is strongly recommended that you keep this at it’s default value of of 50.0
Note
load_scans() must first be run in order for this to work.
-
dimepy.SpectrumList¶
-
class
dimepy.
SpectrumList
[source]¶ -
append
(spectrum: dimepy.spectrum.Spectrum)[source]¶ Method to append a Spectrum to the SpectrumList.
- Arguments:
- spectrum (Spectrum): A Spectrum object.
-
bin
(bin_width: float = 0.5, statistic: str = 'mean')[source]¶ Method to conduct mass binning to nominal mass and mass spectrum generation across a SpectrumList.
- Arguments:
bin_width (float): The mass-to-ion bin-widths to use for binning.
statistic (str): The statistic to use to calculate bin values.
- Supported statistic types are:
- ‘mean’ (default): compute the mean of intensities for points within each bin.
- Empty bins will be represented by NaN.
- ‘std’: compute the standard deviation within each bin. This is
- implicitly calculated with ddof=0.
- ‘median’: compute the median of values for points within each bin.
- Empty bins will be represented by NaN.
- ‘count’: compute the count of points within each bin.
- This is identical to an unweighted histogram. values array is not referenced.
- ‘sum’: compute the sum of values for points within each bin.
- This is identical to a weighted histogram.
- ‘min’: compute the minimum of values for points within each bin.
- Empty bins will be represented by NaN.
- ‘max’: compute the maximum of values for point within each bin.
- Empty bins will be represented by NaN.
-
detect_outliers
(threshold: float = 1, verbose: bool = False)[source]¶ Method to locate and remove outlier spectrum using the median-absolute deviation of the TICS within the SpectrumList.
Note
This method is still being actively developed, so is likely to change.
- Arguments:
threshold (int): Threshold for MAD outlier detection.
- verbose (bool): Whether to print out the identifiers of
- the removed Spectrum.
-
normalise
(method: str = 'tic') → None[source]¶ Method to conduct sample independent intensity normalisation.
- Arguments:
method (str): The normalisation method to use.
- Currently supported normalisation methods are:
- ‘tic’ (default): Normalise to the total ion current
- of the Spectrum:
- ‘median’: Normalise to the meidan of the Spectrum.
- ‘mean’: Normalise to the mean of the Spectrum.
-
to_csv
(fp: str, sep: str = ', ', output_type: str = 'base')[source]¶ Method to export the spectrum list.
- Arguments:
fp (str): Filepath to export the file to.
sep (str): Separator to use for file export
output_type (str): What form of output to export:
-
transform
(method: str = 'log10') → None[source]¶ Method to conduct sample independent intensity transformation.
-
value_imputate
(method: str = 'min', threshold: float = 0.5) → None[source]¶ A method to deploy value imputation to the Spectrum List.
Note
As most metabolite selection methods fail to deal with missing values, it is strongly recommended to run this method once binning has been performed over the SpectrumList
- Arguments:
method (str): Method to use for value imputation.
- Currently supported value imputation methods are:
- ‘basic’ (default) : Replace thresholded null values
- with half the minimum intensity value per Spec
- ‘mean’: Replace thresholded null values with the
- mean intensity value per Spec.
- ‘min’: Replace thresholded null values with the
- minimum intensity value per Spec.
- ‘median’: Replace thresholded null values with the
- minimum intensity value per Spec.
- threshold (float): Number of samples an intensity needs to be
- present in to be taken forward for imputation.
-
dimepy.Scan¶
-
class
dimepy.
Scan
(pymzml_spectrum, snr_estimator: str = False, peak_type: str = 'raw')[source]¶ -
__init__
(pymzml_spectrum, snr_estimator: str = False, peak_type: str = 'raw')[source]¶ Initialise a Scan object for a given pymzML Spectrum.
Arguments:
pymzml_spectrum (pymzml.Spectrum): Spectrum object.
snr_estimator (str): Signal to noise method used to filter.
peak_type (str): Peaks to take forward.
-
bin
(bin_width: float = 0.01, statistic: str = 'mean') → None[source]¶ Method to conduct mass binning to nominal mass and mass spectrum generation.
- Arguments:
bin_width (float): The mass-to-ion bin-widths to use for binning.
statistic (str): The statistic to use to calculate bin values.
- Supported statistic types are:
- ‘mean’ (default): compute the mean of intensities for points within each bin.
- Empty bins will be represented by NaN.
- ‘std’: compute the standard deviation within each bin. This is
- implicitly calculated with ddof=0.
- ‘median’: compute the median of values for points within each bin.
- Empty bins will be represented by NaN.
- ‘count’: compute the count of points within each bin.
- This is identical to an unweighted histogram. values array is not referenced.
- ‘sum’: compute the sum of values for points within each bin.
- This is identical to a weighted histogram.
- ‘min’: compute the minimum of values for points within each bin.
- Empty bins will be represented by NaN.
- ‘max’: compute the maximum of values for point within each bin.
- Empty bins will be represented by NaN.
-