OrbiTrack Dev Log 5: Chemical Formula Assignment

In this post, I document the full formula assignment pipeline using MFAssignR, tailored for Orbitrap-MS data post-peak list processing (There is no need to reinvent the wheel here).

The steps include quality control filtering, noise estimation via Kendrick Mass Defect (KMD) plots, isotopic filtering, recalibrant inspection, and final formula assignment.

1. Load Input and Install Required Packages

1
2
3
4
5
setwd("./output/ToF_peak_list/")
devtools::install("MFAssignR")
library(MFAssignR)
input_name = './output/ToF_peak_list/20250731_Punjab_Delhi_Orbitrap+TOF_peak_list'
Data <- read.csv(paste0(input_name, '-raw_mz.csv'))[ , c('m.z', 'intensity')]

2. Reading note: How MFAssignR filter raw peaks and assign formulas

Noise filtering
MFassignR uses a Kendrick Mass Defect (KMD)-based noise filtering approach (KMDNoise) before formula assignment. It analyzes raw spectra to identify background regions free from analyte signals. By slicing along two KMD lines (default intercepts: 0.05 and 0.2), it estimates the baseline noise level. Peaks below a user-defined signal-to-noise (SN) threshold (e.g., 3–10× noise) are removed. This enhances data quality by filtering out low-intensity noise and multiply charged ions.

Formula assignment
MFAssignR applies several non-optional quality assurance (QA) rules to screen out chemically invalid formulas during assignment. Below is a summary of key rules:

🔍 Fundamental Rules

Rule Description
Senior Rule
(Kind & Fiehn, 2007)
Ensures molecular formulas follow known valency and bonding constraints. Useful for identifying feasible adduct or fragment ions.
Nitrogen Rule For odd vs even nominal masses: odd → odd number of N atoms.
Large Atom Rule Large atoms tend to fragment at weak bonds; used to predict fragmentation patterns.
Max Hydrogen Rule Limits H count based on allowed bonding from other atoms. Prevents over-saturation.
Max DBE Rule
(Lobodin et al., 2012)
Ensures formulas have chemically valid unsaturation:
DBE = (2C + 2 + N − X − H)/2

3. Noise Estimation Using KMD

KMDNoise isolates low-intensity regions via Kendrick Mass Defect linear slice filtering:

1
2
3
4
5
6
Noise <- KMDNoise(Data)
plot <- Noise[["KMD"]]
plot
KMDN <- Noise[["Noise"]]
KMDN
SNplot(Data, cut = KMDN * 3, mass = 301.1, window.x = 900, window.y = 25)
SN Plot

Signal-to-noise (S/N) plot

Spectrum noise removal

Spectrum after noise removal

4. Isotope Prescreening

1
2
3
Isotope <- IsoFiltR(Data)
Mono <- Isotope[["Mono"]]
Iso <- Isotope[["Iso"]]

5. Initial CHO Formula Assignment

1
2
3
4
Assign <- MFAssignCHO(Mono, Iso, ionMode = "pos", lowMW =50, highMW = 1000,
POEx= 0, Zx = 1, Mx = 2, Ex = 1,
ppm_err = 3, H_Cmin = 0.3,
HetCut = "off", NMScut = "on", SN = 1*KMDN)

6. Review Assignment Quality

1
2
3
4
5
6
7
Unambig1 <- Assign[["Unambig"]]
Ambig1 <- Assign[["Ambig"]]
Unassigned1 <- Assign[["None"]]
MSAssign <- Assign[["MSAssign"]]
Error <- Assign[["Error"]]
MSgroups <- Assign[["MSgroups"]]
VK <- Assign[["VK"]]

7. Identify Recalibrant Series and Recalibrate

This step uses high-confidence assigned ions (from Unambig1) to refine m/z accuracy via internal recalibration.

1
2
3
4
5
6
7
8
9
10
check <- RecalList(Unambig1)
Test <- Recal(df = Unambig1, peaks = Mono, isopeaks = Iso, mode = "pos",
SN = 2*KMDN, mzRange = 50,
series1 = "O5_Na_2", series2 = "O5_Na_3",
series3 = "O6_Na_3", series4 = "O2_Na_3", series5 = "O3_Na_3")
Plot <- Test[["Plot"]]
Plot
Mono2 <- Test[["Mono"]]
Iso2 <- Test[["Iso"]]
List <- Test[["RecalList"]]

8. Final Formula Assignment with Extended Elements

1
2
3
4
Assign <- MFAssign(Mono2, Iso2, ionMode = "pos", lowMW =50, highMW = 1000,
POEx= 0, Zx = 1, Mx = 2, Ex = 0,
Nx = 3, Sx = 3, ppm_err = 20, H_Cmin = 0.3,
HetCut = "off", DeNovo = 300, NMScut = "on", SN = 0.25*KMDN)

A summary of the parameters used in the MFAssign function:

Parameter Value Meaning
Mono2 [input] Dataframe of monoisotopic masses (from Recal step)
Iso2 [input] Dataframe of isotopic masses (from Recal step)
ionMode "pos" Specifies positive ionization mode
lowMW 50 Lower limit of molecular mass to be assigned
highMW 1000 Upper limit of molecular mass to be assigned
POEx 0 Whether to allow odd-electron positive mode ions (0 = no)
Zx 1 Charge state allowed in formula assignment
Mx 2 Maximum number of sodium adducts (Na) allowed
Ex 0 Amount of 13C isotopes allowed
Nx 3 Maximum number of nitrogen atoms (14N) allowed
Sx 3 Maximum number of sulfur atoms (32S) allowed
ppm_err 20 Error tolerance for formula assignment in ppm
H_Cmin 0.3 Minimum hydrogen-to-carbon (H/C) ratio
HetCut "off" Disable high heteroatom QA filter
DeNovo 300 Cutoff for de novo formula generation (masses above this value are not considered)
NMScut "on" Enable nominal mass series QA check (Koch et al., 2007)
SN 0.25*KMDN Signal-to-noise threshold for formula assignment, scaled by KMD-based noise estimate

9. Save Final Outputs

1
2
3
4
5
6
7
Unambig2 <- Assign[["Unambig"]]
Ambig2 <- Assign[["Ambig"]]
Unassigned2 <- Assign[["None"]]
MSAssign <- Assign[["MSAssign"]]
Error <- Assign[["Error"]]
MSgroups <- Assign[["MSgroups"]]
VK <- Assign[["VK"]]

1
2
3
4
5
6
input_name = './output/ToF_peak_list/20250731_Punjab_Orbitrap+TOF_peak_list'
write.csv(Unambig2, file = paste0(input_name, '-assigned_mz.csv'))
write.csv(Unassigned2,file = paste0(input_name, '-un-assigned_mz.csv'))
write.csv(Ambig2, file = paste0(input_name, '-ambiguous_ion.csv'))
write.csv(Iso2, file = paste0(input_name, '-isotope_mz.csv'))

The assigned formula will be shown as:

Reference

  1. Kind, T. & Fiehn, O. (2007). Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry. BMC Bioinformatics, 8, 105
  2. Schum, S.K., Brown, L.E., & Mazzoleni, L.R. (2020). MFAssignR: Molecular formula assignment software for ultrahigh resolution mass spectrometry analysis of environmental complex mixtures. Environmental Research, https://doi.org/10.1016/j.envres.2020.11011
  3. MFAssignR github page, https://github.com/skschum/MFAssignR
OrbiTrack Dev Log 6: TOF Fitting Refiner OrbiTrack Dev Log 4: TOF Missing Peaks Adding

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×