Enhancing Top-Down Proteomics Of Brain Tissue With FAIMS Part 2

Data Analysis

Proteoform identification was performed with TopPIC version 1.3.53 Settings for TopPIC included a precursor window of 3 m/z (to account for isotopic envelope), a mass error tolerance of 15 ppm, a max/min unknown mass shift of 500 Da, and a maximum number of allowed unknown modifications of 1.

In recent years, more and more studies have shown that there is a close relationship between protein morphology and memory. Proteins are the most basic molecules in life activities. They can control the life activities of cells and maintain the normal operation of the human body. Studies have found that the morphology of proteins not only plays an important role in cell life activities but also has an inseparable relationship with the memory function in the brain. Once the morphology of proteins changes, it will affect the connection between neurons, thereby affecting people's memory ability.

In the early stages of life, changes in protein morphology play a vital role in the development and function of the brain. If the morphology of proteins is not as it should be, it will cause a series of neurological problems, eventually leading to memory loss, cognitive impairment, and so on.

However, there is no need to worry. Modern medical technology has allowed people to adjust the morphology of proteins through diet to improve their memory. A rich and varied diet is rich in protein, but when choosing protein, try to choose low-fat, low-cholesterol, and low-carbohydrate foods to avoid the adverse effects of excessive intake on the body.

In short, protein morphology and memory are inseparable. We should pay attention to a healthy diet and balanced protein intake to maintain good health while maintaining good memory. It can be seen that we need to improve memory, and Cistanche can significantly improve memory because it has antioxidant, anti-inflammatory, and anti-aging effects, which can help reduce oxidation and inflammatory reactions in the brain, thereby protecting the health of the nervous system. In addition, Cistanche can also promote the growth and repair of nerve cells, thereby enhancing the connectivity and function of neural networks. These effects can help improve memory, learning ability, and thinking speed, and can also prevent the occurrence of cognitive dysfunction and neurodegenerative diseases.

increase brain power

Click Know to improve short-term memory

MS2 spectra were searched against a database concatenated with entries from Homo sapiens Swiss-Prot (20,352), Swiss-Prot splice variants (22,000), and TrEMBL (54,436), as well as common contaminants. Identified proteoforms were filtered to a false discovery rate (FDR) of 1% through TopPIC.

Downstream data analysis was performed in the R environment for statistical computing and figure generation.54 In downstream analysis, proteoforms from the same gene with the same starting and ending amino acids that were found to be within ±5 Da were combined as a single proteoform to increase the stringency of our assignments.

Although the 5 Da threshold is arbitrary, this approach compensated for incorrectly assigned monoisotopic peaks, ambiguity in unknown mass shifts, and other artifactual deviations. To determine the relative standard deviation (RSD) of proteoforms within replicates, we utilized the "feature intensity" output from TopPIC.

Fragmentation sequence coverage maps and associated spectra were generated using the LCMsSpectator software.55

Specific settings for LCMsSpectator included a precursor ion tolerance of 10 ppm, a production tolerance of 10 ppm, a minimum S/N threshold of 1.5, a Pearson correlation threshold of 0.7, a default smoothing of 9 points, and a precursor isotope relative intensity threshold of 0.1.

RESULTS AND DISCUSSION

Addition of FAIMS Robustly Increases Proteome Coverage

The cortex tissue sample from a patient diagnosed with Alzheimer's was processed following the workflow, as shown in Figure 1. We first sought to analyze the performance of FAIMS on various CVs in comparison to those without FAIMS (referred to as "No FAIMS").

It is worth remembering that the FAIMS interface produces a modest loss of ion transmission, presumably due to the longer ion path through the source.56 Therefore, to better represent typical conditions and ensure equitable comparisons, the "No FAIMS" data were collected using the same sample without the FAIMS unit installed.

These "No FAIMS" data were collected in triplicate and analyzed according to the workflow, as shown in Figure 1 with a top 6 DDA MS2 analysis. The "No FAIMS" data sets on average identified 754 ± 35 proteoforms (Figure 2A and Table S1) and collectively identified 1073 unique (non-redundant) proteoforms (Figure 2B) derived from 293 unique genes (Figure 2C), covering 29,359 amino acids across the proteome (Figure 2D).

The metric "proteome coverage" (Figure 2D) is defined as non-redundant amino acids covered by each proteoform's sequence. Essentially, this metric accounts for both the length and diversity of the identified proteoforms.

We consider this to be a more balanced metric compared to the raw number of proteoforms and genes as it avoids biases toward smaller proteoforms. Although simple counts of proteoforms or genes are intuitive, these metrics appear to be strongly influenced by short proteolytic fragments. The FAIMS data sets were collected from −50 to −20 CV scanned in steps of 5 V.

Each of the resulting 7 CVs was collected in triplicate. Concerning proteoforms and genes, FAIMS outperformed "No FAIMS" for all the CVs tested within the −50 to −30 CV range (Figure 2A–C).

For example, the three data sets at −50 CV identified 1833 ± 17 proteoforms, an average increase of ∼140% per run compared to "No FAIMS". Collectively, these three data sets at −50 CV identified 2564 unique proteoforms from 530 unique genes covering a total of 43,437 amino acids, an increase of 95, 69, and 69% over the three "No FAIMS" data sets, respectively (Figure 2B–D and Table S1).

increase memory

Also, although FAIMS at −25 V observes fewer unique proteoforms and genes compared to "No FAIMS", the proteoforms themselves are much longer on average. Thus, when considering proteoform length, (Figure 2D), the −25 V data sets cover nearly as many amino acids from the proteome (28,398) relative to "No FAIMS" (29,359), doing so with only half as many proteoforms (Figure 2B).

We also investigated whether FAIMS could provide as robust and reproducible an analysis compared to without FAIMS. We assessed quantification reproducibility by calculating the RSD (equivalent to the coefficient of variation) of each proteoform's feature intensity, provided the proteoform met the criterion of being observed across all three replicate data sets for each CV setting.

Because the FAIMS and "No FAIMS" data sets were all collected from a single biological sample, variance among the replicates should be exclusively attributed to instrumental factors.

The boxplot in Figure 3 (as well as individual histograms in Figure S1) demonstrates how the distributions of RSDs compare between the FAIMS and "No FAIMS" data sets.

Based on median RSD, the addition of FAIMS resulted in similar or slightly improved quantification quality relative to "No FAIMS". The improvement in RSDs among lower voltages (−20 to −30 CV) may be related to the observation that fewer proteoforms are transmitted within that range, providing higher quality MS1 spectra and therefore better estimations of feature intensity.

Furthermore, it is worth noting that the RSDs determined from these replicates are similar to previous top-down55 and bottom-up57,58 experiments performed with similar instrumentation.

Taken together, these data demonstrate that the addition of FAIMS to TDP robustly increases proteoform identifications, without any sacrifice of quantification quality.

We next investigated the overlap in identifications between the CVs, which could provide the largest non-redundant set of observations.

Using overlap coefficients, defined as the intersection between two sets divided by the smaller of the two sets, we determined the similarity between each data set. The overall mean overlap coefficient for both proteoform and gene identifications within each FAIMS CV was 0.76.

This is comparable to the overlap coefficient from "No FAIMS" data sets (0.77), suggesting that the technical replicates show similar reproducibility. A heatmap displaying the FAIMS overlap coefficients of proteoforms and genes is shown in Figures 4 and S2, respectively.

Notably, the overlap of gene identifications across the CV space is much greater relative to proteoforms, although it is particularly noticeable when comparing the longest CV distances.

For example, the overlap between −50 and −20 CV is 0.64 at the gene level and 0.06 at the proteoform level. This was expected as each gene can potentially be represented by multiple proteoforms.

ways to improve brain function

Figure 4 demonstrates that while a 5−10 V difference in CV produces overlap coefficients mostly above 0.5, distances greater than 15 V produce larger degrees of dissimilarity and are therefore better spaced for capturing sufficiently different sets of proteoforms.

With these data sets, we next pursued determining which combinations of CVs are optimal for achieving maximal gene identifications, proteoform identifications, and sequence coverage with constraints on the number of CVs for each combination.

First, we established the optimal combinations for any number of CVs from 1 to 7 using the metrics of unique proteoforms, genes, and proteome sequence coverage (Figure 5).

Using combinations limited to three CVs as an example, Figure 5A, B details how −35, −40, and −50 V are ideal if one desires to maximize the number of proteoforms or genes.

The proteoforms and genes identified in these three CVs (nine data sets) cover 84% of the total unique proteoforms and 92% of the total genes over the entire set of 21 data sets.

When weighing a proteoform's length with sequence coverage as a metric, the ideal combination is at −30, −40, and −50 CV, covering 88% of the unique amino acids from all 21 FAIMS data sets with only a modest loss of proteoform and gene identifications (4221 proteoforms and 723 genes, Figure 5C). However, these combinations include all replicates and represent the maximum identifications that could be achieved for a given combination of CVs. Therefore, to demonstrate the advantage of external CV stepping, we decided to investigate what numbers could reasonably be achieved on average with just three runs, where each run is a different CV.

Using the −50, −40, and −30 V data sets as an example, we determined the average unique proteoforms, unique genes, and proteome sequence coverages based on the 27 unique combinations from the three different replicates within each CV.

Based on this analysis, we could achieve 2986 (±46) unique proteoforms and 618 (±12) unique genes covering 64,534 (±1089) amino acids on average. Compared to the "No FAIMS" triplicate data sets, which identified 1073 unique proteoforms and 293 unique genes covering 29,359 amino acids, external CV stepping at −50, −40, and −30 V could more than double each metric.

Nevertheless, the optimal combination of CVs will also likely depend on the organism and tissue the sample is derived from, the sample preparation steps applied, as well as any additional offline fractionation implemented before LC-MS analysis.

Overall, when taken together with the overlap analysis (Figure 4), suggest that 15 V or greater distances between CVs provide the least amount of overlap between proteoforms, while 5−10 V separation within the −50 to −30 CV maximizes genes, proteoforms, and proteome sequence coverage.

To further determine the impact of FAIMS on the depth of proteome coverage, we compared the genes identified from our top-down data sets to a typical bottom-up analysis of the human brain.

Comprehensive bottom-up data sets of human brain tissue allowed us to estimate abundances for 8528 proteins using weighted spectral counting.59–61 Proteins were binned into 10 different abundance percentiles based on spectral counts compiled from several bottom-up data sets of the human brain.

By cross-referencing with our top-down data sets, we were able to determine where the genes and the proteoforms are derived from and rank in terms of estimated protein abundance (using data set-normalized spectral counts) in the human brain (Figure 6).

improve your memory