The Effect of Noise in Raman Spectra on the Reconstruction of the Concentration of Amino Acids in the Mixture by Multivariate Curve Resolution (MCR) Analysis

Changes in the concentration of free amino acids in biological tissues is a sign of impaired protein metabolism in patients with cancer. Recently, Raman spectroscopy has been used for early diagnostics of oncological diseases. The concentrations of individual components of biological tissue (for instance, the concentrations of amino acids) can be obtained by decomposing the tissue Raman spectrum. This study was designed to evaluate the effect of noise in the Raman spectra of individual amino acids on the result of the decomposition of the spectra of an amino acid mixture. As a decomposition method, we used Multivariate Curve Resolution-Alternating Least Squares (MCR–ALS) analysis and investigate experimental Raman spectra of amino acids and mathematically simulated Raman spectra of amino acid mixtures. Noise with different signal-to-noise ratios (SNR) was artificially added to both the experimental spectra of pure amino acids and the spectra of the mixtures. Concentration values for each amino acid obtained as a result of applying the MCR–ALS analysis have been compared with the corresponding true values and the correlation coefficients have been calculated. The results show a less pronounced negative effect of noise in the case when the spectra of pure amino acids (which were used as a basis for the MCR–ALS analysis) are noisy, and a more pronounced negative effect when the spectrum of the mixture is noisy. The accuracy of reconstruction of an amino acid is also negatively affected by strong background fluorescence in the amino acid spectrum. Moreover, the results indicate that using the basis spectra with a high SNR (SNR = 5) makes it possible to successfully estimate the amino acid concentrations in a mixture even when the Raman spectrum of the mixture is noisy and has a low SNR (SNR < 5).


Introduction
Cancer is the second leading cause of death worldwide. The most common type of cancer was lung cancer. According to the World Health Organization, there was about 2.21 million new cases in 2020. Moreover, lung cancer was the most common cause of cancer death in 2020 (1.80 million deaths). It is well known that lung cancer risk and mortality can be reduced by early detection of cases [1].
It is also known that during the malignant tumor development, changes in protein metabolism occur in the patient's body. Many researchers have described changes in plasma free amino acid (PFAA) profiles in patients with cancer. Kubota, Meguid, and Hitch [2] analyzing PFAA in venous blood of patients with breast cancer, gastrointestinal tract cancer, and head and neck cancer suggested that PFAA profiles correlate diagnostically with the organ-site origin of three different kinds of malignant tumors. Miyagi et al. [3] determined the characteristics of the PFAA profiles in cancer patients with one of five types of cancer: lung, gastric, colorectal, breast, or prostate cancer. PFAA profiling for detecting lung cancer was also studied by Shingyoji et al. [4], Zhao et al. [5], and Proenza et al. [6]. These findings suggest that PFAA profiling has great potential for improving early detection of lung cancer.
It makes us look for new methods to analyze PFAA profiles. Recently, Raman spectroscopy has been used for early diagnostics of oncological diseases. Bratchenko et al. [7] have shown that this method can be used in the diagnosis of cancer, such as skin neoplasms. Moreover, Raman spectroscopy is an optical method that is relevant for the analysis of liquid media. So, we believe that Raman spectroscopy can be used for non-invasive analysis of blood plasma. The Raman spectra of PFAAs are specific and can be used to successfully evaluate PFAAs' concentration in a mixture of different substances by the Raman spectrum of this mixture.
In this study, we use Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS) method to analyze the Raman spectra of amino acid mixtures. This method is widely used to reconstruct the concentration profiles of chemicals analysis [8,9]. Recently, MCR-ALS method has found wide biological and medical applications [8,10] and has been used for the analysis of spectral data when it is required to determine the concentrations of complex mixture components from spectra. For example, Xu and Rice [11] used a MCR spectral unmixing in fluorescence imaging. Chen et al. [12] used Raman spectroscopic detection of keratin with MCR analysis for automatic oral cancer diagnosis. Iwasaki et al. [13] investigated the possibilities of discrimination of breast cancer cells from normal mammary epithelial cells by Raman microspectroscopy and MCR analysis. It should be noted that the use of MCR-ALS analysis makes it possible not only to estimate the concentrations of components, but also to obtain their "pure" Raman spectra [14].
However, the MCR-ALS analysis can be sensitive to noise in the Raman spectra analyzed. In the practical application, the efficiency of evaluation of PFAAs' concentration in a mixture may decrease due to the fact that the Raman spectra contain a noise signal. We also suppose that, if MCR-ALS uses a known predetermined basis of amino acids, the result can be affected by noise in both the pure amino acid Raman spectra of the basis and Raman spectra of the mixtures. On the one hand, we can provide high-quality registration of basis Raman spectra (that is, the spectra of pure amino acids that are used in the MCR-ALS analysis) with high signal-tonoise ratio (SNR) using a spectroscopic setup with high spectral resolution and increasing integration time. In addition, pure amino acids may be available for registration (or Raman microscopy of the samples may be used). And finally, it is enough to record the spectra (which are used as the basis) only once -then one can use them when analyzing other samples of mixtures. On the other hand, in a clinical setting, high-quality recording of Raman spectra can be difficult due to the large patient flow and limited time to examine a patient. Therefore, it can be assumed that the Raman spectra, which are subject to analysis by the MCR-ALS method, will have a lower SNR than the previously obtained Raman spectra of amino acids used as a basis for the MCR-ALS analysis.
This study was designed to evaluate an effect of the noise in Raman basis spectra of amino acids and Raman spectra of a mixture on their reconstruction from the mixture.

Materials and Methods
The experimental Raman spectra used in our study are recorded using a portable spectroscopic setup which includes a thermally stabilized LML-785.0RB-04 laser diode module as an excitation source (785 ± 0.1 nm central wavelength, 200 mW laser power) and a QE 65 Pro spectrometer (OceanOptics, Inc., USA) with CCD detector operating at -15 °C [15].
We used 20 standard proteinogenic amino acids (see Table 1 and Fig. 1). Amino acids are presented in crystalline powder form. The amino acids in the crystal form were placed onto the metal-coated slide. All spectra were registered at the room temperature. The registration of spectra using this system was carried out in 800-1000 nm with 0.2 nm spectral resolution that corresponds to the 240-2236 cm -1 . The Raman signal of amino acids was acquired from 3 accumulations each of 5 sec integration time. The mean Raman spectrum for each amino acid was averaged over three registered spectra.
The preprocessing of the registered data includes only cosmic ray and dark noise removal that are automatically applied in the "Spectra Suite" software package (OceanOptics, Inc., USA) [16]. Examples of recorded and preprocessed spectra is shown in Fig. 1.
Then we have modeled amino acid mixtures. The "mixture" means the sum of 20 standard proteinogenic amino acids (see Table 1), taken in different quantities, that is, with different concentrations. The concentration of amino acids in the mixtures is chosen so that the mixtures correspond to real PFAA profiles of blood plasma samples studied by other researchers [3][4][5][6].
We define a Raman spectrum of an amino acid mixture as a mathematical sum of the products of the pure amino acid spectra of and its concentrations in the mixture:  where -are the concentrations of pure amino acids in the mixture, -are the pure amino acid spectra.
Using information on PFAA profiles in lung cancer patients [3][4][5][6], we have artificially modelled 10 Raman spectra: 5 spectra of lung cancer patient PFAA profiles and 5 spectra of control group PFAA profiles. Concentrations of amino acids in the mixtures are presented in Table 2.
The next step in our study was the simulation of a noise in the Raman spectra. It should be noted that the spectra recorded have a noise signal due to the spectroscopic setup. However, in this study, we investigate the effect of additively added noise. Therefore, the noise contained in the Raman spectra initially is not taken into account.
We added the noise as the random value process with hypothetically normal distribution, zero mean value and various standard deviations. In this case, the formula for a Raman spectrum looks like this: (2) where is an original Raman spectrum, is the a noise spectrum.
To simulate different noise levels, we evaluated an additive noise level from a signal-to-noise ratio (SNR) metric proposed in [15]: where is the Raman signal level (intensity of Raman peak in 980-1025 cm -1 band); is the noise standard deviation.
To investigate the effect of the noise, it is necessary to compare different combinations of a noisy mixture and noisy basis spectra. For this purpose, we simulated Raman spectra of pure amino acids with SNRs equal to 1, 5, and 10 and Raman spectra of mixtures with SNRs equal to 2, 3, 4, and 5. Examples of the noisy spectra are shown in Fig. 2.  For unmixing spectra by MCR-ALS analysis we used a protocol by Felten et al. [14]. The main idea of MCR-ALS is to decompose the Raman spectra matrix D into smaller matrices C and S T : (4) where C represents the concentration profiles for each of the amino acids, S T is the pure amino acid spectra matrix, and E is the error matrix.
As a basis (S T ), we used the noisy amino acid spectra of 20 standard proteinogenic amino acids (see Table 1) to which we added noise in the previous step. After initial estimation is given for C, it is optimized iteratively using an alternative least squares algorithm (ALS) until convergence is reached [14].

Results and Discussion
In this study, we investigated the following combinations of a mixture and basis spectra: no noise in mixture spectra, no noise in basis spectra; no noise in mixture spectra, SNR = 10 for basis spectra; no noise in mixture spectra, SNR = 5 for basis spectra; no noise in mixture spectra, SNR = 1 for basis spectra; -SNR = 10 for mixture spectra, no noise in basis spectra [17]; -SNR = 5 for mixture spectra, no noise in basis spectra [17]; -SNR = 1 for mixture spectra, no noise in basis spectra [17]; -SNR = 5 for mixture spectra, SNR = 5 for basis spectra; -SNR = 4 for mixture spectra, SNR = 5 for basis spectra; -SNR = 3 for mixture spectra, SNR = 5 for basis spectra; -SNR = 2 for mixture spectra, SNR = 5 for basis spectra. As a result of the MCR-ALS analysis, we have obtained a matrix of amino acid concentrations in the mixture spectra. Concentrations for each amino acid have been compared with the corresponding true values and the correlation coefficients have been calculated between the true concentration array and the obtained concentration array (see Tables 3,4,5).
Each concentration array corresponds to one of the amino acids and different mixtures (all the mixtures which we investigated). That is, each element of the array is a concentration of an amino acid in a mixture spectrum (a spectrum of one of the mixtures we used). The correlation coefficient indicates the degree of linear relation of the arrays and varies in the range between -1 and +1, where zero value corresponds to completely uncorrelated arrays.  Table 4 Correlation coefficients between obtained and true amino acid concentration values for the case of the noise added to mixture spectra only [17].  Table 3 shows the correlation coefficients for the case when the basis spectra are noisy with a different SNR, and the mixtures have no noise. For comparison, Table 4 shows the results that we obtained in the previous study, when we investigated the effect of noise in the mixture Raman spectra on the quality of unmixing spectra [17].
As one can see from Table 3, the correlation coefficient between true and reconstructed concentrations of amino acids equals 1 for the case without noise. In the case of noisy Raman spectra, the quality of reconstruction of amino acid concentrations is expected to decrease. Nevertheless, in the cases of SNR from 10 to 1, the correlation coefficients are high for all amino acids and ranges from 0.90 to 1.
Paying attention to the comparison of these results with those obtained in the previous study (see Table 4), one can see the following. While in the case of a noise in the basis spectra, amino acids were reconstructed successfully with any of the considered SNRs, in the case of a noise in the mixture spectrum we failed to reconstruct all amino acids when SNR is less than 10. So, when SNR = 5, three out of 20 amino acids are reconstructed with an accuracy of less than 90%, and in the case of SNR = 1 there are already 11 such amino acids.
The results of our previous study [17] show that the concentrations of amino acids are restored with lower correlation coefficients if their Raman spectra have no intense peaks. That is, the ratio between Raman peaks and background, apparently caused by fluorescence, is not high (compare the Raman spectra of cysteine, tyrosine, tryptophan with weak background fluorescence and the Raman spectra of glutamic acid, methionine, phenylalanine with strong background fluorescence in Fig. 1).
It can be concluded that a noise in the Raman spectrum of an amino acid mixture decreases the quality of reconstruction more than a noise in the basis, which is used in MCR-ALS. This may be due to the random nature of the noise we added to the spectra. It has zero mean value; therefore, during MCR-ALS analysis (see Eq. 4), the noise components compensate each other. Table 5 demonstrates the correlation coefficients of reconstruction of mixture Raman spectra with different SNR using noisy basis Raman spectra of amino acids (SNR = 5). As expected, at higher mixture SNR (SNR > 4) the correlation coefficients are in a range from 0.9 to 1 for almost all amino acids except cysteine and tyrosine, which are characterized by low Raman peaks in all spectral range (see Fig. 1). For noisier mixture Raman spectra with SNR = 3 and SNR = 2 MCR-ALS method failed to reconstruct 5 and 6 amino acids with high quality, respectively. It should be noted that one of the failed components is cysteine, which either is absent in mixtures analyzed or its concentration is low.
As in our previous study [17], the concentrations of leucine, serine, cysteine, tyrosine, tryptophan are restored with lower correlation coefficients because random noise overlaps their spectra (which have a low ratio between Raman peaks and background fluorescence).  Table 5 Correlation coefficients between obtained and true amino acid concentration values for the case of the noise added to both basis spectra (SNR = 5 for all the cases) and mixture spectra. It should be noted that the correlation coefficients for the cases of the basis spectra with noise and without noise are comparable. Thus, the approach proposed makes it possible to estimate amino acid concentrations from noisy Raman spectra of the mixture using basis spectra with noise, and this can be done almost as efficiently as if using pure basis spectra without noise.
Nevertheless, the presence of fluorescence in the spectra is not the only problem. The form of the spectrum, that is, the presence of Raman peaks in certain wavelength ranges, has a strong influence on the result of the decomposition of the mixture spectrum. For example, in the cases of SNR = 1 and SNR = 2, one can see extremely low negative correlation values for leucine (Leu) and methionine (Met) (see Tables 4 and 5). It can be explained by the fact that the Raman spectra of these amino acids do not contain peaks that are unique to only these amino acids; moreover, the spectra are noisy. Thus, the MCR method is not efficient enough in this case.
Having studied the effect of noise in Raman spectra on the reconstruction of the concentration of amino acids in the artificially modelled mixtures, we should check the obtained relations for real mixtures, when the noise is of different nature. It is very clear that the real mixture samples (i.e., biological tissue samples) may contain other components such as lipids and proteins. In this case, the resolved spectra of the components may have some background contribution incorporated, a final estimation of component concentrations should be performed by fixing the pure spectra of the amino acids and leaving free the background contribution, which requires the correction of the protocol settings. In addition, the efficiency of reconstruction may be affected by removing background fluorescence and normalization of the basis spectra in order to equalize their intensities. Therefore, our future research will be devoted to experiments on real mixtures of amino acids and study of the influence of basis spectra preprocessing.

Disclosures
All authors declare that there is no conflict of interests in this paper.