Label-free diagnosis of lung cancer by Fourier transform infrared microspectroscopy coupled with domain adversarial learning

Yudong Tian a, Xiangyu Zhaoa, Jingzhu Shaoa, Bingsen Xuea, Lianting Huanga, Yani Kanga, Hanyue Lib, Gang Liub, Haitang Yang*b and Chongzhao Wu*a
aCenter for Biophotonics, Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China. E-mail: czwu@sjtu.edu.cn
bDepartment of Thoracic Surgery, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China. E-mail: haitang.yang@shsmu.edu.cn

Received 25th February 2025 , Accepted 20th May 2025

First published on 28th May 2025


Abstract

Lung cancer is one of the most prevalent malignancies, characterized by high morbidity and mortality rates. Current diagnostic approaches primarily rely on CT imaging and histopathological evaluations, which are time-consuming, heavily dependent on pathologists’ expertise, and prone to misdiagnosis. Fourier transform infrared (FTIR) microspectroscopy is a promising label-free technique that can offer insights into morphological and molecular pathological alterations in biological tissues. Here, we present a novel FTIR microspectroscopy method enhanced by a deep learning model for differentiating lung cancer tissues, which serves as a crucial adjunct to clinical diagnosis. We propose an infrared spectral domain adversarial neural network (IRS-DANN), which employs a domain adversarial learning mechanism to mitigate the impact of inter-patient variability, thereby enabling the accurate discrimination of lung cancer tissues. This method demonstrates superior classification performance on a real clinical FTIR dataset, even with limited training samples. Additionally, we visualize and elucidate the FTIR fingerprint peaks, which are linked to the corresponding biological components and crucial for lung cancer differentiation. These findings highlight the great potential of incorporating FTIR microspectroscopy with the deep learning model as a valuable tool for the diagnosis and pathological studies of lung cancer.


1. Introduction

Lung cancer is one of many malignant tumors and it currently has the highest rates of morbidity and mortality in the world.1 According to the World Health Organization,2,3 lung cancer remains the leading cause of cancer deaths, with about 2.5 million new lung cancer cases in 2022, resulting in 1.8 million deaths and creating a huge public healthcare burden. Lung cancer originates from the epithelial cells of the respiratory tract and is histologically categorized into two main groups: non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC). Among all lung cancer patients, 85% are diagnosed with NSCLC.4 Factors increasing the risk of lung cancer can be categorized into environmental factors (such as smoking, radiation exposure, etc.) and genetic factors.1

At present, early diagnosis and screening of lung cancer mainly rely on computed tomography (CT), chest X-ray, and so on.5,6 However, conventional imaging techniques, such as CT, pose radiation risks to patients, and the accuracy of diagnoses can be influenced by the expertise of clinicians, leading to potential misdiagnoses.7,8 Microscopic examination of hematoxylin and eosin (H&E)-stained tissue samples is currently the gold standard for the pathological diagnosis of lung cancer.9 However, the sample preparation and slide processing of H&E staining are time-consuming, often resulting in delays in obtaining pathological results. In addition, the results of H&E staining examination are heavily dependent on the subjective judgment of clinicians, and discrepancies among doctors may seriously affect the final diagnosis, ultimately affecting clinical intervention.10–12 Therefore, developing a more rapid, unbiased method with high sensitivity and specificity for clinical pathological diagnosis of lung cancer is of great significance.

One promising diagnostic method is Fourier transform infrared (FTIR) microspectroscopy due to its powerful and efficient biochemical characterization ability. Compared with traditional tissue examination using H&E staining, FTIR microspectroscopy can be used to quickly obtain rich biochemical information about the sample by means of absorption spectra without any labeling, providing reliable diagnostic results and rapid turnover capabilities.13,14 The basic principle of FTIR microspectroscopy is that when infrared (IR) light passes through the sample, different functional groups in the sample register different degrees of vibration due to their absorption of the IR radiation, and the biochemical characteristics of the sample can then be reflected according to the absorption curves that are recorded. As a result, FTIR microspectroscopy can accurately reflect structural and chemical changes at the molecular level and thus can be used to study molecular pathological alterations of biological samples (such as proteins, lipids, DNA, etc.).15–19 FTIR microspectroscopy has been widely used in the past few decades as a metabolic tool in the clinical diagnosis of different diseases.20–25 Recent reports have demonstrated the feasibility of FTIR in differentiating normal tissues and tumors in various organs including the stomach,26 skin,27,28 breast,29,30 cervix,31 ovaries,32 prostate,33 lymph nodes,34 and so on. Furthermore, its possible applications in the detection of lung cancer by using sputum,35 serum,36,37 and other specimen types17,38,39 have also been demonstrated.

With the growing applications of machine learning (ML) techniques in spectral analysis, traditional machine learning algorithms represented by support vector machine (SVM), k-nearest neighbour (KNN), and random forest (RF), as well as various deep learning models have attained great success in spectral data processing and analyzing tasks such as imaging, classification, recognition, prediction and so on.38,40–48 However, the FTIR spectra from different patients have distinct individual characteristics, and this inter-individual variability may mislead the model during training, especially when the number of training samples is small, which may result in severe overfitting of the model and make it difficult to identify the intrinsic cancer information in the FTIR spectra. This challenge leads to the fact that the models trained using data from specific patients usually do not generalize well to data from other patients.

To address the above challenges, one promising method is domain adaptation (DA) learning,49 which aims to eliminate the variations in feature distribution between different domains and learn common features for data classification. Recently, domain adaptation learning approaches have been widely used in a variety of transfer learning tasks, such as image classification, emotion recognition, electroencephalography (EEG) signal classification, and so on.50–53 Nevertheless, methods of domain adaptation learning have rarely been reported in spectroscopic studies.

In this paper, we introduce a new type of deep learning model called the infrared spectral domain adversarial neural network (IRS-DANN) to address the challenge of FTIR spectral classification across different patients (Fig. 1). The spectral data encompass two types of domain information: the source domain, which reflects individual patient characteristics, and the target domain, which contains cancer-specific information. Our approach aims to minimize the influence of individual patient variability while preserving the ability of the model to distinguish between spectra from cancerous samples through a domain adversarial learning strategy. This strategy allows us to focus on extracting common cancer-relevant features from diverse source domains, significantly enhancing the reliability and generalization of the model.54 We also extract and visualize the cancer-related FTIR spectral features and identify the corresponding biomolecules by Grad-weighted class activation mapping (Grad-CAM). To the best of our knowledge, this study is the first to systematically mine FTIR data utilizing the domain adversarial neural network. This work successfully integrates FTIR microspectroscopy and the domain adversarial neural network model as complementary tools to fulfil routine pathological examinations, facilitating accurate diagnosis of lung cancer and offering insights into understanding the biochemical and metabolic basis of lung cancer. Compared to existing methods, the IRS-DANN model achieves state-of-the-art results on a real FTIR dataset, providing a promising approach for improving the diagnostic accuracy of lung cancer and other diseases in the future.


image file: d5an00216h-f1.tif
Fig. 1 The overall flowchart of this work. (a) Experimental acquisition and pre-processing of FTIR spectral data. (b) Model training and evaluation. (c) Identifying biochemical signatures by Grad-weighted class activation mapping (Grad-CAM) analysis.

2. Methods

2.1 Preparation of samples

A total of 15 patients were recruited for this experiment, including 13 patients diagnosed with malignant lung adenocarcinoma and 2 patients diagnosed with chronic granulomatous inflammation. We collected 18 formalin-fixed paraffin-embedded (FFPE) tissue blocks from these patients, which included 6 benign tissue blocks and 12 malignant lung adenocarcinoma tissue blocks. From three of these patients, both malignant adenocarcinoma samples and benign tissue samples were obtained. All samples were collected at the Shanghai Chest Hospital and all experimental procedures were approved by the Ethics Committee of the Shanghai Chest Hospital [number KS(Y)22139].

The processed FFPE tissue blocks were then sectioned using a slicer, and two adjacent tissue sections with a thickness of 4 μm were obtained on each block during the sectioning process. One of the slices was fixed on a CaF2 substrate for FTIR spectroscopy imaging, and the other was fixed on a glass substrate for dewaxing and hematoxylin and eosin (H&E) staining. The H&E staining results were used to accurately locate cancer tissue in order to record FTIR spectra of cancerous samples.

2.2 Experimental measurement of spectra

FTIR spectral data were recorded using a Bruker Vertex 80 Fourier transform infrared spectrometer in transmission mode coupled to a microscope (Hyperion 2000/3000) equipped with a liquid nitrogen-cooled MCT point detector. The processing of data from FTIR was carried out using OPUS 6.5 software, where the apodization function was Blackman-Harris 3-Term and the zero fill factor was set to 2. Background spectra were acquired before scanning each sample. To acquire FTIR mapping of target regions, the FTIR spectra were recorded using the built-in Globar source with a 15× objective. Point-by-point sampling with a step size of 100 μm in both the x and y directions was performed on the defined region and the spectrum for each sampling point was obtained with a 100 × 100 μm2 aperture. The measured spectral wavelength range was between 4000 cm−1 and 600 cm−1 with a spectral resolution of 8 cm−1. We first defined the rectangular region of interest (the region covering the main tissues to be measured), then performed experimental acquisition of the spectra in the region of interest, and finally obtained the data as three-dimensional spectral cubes. During the microspectroscopy analysis of each sampling pixel, 8 repetitions of the spectral test were performed, and the final absorption spectrum was obtained by averaging the 8 measured spectra. Meanwhile, in this study, we scanned the entire section for benign samples to acquire spectral data. However, for sections with malignant cells, we selected only the areas with rich cancer cells for FTIR scanning based on the results of the corresponding H&E staining (see Fig. 3a).

2.3 Data pre-processing

The FTIR spectral data were pre-processed using OPUS 6.5 software (Bruker Optics, Germany) and a program written in Python 3.11.7. First, the water vapor contribution was removed from all raw spectra, and the CO2 peak near 2400 cm−1 was flattened before further analysis. Spectra in the range of 4000–980 cm−1 were applied to subsequent processes. All FTIR spectra were then baseline-corrected using rubber band correction with 64 baseline points and then digitally deparaffinized. Digital deparaffinization was performed using an extended multiplicative scattering signal correction (EMSC)-based method to correct the spectral baseline and remove the contribution of background and paraffin spectra as well as other contaminant spectral interferences. Detailed digital dewaxing procedures are described in the ESI. Spectra were then vector-normalized. Furthermore, first-order derivative spectra were calculated to extract subtle characteristic information of the sample and improve the sensitivity and precision of spectral analysis.

2.4 Architecture of the IRS-DANN model

The proposed IRS-DANN model mainly consists of three components: feature extractor (encoder), label classifier, and domain discriminator. These three modules are trained jointly in an adversarial way. The encoder is responsible for performing dimensionality reduction as well as extracting the cancer-relevant features from the raw FTIR spectral signals. The domain discriminator serves primarily to determine which patient the input spectrum comes from and is connected to the encoder module through a gradient reversal layer (GRL). The label classifier classifies the target FTIR spectrum and directs the encoder to extract cancer-related features. During the training process, the domain discriminator and label classifier can be optimized adversarially through the GRL module, supervising the model and preventing it from classifying based on domain knowledge. This reduces the impact of individual characteristics of different patients on model prediction. During the inferencing stage, only the encoder and label classifier are utilized to acquire the target label, while the domain discriminator and GRL module will be discarded. The structure and training flow of the model are shown in Fig. 2.
image file: d5an00216h-f2.tif
Fig. 2 The infrared spectral domain adversarial neural network (IRS-DANN) model consists of three modules. The structure of IRS-DANN includes an encoder, a label classifier, and a domain discriminator. The encoder, which is used to extract spectral features, can be built with different backbone structures (CNN, LSTM, or transformer). Both the label predictor and the domain discriminator are multilayer perceptron (MLP) models, where the former is used for discrimination of the spectral class and the latter is used to determine the spectral source. They can help the encoder reduce the disturbance between different individuals and learn the correct cancer-related characteristics.

First, a feature encoder Ge(θe) is used to extract spectral features from the input spectral batches X = [x1, x2, …, xN], XRN×1×L, where N is the batch size and L is the length of a single spectrum, while θe represents the parameters of the encoder. The encoder can map the input spectral batch X into the feature vectors f = {f1, f2, …, fN}, fiR1×F:

 
f = Ge(θe, X), fRN×F (1)

The feature vectors are then inputted into two classifiers composed of multilayer perceptron (MLP) to obtain the category prediction labels ŶcRN×2 and the domain prediction labels ŶdRN×H, respectively, where H is the number of patients in the training cohort. Specifically, the former deals with a binary classification task with the goal of determining whether the input spectrum is from cancerous tissue or not, and the latter focuses on a multi-classification task and tries to determine which patient the input spectrum is from.

In this paper, the label classifier can be expressed as Ml(θl) and the domain discriminator can be expressed as Md(θd), where θl and θd represent the parameters of the label classifier and domain discriminator. Thus, the final loss function comes from two parts, which can be designed as:

 
Lt(X; θe, θl, θd) = Ll(Ml(f; θl)) − α·Ld(Md(f; θd)) (2)
where Ll and Ld are the losses of the label classifier and domain discriminator, respectively, and α is a hyperparameter used to control the ratio of domain classification loss to total loss. In this study, the loss function of the label classifier Ll is designed as a binary focal loss:
 
image file: d5an00216h-t1.tif(3)
where yic represents the target label of input data xi, and pi = Ml(fi; θl) denotes the predicted value of the ith spectrum in the batch by the label classifier, while α1, α2 and γ are hyperparameters used to balance the samples.

The loss function of the domain discriminator Ld can be denoted with a categorical cross-entropy function:

 
image file: d5an00216h-t2.tif(4)
where c is the number of classes, yic is a sign function that takes 1 if the true category of sample i is equal to c and 0 otherwise, and pic denotes the predicted probability that observation sample i belongs to category c.

2.5 Experimental settings

To make the model more suitable for actual situations, we partition the training and test datasets on a per-patient basis. The test dataset consists of data on five patients with malignant lung cancer, and all of the remaining data are used as the training dataset. Traditional machine learning methods, such as support vector machine (SVM) and random forest (RF), and neural network-based methods, such as convolution neural network (CNN), long short term memory (LSTM), and transformer, are utilized for comparison. Meanwhile, to assess the influence of different feature extraction architectures on the model's classification performance, we incorporated CNN, LSTM, and transformer as backbone structures within the IRS-DANN framework. This comparative analysis allowed us to evaluate the effectiveness of each architecture in enhancing the model's ability to classify lung cancer using FTIR spectral data. The data pre-processing and data input methods are the same for all of the models. In the experiment, the main difference between the IRS-DANN-CNN (transformer, LSTM) and the CNN (transformer, LSTM) models is that the former has a domain discriminator module. They have the same encoder module and label classifier. The label classifier and the domain discriminator are both three-layer multilayer perceptual (MLP) models, with 100 nodes in the intermediate hidden layer. A ReLU function is added between the hidden layer and the output layer to introduce nonlinearity. Meanwhile, a LayerNorm layer is introduced to stabilize the output of the hidden layer and accelerate convergence. The output of the output layer is mapped to a probability distribution vector through a SoftMax function. The parameters of the SVM and RF models are optimized by a stochastic search algorithm. The initial learning rate λ of the IRS-DANN model is set to 0.001 and attenuated by a cosine annealing strategy, the hyperparameter α is set to 0.01, and the maximum training epoch is set to 100. In addition, an early stopping strategy is implemented to mitigate the risk of severe overfitting. All deep learning models are built on the Pytorch 1.12.1 framework and are deployed on a server with an Intel i9 CPU and NVIDIA GeForce RTX 3060 Ti. On average, half an hour is required to complete the CNN and LSTM model training, and 4 hours are required for transformer model training.

2.6 Model analysis by gradient-weighted class activation mapping

Visualization and interpretability of black box models have always been a major challenge in the field of deep learning, and interpretability analysis of deep learning models is especially important in the medical field. Through interpretability analysis, we can know how the model makes classification decisions, which helps us understand the mechanism of the model and the regions that the model focuses on. At the same time, the results of model interpretability analysis can further help us find out the biochemical differences between cancerous and normal lung tissue. This advantage could lead to a better understanding of the underlying mechanisms involved in the development and progression of lung cancer and ultimately help to identify potential lung cancer biomarkers. Therefore, the interpretability analysis of the deep learning model can help to not only further verify the effectiveness of the model but also better understand the biological mechanism of complex diseases, ensuring the deep learning model is more in line with the application requirements of the medical field.

Gradient-weighted class activation mapping (Grad-CAM)55 is a widely applied method for interpreting the decisions of convolutional neural networks, in which the basic idea is to compute score maps using the global average of class-specific backpropagated gradients as the weights of the feature maps. It provides insight by visualizing the model's regions of interest for a given input. With the Grad-CAM, we can obtain a clear picture of the highly cancer-relevant spectral regions (spectral region of interest), which helps exploit the inherent object localization ability of the model. The details of CAM and Grad-CAM are further illustrated in the ESI.

The relationships between the main FTIR absorption peaks and the corresponding vibration models and constituents are listed in Table 1, which provides a comprehensive explanation of the FTIR spectral bands of the biological samples.

Table 1 Biomolecular peak assignments from 4000 to 980 cm−1 (ref. 56–58)
Wavenumber (cm−1) Molecular bonds Components
3400–3300 OH stretching  
3300–3020 NH stretching Amide A and amide B
3050–3000 [double bond, length as m-dash]C–H stretching Unsaturated fatty acids
2972–2845 C–H asymmetric stretching Methyl group and methylene group
1660 C[double bond, length as m-dash]O stretching Proteins (amide I)
1660–1600 C[double bond, length as m-dash]C stretching Unsaturated fatty acids
1700–1540 CN stretching, NH bending and C[double bond, length as m-dash]O stretching Mainly from proteins (amide I/II)
1540 N–H bending and C–H stretching Proteins (amide II)
1520 C–C stretching Tyrosine ring
1420–1325 CH3 symmetric bending  
1473, 1462, 1373 CH2 and CH3 bending Paraffin
1230–1200 Asymmetric PO2 stretching Nucleic acids
1170 C–OH stretching CO–O–C asymmetric stretching Serine, threonine, and tyrosine cholesterol esters
1090–1080 Symmetric PO2 stretching Nucleic acids and phospholipids
1048 C–O–P, C–O–C stretching Lipids, ribose


2.7 Feature projection using t-distributed stochastic neighbor embedding

T-distributed stochastic neighbor embedding (t-SNE)59 is a nonlinear statistical method through which we can transform high-dimensional spectral features into a two-dimensional space. In the process of dimensionality reduction, t-SNE can bring similar features close together while separating dissimilar features, allowing us to intuitively observe the distribution of the data. In this work, we use the t-SNE method to visualize the original spectral features as well as the output features of the models, thereby intuitively demonstrating the impact of individual patient differences and the potential of domain adversarial learning in eliminating such differences. Further explanation of the principles of t-SNE can be found in section 5 of the ESI.

2.8 Evaluation

We evaluate the proposed models using equations for the accuracy, precision, recall, F1 score and area under the curve (AUC), which are defined as follows:
 
image file: d5an00216h-t3.tif(5)
 
image file: d5an00216h-t4.tif(6)
 
image file: d5an00216h-t5.tif(7)
 
image file: d5an00216h-t6.tif(8)
where TP denotes the number of cancerous spectra correctly classified by the model, FP denotes the number of benign spectra incorrectly classified by the model, TN denotes the number of benign spectra correctly classified by the model, and FN denotes the number of cancerous spectra incorrectly classified by the model. The F1 score is a weighted average of the recall and precision values and is often used to evaluate the performance of a model based on unbalanced data. The definition of the AUC is the area under the receiver operating characteristic (ROC) curve.60 It is based on the entire ROC curve and does not depend on specific classification thresholds, thus providing a comprehensive performance evaluation for different classification threshold choices.

3. Results

3.1 FTIR dataset analysis

We obtain a fingerprint of the biochemical and metabolic components for the collected tissues by FTIR microspectroscopy. H&E staining serves as the gold-standard histopathological reference for identifying malignant tumor regions (Fig. 3a). 26[thin space (1/6-em)]474 FTIR spectra from 18 FFPE tissue samples are collected in total. The cohort, consisting of 7 patients with malignant adenocarcinoma samples (16[thin space (1/6-em)]259 spectra) and 2 patients with benign tissue samples (2912 spectra), serves as the training set for the IRS-DANN model, while the test cohort, consisting of 3 patients with malignant adenocarcinoma samples (2638 spectra) and benign tissue samples (2051 spectra) as well as 2 patients with malignant adenocarcinoma samples (1114 spectra) and 1 patient with a benign tissue sample (1500 spectra), serves as the validation set of the IRS-DANN model. Further details of the FTIR dataset are shown in Table S1.
image file: d5an00216h-f3.tif
Fig. 3 FTIR microspectroscopy analysis of malignant tumor and benign tissue samples. (a) Standard hematoxylin and eosin (H&E)-stained sections under an optical microscope for benign tissue (left panels) and malignant tumor (right panels). Scale bar, 1 mm for the original image and 200 μm for the enlarged image. Original magnification ×40. (b) Normalized mean FTIR spectra (mean value ± standard deviation) after baseline correction and digital deparaffinizing. The blue line represents the mean spectra of benign tissue samples (6 tissue blocks and 6463 spectra) and the red line represents the mean spectra of malignant tumor tissue samples (12 tissue blocks and 20[thin space (1/6-em)]011 spectra). The inset shows the zoomed-in absorbance spectra in the wavenumber range from 1758 to 1064 cm−1. The bottom figure exhibits the differences in absorbance intensity between the average FTIR spectra of benign tissues and those of malignant tumor tissues. (c) 2D t-SNE visualizations of the raw FTIR spectra.

Before training the model, we first focus on the prominent FTIR peaks with different intensities and assignments of biological constituents between the two groups after gross visual inspection (Fig. 3b). This shows that peaks at 3300, 3000–2950, 2900–2800, 1660, 1540, 1400, 1225, and 1170–1040 cm−1 significantly increase in the malignant tumor group compared with those in the benign group. Meanwhile, the peaks at 1600, 1350, and 1250 cm−1 decrease in the malignant group. The intensity of FTIR peaks associated with nucleotide-related metabolites is higher in the malignant tumor group than in the benign group. Meanwhile, the absorption of the malignant tissue arises from amide I (1660 cm−1), amide II (1540 cm−1), and some amino acids (1048 cm−1). This may be related to the altered protein expression in cancer cells. Besides, the intensity of the different peaks associated with lipids also shows different patterns between the two groups (3000–2900, 1600, 1350, 1250, 1170, and 1048 cm−1). Furthermore, the absorption band at 3300 cm−1 related to the OH stretching of water is higher in the malignant tumor group, which may be related to the metabolic differences between the malignant and benign tissue.

Moreover, we visualize the distribution of collected FTIR spectra using t-SNE, as shown in Fig. 3c. The raw spectral data present a strong tendency to be distributed according to the source domain, which may lead to overfitting of the model during subsequent training, reducing its generalization ability. Therefore, to improve the performance of the model further, we need to eliminate this influence from the source domain in the training.

3.2 Training results

The training convergence plots are shown in Fig. 4, which displays the model F1 score plots (on the left) and the loss of label classifier plots (on the right) for each IRS-DANN model and epoch during training. The blue curve representing the training set exhibits a generally monotonic upward or downward trend, while the red curve representing the validation set seems to fluctuate greatly, which may be attributed to the relatively small sample size. However, by establishing checkpoints during the training process, we are able to monitor specific metrics and identify the most performant epoch. Therefore, we can only keep the model weights with the highest validation F1 scores for subsequent analysis, resulting in improved test-set accuracy.
image file: d5an00216h-f4.tif
Fig. 4 F1 score and mean loss of label classifier after each epoch of training during the training procedure. Each row from top to bottom represents the IRS-DANN-CNN model (a, b), the IRS-DANN-LSTM model (c, d), and the IRS-DANN-transformer model (e, f), respectively. The blue color represents the training dataset, and the red color represents the test dataset.

Moreover, based solely on the training convergence curves, the IRS-DANN-LSTM model exhibits a smoother convergence trajectory compared to the other two models, demonstrating the advantages of the LSTM model in long spectral sequence tasks.

3.3 Model performance on the test dataset

The IRS-DANN models demonstrate state-of-the-art classification F1 score results on the test dataset. Among them, the IRS-DANN-CNN model has the most significant improvement compared to the baseline model, with the F1 score increasing by about 8%, while the F1 scores of the IRS-DANN-LSTM and IRS-DANN-transformer models increase by about 4%. Table 2 shows the detailed results.
Table 2 Comparison of classification performance among IRS-DANN and other methods on the test dataset
Methods Accuracy Precision Recall F1 score AUC (cancer)
RF 0.5136 0.2568 0.4998 0.3393 0.5076
SVM 0.5138 0.2569 0.5000 0.3393 0.5421
CNN 0.8744 0.8794 0.8760 0.8742 0.9667
LSTM 0.9218 0.9301 0.9238 0.9206 0.9609
Transformer 0.8886 0.9068 0.8916 0.8878 0.9463
IRS-DANN-CNN 0.9506 0.9561 0.9492 0.9503 0.9996
IRS-DANN-LSTM 0.9576 0.9597 0.9587 0.9575 0.9886
IRS-DANN-transformer 0.9263 0.9305 0.9278 0.9262 0.9702


We further evaluate the proposed IRS-DANN model using ROC-AUC and confusion matrix calculations. Fig. 5 illustrates the comprehensive results. The IRS-DANN-CNN model achieves the highest ROC-AUC value, reaching 0.9996, with most deep learning models achieving ROC-AUC values that exceed 0.95. From the confusion matrix, it is evident that all the IRS-DANN models exhibit well-balanced discriminative capabilities between the two sample classes.


image file: d5an00216h-f5.tif
Fig. 5 The percentage confusion matrixes and ROC curves of the IRS-DANN models on the test dataset. Each row from top to bottom represents the IRS-DANN-CNN model (a, b), the IRS-DANN-LSTM model (c, d), and the IRS-DANN-transformer model (e, f), respectively.

3.4 T-SNE projection

To demonstrate more intuitively the effectiveness of domain adversarial learning, we use t-SNE projection to visualize the output features of the IRS-DANN model's encoder (Fig. 6). It is evident that, in comparison to the raw FTIR spectra (Fig. 3c) and the output features of the model trained without adversarial learning (Fig. S6), the features of the IRS-DANN model are capable of forming distinguishable clusters according to different categories, demonstrating that the encoder of the IRS-DANN model can effectively mitigate the source domain characteristics of the spectra while persevering the target domain's information.
image file: d5an00216h-f6.tif
Fig. 6 2D visualization of the output features of the IRS-DANN model's encoder using t-SNE. Scatter plots are labeled for different categories (left panels) and patients (right panels). Each row from top to bottom represents the IRS-DANN-CNN model (a), the IRS-DANN-LSTM model (b), and the IRS-DANN-transformer model (c), respectively.

3.5 Spectral signatures of lung cancer

To facilitate the understanding of the FTIR spectral signatures of lung cancer, a Grad-CAM-based interpretation method is adopted to gain insights into model predictions. Each single FTIR spectrum was fed into the pre-trained IRS-DANN-CNN model to calculate the corresponding CAM score. And in this study, we used the mean CAM of the spectra from the same tissue block to draw the heat map for the accuracy of the results. The contribution of FTIR spectral bands to classification is shown in Fig. 7. In each figure part, the horizontal axis represents the wavenumber, and the vertical axis represents the absorption intensity of the normalized average spectra. The CAM value corresponding to each wavenumber is presented in the form of a heat map. The darker color represents a higher CAM score, indicating that the wavenumber absorption peak at the corresponding position has a more significant contribution to the classification result, while the lighter color represents a lower contribution to the classification result. For the FTIR spectra from malignant adenocarcinoma samples, high CAM values are mainly concentrated at 3100 cm−1, 3050–2800 cm−1, 1700–1500 cm−1, and 1200–1000 cm−1. For FTIR spectra from benign tissue samples, the high CAM values are mainly at 2970–2800 cm−1, 1400–1300 cm−1, and 1100 cm−1.
image file: d5an00216h-f7.tif
Fig. 7 FTIR spectral regions of interest informed by gradient-weighted class activation maps, with cancerous FTIR spectra on the left and benign FTIR spectra on the right. (a)–(c) Grad-CAM of cancerous spectra; (d)–(f) Grad-CAM of benign spectra.

From Fig. 7 and Table 1, we can find the FTIR absorbance values and their corresponding biological components. The model showed high activation values in the range of 1200–1000 cm−1, which are highly correlated with the nucleic acid components, probably due to DNA damage and genetic changes of cancer cells and their rapid proliferation. At the same time, the FTIR absorption peak at 1170 cm−1 may be related to the change in the cholesterol content. The increase in the cholesterol level has been reported to be a sign of cancer cell proliferation and tumor progression.61 The FTIR absorption peaks in the range of 1700 to 1500 cm−1 and at 1170 cm−1 are also associated with specific signals related to amino acids such as tyrosine, threonine, and glycine, among others, as well as total protein content. FTIR absorption peaks in the range of 1720 to 1600 cm−1 are associated with pyrimidine bases. FTIR absorption peaks in the range of 1660 to 1600 cm−1 are associated with unsaturated fatty acids, which may be related to the requirement of excess lipids for tumor cell growth, proliferation, invasion, and metastasis.62 FTIR absorption peaks in the range of 3000 to 2800 cm−1 are mainly correlated with the methyl group, methylene group, and unsaturated lipids, suggesting that lipid metabolism may play an important role in lung adenocarcinoma progression. The FTIR absorption peak near 3100 cm−1 is probably derived from OH bond stretching and is correlated with the sample's water content, which may be due to the vigorous physiological metabolic activity of cancerous cells. These FTIR absorption peaks and corresponding compositional changes may be underlying features of lung adenocarcinoma. These results can be further verified by pathological examination and may contribute to future pathological studies of lung cancer.

4. Discussion

This study proposed a novel framework called IRS-DANN for lung cancer diagnosis based on FTIR microspectroscopy. To the best of our knowledge, it is the first time that adversarial learning has been introduced into the classification of pathological infrared spectroscopy, and the IRS-DANN model has demonstrated the best classification performance on real clinical data.

Typically, the patient-to-patient variance of the FTIR spectral signals may be large because the FTIR spectra provide corresponding biochemical information directly from tissue sections, which can vary greatly among patients. Furthermore, this variance may mislead the classifier, resulting in weak model generalization ability. Moreover, the limited number of patients, which is an inherent feature of datasets in the medical field, further exacerbates the risk of overfitting. It is difficult to directly mobilize the existing machine learning or deep learning methods to analyze FTIR datasets. Inspired by the idea of adversarial learning, our proposed IRS-DANN approach aims to enhance the generalization ability of the model by removing information on the individual from the spectral data. The proposed model is mainly composed of three parts: encoder, domain discriminator, and label classifier. The encoder is responsible for spectral dimensionality reduction and feature extraction. The domain discriminator guides the encoder to eliminate information on the individual, and the label classifier completes the final classification on this basis.

In the FTIR spectral classification task for lung cancer, the IRS-DANN model demonstrates superior performance (F1 score higher than 0.95 and ROC-AUC value higher than 0.99), significantly outperforming other baseline methods. In addition, the t-SNE projections of the model encoder's output features show a tendency to cluster by categories rather than by individuals, further demonstrating the effectiveness of the adversarial learning strategy in removing information on the individual. Finally, we employ the Grad-CAM method to provide a visual interpretation of the IRS-DANN-CNN model, identifying the FTIR absorption peaks that play a significant role in cancer classification and determining the corresponding biochemical components. These results collectively demonstrate the potential of integrating FTIR microspectroscopy with deep learning techniques for cancer diagnosis and pathological analysis.

This research offers a promising methodology for the diagnosis and research of clinical cancer. Through adversarial learning, the encoder is capable of learning cancer-related features from the spectra while simultaneously eliminating interference of information on the individual, which significantly enhances the model's performance and generalizability. However, it is important to note that the current study is still limited to a small number of patients and focuses solely on lung cancer. Future research should examine the performance of the IRS-DANN model on larger sample sizes and a broader range of cancer types.

5. Conclusion

In this paper, we introduce the IRS-DANN model, a deep learning framework based on domain adversarial learning, for lung cancer detection utilizing experimentally acquired FTIR spectral data. The IRS-DANN model employs an adversarial learning strategy that utilizes a domain discriminator to guide the encoder in extracting individual-independent information, alongside a label classifier to identify features that are critical for classifying cancerous spectra. This approach enables our model to effectively isolate cancer-relevant features while minimizing the influence of individual patient variance, thereby improving the model's classification performance for clinical FTIR spectral data. Additionally, we analyze FTIR absorption peaks and their corresponding biomarkers using Grad-CAM, shedding light on their roles in cancer differentiation. These findings have significant implications for FTIR signal processing and the clinical management of lung cancer.

Ethical statement

All experiments were performed in compliance with relevant laws and relevant institutional guidelines, and approved by the Ethics Committee of the Shanghai Chest Hospital [ethics approval number: KS(Y)22139]. Informed consent was obtained from human participants of this study as per the institute-approved standard protocol.

Author contributions

Yudong Tian: conceptualization, data curation, formal analysis, methodology, software, visualization, validation, and writing – original draft. Xiangyu Zhao: conceptualization, data curation, formal analysis, methodology, software, validation, and writing – review & editing. Jingzhu Shao: writing – review & editing. Bingsen Xue: methodology. Lianting Huang: writing – review & editing. Yani Kang: data curation. Hanyue Li: data curation. Gang Liu: data curation. Haitang Yang: data curation, writing – review & editing. Chongzhao Wu: conceptualization, resources, project administration, funding acquisition, supervision, and writing – review & editing.

Data availability

The data supporting this article are available on GitHub at https://github.com/Atrf/FTIR, and further inquiries can be directed to the corresponding author.

Conflicts of interest

There are no conflicts to declare.

Acknowledgements

This research was sponsored by the National Natural Science Foundation of China (62375170 and 62105201), the Medical Engineering Cross-research Fund of the Shanghai Jiao Tong University “Star of Jiao Tong University” program (24X010301595 and YG2024QNA51) and the Science and Technology Commission of Shanghai Municipality (20DZ2220400).

References

  1. R. L. Siegel, A. N. Giaquinto and A. Jemal, CA Cancer J. Clin., 2024, 74, 12–49 CrossRef PubMed .
  2. F. Bray, M. Laversanne, H. Sung, J. Ferlay, R. L. Siegel, I. Soerjomataram and A. Jemal, CA Cancer J. Clin., 2024, 74, 229–263 CrossRef PubMed .
  3. Global cancer burden growing, amidst mounting need for services, https://www.who.int/news/item/01-02-2024-global-cancer-burden-growing–amidst-mounting-need-for-services, (accessed August 20, 2024).
  4. W. D. Travis, E. Brambilla, A. G. Nicholson, Y. Yatabe, J. H. M. Austin, M. B. Beasley, L. R. Chirieac, S. Dacic, E. Duhig, D. B. Flieder, K. Geisinger, F. R. Hirsch, Y. Ishikawa, K. M. Kerr, M. Noguchi, G. Pelosi, C. A. Powell, M. S. Tsao, I. Wistuba and WHO Panel, J. Thorac. Oncol., 2015, 10, 1243–1260 CrossRef PubMed .
  5. S. C. Park, J. Tan, X. Wang, D. Lederman, J. K. Leader, S. H. Kim and B. Zheng, Phys. Med. Biol., 2011, 56, 1139 CrossRef PubMed .
  6. A. Khalil, M. Majlath, V. Gounant, A. Hess, J. P. Laissy and M. P. Debray, Diagn. Interv. Imaging, 2016, 97, 991–1002 CrossRef CAS PubMed .
  7. S. C. Van't Westeinde and R. J. Van Klaveren, Cancer J., 2011, 17, 3–10 CrossRef PubMed .
  8. A. Ramaswamy, Curr. Pulmonol. Rep., 2022, 11, 15–28 CrossRef PubMed .
  9. M. Wang, F. Tang, X. Pan, L. Yao, X. Wang, Y. Jing, J. Ma, G. Wang and L. Mi, BBA Clin., 2017, 8, 7–13 Search PubMed .
  10. S. Feng, D. L. Weaver, P. A. Carney, L. M. Reisch, B. M. Geller, A. Goodwin, M. H. Rendi, T. Onega, K. H. Allison, A. N. A. Tosteson, H. D. Nelson, G. Longton, M. Pepe and J. G. Elmore, Arch. Pathol. Lab. Med., 2014, 138, 955–961 CrossRef PubMed .
  11. A. C. Chi, N. Katabi, H.-S. Chen and Y.-S. L. Chen, Head Neck Pathol., 2016, 10, 451–464 CrossRef PubMed .
  12. Z. Kishanifarahani, M. Ahadi, B. Kazeminejad, T. Mollasharifi, M. Saber Afsharian, A. Sadeghi, F. Bidari, E. Jamali, N. Hasanzadeh, A. Movafagh, A. Dehghan, A. Moradi and A. Moradi, Iran. J. Pathol., 2019, 14, 243–247 CrossRef PubMed .
  13. M. Huleihel, A. Salman, V. Erukhimovitch, J. Ramesh, Z. Hammody and S. Mordechai, J. Biochem. Biophys. Methods, 2002, 50, 111–121 CrossRef CAS PubMed .
  14. E. Gazi, J. Dwyer, N. P. Lockyer, J. Miyan, P. Gardner, C. A. Hart, M. D. Brown and N. W. Clarke, Vib. Spectrosc., 2005, 38, 193–201 CrossRef CAS .
  15. S. Argov, J. Ramesh, A. Salman, I. Sinelnikov, J. Goldstein, H. Guterman and S. Mordechai, J. Biomed. Opt., 2002, 7, 248–254 CrossRef PubMed .
  16. M. F. K. Fung, M. Senterman, P. Eid, W. Faught, N. Z. Mikhael and P. T. T. Wong, Gynecol. Oncol., 1997, 66, 10–15 CrossRef PubMed .
  17. H. P. Wang, Sci. Total Environ., 1997, 183–287 Search PubMed .
  18. J. Zhu, W. Wei, B. Chen, P. Tang, X. Zhao and C. Wu, ACS Photonics, 2024, 11, 1857–1865 CrossRef .
  19. B. Chen, B. Xu, J. Zhu, P. Tang, J. Shao, S. Yang, G. Ding and C. Wu, IEEE Sens. J., 2023, 23, 28923–28931 CAS .
  20. C. Krafft, G. Steiner, C. Beleites and R. Salzer, J. Biophotonics, 2009, 2, 13–28 CrossRef CAS PubMed .
  21. C. Kendall, M. Isabelle, F. Bazant-Hegemark, J. Hutchings, L. Orr, J. Babrah, R. Baker and N. Stone, Analyst, 2009, 134, 1029–1045 RSC .
  22. K.-Y. Su and W.-L. Lee, Cancers, 2020, 12, 115 CrossRef CAS PubMed .
  23. R. Sahu and S. Mordechai, Future Oncol., 2005, 1, 635–647 Search PubMed .
  24. A. A. Bunaciu, V. D. Hoang and H. Y. Aboul-Enein, Crit. Rev. Anal. Chem., 2015, 45, 156–165 CrossRef CAS PubMed .
  25. J. Shao, X. Zhao, P. Tang, B. Chen, B. Xu, H. Lu, Z. Qin and C. Wu, Spectrochim. Acta, Part A, 2024, 321, 124753 CrossRef CAS PubMed .
  26. Y. Xu, Sci. China, Ser. B: Chem., 2005, 48, 167 CrossRef CAS .
  27. D. L. Peres, S. Farooq, R. Raffaeli, M. V. Croce, A. E. Croce and D. M. Zezell, in 2023 International Conference on Optical MEMS and Nanophotonics (OMN) and SBFoton International Optics and Photonics Conference (SBFoton IOPC), IEEE, Campinas, Brazil, 2023, pp. 1–2.
  28. B. R. Shakya, P. Shrestha, H.-R. Teppo and L. Rieppo, Appl. Spectrosc. Rev., 2021, 56, 347–379 CrossRef CAS .
  29. D. C. Malins, N. L. Polissar, K. Nishikida, E. H. Holmes, H. S. Gardner and S. J. Gunselman, Cancer, 1995, 75, 503–517 CrossRef CAS .
  30. M. Verdonck, A. Denayer, B. Delvaux, S. Garaud, R. De Wind, C. Desmedt, C. Sotiriou, K. Willard-Gallo and E. Goormaghtigh, Analyst, 2016, 141, 606–619 RSC .
  31. B. R. Wood, L. Chiriboga, H. Yee, M. A. Quinn, D. McNaughton and M. Diem, Gynecol. Oncol., 2004, 93, 59–68 CrossRef CAS PubMed .
  32. C. M. Krishna, G. D. Sockalingum, R. A. Bhat, L. Venteo, P. Kushtagi, M. Pluot and M. Manfait, Anal. Bioanal. Chem., 2007, 387, 1649–1656 CrossRef CAS PubMed .
  33. E. Gazi, J. Dwyer, P. Gardner, A. Ghanbari-Siahkali, A. Wade, J. Miyan, N. Lockyer, J. Vickerman, N. Clarke, J. Shanks, L. Scott, C. Hart and M. Brown, J. Pathol., 2003, 201, 99–108 CrossRef CAS PubMed .
  34. E. Willenbacher, A. Brunner, B. Zelger, S. H. Unterberger, R. Stalder, C. W. Huck, W. Willenbacher and J. D. Pallua, J. Biophotonics, 2021, 14, e202100079 CrossRef CAS PubMed .
  35. P. D. Lewis, K. E. Lewis, R. Ghosal, S. Bayliss, A. J. Lloyd, J. Wills, R. Godfrey, P. Kloer and L. A. Mur, BMC Cancer, 2010, 10, 640 CrossRef CAS PubMed .
  36. H. Li, J. Wang, X. Li, X. Zhu, S. Guo, H. Wang, J. Yu, X. Ye and F. He, Spectrochim. Acta, Part A, 2024, 306, 123596 CrossRef CAS PubMed .
  37. X. Wang, X. Shen, D. Sheng, X. Chen and X. Liu, Spectrochim. Acta, Part A, 2014, 122, 193–197 CrossRef CAS PubMed .
  38. R. Bangaoil, A. Santillan, L. M. Angeles, L. Abanilla, A. Lim, Ma. C. Ramos, A. Fellizar, L. Guevarra and P. M. Albano, PLoS One, 2020, 15, e0233626 CrossRef CAS PubMed .
  39. D. M. Zezell, T. M. Pereira, G. Mennecier, L. Bachmann, A. B. Govone and M. L. Z. Dagli, in Biophotonics: Photonic Solutions for Better Health Care III, SPIE, 2012, vol. 8427, pp. 634–640 Search PubMed .
  40. A. P. M. Michel, A. E. Morrison, V. L. Preston, C. T. Marx, B. C. Colson and H. K. White, Environ. Sci. Technol., 2020, 54, 10630–10637 CrossRef CAS PubMed .
  41. S. Lim and M. Cohenford, in 2014 25th International Workshop on Database and Expert Systems Applications, 2014, pp. 77–81.
  42. B. Bai, X. Yang, Y. Li, Y. Zhang, N. Pillar and A. Ozcan, Light: Sci. Appl., 2023, 12, 57 CrossRef CAS PubMed .
  43. F. B. Muniz, M. D. F. O. Baffa, S. B. Garcia, L. Bachmann and J. C. Felipe, Comput. Methods Programs Biomed., 2023, 231, 107388 CrossRef PubMed .
  44. E. Kaznowska, J. Depciuch, K. Łach, M. Kołodziej, A. Koziorowska, J. Vongsvivut, I. Zawlik, M. Cholewa and J. Cebulski, Talanta, 2018, 186, 337–345 CrossRef CAS PubMed .
  45. Y. Wu, Y. Wang, C. He, Y. Wang, J. Ma, Y. Lin, L. Zhou, S. Xu, Y. Ye, W. Yin, J. Ye and J. Lu, Anal. Chim. Acta, 2023, 1283, 341897 CrossRef CAS PubMed .
  46. C. Li, S. Liu, Q. Zhang, D. Wan, R. Shen, Z. Wang, Y. Li and B. Hu, Spectrochim. Acta, Part A, 2023, 287, 122049 CrossRef CAS PubMed .
  47. H. Yan, M. Yu, J. Xia, L. Zhu, T. Zhang, Z. Zhu and G. Sun, IEEE Access, 2020, 8, 127313–127328 Search PubMed .
  48. P. Tang, W. Wei, B. Xu, X. Zhao, J. Shao, Y. Tian and C. Wu, J. Lightwave Technol., 2024, 1–11 Search PubMed .
  49. Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand and V. Lempitsky, arXiv, 2016, preprint, arXiv:1505.07818,  DOI:10.48550/arXiv.1505.07818.
  50. H. Yu and M. Hu, IEEE Access, 2021, 9, 82000–82009 Search PubMed .
  51. Z. He, Y. Zhong and J. Pan, Comput. Biol. Med., 2022, 141, 105048 CrossRef PubMed .
  52. M. Huang and J. Yin, Mathematics, 2022, 10, 3223 CrossRef .
  53. X. Zhang, L. Yao, M. Dong, Z. Liu, Y. Zhang and Y. Li, IEEE J. Biomed. Health Inf., 2020, 24, 2852–2859 Search PubMed .
  54. Y. Ganin and V. Lempitsky, in Proceedings of the 32nd International Conference on Machine Learning, PMLR, 2015, pp. 1180–1189.
  55. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh and D. Batra, in 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, Venice, 2017, pp. 618–626.
  56. S. Errico, M. Moggio, N. Diano, M. Portaccio and M. Lepore, Biotechnol. Appl. Biochem., 2023, 70, 937–961 CrossRef CAS PubMed .
  57. Z. Movasaghi, S. Rehman and I. Ur Rehman, Appl. Spectrosc. Rev., 2008, 43, 134–179 Search PubMed .
  58. M. J. Baker, J. Trevisan, P. Bassan, R. Bhargava, H. J. Butler, K. M. Dorling, P. R. Fielden, S. W. Fogarty, N. J. Fullwood, K. A. Heys, C. Hughes, P. Lasch, P. L. Martin-Hirsch, B. Obinaju, G. D. Sockalingum, J. Sulé-Suso, R. J. Strong, M. J. Walsh, B. R. Wood, P. Gardner and F. L. Martin, Nat. Protoc., 2014, 9, 1771–1791 Search PubMed .
  59. L. van der Maaten, arXiv, 2013, preprint, arXiv:1301.3342,  DOI:10.48550/arXiv.1301.3342.
  60. A. I. Bandos, H. E. Rockette, T. Song and D. Gur, Biometrics, 2009, 65, 247–256 CrossRef PubMed .
  61. H. Xu, S. Zhou, Q. Tang, H. Xia and F. Bi, Biochim. Biophys. Acta, Rev. Cancer, 2020, 1874, 188394 Search PubMed .
  62. A. V. Reddy, L. K. Killampalli, A. R. Prakash, S. Naag, G. Sreenath and S. K. Biraggari, Dent. Res. J., 2016, 13, 494–499 CrossRef PubMed .

Footnotes

Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5an00216h
These authors contributed equally.

This journal is © The Royal Society of Chemistry 2025
Click here to see how this site uses Cookies. View our privacy policy here.