Subset binding enables detection of multimodal patient subgroup patterns and drug target discovery in idiopathic pulmonary fibrosis Yayoi Natsume-Kitatani, Mari N Itoh, Yoshito Takeda, Masataka Kuroda, Haruhiko Hirata, Kotaro Miyake, Takayuki Shiroyama, Yuya Shirai, Yoshimi Noda, Yuichi Adachi, Takatoshi Enomoto, Saori Amiya, Jun Adachi, Ryohei Narumi, Satoshi Muraoka, Takeshi Tomonaga, Sadao Kurohashi, Fei Cheng, Ribeka Tanaka, Shuntaro Yada, Eiji Aramaki, Shoko Wakamiya, Yi-An Chen, Akiko Fukagawa, Chihiro Higuchi, Yosui Nojima, Takeshi Fujiwara, Chioko Nagao, Toshihiro Takeda, Yasushi Matsumura, Kenji Mizuguchi, Atsushi Kumanogoh, Naonori Ueda Briefings in Bioinformatics, 2026 Idiopathic pulmonary fibrosis (IPF) is an intractable lung disease that belongs to idiopathic interstitial pneumonia (IIP) with limited therapeutic options. Conventional patient stratification approaches often fail to integrate diverse data modalities, particularly heterogeneous electronic medical records (EMR) containing mixed discrete and continuous values, with omics data, or fail to extract the interpretable many-to-many relationships crucial for precision medicine. We introduce subset binding (SB), a novel unsupervised algorithm that extends fuzzy association rule mining to robustly integrate heterogeneous clinical data (EMR) and omics data. This framework is uniquely designed to identify clinically meaningful patient subgroup patterns and discover associated molecular signatures based on observable symptoms rather than relying on ambiguous conventional diagnostic categories, such as IIPs. Applying SB to a dataset including 602 samples (from 403 IIPs including IPF patients and 39 healthy controls), we successfully identified 20 proteins linked with key IPF clinical features. Network-based pathway analysis nominated tyrosine kinases as critical drug target candidates, leading to the proposal of ponatinib, a multi-kinase inhibitor, as a candidate therapeutic. Functional validation using a TGF-β-induced epithelial-mesenchymal transition (EMT) model confirmed ponatinib’s ability to at least partially suppress TGF-β-induced EMT. This inhibitory effect is consistent with the anti-fibrotic mechanism of the existing IPF drug, nintedanib, and reinforces prior evidence supporting ponatinib’s anti-fibrotic property. This study demonstrates that SB enables transparent, reproducible, and robust, molecularly defined patient stratification from multimodal patient data. By establishing a data-driven framework that focuses on observation-based rules, this work lays the critical foundation for future prognostic validation and tailored treatment strategies, offering clinically actionable insights and therapeutic discovery in diagnostically ambiguous diseases like IPF, with ponatinib emerging as a compelling repurposing candidate. Significance statement Idiopathic pulmonary fibrosis (IPF) is a progressive lung disease with limited therapeutic options. IPF is classified as idiopathic interstitial pneumonia (IIP), but distinguishing it from other similar diseases in IIP is not straightforward. The ambiguities in distinguishing IPF from other IIPs necessitate the identification of molecules associated with specific clinical features, rather than relying on solely on diagnosis. Existing methods for multi-omics data analysis often fail to effectively integrate heterogeneous data – such as EMR (containing mixed discrete and continuous values) and omics – or to extract many-to-many molecular-phenotypic relationships. We developed subset binding (SB), a novel, interpretable unsupervised machine learning method to specifically address these technical limitations by integrating EMR and omics data. Our approach successfully detected proteins in serum extracellular vesicles associated with IPF-related features, highlighted several tyrosine kinases as potential drug targets, and proposed the multi-kinase inhibitor ponatinib as a compelling candidate for drug repurposing. This data-driven framework establishes a scalable and interpretable foundation for biomarker and drug target discovery for intractable diseases whose mechanisms are not fully understood.
Serum vesicle biomarkers reflect the disease activity of idiopathic pulmonary fibrosis Yuya Shirai, Takatoshi Enomoto, Yoshito Takeda, Ryuya Edahiro, Miho Takahashi-Itoh, Yoshimi Noda, Yuichi Adachi, Mana Nakayama, Takahiro Kawasaki, Taro Koba, Yu Futami, Hanako Yoshimura, Saori Amiya, Reina Hara, Makoto Yamamoto, Daisuke Nakatsubo, Yasuhiko Suga, Maiko Naito, Kentaro Masuhiro, Takanori Matsuki, Haruhiko Hirata, Kota Iwahori, Izumi Nagatomo, Kotaro Miyake, Shohei Koyama, Kiyoharu Fukushima, Takayuki Shiroyama, Yujiro Naito, Shinji Futami, Yayoi Natsume-Kitatani, Naoko Ose, Soichiro Funaki, Satoshi Nojima, Shigeyuki Shichino, Masahiro Yanagawa, Yasushi Shintani, Mari Nogami-Itoh, Jun Adachi, Yoshikazu Inoue, Takeshi Tomonaga, Yukinori Okada, Kenji Mizuguchi, Atsushi Kumanogoh Journal of Translational Medicine, 2025 Background Idiopathic pulmonary fibrosis (IPF) is a heterogeneous disease caused by an interplay of genetic and environmental factors. Biomarkers that reflect the progression of fibrosis are required for the management of IPF. Methods We extracted serum extracellular vesicles from a discovery cohort (127 IPF patients and 34 controls) and a validation cohort (20 IPF patients and 22 controls). Non-targeted proteomic analysis was performed by a data-independent acquisition method. We investigated the proteomic profiles in relation to multiple clinical parameters associated with IPF. To further evaluate the biological relevance of the identified biomarkers, we analyzed publicly available single-cell RNA sequencing datasets of lung tissue and conducted immunochemical validation using our collected lung samples. Results We obtained 2420 protein profiles in serum extracellular vesicles and identified 19 IPF-associated proteins; their expressions were significantly lung-specific. Protein module analyses revealed that the upstream components of the complement system were increased in IPF. These IPF-associated proteins were involved in various IPF-associated genes and heterogeneously increased in IPF patients. Notably, surfactant protein B (SFTPB) not only showed superior diagnostic performance over the existing marker but was also significantly associated with progressive disease activity, such as the extent of fibrosis and decline in lung function. Furthermore, single-cell RNA-sequencing analysis revealed that SFTPB was associated with the TGF-β/SMAD pathway in SCGB3A2 + cells in IPF lungs. SFTPB expression in SCGB3A2 + cells was confirmed by immunostaining. Conclusions Serum extracellular vesicles could capture heterogenetic fibrotic profiles in IPF, and SFTPB can be a promising biomarker reflecting the disease activity.
Familial fibrotic hypersensitivity pneumonitis: A distinct clinical phenotype with shorter leukocyte telomere length Masashi Nishimura, Hideya Kitamura, Yoichi Tagami, Kazushi Fujimoto, Takashi Fukushima, Ryota Otoshi, Takashi Niwa, Jun Aoki, Taiki Fukuda, Tomoe Sawazumi, Tae Iwasawa, Koji Okudela, Tamiko Takemura, Yayoi Natsume-Kitatani, Yu Hara, Takeshi Kaneko, Takashi Ogura Respiratory Investigation, 2025 BACKGROUND: Family history is an important factor in recognizing the prognosis of interstitial lung disease (ILD), however, its significance in fibrotic hypersensitivity pneumonitis (HP), as based on the latest international guidelines, remains unclear. This study aimed to investigate the distinct clinical profile and leukocyte telomere length (LTL) of familial fibrotic HP. METHODS: We retrospectively reviewed 490 patients who underwent leukocyte telomere length (LTL) measurement, and identified 131 patients with fibrotic HP, including 19 familial cases. Chest HRCT images were reviewed using automatic deep learning-based lung analysis. RESULTS: Familial fibrotic HP patients were younger (age ≥60: 68.4 % vs. 91.9 %, p = 0.02) and had lower diffusing capacity for carbon monoxide (DLco) (mean [SD]: 70.9 [23.6] vs. 82.9 [23.2], p = 0.031) compared to non-familial cases. Despite similar imaging patterns and AI-based CT analysis, pathological features of usual interstitial pneumonia (UIP) were more frequent in familial cases. Age-adjusted LTL was significantly shorter in the familial group (mean [SD]: -0.26 [0.26] vs. -0.06 [0.28], p = 0.004). Furthermore, LTL correlated with serum KL-6 (r = -0.603, p = 0.006), %DLco(r = 0.629, p = 0.007), and fibrotic features according to deep learning-based CT analysis; the consolidation with traction bronchiectasis ratio (r = -0.48, p = 0.038) and the traction bronchiectasis ratio (r = -0.489, p = 0.034). CONCLUSION: Familial fibrotic HP may represent a distinct clinical phenotype characterized by shorter telomeres, with LTL emerging as a potential biomarker for disease severity.
An adjuvant database for preclinical evaluation of vaccines and immunotherapeutics Yayoi Natsume-Kitatani, Kouji Kobiyama, Yoshinobu Igarashi, Taiki Aoshi, Noriyuki Nakatsu, Lokesh P. Tripathi, Junichi Ito, Johan Nyström-Persson, Yuji Kosugi, Rodolfo S. Allendes Osorio, Chioko Nagao, Burcu Temizoz, Etsushi Kuroda, Daron M. Standley, Hiroshi Kiyono, Kenji Nakanishi, Satoshi Uematsu, Isao Hamaguchi, Yasuhiro Yasutomi, Jun Kunisawa, Sho Yamasaki, Cevayir Coban, Hiroshi Yamada, Kenji Mizuguchi, Ken J. Ishii Cell Chemical Biology, 2025 Adjuvants are immunostimulators used to enhance vaccine efficacy against infectious diseases. However, current methods for evaluating their efficacy and safety are limited, hindering large-scale screening. To address this, we developed a prototype Adjuvant Database (ADB) containing transcriptome data, generated using the same protocols as the widely used Open TG-GATEs (OTG) toxicogenomics database, covering 25 adjuvants across multiple species, organs, time points, and doses. This enabled cross-database integration of ADB and OTG. Transcriptomic patterns successfully distinguished each adjuvant regardless of organs or species. Using both databases, we built machine learning models to predict adjuvanticity and hepatotoxicity. Notably, we identified colchicine's adjuvant activity and FK565's liver toxicity through data-driven analysis. Overall, ADB combined with OTG offers a framework for transcriptomics-based, data-driven screening of adjuvant candidates.
Homology-feature-assisted quantification of fibrotic lesions in computed tomography images: a proof of concept for CT image feature-based prediction for gene-expression-distribution Kentaro Doi, Hodaka Numasaki, Yusuke Anetai, Yayoi Natsume-Kitatani International Journal of Computer Assisted Radiology and Surgery, 2025 Purpose Computed tomography (CT) image is promising for diagnosing of interstitial idiopathic pneumonias (IIPs); however, quantification of IIPs lesions in CT images is required. This study aimed to quantitatively evaluate fibrotic lesions in CT images using homology-based image analysis. Methods We collected publicly available CT images comprising 47 fibrotic images and 36 non-fibrotic images. The homology-profile (HP) image analysis method provides b0 and b1 profiles, indicating the number of isolated components and holes in a binary image. We locally applied the HP method to the CT image and generated homology-based feature (HF) maps as resultant images. The collected images were randomly divided into the tuning dataset and the testing dataset. The cut-off value for classifying the HF map for fibrotic or non-fibrotic images was defined using receiver operating characteristic (ROC) analysis with the tuning dataset. This cut-off value was evaluated using the testing dataset with accuracy, sensitivity, specificity, and precision. Results We successfully visualized the quantification of fibrotic lesions in the HF map. The b0 HF map was more suitable for quantifying fibrotic lesions than b1. The mean cut-off value of the b0 HF map was 199, with all performances achieved at 1.0. Furthermore, the classification of the b0 HF map for fibrotic or lung cancer images achieved all maximum performances at 1.0. Conclusion This study demonstrated the feasibility of using the HF in quantitatively evaluating fibrotic lesions in CT images. Our proposed HP-based method can also be promising in quantifying the fibrotic lesions of patients with IIPs, which can be applicable to assist the diagnosis of IIPs.
Idiopathic pulmonary fibrosis-specific Bayesian network integrating extracellular vesicle proteome and clinical information Mei Tomoto, Yohei Mineharu, Noriaki Sato, Yoshinori Tamada, Mari Nogami-Itoh, Masataka Kuroda, Jun Adachi, Yoshito Takeda, Kenji Mizuguchi, Atsushi Kumanogoh, Yayoi Natsume-Kitatani, Yasushi Okuno Scientific Reports, 2024 Idiopathic pulmonary fibrosis (IPF) is a progressive disease characterized by severe lung fibrosis and a poor prognosis. Although the biomolecules related to IPF have been extensively studied, molecular mechanisms of the pathogenesis and their association with serum biomarkers and clinical findings have not been fully elucidated. We constructed a Bayesian network using multimodal data consisting of a proteome dataset from serum extracellular vesicles, laboratory examinations, and clinical findings from 206 patients with IPF and 36 controls. Differential protein expression analysis was also performed by edgeR and incorporated into the constructed network. We have successfully visualized the relationship between biomolecules and clinical findings with this approach. The IPF-specific network included modules associated with TGF-β signaling (TGFB1 and LRC32), fibrosis-related (A2MG and PZP), myofibroblast and inflammation (LRP1 and ITIH4), complement-related (SAA1 and SAA2), as well as serum markers, and clinical symptoms (KL-6, SP-D and fine crackles). Notably, it identified SAA2 associated with lymphocyte counts and PSPB connected with the serum markers KL-6 and SP-D, along with fine crackles as clinical manifestations. These results contribute to the elucidation of the pathogenesis of IPF and potential therapeutic targets.
Correlation of CT-based radiomics analysis with pathological cellular infiltration in fibrosing interstitial lung diseases Akira Haga, Tae Iwasawa, Toshihiro Misumi, Koji Okudela, Tsuneyuki Oda, Hideya Kitamura, Tomoki Saka, Shoichiro Matsushita, Tomohisa Baba, Yayoi Natsume-Kitatani, Daisuke Utsunomiya, Takashi Ogura Japanese Journal of Radiology, 2024 Purpose We aimed to identify computed tomography (CT) radiomics features that are associated with cellular infiltration and construct CT radiomics models predictive of cellular infiltration in patients with fibrotic ILD. Materials and methods CT images of patients with ILD who underwent surgical lung biopsy (SLB) were analyzed. Radiomics features were extracted using artificial intelligence-based software and PyRadiomics. We constructed a model predicting cell counts in histological specimens, and another model predicting two classifications of higher or lower cellularity. We tested these models using external validation. Results Overall, 100 patients (mean age: 62 ± 8.9 [standard deviation] years; 61 men) were included. The CT radiomics model used to predict cell count in 140 histological specimens predicted the actual cell count in 59 external validation specimens (root-mean-square error: 0.797). The two-classification model’s accuracy was 70% and the F1 score was 0.73 in the external validation dataset including 30 patients. Conclusion The CT radiomics-based model developed in this study provided useful information regarding the cellular infiltration in the ILD with good correlation with SLB specimens.
SFTPB in serum extracellular vesicles as a biomarker of progressive pulmonary fibrosis Takatoshi Enomoto, Yuya Shirai, Yoshito Takeda, Ryuya Edahiro, Shigeyuki Shichino, Mana Nakayama, Miho Takahashi-Itoh, Yoshimi Noda, Yuichi Adachi, Takahiro Kawasaki, Taro Koba, Yu Futami, Moto Yaga, Yuki Hosono, Hanako Yoshimura, Saori Amiya, Reina Hara, Makoto Yamamoto, Daisuke Nakatsubo, Yasuhiko Suga, Maiko Naito, Kentaro Masuhiro, Haruhiko Hirata, Kota Iwahori, Izumi Nagatomo, Kotaro Miyake, Shohei Koyama, Kiyoharu Fukushima, Takayuki Shiroyama, Yujiro Naito, Shinji Futami, Yayoi Natsume-Kitatani, Satoshi Nojima, Masahiro Yanagawa, Yasushi Shintani, Mari Nogami-Itoh, Kenji Mizuguchi, Jun Adachi, Takeshi Tomonaga, Yoshikazu Inoue, Atsushi Kumanogoh Jci Insight, 2024 Progressive pulmonary fibrosis (PPF), defined as the worsening of various interstitial lung diseases (ILDs), currently lacks useful biomarkers. To identify novel biomarkers for early detection of patients at risk of PPF, we performed a proteomic analysis of serum extracellular vesicles (EVs). Notably, the identified candidate biomarkers were enriched for lung-derived proteins participating in fibrosis-related pathways. Among them, pulmonary surfactant-associated protein B (SFTPB) in serum EVs could predict ILD progression better than the known biomarkers, serum KL-6 and SP-D, and it was identified as an independent prognostic factor from ILD-gender-age-physiology index. Subsequently, the utility of SFTPB for predicting ILD progression was evaluated further in 2 cohorts using serum EVs and serum, respectively, suggesting that SFTPB in serum EVs but not in serum was helpful. Among SFTPB forms, pro-SFTPB levels were increased in both serum EVs and lungs of patients with PPF compared with those of the control. Consistently, in a mouse model, the levels of pro-SFTPB, primarily originating from alveolar epithelial type 2 cells, were increased similarly in serum EVs and lungs, reflecting pro-fibrotic changes in the lungs, as supported by single-cell RNA sequencing. SFTPB, especially its pro-form, in serum EVs could serve as a biomarker for predicting ILD progression.
Disease network constructor: A pathway extraction and visualization Mohammad Golam Sohrab, Khoa Duong, Goran Topić, Masami Ikeda, Nozomi Nagano, Yayoi Natsume-Kitatani, Masakata Kuroda, Mari Itoh, Hiroya Takamura Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2023
BiomedCurator: Data Curation for Biomedical Literature Mohammad Golam Sohrab, Khoa N.A. Duong, Ikeda Masami, Goran Topić, Yayoi Natsume-Kitatani, Masakata Kuroda, Mari Nogami Itoh, Hiroya Takamura Proceedings of the 2nd Conference of the Asia Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing Long Paper Aacl Ijcnlp 2022, 2022