Expert Review on the Quality of Responses to the Questions of Multiple Myeloma Patients: A Validation Study of the Medical Artificial Intelligence System “Myelobot”

Aleksander Sergeevich Luchinin, O.E. Ochirova, V.G. Potapenko, V.V. Ryabchikova,

DOI:

https://doi.org/10.21320/2500-2139-2026-19-1-81-89

BACKGROUND. The use of artificial intelligence (AI) in oncology and hematology opens up many possibilities for improving health service systems including communication between physicians and patients with long-standing diseases, such as multiple myeloma (MM). Generative AI based on the large language models is increasingly introduced into clinical practice. However, the issues of the quality of information provided as well as the level of empathy and clinical safety of such systems have until now remained underresearched.

AIM. A comprehensive prospective evaluation of the quality of responses to the questions of MM patients provided by the specialized medical AI system “Myelobot”.

MATERIALS & METHODS. This study used the scores of accuracy, empathy, and potential harm and additionally analyzed the consistency in reviewers’ ratings. All scores were measured with 5-point Likert scale with lower points corresponding to higher quality, safety, and empathy level of responses. Three hematologists participated in the study, independently and anonymously reviewing 32 AI system responses to patient questions across three scores.

RESULTS. The median values of all scores appeared to be significantly lower than empirical threshold of 2.5 points (< 0.001), suggesting a high quality of responses. At the same time, the Fleiss kappa and Krippendorff alpha coefficients of consistency in reviewers’ ratings were negative, especially on the empathy score, suggesting substantial variability in expert evaluations.

CONCLUSION. AI service “Myelobot” demonstrated a high level of accuracy, clinical safety, and ability for empathic communication with MM patients. However, conflicting expert ratings clearly indicate the need for standardization of the scores and calibration of evaluation approaches in future studies. According to medical specialists, AI service system “Myelobot” is a highly effective MM patient support tool with a capacity to carry out the function of physician assistant providing medical information 24 hours a day.

  1. Beam AL, Drazen JM, Kohane IS, et al. Artificial intelligence in medicine. N Engl J Med. 2023;388(13):1220–1. doi: 10.1056/NEJMe2206291. DOI: https://doi.org/10.1056/NEJMe2206291
  2. Nagler M. Artificial intelligence in medicine: are we ready? Hamostaseologie. 2024;44(6):422–4. doi: 10.1055/a-2443-4130. DOI: https://doi.org/10.1055/a-2443-4130
  3. Ito R, Kato K, Nanataki K, et al. Assessing large language models for Lugano classification of malignant lymphoma in Japanese FDG-PET reports. EJNMMI Rep. 2025;9(1):8. doi: 10.1186/s41824-025-00246-8. DOI: https://doi.org/10.1186/s41824-025-00246-8
  4. Fushimi A, Terada M, Tahara R, et al. Assessing the quality of Japanese online breast cancer treatment information using large language models: a comparison of ChatGPT, Claude, and expert evaluations. Breast Cancer. 2025;32(5):960–9. doi: 10.1007/s12282-025-01719-1. DOI: https://doi.org/10.1007/s12282-025-01719-1
  5. Hernández-Flores LA, López-Martínez JB, Rosales-de-la-Rosa JJ, et al. Assessment of challenging oncologic cases: a comparative analysis between ChatGPT, Gemini, and a multidisciplinary tumor board. J Surg Oncol. 2025;131(8):1562–70. doi: 10.1002/jso.28121. DOI: https://doi.org/10.1002/jso.28121
  6. Kun W, Bo T, Yuntao L, et al. Comparative analysis of the performance of the large language models ChatGPT-3.5, ChatGPT-4 and open AI-O1 in the field of programmed cell death in myeloma. Discov Oncol. 2025;16(1):870. doi: 10.1007/s12672-025-02648-3. DOI: https://doi.org/10.1007/s12672-025-02648-3
  7. Skyles TJ, Freeman IJ, Kalibbala G, et al. Exploring ChatGPT 3.5 for structured data extraction from oncological notes. AMIA Jt Summits Transl Sci Proc. 2025;2025:518–26.
  8. Chatziisaak D, Burri P, Sparn M, et al. Concordance of ChatGPT artificial intelligence decision-making in colorectal cancer multidisciplinary meetings: retrospective study. BJS Open. 2025;9(3):zraf040. doi: 10.1093/bjsopen/zraf040. DOI: https://doi.org/10.1093/bjsopen/zraf040
  9. Kim S, Kim D, Shin HJ, et al. Large-scale validation of the feasibility of GPT-4 as a proofreading tool for head CT reports. Radiology. 2025;314(1):e240701. doi: 10.1148/radiol.240701. DOI: https://doi.org/10.1148/radiol.240701
  10. Kinikoglu O, Isik D. Evaluating the performance of ChatGPT-4o oncology expert in comparison to standard medical oncology knowledge: a focus on treatment-related clinical questions. Cureus. 2025;17(1):e78076. doi: 10.7759/cureus.78076. DOI: https://doi.org/10.7759/cureus.78076
  11. Goh E, Gallo R, Hom J, et al. Large language model influence on diagnostic reasoning: a randomized clinical trial. JAMA Netw Open. 2024;7(10):e2440969. doi: 10.1001/jamanetworkopen.2024.40969. DOI: https://doi.org/10.1001/jamanetworkopen.2024.40969
  12. Menz BD, Modi ND, Abuhelwa AY, et al. Generative AI chatbots for reliable cancer information: evaluating web-search, multilingual, and reference capabilities of emerging large language models. Eur J Cancer. 2025;218:115274. doi: 10.1016/j.ejca.2025.115274. DOI: https://doi.org/10.1016/j.ejca.2025.115274
  13. Elemento O, Khozin S, Sternberg CN. The use of artificial intelligence for cancer therapeutic decision-making. NEJM AI. 2025;2(5):AIra2401164. doi: 10.1056/AIra2401164. DOI: https://doi.org/10.1056/AIra2401164
  14. Suárez EU, Torres-Saavedra F, Domingo-González A, et al. How well do different chatbots respond to multiple myeloma treatment guidelines? Leukemia. 2025;39(6):1538–9. doi: 10.1038/s41375-025-02604-8. DOI: https://doi.org/10.1038/s41375-025-02604-8
  15. Kaiser KN, Hughes AJ, Yang AD, et al. Use of large language models as clinical decision support tools for management pancreatic adenocarcinoma using national comprehensive cancer network guidelines. Surgery. 2025;182:109267. doi: 10.1016/j.surg.2025.109267. DOI: https://doi.org/10.1016/j.surg.2025.109267
  16. McInerney S, Nash T, Lee R, et al. AI chatbot for cancer patient support: development and evaluation using Llama 3.1, Mistral 7B, and PHI 3B. Stud Health Technol Inform. 2025;327:890–1. doi: 10.3233/SHTI250494. DOI: https://doi.org/10.3233/SHTI250494
  17. Everett SS, Bunning BJ, Jain P, et al. From tool to teammate: a randomized controlled trial of clinician-AI collaborative workflows for diagnosis. medRxiv [Preprint]. 2025:2025.06.07.25329176. doi: 10.1101/2025.06.07.25329176. DOI: https://doi.org/10.1101/2025.06.07.25329176
  18. Tu T, Schaekermann M, Palepu A, et al. Towards conversational diagnostic artificial intelligence. Nature. 2025;642(8067):442–50. doi: 10.1038/s41586-025-08866-7. DOI: https://doi.org/10.1038/s41586-025-08866-7
  19. Tran Y, Lamprell K, Nic Giolla Easpaig B, et al. What information do patients want across their cancer journeys? a network analysis of cancer patients’ information needs. Cancer Med. 2019;8(1):155–64. doi: 10.1002/cam4.1915. DOI: https://doi.org/10.1002/cam4.1915
  20. Keinki C, Zowalla R, Wiesner M, et al. Understandability of patient information booklets for patients with cancer. J Cancer Educ. 2018;33(3):517–27. doi: 10.1007/s13187-016-1121-3. DOI: https://doi.org/10.1007/s13187-016-1121-3
  21. Hindelang M, Sitaru S, Zink A. Transforming health care through chatbots for medical history-taking and future directions: comprehensive systematic review. JMIR Med Inform. 2024;12:e56628. doi: 10.2196/56628. DOI: https://doi.org/10.2196/56628
  22. Karaagac M, Carkit S. Evaluation of AI-based chatbots in liver cancer information dissemination: a comparative analysis of GPT, DeepSeek, Copilot, and Gemini. Oncology. 2025:1–10. doi: 10.1159/000546726. DOI: https://doi.org/10.1159/000546726
  23. Huh S. How appropriately can generative artificial intelligence platforms, including GPT-4, Gemini, Bing, and Wrtn, answer questions about colon cancer in the Korean language? Ann Coloproctol. 2025;41(3):190–7. doi: 10.3393/ac.2024.00122.0017. DOI: https://doi.org/10.3393/ac.2024.00122.0017

Downloads

Download data is not yet available.

Author Biography

  • Aleksander Sergeevich Luchinin, NN Blokhin National Medical Cancer Research Center, 23 Kashirskoye sh., Moscow, Russian Federation, 115522

    MD, PhD

Published

01.01.2026

Issue

NEW TECHNOLOGIES

How to Cite

Luchinin A.S., Ochirova O.E., Potapenko V.G., Ryabchikova V.V. Expert Review on the Quality of Responses to the Questions of Multiple Myeloma Patients: A Validation Study of the Medical Artificial Intelligence System “Myelobot”. Clinical Oncohematology. Basic Research and Clinical Practice. 2026;19(1):81–89. doi:10.21320/2500-2139-2026-19-1-81-89.

Most read articles by the same author(s)

1 2 > >>