Skip to content




All times in Central Time.

1:20-2:10 p.m.

Event Type

Share This Session

Share on facebook
Share on twitter
Share on linkedin
Share on email
Share on whatsapp

Predicting Lung Cancer Amongst High Risk Patients

Lung cancer is one of the most common diseases in both men and women and is the leading cause of cancer-related deaths. The timing of lung cancer diagnosis is essential to patient prognosis and survival, considering that late diagnosis contributes to increased mortality risks (Gray, Teare, Stevens, & Archer, 2016). Due to an increase in targeted screening efforts, selective screening tools such as the low dose CT scans have been utilized to better detect lung cancer at an earlier stage and has proved to reduce the lung cancer mortality by 20 percent in high-risk individuals (National Lung Screening Trial Research et al., 2011).

Utilizing this method, however, only results in a small number of prevented lung cancer cases. Recent studies and investigations “suggest that using lung cancer risk prediction models could lead to more effective screening programs compared to the current recommendations.” (Ten Haaf et al., 2017). But previous lung cancer risk prediction models were based on small sample sizes, a small number of risk predictors, and focused on a specific patient population, resulting in a lack of generalizability (Wang et al. 2019).

Presenters at this session analyzed a large healthcare dataset with 308,024 patients at risk for lung cancer using RapidMiner to identify features that are associated with patients at high risk of developing lung cancer within the following year. RapidMiner is a citizen science machine learning tool for the non-computer-programming professional. Their work introduces prediction models with accuracy rates ranging from 69 percent to 75 percent, and correctly predicts 62 percent to 73 percent of cancer patients. This work contributes to the current early detection practices by maximizing the use of EHRs to identify high-risk patients more efficiently before the onset of lung cancer and demonstrates the use of a citizen science tool in predictive analytics.

Learner Outcomes

After completing this session, attendees will be able to:

• Better identify features associated with new incident lung cancer.
• Understand how EHR data can be used to build prediction models.
• Evaluate lung cancer risk models and their performance in healthcare data.
• Assess prediction models applied to a national dataset.
• Demonstrate the use of citizen science tools to predict lung cancer patients


Bianca Jackson, MPH
Health Outcomes Research and Policy, Health Informatics, University of Tennessee Health Science Center Memphis, TN

Charisse Madlock, PhD
Health Informatics and Information Management, University of Tennessee Health Science Center Memphis, TN
Play Video