Abstract 735P
Background
Cancer registry provides essential information to support precision medicine clinical practice and research. Traditionally, it requires cancer registrars’ manual abstraction from unstructured EHRs. The aim of this study is to demonstrate the performance of a hospital-based AI system in supporting cancer registry data element abstraction.
Methods
A natural language processing system was designed with ensemble voting from 3 sub-systems (Hybrid Neural Symbolic System, Hierarchical Attention Network, and Statistical Principle-Based Approach) to incorporate different abstraction rules in cancer diagnosis, staging and treatment data elements. Patient reports were annotated with cancer registry concepts to facilitate the manual coding process. The recommended coding of 40 data elements is provided to cancer registrars for 16 major cancers (oral, salivary gland, nasopharyngeal, esophageal, stomach, colorectal, liver, laryngeal, lung, breast, cervical, uterus, ovarian, prostate, bladder, blood) through a visualization platform. The performance was evaluated using precision, recall, and Fβ-measure (Fβ).
Results
There is total 229,375 reports (pathological, image and surgical notes) from 5451 patients. The average number of reports per person/cancer is 27.5 to 61.5 among different cancers. To emphasize the importance of precision, we use F(0.5)>0.85 as a performance target. There are 4 cancers (bladder, stomach, lung and prostate) with all 40 data elements reaching the target, 9 cancers with 30 to 39 elements, and 3 cancers (oral, salivary gland and breast) with less than 30 data elements. The developed AI system has been incorporated in eight hospitals for further prospective validations.
Conclusions
An ensemble voting system from 3 incorporated sub-systems provides a resolution to the complexity from cancer registry abstraction rules. Our system develops coding recommendations for cancer registrars with feasible performance, and consequently, may improve the quality of coding practices and reduce the labor and time resources for data abstraction.
Clinical trial identification
Editorial acknowledgement
Legal entity responsible for the study
Y-H. Yang.
Funding
Health Promotion Administration, Ministry of Health and Welfare, Taiwan.
Disclosure
All authors have declared no conflicts of interest.