The CREU project in 2006-2007:
Advanced Intelligent System for Generating Electronic Medical Records




This project is supported by the grant from CREU: Collaborative Research Experience for Undergraduates in Computer Science and Engineering




[ Participants | Project Abstract | Project Methods | Journal | Due Dates ]


Participants

Christina Robinson,
Charnette L. Carrington
Hyoil Han


Project abstract

Electronic Medical Records are important to manage health data and save lives to improve the quality of service in hospitals. Clinical medical records contain a wealth of information, largely in free-text form. This project will implement parts of a generic framework to semi-automatically extract and mine data from clinical notes, automatically learn patterns for each physician’s clinical notes, and automatically populate EMR databases for multi-users. The project will help the generation of Electronic Medical Records by concentrating on user modeling for personalization, information extraction from unstructured or semi-structured text and database design/implementation. The working hypotheses are that using information extraction (IE), data mining and user feedback to build each personalized profile from the input text by each physician will improve the automatic generation of Electronic Medical Record.


Project methods

First, background research for information extraction from unstructured text and semi-structured text will be conducted. Second, unstructured/semi-structured data will be available after each physician specifies what they want to extract. Then information extraction will be performed for the specified data. Once information extraction ends, the approach for producing this system will mirror that of the database design approach. In our preliminary investigation, a field study will be geared towards user specifications. Interviews with potential users (i.e., physicians) along with pre-existing studies will be our primary source of information for this phase. This is not only the most crucial part of the development phase for us, but one of the most time consuming. The design phase will begin soon after the data from the preliminary investigation is compiled and analyzed. During this time, entity relationship diagrams will be designed and mapped. From the model created in the design phase, our code will be produced. The projected coding languages of this phase are SQL and ASP. These language choices may change as a result of the preliminary investigation, but are initially vital in the perception of the project. Sample and test data will then be populated to simulate the user interaction with the system. Potential user errors will be included in the simulation to ensure functionality based on the user information collected at the beginning of the project. Ultimately our goal is to populate relevant data of personalization profiles in a timely manner for users in environments which require immediate, comprehensible data which is reliable for quick decision purposes to improve patient care.


Journal

By Christina Robinson,
The goal of this project was to build a system that can extract information/data from a structured or semi-structured document and input this data into a database. We were to research current systems that take data, manipulate it, and insert it into data tables and kept its structure so that it made sense when extracted. We spent weeks reading and researching tools that already exist having to do with our subject. After we researched the topic enough we tried to implement a system to replicate what we wanted to accomplish. We met weekly and would sit and discuss everything we read and how we wanted the system to work. Half way through our process a grad student came in to help with any road blocks we faced. Most of the work done by me was research about the current systems and proposed systems about data extraction tools. We achieved a greater understanding of the topic through researching it. We also implemented part of the code needed to make such a system.

By Charnette L. Carrington,
As a basis for the programs which I created, I used the technical paper, A Generic Framework: From Clinical Notes to Electronic Medical Records. The generic framework outlined for electronic medical records based on clinical notes proved to be an insightful place to begin. The java code which was created address to parts of the framework: text mining and data storage. Han, et al, describes text mining as being composed of three phases: term identifications, term association, and term classification. Although term identification is performed by using a speech tagger based on the paper, I used a user input method. The ExpressionTest program can return relevant information based on the term received from the user. The latter part of the framework addressed, data storage was executed as outlined in our proposal. The data is inserted into a table on SQL Sever 2005 with the Emr program. The Emr program is also the initial step in the identifying similar regions concept discussed in NoDoSE – A Tool for Semi-Automatically Extracting Structured and Semi-structured Data from Text Documents [4]. This document places data from text files into relevant nodes where they represent a single structural component of the document. Three values are stored in each node: the type name, start and end offset, and label. The initial basis of this project uses a String type, the beginning of the line is the start offset, semi-colon end offset, and a user defined term as the label. The theory of separating text files semantically was also used as a basis for this project.

References
  1. Hyoil Han, Yoori Choi, Yoomyung Choi, Xiaohua Zhou, and Ari D. Brooks, A Generic Framework: From Clinical Notes to Electronic Medical Records, Accepted for publication. The 19th IEEE International Symposium on Computer Based Medical System, Salt Lake City, Utah, USA, June 22-23, 2006.
  2. X. Zhou, H. Han, I. Chankai, A. A. Prestrud, and A. D. Brooks, "Approaches to Text Mining for Clinical Medical Records," To appear at The 21st Annual ACM Symposium on Applied Computing 2006, Technical tracks on Computer Applications in Health Care, Dijon, France, 2006.
  3. X. Zhou, H. Han, I. Chankai, A. A. Prestrud, and A. D. Brooks, "Converting Semi-structured Clinical Medical Records into Information and Knowledge," presented at International Workshop on Biomedical Data Engineering (BMDE) in conjunction with the 21st International Conference on Data Engineering (ICDE), Tokyo, Japan, 2005.
  4. Adelberg, B., NoDoSE - A Tool for Semi-Automatically Extracting Structured and Semistructured Data from Text Documents, ACM SIGMOD, 1998.


Due dates

        December 2006: Progress report
        June 2007: Final report


For further information on this project, please contact hyoil.han AT ischool.drexel.edu