Medical OCR for Faster Decisions

Using local LLMs to analyze Electronic Health Records to save time and make informed medical decisions.

Skills

Link to Project

Three Avenues for Consideration

1) Analyis and Sorting of Faxes based on content and Doctor preferences. 2) Interfacing with the existing Fax API 3) HIPPA Compliance

Proof of Concept Validation

What do i need to learn from doctors?... What is the Chief Complaint? Why did the patient come in? Given a dataset, what info can you tell me about the patient.

Deciding on a TechStack for OCR

I have recently decided to run a vanilla HTML/CSS/JS webpage hosted on a Flask server while I work on the Python script that runs the local LLM. - HTML/CSS/JS for the webpage - Python for the LLM - Flask development server for realtime monitoring the LLM output. When I first began, I decided on a MERN stack with Express.js for a server, React for front end, and Node.js. No database since it is a local webapp at the moment. I wanted to run OCR on the EHR documents but realized it would probably be better just to work with text from a PDF doc for now. First, I found OCR models that run with Node.js. Tesseract does not take pdfs, Scribe doesn't do CDN but was more accurate (files must be served from the origin of the file importing scribe). The Azure Microsoft model had all the features I needed, had the capacity to scale, was cheap enough for quick development and some basic testing. With Azure, I can set what features I want to extract, upload an image for analysis or send an image URL, get the analysis result. Steps: Created a Cognitive Services Azure AI Foundry resource, installed the @azure/ai-text-analytics package, deployed a new resource with the "Project" tag of "medfaxocr". Resource group: rg-medfaxocr Resource Name: ai-vision-medfaxocr-eastus Project Name: MedicalFaxOCR User Identity: System assigned Then I learned that AI Vision is best for image recognition and not the best for forms and documents so I will try switching to the AI Document Intelligence. For HIPPA compliance, I might switch to using a local VLM/LLM which can both read and analyze local. This would be hosted on the local doctors computer.

Using a local LLM

Started with Qwen and it worked! Now that I get an output, I am going to feed it a pdf file for context and limit the number of output token. I am limited by the capabilities of my M3 Apple Silicon chip, 24MB Unified Memory. This is a 1b param model and doesn't seem very creative. For debugging, I added time logging to find the latency points, and custom color-coded logging functions s for easy-to-read terminal output. The MLX library ended up being the most useful library for inferencing on an Apple Silicon chip. I can efficiently run small models and it can interface with the largest number of different models. Transformers, Pipeline, and Unsloth did not work well or could not support the models I was testing. After trying several different models and I found a medgemma model with 4B parameters and 8bit quantization works well. However, I am limited by a small context window of about 5 pages of ascii text; this is not practical for handling 300+ page documents. Next, there are two paths to research: 1. Now onto researching RAG chunking methods for small models on my local machine. 2. Trying different cloud computing solutions to access larger models and performing an A/B testing. For now, I am thinking trying Jupyter Notebooks, and GCP for TPUs and GPUs.

Create a free website with Framer, the website builder loved by startups, designers and agencies.