Automating spanish translation for virtual visits

98point6 Inc

the spanish translation project looked at the possibility of using machine translation to deliver care to spanish speaking patients

process

Past visit analysis

Initial usability testing

Spanish speaking standardized patient testing

Results analysis

Problem

98point6 provides text-based primary care delivered through a mobile app, and is looking to expand their services to include Spanish speaking patients. Hiring interpreters is not a cost effective option, so we were interested in automating the Spanish translation through existing machine translation services such as Google translate.

role

I served as the User Experience Researcher on the Patient Experience team for this project. I worked closely with another User Experience Researcher on the Physician experience team. My responsibilities included study design, participant recruitment, study moderation, results analysis, and recommendation creation. This was a research project to investigate the potential of machine translation, so there is no design aspect to this project yet.

SUmmary

This research tests the viability of machine translation. Researchers are investigating whether this is a possible path forward and whether or not there are red flags that tell us this is not a path worth pursuing. Specifically, we are looking to determine the impact of machine translation on the quality of the care we provide, and the impact on patient experience.

Past visit analysis

The first step in this project was an analysis of past visit transcripts. We met with a native Spanish-speaker to review (PHI scrubbed) chat transcripts from production. These transcripts were translated using an automated translator, AmazonTranslate. The native Spanish-speaker reviewed the transcripts are offered feedback on the accuracy of the translation.

  • Direct translation does not capture meaning in some instances. There are some phrases that will not translate well. Examples include urgent care and pink eye.

  • English idioms will not translate well. An example of this is “running” for “running its course.”

Visit Analysis SUmmary of findings

  • There were minimal language translation errors. All of them were minimal and not blockers to this path forward.

  • Physician typos cause translation errors that are not easy to discern by the patient.

top recommendations

  • Continue to prioritize this project, as there are no issues discovered that should block forward progress on the research.

  • Conduct additional research that looks at translation in both directions: Spanish to English and English to Spanish.

  • Conduct additional research with native Spanish-speaking individuals inputting the Spanish text.

initial usability testing

The next step in this research was a more in-depth study of machine translation for medical care. A usability study was conducted with native Spanish-speaking participants. The goal of this study is to better understand the viability of this solution from the medical quality and user experience perspectives.

Study design

We conducted five live, mock doctor chats with Spanish-speaking testers and English-speaking doctors. Testers were given a mock clinical scenario and asked to send their chief complaint and messages in Spanish. Those messages were then translated to English by a facilitator using Google Translate before being passed along to the doctor. The doctor then responded in English, with their response in turn translated by Google into Spanish by the facilitator before being sent to the patient. The doctor was asked to come to a diagnosis and care plan. After the chat, the transcript and conclusion was reviewed by 2 doctors, both of whom speak both English and Spanish, for quality purposes. The doctor and tester were asked to provide qualitative feedback on their experience.

SUmmary of findings

  • All patients reported feeling like the doctor explained things in a way they could understand for all of the visit.

  • All patients reported feeling like the doctor mostly or completely understood what they were saying during all of the visit.

  • In all but one instance the physicians reported understanding what the patient was trying to say in all of the visit.

    • in one instance the physician reported slight confusion.

  • In nearly all instances, both the quality reviewers found the physician to reach an appropriate diagnosis and care plan.

  • In nearly all all instances the quality reviewer felt the physician did not miss anything medically important.

  • Most patients were able to correctly state their diagnosis and treatment plans after the visits.

  • The physicians never felt like they were unable to proceed because of a breakdown in communication.

Top Recommendations

  • The machine translation met our acceptance criteria for this test and so is a viable solution that is worth further consideration. 

  • Future testing with more difficult issues (e.g. behavioral health, pediatrics, prescription refill, complex course of symptoms) and care plans may be valuable.

Spanish speaking standardized patient testing

For the next round of testing the goal was to create a realistic experience for patients and physicians. Due to this, we decided to use standardized patient actors as the participants. Standardized patients are used as actors for medical school training. These actors are trained to respond in a realistic manner and are able to represent a patient accurately. This way, the conversation between patient and doctor could go in any direction and the conversations were not based around a pre-set script. Additionally, we had to recruit native Spanish speaking standardized patient actors. We recruited 3 participants from a couple of different Spanish-speaking areas to account for any regional dialect differences.

Study Design

We conducted six live, mock doctor chats with Spanish-speaking standardized patients and English-speaking doctors. Each participant worked through two scenarios. The medical scenarios that were tested include our top three diagnoses as well as three more complex scenarios. This allows us to more confidently generalize our results. Patient messages were translated to English by a facilitator using Google Translate before being passed along to the doctor. The doctor then responded in English, with their response in turn translated by Google Translate into Spanish by the facilitator before being sent to the patient. The doctor was asked to come to a diagnosis and care plan. After the chat, the transcript and conclusion was reviewed by two doctors for quality assurance purposes. The doctor and tester were asked to provide qualitative feedback on their experience.

Summary of findings

  • No participants had concerns about this method of getting care.

  • No participants noted any show stopping communication issues. 

  • Biggest issue appears to be grammar problems in translation. 

  • Patients and physicians were aligned on the diagnoses and care plans.

  • Patients all reported positive feedback about their experience.

  • Patient participants commented on this method being preferable to the traditional interpreter.  

Top recommendations

  • Continue forward with additional, more rigorous, testing as this has proven to be a possibly viable solution for Spanish translation.

  • Special attention must be paid to the chatbot delivered and pre-made text blocks to ensure these are grammatically correct. 

  • Dive deeper into the language conflicts that arise from translating grammatically incorrect language. 

  • Deeper dive into how to handle typos. 

Where are we now

Continuing to explore and research

Because this testing has proven that machine translation is a potentially a viable solution we are continuing to explore and study this option. A team has been formed to work on this, and will be using the existing research with future research to begin using machine translation. This research has shown a number of usability issues that must be individually investigated as well.

constraints

This project was constrained by resources. Other business goals have taken priority over this work for now, but this research is something that will be useful as the newly formed team picks up the work in the near future.