Simulation meets machine learning for surgeon assessment

machine-learning-blog

Christopher Simmonds, BS, PGCE
John Lenihan Jr., MD, FACOG

Having now been around and in the field of robotic surgery for over 13 years, we are convinced that robotic surgery, in the right hands, leads to better outcomes for our patients. This is driven by moving some procedures, such as prostatectomy and hysterectomy, from predominantly open procedures to minimally invasive ones. We asked many surgeons why they converted to robotics and they often answered that they could tell their patients were doing better when they rounded the day after surgery. This was before they had the data to prove it, and that was enough for them.

In the US, for example, the number of surgeons who have been carrying out prostatectomies has decreased as the advent of robotics has led to a concentration of care to those surgeons who could offer a robotic approach.

How can you recognize if someone (resident or medical student) has the hand-eye skills to be a good surgeon or not, or if an “open” surgeon is going to be able to adapt to a new technology such as robotic surgery? Does having a high surgical volume imply that one’s outcomes with robotics will be superior? Interestingly, in the past 12 months there have been two papers that have shed some light on this topic.

The first by Andrea Moglia, PhD, from the Center for Computer Assisted Surgery (EndoCAS) at the University Pisa, Italy, is titled, “Distribution of innate psychomotor skills recognized as important for surgical specialization in unconditioned medical undergraduates”. It was published in Surgical Endoscopy and it is a continuation of a previous study the team carried out. The underlying hypothesis is based on data from other disciplines which predict that a small percentage, around 6%, have an innate ability in any skill, 80% can be trained, and 10-14% will never be able to achieve proficiency in that discipline. Does this data apply to robotic surgery as well?

The EndoCAS team in Italy took 155 medical students and asked them to complete a simple curriculum of 5 virtual reality simulation exercises. They tasked the individuals to become “proficient” in these exercises. Proficiency was defined as being able to pass the exercise twice consecutively at benchmark scores that had been developed from expert surgeons.

The table below shows the amount of time it took for these individuals to become proficient.

Proficiency Chart

The average for all the students required 25 attempts to achieve a proficiency level and took them approximately 96 minutes. A group of 9 (5.8%) top performing students were able to demonstrate proficiency in 45 minutes and 14 attempts, while 17 students (11%) scored significantly worse than the group in the mean weighted time to complete the exercises as well as the number of attempts they needed. This low performing group took an average 202 minutes and 42 attempts to reach the same level of proficiency as the mean group did in 96 minutes and 25 attempts.

The conclusion reached was that 82.3% of the group had an aptitude for robotic surgery and could be trained. There was, however, a small group (6%) that had exceptional skill and could be potential future leaders of the specialty. As for the low performing group (11%), the authors suggested that using testing on a virtual reality simulator could enable early identification of individuals with low innate aptitude for surgery, thus allowing program directors to advise them to consider specialization in other (non-craft) medical specialties.1

While this paper does identify potentially high-level performers, it does not make any correlation to clinical outcomes as these individuals are still medical students.

Another interesting study reported on the ongoing work which is being carried out by Andrew Hung, MD, from the University of Southern California. He has been looking at using machine learning to identify significant objective performance metrics for robot – assisted radical prostatectomy. Dr. Hung has published a number of papers on this topic.2, 3

Recently at the SRS (Society of Robotic Surgery) congress in Stockholm, Sweden, Dr. Hung shared some data on the relationship between automated performance metrics (APMs) as defined by the machine learning algorithms and clinical outcomes.

To do this, 8 experienced USC based robotic surgeons were studied and were ranked based on their automated performance metrics (APMs). They used a novel da Vinci® Systems recording device (dVLoggerTM; Intuitive Surgical, Inc.) to collect APMs such as instrument and endoscopic camera motion tracking events as well as energy usage during live robotic surgery. His team then used machine learning (ML) algorithms – now commonplace outside of medicine – to process these large volumes of automatically collected data.* He used three different ML algorithms to find the APMs that were the most predictive of actual surgical outcomes related to Robotics Assisted Radical Prostatectomy.

He split the data into two groups: 4 surgeons with the highest APMs, Group 1, and 4 with the lowest APMs, Group 2. He then compared clinical outcomes of both groups using their historical cases. There were 171 cases in the higher scoring Group 1 and 99 cases in the lower scoring Group 2.

What he found was that the machine learning algorithms predicted accurately that Group 1 patients would have a shorter operative time, (3.7 vs 4.6 hours, p-0.007), shorter length of stay (LOS, 2 days vs 4 days, p=0.02), and shorter Foley catheter duration (9 vs 14 days, p-0.02). The predictive ability of the machine learning algorithms to predict a difference in clinical outcomes were remarkably accurate: OR time p= <0.001, LOS p= 0.05, and Foley duration p= <0.001.

The APMs that seemed to be most impactful were associated mostly with camera movements – the more, the better. His team then re-ranked the surgeons based on their “procedural volumes” alone. Interestingly, two of the higher Group 1 surgeons were also high volume but two were low volume. No correlation was found in clinical outcomes based on a surgeon’s previous volumes. Surprisingly, two of the lower scoring surgeons in Group 2 were actually high volumes and had high robotic case experience.

What does all of this mean to us as surgeons and program directors? Our current thesis is, at least on the da Vinci® system, patient outcomes are driven by the natural ability of the surgeon in being able to control the system. Being a high-volume surgeon does not automatically confer optimum results. Using a da Vinci® is a little like playing the organ: not all good pianists can become good organists. More studies will need to be carried out to connect the dots in this area and see if this hypothesis holds true, but it appears that innate ability is a critical factor in predicting good clinical outcomes. As many experts such as Dr. Javier Magrina from Mayo Arizona have pointed out, “although robotics is an enabler, a robot does not make a bad surgeon good”.

Finally, it will also be interesting to see what happens as new robotic systems with slightly adapted interfaces are introduced over the next two years. Will these new robots open up the pool of expert users or will innate ability still be essential to being a successful robotic surgeon despite the platform? And, what will the future of “supervised autonomy” in robotic surgery do to this paradigm? Questions such as these will continue to occupy our interest and help to drive our work in the field of robotic surgery.


*Machine learning is a form of artificial intelligence which relies on computer algorithms and large volumes of data to “learn” and recognize broad patterns that are often imperceptible to human reviewers. Machine learning will help to identify and objectively measure surgeon performance and anticipate patient outcomes. In the near future, this technology will enable institutions to personalize surgeon training.

  1. Moglia A, Morelli L, Ferrari V, Ferrari M, Mosca F, Cushieri A. Distribution of innate psychomotor skills recognized as important for surgical specialization in unconditioned medical undergraduates. Surg Endos. Mar 2018. doi.org/10.1007/s00464-018-6146-8.
  2. Hung A, Chen J, Gill I, et al. Utilizing machine learning and automated performance metrics to evaluate robot-assisted radical prostatectomy performance and predict outcomes. JEndourol. 2018;32(5):438-444. doi:10.1089/end.2018.0035
  3. Hung A, Chen J, Jarc A, Hatcher D, Djaladay H, Gill I. Automated Performance Metrics and Machine Learning Algorithms to Measure Surgeon Performance and Anticipate Clinical Outcomes in Robotic Surgery. JAMA Surgery, June 2018; epub www.jamasurgery.com