What’s in a Score? (and What is the Data Telling You)

by: Christopher Simmonds

Data, data, data. That is all we seem to hear about today in healthcare. One of the consequences of the Affordable Care Act has been to ensure that hospitals, physicians, surgeons and nurses are becoming obsessed with data and information to an extent like never before. Looking at information across large data pools, trends can be identified and behaviors that drive the trends can be discovered and, if needed, modified, including robotic surgery, which is one of the areas where there is a lot of analysis occurring.

Robotic surgery is truly a misnomer, as in reality it is a computer-assisted surgery where the computer has been placed between the surgeon and the patient, enhancing the surgeon’s capabilities as compared to other surgical techniques. If the robot was compared to a super hero, its role would be to turn the surgeon into Iron Man whose every day actions are enhanced by the power of computing.

The fact that there is a computer between the surgeon and the patient means that a lot of data can be captured. At their town hall meeting in July 2015, this was specifically noted by the FDA.  In addition, a main focus of that meeting was training and simulation which also is computer-based and captures a lot of information, including a surgeon’s actions which can then be translated into a scoring system.  So what can these scoring systems for robotic surgery training tell us?

If you study surgeons long enough you can identify that some surgeons will be very precise in their motions and other less so. When training new surgeons there are also certain good habits you would like them to develop such as keeping their instruments in view at all times and making sure they do not use too much force or drop things. For these reasons the MScore system, which underpins all the scoring on the dv-Trainer, looks at efficiency and good habit metrics when calculating overall scores.

scoring blog

Typically you should be rewarded for efficiency and penalized for bad habits

When Mimic initially developed the MScore system it was calculated as a percentage-based scoring system. The scores were based on the weighted average of all individual metrics as compared to an expert base line.  While this provided a simple and easy way to display the score it may not have been the best in helping an individual focus on specific areas of improvement.  A high percentage in one area could compensate for a low percentage in another area while still producing an acceptable overall percentage. Mimic refers to this as the classic scoring system.

After being challenged by educators, Mimic decided to take inspiration from FLS and develop what it now refers to as its proficiency-based scoring system.

Like the classic scoring system the revised MScore system is based on expert user benchmarks, however, proficiency is measured as being within one standard deviation of the mean score of those experts.  As an example, if five surgeons’ results have been pooled to produce the benchmark you have to perform better than at least one of these surgeons in order for you to pass. Instead of the overall result being a combination of the scores you have to become proficient at each individual metric before you can pass. The example below shows an individual who has passed on all other areas but failed in the area of blood loss. The number shown is a weighted addition of all the metrics together. The user would have likely passed in a percentage-based system as their superior scores in all the other metrics would have been compensated for their lower score in blood loss.

scoring comparison

The other difference between the classic scoring system and the proficiency-based scoring system is that you can set proficiency thresholds. In FLS for example, for students to pass they need to complete the same exercise twice consecutively and ten times non-consecutively. The same principal has been introduced into MScore and defaults to two consecutive and five non-consecutive passes, though this can be modified by the end user.

Mimic realized early on that they did not have all the answers and therefore ensured that the scoring system was developed with an open architecture approach.  Expert level benchmarks can be input from peer reviewed literature as well as from scores posted by surgeons within specific institutions. Weighting and proficiency levels can modified to meet specific needs. However curriculum and benchmarks such as the Morristown protocol are often used and have been implemented across many systems.

Overall, both the classic scoring system and the proficiency scoring system are helping surgeons improve their performance which is a good thing, noting that it will probably take someone longer to pass a proficiency-based curriculum than a percentage based one. In some instances this data is being used a part of annual certification programs but that will be the subject of another blog post, another day.