Automated operative workflow analysis of endoscopic pituitary surgery using machine learning: development and preclinical evaluation (IDEAL stage 0)

OBJECTIVE Surgical workflow analysis involves systematically breaking down operations into key phases and steps. Automatic analysis of this workflow has potential uses for surgical training, preoperative planning


M
achine learning (ML), a subdomain of artificial intelligence (AI), has already revolutionized many industries and has the potential to disrupt medicine and surgery. 1 There has been rapid growth in the efforts of ML models to interpret the medical data, including natural language documentation and diagnostics. 2,3Of significance to surgeons is the potential of ML to interpret videos of events that occur in operations.With advancements in computational power, we will be able to apply ML to surgery in real time. 3An initial step in this process is training ML systems to recognize and analyze the critical components of surgery by using ML techniques.
An established method for this is "operative workflow analysis"-systematically deconstructing operations into steps and phases. 4A step refers to the completion of a named surgical objective (e.g., hemostasis), whereas a phase represents a major surgical event that is composed of a series of steps (e.g., closure). 4During each step, certain surgical instruments (e.g., forceps) are used to achieve a specific objective, and there is the potential for technical error (lapses in operative technique), which may result in adverse events. 4ML-based recognition of these elements will thus allow surgical workflow analysis to be generated automatically and accurately.][7][8][9] By integration with the wider surgical team (such as nursing staff and anesthesiologists), these ML systems may aid orchestration of the team to a common workflow, improving efficiency and resource management. 6Additionally, this complements the potential for real-time intraoperative ML guidance for surgeons and facilitates progression through the surgical steps, potentially reducing operative times and errors. 10,113][14][15] Low-volume surgeries or those with steeper learning curves may especially benefit from augmented training, assessment, simulation, and intraoperative guidance. 14The endoscopic transsphenoidal approach (eTSA) to resection of pituitary adenomas is an exemplarbeing performed at tertiary level care, at a comparatively low volume with a steep learning curve.It is therefore an ideal application of ML-based operative workflow analysis and would represent, to the best of our knowledge, the first neurosurgical operation analyzed in this way.Crucial to the safe integration of such technology is structured and iterative development, best captured by the Idea, Development, Exploration, Assessment, Long-term study (IDEAL) stages-beginning at the preclinical stage 0. 16,17 In this IDEAL stage 0 study, we sought to use Touch Surgery for the development and evaluation of ML-powered analysis of the phases and steps in eTSA pituitary surgery.

Methods
This paper was generated using multiple reporting guidelines, given that no single guideline comprehensive-ly captures this preclinical stage of ML technology development yet.[18][19]

Study Design
A preclinical development and evaluation (IDEAL stage 0) design was adopted. 16,17The study was based at a tertiary neurosurgical center (National Hospital for Neurology and Neurosurgery, London), which acts as a regional referral center for pituitary tumors and performs approximately 150-200 pituitary operations each year.

Data Collection
A library of anonymized operative videos of the eTSA for pituitary adenoma was used for ML model development.Videos were collected from surgical cases treated between August 8, 2018, and October 11, 2020.Cases were included if an operative video was available that was complete or near complete (missing steps but not missing an entire phase).Cases using microscopic surgery, and revision surgery in which the primary surgery was performed within 6 months, were excluded.For each case, patient (age, biological sex, tumor type) and operative (operative time) characteristics were recorded.Informed written patient consent was obtained and the project was registered with the local governance committee.
All eTSAs for pituitary adenoma operations performed at our center are single specialty (neurosurgery), performed by an attending surgeon and a subspecialty fellow.The majority of cases are performed using a mononostril technique with an endoscope holder during the sellar phase of the operation.Operative videos were recorded using a high-definition endoscope (Hopkins Telescope with AIDA storage system; Karl Storz Endoscopy).Videos were exported (MPV format) onto an encrypted hard drive and uploaded to Touch Surgery (Medtronic, Inc.; https://www.touchsurgery.com/professional),web-based software for surgical video storage and ML-derived surgical analytics.
In total, 50 videos were collected, which allowed for an 80/20 split with 40 videos used for training and 10 videos used for testing and evaluation (selected via a random number table).This 80/20 split between training and testing sets has generally been adopted within the ML literature for this sample size, and considers a minimum of 25 videos as sufficient for model training and 6 videos for testing. 12,20

Data Labeling
A workflow of 3 surgical phases and 7 constituent steps (Table 1) was generated through literature review and local expert surgeon consensus (N.L.D., H.J.M.).Operative video labeling of steps and phases was performed using Touch Surgery by at least 2 authors in duplicate (D.Z.K., H.J.M.), with differences settled through discussion and mutual agreement."Steps" were defined as a sequence of activities used to achieve a surgical objective, and "phases" as a major event occurring during a surgical procedure, composed of several steps. 4Through labeling of step and phase time stamps, manual video segmentation was achieved.Of note, videos were not submitted to formal analysis if a large portion (an entire phase or more) of the recording was missing.

Model Development and Evaluation
The training video set (n = 40) was analyzed by Touch Surgery to develop an ML model capable of recognizing the phases and the steps of the procedure.To develop an ML model to perform surgical workflow analysis, frames were extracted from each of the 40 videos at a constant frame rate (1 frame per second) and associated with a label indicating the phase and step to which they belonged according to the expert annotations.Using these frames as visual input and the associated label as the ground truth target, a 2-stage training pipeline was introduced in which convolutional neural network (CNN) models were first pretrained to recognize steps and phases from a short temporal window (1-5 frames).From a computer vision perspective, due to the ambiguity of the different anatomical and instrument landmarks visible in the dynamic field of view during different steps, a single frame or short sequence may not carry sufficient information to aid correct classification. 12,21To compensate for this, once networks were pretrained, a recurrent neural network (RNN) was trained in order to improve temporal resolution and the consistency of the predictions (Fig. 1). 12,20he accuracy of the final model was then evaluated using the testing video set (n = 10), comparing the step and phase recognition of the model to the phase or step label assigned by expert surgeons.The evaluation metrics used were accuracy, precision, recall, and F1 score.Accuracy was calculated as the average per-class correct classification ratio of all the frames for each class.F1 score, the harmonic mean between precision (or positive predictive value) and recall (or sensitivity), was calculated per class (as defined in eq. 1) and then averaged across classes.We considered an accuracy of ≥ 90% for phase recognition and 70% for step recognition as sufficient prior to progression to prospective, real-time, first-in-human studies (IDEAL stage 1). 12,21

Equations
We used the following equations for our calculations.

General Characteristics
A total of 50 cases (49 patients) of eTSA for pituitary surgery were included in the final analysis.The median age of included patients was 52 years (IQR 41-68 years), with a 25:24 male/female ratio.All cases were considered pituitary adenomas at the time of resection, with most of these considered macroadenomas on radiological assessment (46/50, 92%).Histological analysis confirmed pituitary adenomas in the majority of included cases (48/50, 96%); other pathologies included lymphocytic hypophysitis (1/50, 2%) and chronic lymphocytic leukemia (1/50, 2%).Forty-seven cases were primary surgeries and 3 were revision surgeries.
Figure 3 highlights variations in the temporal relationship of steps, with 50% of videos containing all 7 steps performed sequentially in numerical order, and the other 50% of operations either 1) not containing all 7 steps or 2) containing all steps but in nonnumerical order.For example, 20% of videos did not include a formal step 7 (closure)-either reflecting heterogeneity in practice among local surgeons in certain clinical contexts or being due to incomplete video data.The steps repeated most frequently were steps 3 (posterior septectomy and removal of sphe- noid septations, 7 times); 4 (anterior sellar wall removal, 7 times); and 5 (tumor identification and excision, 2 times).However, steps 1 (nasal corridor creation) and 7 (skull base reconstruction) were never repeated.

Model Performance
During development of the ML model, different approaches were tested iteratively until the final version for this study was achieved.The stand-alone CNN network achieved accuracies of 80% for phase recognition and 65% for step recognition.The addition of the RNN improved accuracies to 86% and 73% for phase and step recognition, respectively.Final postprocessing improvements further boosted the final performance to 91% and 76% for phase and step recognition, respectively.Final model evaluation metrics are displayed in Table 2. F1 scores were of a similar degree-with phase recognition of 90% and step recognition of 75%.Video 1 displays the model's predictions in action during an illustrative operation.

Principal Findings
We have demonstrated that Touch Surgery developed and evaluated an accurate and automated ML model for surgical workflow recognition that is capable of detecting The analysis of the eTSA for resection of pituitary adenomas makes this the first study of its kind in neu-rosurgery. 227][28][29] We found that despite variations within our own practice, Touch Surgery generated an ML model capable of accurate phase and step recognition.
][32] This is particularly evident during training programs, with the majority of US residents having performed fewer than 10 pituitary surgeries during training, thus requiring dedicated fellowships to gain the necessary skills and competency for these operations. 33As residency programs move to competency-based frameworks and pituitary services are consolidated into centers of excellence, structured training and objective assessment of pituitary surgery are increasingly relevant. 32,34utomated operative workflow analysis may meet these educational and training demands. 9,14,21For example, automatic indexing of contemporary operative videos may supplement teaching of residents and fellows, and may facilitate personal reflection on particular aspects of surgical performance (e.g., a technically challenging step). 21Similarly, deconstructing videos into critical operative steps may facilitate comparative analysis of the surgical performance of individual surgeons of various grades-examining step order and durations. 14Building on this, ML models have been used to analyze operative step performance, allowing assessment of operation-specific competency in a structured, objective, and personalized way. 14,35,36][7][8][9] Moreover, the integration of ML-based operative step analysis into the live operating room environment may improve surgical team efficiency and resource management.For example, through orientation onto the current proceedings (e.g., phase and step) and anticipation of the next necessary instruments, the entire surgical team is orchestrated to a common workflow and is prepared for the immediate next steps. 37This would require the integration of ML systems into the workflows of the wider surgical team (e.g., scrub technicians and anesthesiologists), a concept that has been found to be generally acceptable to team members. 38][7][8][9] Operations performed in this "smart" operating room may therefore be shorter and more economical (e.g., unnecessary instruments not opened or used). 6urthermore, in this era of personalized medicine, we are moving toward the data-driven analysis of the entire in-dividual patient pathway.Combining intraoperative phase and step recognition with other ML-based technologies has numerous potential uses-including administrative, patient selection, and outcome prediction.For example, after automated deconstruction of an operation into constituent components, natural language processing techniques may be used for automatic generation of operative noteswhich are generally otherwise template-based and may omit up to 50% of essential steps and events. 2,39Similarly, incorporating preoperative variables (e.g., automated imaging analysis) into ML models with intraoperative events may highlight characteristics (e.g., age or tumor morphology) that are predictive of successful outcomes or complications.Such information could be used to aid patient selection, preoperative planning, and tailored informed consent-adapted to individual patient factors. 22Finally, the correlation of intraoperative video data to postoperative outcomes may allow prediction of outcomes postoperatively and exploration of the maneuvers, instruments, or materials linked to superior or inferior outcomes.Not only would this allow for targeted refinement of operative techniques, but it may also support early identification of errors and form part of early-warning systems for potential adverse events. 40,41Ultimately, this ML-based surgical workflow analysis-through its uses in education, training, intraoperative guidance, and patient pathway integrationaims to make surgery safer, more effective, and even more individualized. 21,37ndings in the Context of the Literature Surgical phase recognition through ML approaches is a growing field, most prevalent in general surgery (e.g., laparoscopic cholecystectomy) and ophthalmological surgery (e.g., cataract surgery). 21There have been no reports of the use of this technology in neurosurgery, 21,22 although surveys of neurosurgeons and neurosurgical patients indicate receptiveness to AI being integrated into operating rooms. 38,42ultiple ML and statistical models have been explored, including dynamic time warping, hidden Markov models, support vector machines, and DNNs.The DNNs (particularly CNNs and RNNs) are used most commonly and, although they require more data and higher computational power, they have displayed increased phase recognition accuracy in multiple studies across surgical specialties.Indeed, a recent study comparing the use of multiple ML models in cataract surgery phase recognition found a combined CNN-RNN configuration with temporal modeling with accuracies > 90%. 14Touch Surgery's ML model is a similar configuration and has displayed accuracies of 93% and 73% for phase and step recognition, respectively, in cataract surgery workflow analysis prior to its application in pituitary surgery. 12Similar heights of accuracy have been observed in laparoscopic sleeve gastrectomy, with an accuracy of up to 86% for operative step detection using a CNN-RNN temporal model. 11Additionally, accurate phase recognition has been achieved for peroral endoscopic myotomy (87.6% accuracy) 43 but has proven to be more difficult in laparoscopic proctocolectomies (67% accuracy), where the operative steps used are less standardized and more complex.

Limitations and Strengths
This study has validated the first ML model that is capable of analyzing neurosurgery videos.However, it has several limitations, as follows: 1) the included sample is small and highly selected (single-center, nonconsecutive, endoscopic endonasal technique); 2) the model may be overfitted to local surgical practice; and 3) prospective validation has yet to be performed and external validity has yet to be determined.The labeled steps and phases were based on local consensus, and therefore further work is needed to create a standardized and generalizable workflow framework for this procedure.Steps at present may contain multiple substeps, which are likely to require a larger sample size before the model can accurately delineate them.Looking ahead, progression through IDEAL stages will facilitate further development of this technology.In particular, analysis of multicenter and prospective data will increase the predictive potential and generalizability of the ML software, facilitating its integration into "smart" operating theaters with real-time operation analysis. 41Refinement of an ML model on this scale may facilitate transfer learning, such that the trained algorithm is adapted to other surgeries after a period of adjustment-without requiring total de novo model creation and potentially requiring less data to achieve accurate phase and step recognition.

Conclusions
In this IDEAL stage 0 study, ML techniques have been used to automatically analyze operative videos eTSA for pituitary surgery.Using a combined CNN and RNN model, Touch Surgery achieved phase and step recognition accuracies of 91% and 76%, respectively.ML-based surgical workflow analysis has numerous potential uses-such as education (e.g., automatic indexing of contemporary operative videos for teaching); improved operative efficiency (e.g., orchestrating the entire surgical team to a common workflow); and improved patient outcomes (e.g., comparison of surgical techniques or early detection of adverse events).This technology has previously been shown to be acceptable to neurosurgical teams and patients.Future directions include the real-time integration of Touch Surgery into the live operative environment as an IDEAL stage 1 (first-in-human) study, and further development of underpinning ML models using larger data sets.

VIDEO 1 .
Operative video displaying Touch Surgery step and phase predictions.Steps are included in the upper left corner and phases are included in the upper right."GT" in yellow represents the "ground truth" steps or phases labeled by experts."Pred" represents the algorithm's prediction of steps and phases.The prediction text is green when correct (aligning with the ground truth) and red when incorrect.For each time point, a prediction certainty percentage is presented.Copyright 2021 Medtronic.All rights reserved.Used with the permission of Medtronic.Click here to view.

FIG. 1 .
FIG. 1. Overview of operative video processing and analysis.Figure is available in color online only.

FIG. 3 .
FIG. 3.Step variations in operative video workflow.Copyright 2021 Medtronic.All rights reserved.Used with the permission of Medtronic.Figure is available in color online only.

FIG. 2 .
FIG. 2. Average and range of time per step.Copyright 2021 Medtronic.All rights reserved.Used with the permission of Medtronic.Figure is available in color online only.