Analyzing Stress in Doctors during COVID-19: A Machine Learning Approach

Research Article

Analyzing Stress in Doctors during COVID-19: A Machine Learning Approach

  • Maddury Jyotsna 1*
  • G. Murali Rao 2
  • Lalita Nemani 1
  • Naveen Kumar 1
  • Achukatla Kumar 3

1Department of Cardiology, NIMS, Hyderabad, India.

2Department of Statistics, Indian Institute of Statistics, India.

3Consultant (Scientific), ICMR, Port Blair, Andaman & Nicobar Islands.

*Corresponding Author: Maddury Jyotsna, Department of Cardiology, NIMS, Hyderabad, India.

Citation: M Jyotsna, Murali R, L Nemani, N Kumar, A Kumar. (2022). Analyzing Stress in Doctors during COVID-19: A Machine Learning Approach. Journal of Clinical Cardiology and Cardiology Research, BRS Publishers. 1(1); DOI: 10.59657/2837-4673.brs.22.002

Copyright: © 2022 Maddury Jyotsna, this is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Received: August 11, 2022 | Accepted: September 05, 2022 | Published: September 10, 2022

Abstract

Doctors are under stress in their professional life as the life of another person is involved in their diseases management, more so in the present with increasing violence by the patient attendants in India. Over that this COVID pandemic worsen the stress situation to the doctors across the world. The objective of the present study is to study the stress factors among doctors during this COVID-19 era and the influence of these factors using the Machine learning approach. We adapted a machine learning approach to detect this severity of the stress, so that proper advice can be given these doctors before they face serious consequences of the stress. This machine learning algorithm, component of Artificial intelligence, applied in 1069 doctors and found that Anxiety, fear to treat, Afraid to go home (fear of infecting others, especially elders/children), age, and experience to be the significant variable of COVID stress among doctors. 


Keywords: covid-19; doctors; stress; machine learning; artificial intelligence

Introduction

Stress in medical practice mainly because of taking care of other peoples' lives. Peculiar to the COVID era, stress is addition to self and family protection. This stress is continuing even after introduction of efficient vaccine for COVID [1] due to emergency of newer variants (like in the UK-SARS-CoV-2 VUI 202012/01 (Variant Under Investigation, the year 2020, month 12, variant 01), South Africa-variant 501Y.V2, because of an N501Y mutation, many such variants are arising even now) [2,3,4,5]. Stress, whether acute or chronic, is persisting in this 2021 year to the treating doctors. 

As per the WHO health definition, everyone requires both physical and mental health [6]. At Least ten known disorders related to Stress are heart disease, asthma, obesity, diabetes, headache, depression & anxiety, gastrointestinal problems, and Alzheimer's disease [7]. Out of these disorders, heart diseases mount to significant and disabling conditions due to Stress, and all other stress disorders also lead to heart diseases. Already the mortality rates in the medical fraternity due to COVID 19 infection is not an acceptable range [8]; at least we have to take measures to decrease the morbidity due to COVID 19 stress. So, it is crucial and appropriate to detect the doctor's Stress during this pandemic.

Stress can be assessed by subjective means using mental scores or objective means such as speech [9], electroencephalogram (EEG) [10], or Data analysis using machine learning (ML) [11]. The objective of the present study is to study the stress factors among doctors during this COVID-19 era and the influence of these factors using the Machine learning approach. 

Materials and Methods

This a cross-sectional design in which an online questionnaire, constituting of 13 questions that describe the different components of Stress, was collected from doctors during this COVID -19 era. The study was started after obtaining Institutional Ethics Committee approval and registered under CTRI (CTRI no: CTRI/2020/09/027907). In the pandemic's present context, all the questionnaires formatted using Google documents were forwarded to all the participants and a brief introduction to the study. They were requested to click on the link provided and submit the responses only if they are willing to participate in the study. All Post MBBS doctors, irrespective of whether they are treating or not treating the COVID cases and willing to give the questionnaire answers, are included in the study. Doctors who are unwilling to participate in the study were excluded from the study. Responses for the questionnaire are collected, stored electronically, and analyzed. Risk categorization was done into six categories (which is condenced to 3 categories), depending on which department doctors are working and whether they are treating COVID cases are not.

Data Analysis

The survey data were subjected to basic pre-processing, and 1069 participating doctors' responses were considered for further analysis. Initially, graphical analysis (visualization) was performed to develop a preliminary understanding of the doctors' different attributes in a covid-19 era. The attempt was made to receive response from as many as doctors as possible and in the pandemic times over 1000 was considered reasonable to go ahead with the analysis and modeling. Based on the inferences of graphical analysis, further detailed statistical analyses were performed. As a part of Descriptive Analytics on the survey data to study the behavior of the variables and association among themselves, especially with COVID -19 Stress using statistical methods such as histogram, Anova, Chi-Square tests.

Finally, to predict the state of COVID -19 Stress (Extreme, Moderate or No) among the doctors in this covid-19 era, different machine learning algorithms were tried on the survey database with " COVID -19 Stress" used as the target variable, and the remaining all other 17 variables (Age, Gender, Experience, etc.). The five applicable Machine Learning Models (Logistic Regression, Decision Tree, Random Forest, Naïve Bayes, and Adaptive Boosting) were identified to be tried out and select the best among them for predicting COVID -19 Stress. The five model performance parameters (Accuracy, Precision, Recall, F1 Score - weighted average of Precision & Recall, Area Under the Curve- AUC) were used to compare the above machine learning models and select the best among the above for future prediction of COVID -19 Stress.

K-fold (k=10) validation was used before the final validation, which randomly selects 20% of the observations (individuals) 10 times out of the 969 and the models are accordingly validated. Final validation with 100 data points is for getting additional/improved confidence on the model. 10% of the data points (individuals) or Sample size of 100 is more than adequate for this attempt.

The primary objective of developing the machine learning model and subsequent artificial intelligence tool is to predict the level of Stress in a doctor so that adequate precaution in advance can be taken to destress. Therefore, accuracy & precision were considered the essential measures to select the final prediction model. Before developing the five models, 100 (~10%) data points were randomly selected and kept aside for testing the final model, which was selected as the best for prediction. The remaining 969 of the 1069 data points were used for developing the models. K-fold (k=10) cross-validation method was used to validate each of the five models and arrive at the model performance measures.

The statistical analysis was performed on Minitab software, and Machine Learning Models were developed using Python and Orange open-source machine learning software.

Results

Out of 1069 participants, 558 (58.2%) doctors were involved directly in managing COVID patients, and 414 (38.73%) were female doctors. Over 898 (84%) of the doctors experienced either high or medium-level Stress.

During COVID duties, young Males and Female Doctors were involved in COVID Duties. Most of them had Moderate feelings of sleep, Stress, Society recognition, Fear to treat, Afraid to go home, Anxiety, Depressed, Angry, Stress, even faced neighborhood issues also No social recognition leading to low Motivated and were able to convenience their families partially.

Sixteen different categorical attributes/parameters (variables), which cover both professional as well as personal attributes, were analyzed using both the Chi-square statics & p-values (Fig 1) and represented in the chi-square matrix (105 pairs).

Figure 1: Chi Square Test for Association between different Attributes/Parameters

Except for eight pairs (gender mostly), p-values of all other tests were close to 0, showing a significant association between the attributes irrespective of professional or personal nature. Also, "COVID-19 Stress," which is the study's primary responsibility, is associated with all the remaining attributes.

Machine Learning Approach

Machine Learning was used to predict the Stress State. In the survey data, there were 18 variables [2 continuous variables (age, experience), and 16 categorical variables]. The Target variable in the ML approach was the "COVID -19 Stress," which was classified into three classes "Extreme," "Moderate," and "No." The five model comparison measures were presented in Table 1.

Table 1: Comparison of Machine learning models:

ModelAUCAccuracyF1PrecisionRecall
Random Forest89.1 %83.9 %69.8 %76.8 %64.0 %
Logistic Regression88.1%82.9%67.4 %75.6 %60.8 %
Naïve Bayes87. 5 %80.9 %69.3 %65.2 %74.0 %
AdaBoost85.8 %83.0 %68.8 %73.6 %64.6 %
Decision Tree77.4 %80.5 %64.7 %68.5 %61.4 %

The Random Forest ML Model was the best among all the five applicable and tried out models. In a survey and mostly perception/experienced-based ML modeling, nearly 84% and 78percentage accuracy and precision, respectively, can be considered satisfactory from an ML and AI perspective. The other model performance parameters such as F1, Recall, and Area Under the Curve (AUC) values were above fair values. Logistic Regression model prediction parameters were found to be close to the Random Forest model. However, considering the presence of a large number of categorical variables and strong association among themselves, Random Forest Model being a versatile and ensemble algorithm having the capability of minimizing the error, it was decided to use Random Forest as the final Machine Learning Model for predicting COVID-19 Stress and also subsequently for developing an Artificial Intelligence Tool.

An attempt was made to identify the essential variables (Feature Importance) for the discussed two models, the following diagrams explaining the models' variable importance (Fig 2).

Figure 2: Comparison of Feature Importance score of two models

In the Random Forest graph, the two continuous variables, age & experience, played a significant role in the earlier analysis. Also, the Anxiety, Fear to Treat, Angry and Frustration, and Afraid to Go Home were critical to inducing Stress in the doctors. 

The random 100 responses held back for final model testing were used as new responses. The Random Forest model was used to predict the COVID -19 Stress levels (Extreme / Moderate / No) and compared with the actual responses. The model predictions on the final test data set are presented below in a confusion matrix form (Table 2).

Table 2: Stress levels data sets represented in Confusion Matrix form

Confusion MatrixPredictTotal
ExtremeModerateNo
ActualExtreme497056
Moderate933042
No0202
Total58420100

The overall model accuracy was above 82%. The model was able to discriminate to no small extent correctly the extreme COVID-19 Stress conditions 49 out of 58 (~ 85%), and the remaining were classified as moderate Stress (extreme as moderate). No extreme stress was wrongly classified as "No" Stress. Out of actual 42 Moderate Stress cases, 33 (~ 79%) were corrected classified, seven were classified as extreme, and only two were classified as "No." If Extreme & Moderate can be potential candidates for Destressing intervention, the model is 98 percentage accurate in identifying a doctor's stress condition with associated 18 odd personal and professional parameters/attributes.

Discussion

Experience and research on epidemics in the past have highlighted that epidemics can have a psychological effect not only on the general population but also the healthcare workers (12). Doctors are the prime personnel involved in inpatient treatment and care during an epidemic. The present study supports the fact that doctors experience various degrees of Stress, anxiety, depression, and insomnia due to the ongoing COVID-19 pandemic. Different levels of Stress were reported in 94.4%, anxiety in 75.2%, depression in 68.3%, and insomnia in 66.3%. Lai et al. (13) study from 34 hospitals in China showed that 3/4th of the HCW was in distress; nearly 50% reported symptoms of depression, 2/5th of anxiety, and 1/3rd complained of insomnia.

From six research articles review (1) on COVID (1 India and 5 China) 

showed the mean age of the study population to be 26-40years, supported by our study in the age group. 30-40 years constituted nearly 43%, and 20-30 years constituted 31%. Our study showed that the doctors treating COVID patients were relatively younger than the non-COVID group doctors. The young doctors were equally at Stress than the older in our study, as shown previously by Liang et al. (14). Men were prevalent in our study, unlike the previous studies. Male gender was associated with moderate and severe Stress, which was statistically significant (p=0.014) compared to the female gender shown by Lai J (13). 

In the previous studies, colleagues' safety and the lack of treatment for COVID19 were perceived as universal factors of Stress in all medical staff. The most important triggers for Stress were personal safety, families concern, and patient death. In our study, afraid to treat COVID patients (87.4%), worry about parents or kids (71.1%), afraid to go home after finishing duty (84.8%), and working at very high-risk areas were the triggers for Stress. Previous studies have reported a higher risk of developing psychological impact in healthcare workers working in the emergency units, intensive care, and infectious disease wards. Liang Y et al. (14) showed no difference in the self-related anxiety and depression scores between COVID-associated departments and other departments. The present study highlights that Stress was more in doctors directly involved in treating COVID patients.  All parameters of Stress, anxiety, frustration, depression, and insomnia were more in both the COVID treating and non-treating doctors, even though it was more in treating doctors. The cause for Stress in non-treating doctors may be due to uncertainty of the COVID status of the patients whom they are attending.

According to Mohindra et al. (15), most doctors (85%) were afraid to go home after duty for fear of infecting their family members. Only 50% could convince their family fears of them working in an unsafe environment but too partially. The majority (95%) favored a period for the doctors.

In the present study, unsupervised machine learning algorithms were incorporated to explain the COVID stress among doctors. Random Forest ML Model was the best with 89% AUC, 84 percentage accuracy, precision, and recall about 70% in its capability to predict Stress. Logistic regression performance is close to the Random Forest Model. According to the Random Forest model, the critical variables contributing to COVID stress were anxiety, Fear to Treat, and  Afraid to Go Home (fear of infecting others, especially elders/children). Age & experience also played a significant role in its contribution to COVID stress.

Reshma et al. (16), in their study, had used the Tensile Strength technique to detect the Stress due to Twitter. TensiStrength is a system to detect the strength of Stress and relaxation expressed in social media text messages. It uses a lexical approach and rules to detect stress or relaxation, direct and indirect expressions (9). Ravindra Ahuja et al. (17) used four classification algorithms (Linear Regression, Naïve Bayes, Random Forest, and SVM) to detect the examination stress in students, and sensitivity, specificity, and accuracy are used as a performance parameter (10).  S.M. Chaware (10), in their study, mentioned that Decision Tree, Naïve Bayes, Random Forest, etc. which gives a lower accuracy of 70% on average. So, for their study to assess the Stress due to Facebook posts, they used Conventional Neural Network (CNN) to extract Facebook posts, Transductive Support Vector Machine (TSVM) to classify posts and K-Nearest Neighbors (KNN) to recommend nearby hospitals. In their study, Subhani et al. (18) found a high accuracy percentage with classification algorithms.  They quantified Stress by recording EEG (Electroencephalogram) and analysis done by ML. The proposed ML framework involved EEG feature extraction, feature selection (receiver operating characteristic (ROC) curve, t-test, and the Bhattacharyya distance), classification (logistic regression (LR), support vector machine (SVM), and naïve Bayes (NB) classifiers) and 10-fold cross-validation (CV). The results showed that the proposed framework produced 94.6 percentage accuracy for two-level identification of Stress and 83.4 percentage accuracy for multiple level identification.

Other than unsupervised ML, deep learning and artificial intelligence have also been applied to analyze the Stress by different researchers (9, 19-21).  For detecting the mental Stress, these authors used the accuracies for three-class (amusement vs. baseline vs.  Stress), binary (Stress vs. non-stress) classifications were evaluated, compared by using machine learning techniques like K-Nearest Neighbour, Linear Discriminant Analysis, Random Forest, Decision Tree, AdaBoost, and Kernel Support Vector Machine. Besides, a simple feed-forward deep learning artificial neural network is introduced for these three-class and binary classifications. During the study, using machine learning techniques, the accuracy of up to 81.65% and 93.20% is achieved for three-class and binary classification problems. Using deep learning, the achieved accuracy is up to 84.32% and 95.21%, respectively.

Recent publications highlighted not only the above-mentioned psychological stresses even changes in the eating habits and the affect confinement related obesity (22). Especially in women with pregnancy “Double burden of pregnancy” due to vertical transmission of SARS-CoV-2 was discussed (23).

This research paper may help design an app that addresses the challenge by developing and deploying machine learning-enabled data derived proactively by the doctors' present stressful condition. Infuses collaborative intelligence derived from de-identified yet relevant demographical, physiological, lifestyle, and behavioral datasets and preventive healthcare insights to counter the long-term adverse effects of the Stress on doctors' health. Presently, the AI tool development is being attempted by developing a simple front end to capture the doctor's answers. In the back end, the already developed ML models (RF and LR) will be used to predict the doctor's stress levels and suggest the necessary mitigation protocols. As a more user-friendly and adaptable AI tool, an App (mobile) is being explored to provide versatility.

Conclusion

Machine learning has identified Anxiety, Fear to treat, Afraid to go home (fear of infecting others, especially elders/children), age, and experience to be the significant variable of COVID stress among doctors. The development of an ML Model to explain the COVID stress among the doctors. Its capability to predict the Stress to a reasonable level of accuracy helped develop an Artificial Intelligence Tool to predict Stress state in a doctor starting to treat or treating a patient (either COVID or general) and take some preventive measures to mitigate the Stress. This intelligent system will benefit (doctors, patients, healthcare facilities, and society in general).

Source of Funding

None

Conflicts of Interest

The authors declare no conflict of interest

References