{"id":135,"date":"2023-06-13T15:54:21","date_gmt":"2023-06-13T13:54:21","guid":{"rendered":"https:\/\/project.inria.fr\/stressid\/?page_id=135"},"modified":"2023-06-29T11:33:24","modified_gmt":"2023-06-29T09:33:24","slug":"baseline","status":"publish","type":"page","link":"https:\/\/project.inria.fr\/stressid\/baseline\/","title":{"rendered":"BASELINE"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">The code for the baselines is available at <a href=\"https:\/\/github.com\/robustml-eurecom\/stressID\">https:\/\/github.com\/robustml-eurecom\/stressID<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Unimodal and multimodal baselines combine features extracted from video, audio, and physiological inputs. The models are trained to perform binary classification, i.e. discriminate between stressed and not stressed, as well as 3-class classification. These discrete labels are extracted from the self-assessments of the subjects.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For the <em>binary classification, not stressed vs. stressed is predicted.<\/em><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>&#8220;not stressed&#8221; (0) is given for stress &lt; 5.<\/li><li>&#8220;stressed&#8221; (1) is given for stress \u2267 5.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">For the <em>3-class <\/em><em>classification<\/em>, relaxed vs. neutral vs. stressed is predicted.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>&#8220;relaxed&#8221; (0) is given when valence &gt; 5, arousal &lt; 5 and relax &gt; 5.<\/li><li>&#8220;stressed&#8221; (2) is given when valence &lt; 5, arousal &gt; 5 and stress &gt; 5.<\/li><li>\u201cstressed&#8221; (1) otherwise.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Our analysis confirm that<\/strong> <strong>the labels and the acquired data are coherent and meaningful<\/strong>, stress can be predicted in binary and 3-class classification.<\/p>\n\n\n\n<figure class=\"wp-block-table is-style-stripes\"><table class=\"has-white-background-color has-background has-fixed-layout\"><tbody><tr><td>Baseline (#tasks)<\/td><td>Binary stress<\/td><td>3-class stress<\/td><\/tr><tr><td>Physiological (711)<\/td><td>0.75\u00b10.04<\/td><td>0.55\u00b10.04<\/td><\/tr><tr><td>Video (587)<\/td><td>0.62\u00b10.07<\/td><td>0.48\u00b10.01<\/td><\/tr><tr><td>Audio-HC (385)<\/td><td>0.67\u00b10.04<\/td><td>0.53\u00b10.06<\/td><\/tr><tr><td>Audio-DNN (385)<\/td><td>0.72\u00b10.07<\/td><td>0.53\u00b10.09<\/td><\/tr><tr><td>Multimodal (385)<\/td><td>0.64\u00b10.07<\/td><td>0.42\u00b10.03<\/td><\/tr><\/tbody><\/table><figcaption>Results for unimodal and multimodal frameworks for binary and 3-class stress identification.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"has-medium-font-size wp-block-heading\">FEATURES<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><strong><em>Physiological Signals<\/em>.<\/strong><\/p>\n\n\n\n<div class=\"wp-block-media-text alignwide is-stacked-on-mobile\" style=\"grid-template-columns:45% auto\"><figure class=\"wp-block-media-text__media\"><img loading=\"lazy\" decoding=\"async\" width=\"858\" height=\"1024\" src=\"https:\/\/project.inria.fr\/stressid\/files\/2023\/06\/physio_web-2-858x1024.png\" alt=\"\" class=\"wp-image-331 size-full\" srcset=\"https:\/\/project.inria.fr\/stressid\/files\/2023\/06\/physio_web-2-858x1024.png 858w, https:\/\/project.inria.fr\/stressid\/files\/2023\/06\/physio_web-2-251x300.png 251w, https:\/\/project.inria.fr\/stressid\/files\/2023\/06\/physio_web-2-768x916.png 768w, https:\/\/project.inria.fr\/stressid\/files\/2023\/06\/physio_web-2-1287x1536.png 1287w, https:\/\/project.inria.fr\/stressid\/files\/2023\/06\/physio_web-2-1716x2048.png 1716w, https:\/\/project.inria.fr\/stressid\/files\/2023\/06\/physio_web-2-126x150.png 126w\" sizes=\"auto, (max-width: 858px) 100vw, 858px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p class=\"wp-block-paragraph\"><strong>For ECG<\/strong><strong>, 35 features <\/strong><strong>are extracted<\/strong><strong>.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">These include HRV features in the time domain including the number of R to R intervals (RR) per minute, the standard deviation of all NN intervals (SDNN), the percentage of successive RR intervals that differ by more than 20ms and 50s (pNN20 and pNN50), or the root mean square of successive RR interval differences (RMSSD), as well as frequency domain, and non-linear HRV measures.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>For EDA, 23 features <\/strong><strong>are extracted<\/strong><strong>.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We have extracted statistical features of the Skin Conductance Level (SCL) and Skin Conductance Response (SCR) components of the EDA, including the slope and dynamic range of the SCL, along with time domain features including the number of SCR peaks per minute, the average amplitude of the peaks, and average duration of SCR responses.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>For respiration <\/strong><strong>40 <\/strong><strong>features are extracted.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We have extracted Respiration Rate Variability (RRV) features in time and frequency domain.<\/p>\n<\/div><\/div>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>The physiological features representation is done with the <a href=\"https:\/\/neuropsychology.github.io\/NeuroKit\/index.html\">neurokit2<\/a> package.<\/p><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\"><em><strong>Video Data.<\/strong><\/em><\/p>\n\n\n\n<div class=\"wp-block-media-text alignwide is-stacked-on-mobile\" style=\"grid-template-columns:35% auto\"><figure class=\"wp-block-media-text__media\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"680\" src=\"https:\/\/project.inria.fr\/stressid\/files\/2023\/06\/Schermata-2023-06-19-alle-16.18.29-1024x680.png\" alt=\"\" class=\"wp-image-340 size-full\" srcset=\"https:\/\/project.inria.fr\/stressid\/files\/2023\/06\/Schermata-2023-06-19-alle-16.18.29-1024x680.png 1024w, https:\/\/project.inria.fr\/stressid\/files\/2023\/06\/Schermata-2023-06-19-alle-16.18.29-300x199.png 300w, https:\/\/project.inria.fr\/stressid\/files\/2023\/06\/Schermata-2023-06-19-alle-16.18.29-768x510.png 768w, https:\/\/project.inria.fr\/stressid\/files\/2023\/06\/Schermata-2023-06-19-alle-16.18.29-150x100.png 150w, https:\/\/project.inria.fr\/stressid\/files\/2023\/06\/Schermata-2023-06-19-alle-16.18.29.png 1380w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p class=\"wp-block-paragraph\"><strong>For video, 84 features are extracted.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Mean and standard deviation of action units (AUs) and eye gaze <\/strong>are the<strong> video features. <\/strong>AUs and eye gaze are extracted with OpenFace from each video frame.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Extracted AUs: 1, 2, 4, 5, 6, 7, 9, 10, 12, 14, 15, 17, 20, 23, 25, 26, 28, and 45.<\/p>\n<\/div><\/div>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Two AUs extracted with OpenFace on a sample frame.<\/p><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\"><strong><em>Audio Data.<\/em><\/strong><\/p>\n\n\n\n<div class=\"wp-block-media-text alignwide is-stacked-on-mobile\" style=\"grid-template-columns:33% auto\"><figure class=\"wp-block-media-text__media\"><img loading=\"lazy\" decoding=\"async\" width=\"427\" height=\"448\" src=\"https:\/\/project.inria.fr\/stressid\/files\/2023\/06\/audio_web-1.png\" alt=\"\" class=\"wp-image-345 size-full\" srcset=\"https:\/\/project.inria.fr\/stressid\/files\/2023\/06\/audio_web-1.png 427w, https:\/\/project.inria.fr\/stressid\/files\/2023\/06\/audio_web-1-286x300.png 286w, https:\/\/project.inria.fr\/stressid\/files\/2023\/06\/audio_web-1-143x150.png 143w\" sizes=\"auto, (max-width: 427px) 100vw, 427px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p class=\"wp-block-paragraph\"><strong>For audio, 140 and 513 features are extracted.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Two approaches for speech signals analysis are proposed. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">1) <strong>Audio-HC<\/strong>, employing hand-crafted features. Among others, Mel Frequency Cepstral Coefficients (MFCCs) and their first and second derivatives are extracted, together with spectral centroid, bandwidth, contrast, flatness, and rolloff. The mean and standard deviation over time for all features is extracted, resulting in a feature vector of <strong>140 components<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">2) <strong>DNN<\/strong> feature extraction, employing pre-trained Wav2Vec (W2V) model. Features are extracted every 20 ms and are averaged over time to obtain a single <strong>513-component embedding<\/strong> per utterance.<\/p>\n<\/div><\/div>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>The MFCCs representation of Audio Features.<\/p><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\"><strong><em>Multimodal data<\/em><\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The multimodal baseline is done by <strong>early<\/strong><strong> fusion<\/strong>; the 3 kinds of features (Audio-HC for the audio data) are concatenated at the input.<\/p>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\"><strong>CLASSIFICATION<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For Physiological signals, best results are achieved with Recursive Feature Elimination (RFE) algorithm combined to a L1-penalised logistic regression.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For Video data, best results are achieved combining L1 feature selection with a Random Forest classifier with a maximal tree depth of 5.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For Audio data, best results are achieved with a linear classification layer optimised with Adam and cross-entropy loss.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For the baseline, best results are achieved with Multi Layer Perceptron (MLP) with ReLU activation functions optimised with Adam and cross-entropy loss.<\/p>\n\n\n\n<h2 class=\"has-medium-font-size wp-block-heading\">EXPERIMENTS<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In all the experiments, we generate 8 random splits, using 90% of the subjects for training, and 10% for testing for each split. The results are averaged over the 8 repetitions. To ensure robustness to potential imbalance resulting of the train-test splits, the results are assessed using the weighted f1-score on the test data.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The code for the baselines is available at https:\/\/github.com\/robustml-eurecom\/stressID. Unimodal and multimodal baselines combine features extracted from video, audio, and physiological inputs. The models are trained to perform binary classification, i.e. discriminate between stressed and not stressed, as well as 3-class classification. These discrete labels are extracted from the self-assessments\u2026<\/p>\n<p> <a class=\"continue-reading-link\" href=\"https:\/\/project.inria.fr\/stressid\/baseline\/\"><span>Continue reading<\/span><i class=\"crycon-right-dir\"><\/i><\/a> <\/p>\n","protected":false},"author":2341,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":"","_members_access_role":[],"_members_access_error":""},"class_list":["post-135","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/project.inria.fr\/stressid\/wp-json\/wp\/v2\/pages\/135","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/project.inria.fr\/stressid\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/project.inria.fr\/stressid\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/project.inria.fr\/stressid\/wp-json\/wp\/v2\/users\/2341"}],"replies":[{"embeddable":true,"href":"https:\/\/project.inria.fr\/stressid\/wp-json\/wp\/v2\/comments?post=135"}],"version-history":[{"count":42,"href":"https:\/\/project.inria.fr\/stressid\/wp-json\/wp\/v2\/pages\/135\/revisions"}],"predecessor-version":[{"id":396,"href":"https:\/\/project.inria.fr\/stressid\/wp-json\/wp\/v2\/pages\/135\/revisions\/396"}],"wp:attachment":[{"href":"https:\/\/project.inria.fr\/stressid\/wp-json\/wp\/v2\/media?parent=135"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}