-

Journal of Biomedical Informatics 137 of the AAAI Conference on Artificial Intelligence (2023) 104267. URL: http://dx.doi.org/10.1016/j.jbi. 36 (2022) 12226-12234. 2022.104267. doi:10.1016/j.jbi.2022.104267.

1613-0073

10.1016/j.aiopen.2023.08.012

Toward a Reinforcement-Learning-Based System for Adjusting Medication to Minimize Speech Disfluency

Pavlos Constas

Vikram Rawal

vikram.rawal@mail.utoronto.ca

Matthew Honorio Oliveira

matthewhonorio.oliveira@mail.utoronto.ca

Andreas Constas

Aditya Khan

Kaison Cheung

siukai.cheung@mail.utoronto.ca

Najma Sultani

najma.sultani@mail.utoronto.ca

Carrie Chen

Micol Altomare

Michael Akzam

Jiacheng Chen

Vhea He

vhea.he@mail.utoronto.ca

Lauren Altomare

Heraa Muqri

heraa.muqri@mail.utoronto.ca

Asad Khan

Nimit Amikumar Bhanshali

nimit.bhanshali@mail.utoronto.ca

Youssef Rachad

youssef.rachad@mail.utoronto.ca

Michael Guerzhoy

guerzhoy@cs.toronto.edu

2016

1 517 520

We propose a reinforcement learning (RL)-based system that would automatically prescribe a hypothetical patient medication that may help the patient with their mental health-related speech disfluency, and adjust the medication and the dosages in response to zero-cost frequent measurement of the fluency of the patient. We demonstrate the components of the system: a module that detects and evaluates speech disfluency on a large dataset we built, and an RL algorithm that automatically finds good combinations of medications. To support the two modules, we collect data on the efect of psychiatric medications for speech disfluency from the literature, and build a plausible patient simulation system. We demonstrate that the RL system is, under some circumstances, able to converge to a good medication regime. We collect and label a dataset of people with possible speech disfluency and demonstrate our methods using that dataset. Our work is a proof of concept: we show that there is promise in the idea of using automatic data collection to address speech disfluency.

Disfluency disfluency ASR reinforcement learning

CEUR ceur-ws.org

1. Introduction

Speech disfluency is a common medical issue. It can be caused by, among other factors, conditions such as depression, anxiety, and insomnia (see Section 6). Speech disfluency includes stuttering as well as issues like pauses that are too long, repetitions, “false starts,” and “repairs” of previous utterances [ 1 ]. ment Learning-based system for helping physicians adMachine Learning for Cognitive and Mental Health Workshop (ML4CMH), AAAI 2024, Vancouver, BC, Canada ∗Corresponding author. †These authors contributed equally. nEvelop-O system for detecting how disfluent the person’s speech is, and a subsystem for minimizing the speech disfluency by ifnding a combination of medications that works using

We train our disfluency detection system to predict the

labels we assigned to clips in the dataset we collected.

To demonstrate the feasibility of the RL subsystem, we construct a patient simulation . We measure the precision with which our speech disfluency detection subsystem can measure disfluency, and obtain from the literature the plausible timespans and onset times for the efects of medications. We then run a patient simulation with plau

2.1. Methods

can find strategies to minimize the speech disfluency of The rating process consisted of two stages, each stage our plausibly-simulated patients. lasting approximately one week. In each stage, raters ran

To evaluate our subsystems, we collect a dataset of domly received several channels and were asked to rate public videos of people with possible disfluencies and their audio samples from the dataset. This was arranged label the dataset using a scalable strategy that allows us so that each audio sample in the dataset would have up to to obtain precise and standardized ratings by having each 3 raters. Each rater received diferent channels in the difvideo be rated by multiple raters. ferent stages. To ensure independent evaluation, raters

The rest of the paper is organized as follows: we ex- were advised against sharing their assessments between plain our data collection and labelling process. We then each other. describe our disfluency rating process, and report results Data from the initial stage was not used in our experion that subsystem. We then describe our patient sim- ments. The round was used for acquainting raters with ulation process and report results of the RL system’s the variation of disfluency observed in the dataset. At the performance on the simulated patients. For the patient end of this phase, each rater was privately given summary simulation to be plausible, we connect the patient simu- statistics regarding their ratings in the round, including lation to how precisely speech can be rated for fluency the mean and standard deviation of their ratings across by our system and to the plausible efects of medications. audio samples, as well as a spreadsheet containing a mea(Note that if the measurement of speech fluency is too sure of their bias for each audio sample (where bias is the noisy and/or the medications’ efect is too subtle or the distance of their rating from the mean rating across all onset is too long, learning would likely not be possible.) raters for that audio sample). This process was aimed at Finally, we summarize our literature search results for allowing raters to recognize possible inconsistencies and medications that could plausibly afect speech fluency. biases in their “internal model” of disfluency. The ratings from the second stage were the finalized ratings that would be used for fine-tuning our disfluency-detection 2. Data Collection system. The ratings were standardized, as described below.

The objective of our data collection process is to obtain 3. Rater Performance Analysis a series of audio samples from individuals with possible mental health-related speech disfluency, across a pe- In this Subsection, we analyze the rater data, and show riod of time. We collected 19 channels from searching that raters are somewhat consistent in their ratings of on YouTube for mental health-related vlog channels by the same clips. This indicates that we can use the stanYouTubers, as well as the D-vlog, a dataset of channels dardized ratings (see below) as targets when estimating of YouTubers with depression [ 2 ]. the fluency of speakers in audio clips.

For each YouTuber represented in the videos, we scraped their channel for other videos which contained significant stretches of unedited spoken audio. Terms 3.1. Data Model used to query for videos from each channel were subsets To assess the performance of the raters, we conducted a of the following keywords {“depression”, “story”, “vlog”, regression analysis. The model we utilized was “depression vlog”, “anxiety”, “tested”, “figure”, “rambling”, “issues”, “anxiety vlog”, “webcam”}. For each video, only = + + the audio was extracted. In total, we obtained 195 audio clips. There are 9 to 11 audio clips for each channel, with an average of 10 audio clips per channel.

2.2. Rating System

We devised a rating system to assess the severity of the disfluency in the video data. The authors acted as raters for the videos. The 19 YouTuber channels in the dataset were examined for disfluencies in them. Raters were tasked with assessing the disfluency severity in each 3.2. Analysis video on a scale of 1 to 7, which was adapted from the Stuttering Severity Instrument-Third Edition (SSI-3) [3]. Below, we perform an exploratory analysis of the non(But note that “disfluency” is a more general term than standardized ratings. “stuttering.”) where is the rating given to audio clip by rater , is the rater bias, is the true average disfluency of the channel, and is the random error (see [4] for a similar model). This model is estimated using least-squares regression.

Using this model, the performance of the raters was assessed by randomly splitting the dataset into a 70% training set, and a 30% validation set.

We compute the Root-Mean-Square Error (RMSE) on the training set and the validation set when predicting the disfluency scores using the data model. The RMSE on the training set was 0.8/6.0 (on a scale of 0 to 6 rather than 1 to 7 as in the input) and the RMSE on the validation set was 0.9/6.0. The validation RMSE we would obtain if the data model predicted the average rating every time would be 1.4/6.0. The 2 value of the model on the training set was 0.44, indicating that the rater coeficients and the clip coeficients have explanatory power.

For each clip on the validation set, we compute the standard deviation of the ratings assigned by diferent raters to the same clip. The median standard deviation is 0.6/6.0. This suggests that the median disagreement between raters was just over half a rating point on a given clip. The standard deviations are given on a scale of 6.0 since the scores range from 1 to 7.

3.3. Standardized Ratings

Diferent raters use diferent standards for fluency. We therefore obtained standardized ratings. We accomplish this by subtracting the rater bias (see Section 3.1) for rater for each rating by rater . Then, when we compute the average standardized rating for every clip, we average ratings that are actually on the same scale.

4. Disfluency Pipeline

In this section, we describe our subsystem for assessing the disfluency of a person in the input clip. We use Whisper 1[5] to transcribe the audio. We then use an Auto-Correlational Neural Network-based tagger [ 6 ] to tag the Whisper transcript. Finally, we fine-tune GPT2 [7] on the tagged transcript as input in order to predict the disfluency scores we assigned.

4.1. Transcribing Audio with Whisper

The YouTube videos are transcribed using the Automated Speech Recognition (ASR) model Whisper. Tokens such as “uh”, “um”, etc. were included in the transcript.

4.2. Disfluency Tagging

The parsed text transcripts were subsequently fed into a Disfluency Tagging Auto-Correlational Neural Network (DT-ACNN) [ 6 ] – a system designed to categorize each word within the text transcript as either “fluent” or “dislfuent”. In [ 6 ], the Switchboard corpus of conversational speech [8] dataset was used. For the task of predicting a per-word “fluent” or “disfluent” label, the authors report a recall of 90.0%, a precision of 82.8%, and an F1 score

1https://github.com/openai/whisper

0.10 E S M 0.05 0.00 0 10 20 30 40

Epoch of 86.2% on the dataset. The reported results indicate the efectiveness of the DT-ACNN model in disfluency detection.

4.3. Fine-tuning GPT-2

We fine-tune GPT-2 to predict the average standardized disfluency score by the rates who rated the clip from the tagged Whispter transcripts, as well as from the wordsper-minute (WPM) measure.

For the regression task, we train with embedding size 768, using the Mean Squared Error (MSE) loss, and the AdamW optimizer with parameters 1 = 0.9, 2 = 0.999, = 10 −9. The token limit of GPT2 is 1024. Inputs that exceed this limit were truncated. The following hyperparameters were used during training: a learning rate of 4.5 × 10−4, a batch size of 4 (dictated by computational limitations), with weight decay parameter 0.01, for 50 epochs.

P-tuning [9] was used. In this approach, a soft prompt with a set of 100 tokens is introduced at the beginning of the input. These tokens aid in guiding the model during classification. The model uses a Prompt Encoder to optimize the prompt, with an encoding layer comprised of 128 units. The model performance was evaluated by randomly splitting the dataset into an 80% training set, and a 20% validation set.

4.4. Results

The learning curves for the disfluency prediction task are in Fig. 1. We observe that our system is currently able to predict the validation rating to within about 0.15/6 of the actual rating on average (the standard error is obtained by taking the square root of the MSE) for YouTubers not in the training set.

5. Patient Simulation and Reinforcement Learning 5.1. Overview

In this Section, we explore the plausibility of using RL together with signals from our speech disfluency detector to find an efective medication regimen for people with speech disfluency.

We first describe how we simulate people with speech disfluency in a plausible way. We then demonstrate that our RL algorithm could find an efective medication regimen in a plausible scenario.

5.2. Prior Work: RL for Medication Adjustment

Reinforcement Learning (RL) for medication adjustment has been proposed in several contexts. Oh et al. evaluate an ofline RL algorithm learned on South Korea’s national Health insurance system to prescribe diabetes medication [10]. Javad et al. similarly propose an ofline RL algorithm [11]. They measure the performance of the system based on the concordance to the prescription actually made, as well as by analyzing outcomes where the system’s recommendation and the actual recommendation in the data disagreed. Sun et al. explored anapproach for Type 2 Diabetes treatment [12]. They merged a knowledge-driven model, informed by clinical guidelines, with a data-driven deep reinforcement learning model. The knowledge-driven model uses data from the Singapore Health Services Diabetes Registry which contains over 189,520 patients and their Type-2 Diabetes medication prescription to narrow down a list of viable medications to which the data-driven model applies a Deep Q Network (DQN) which also learns from the historical patient data and is used to rank the candidate medications selected by the knowledge-driven model based on expected long-term rewards.

Nemati et al. developed a clinician-in-the-loop framework for heparin dosing, leveraging data from the MIMICII intensive care unit database [13]. This study engaged an interactive agent in simulated dosing trials, learning from the outcomes to refine decision-making processes. Similarly, Anzabi Zadeh et al. utilized deep reinforcement learning in the context of warfarin dosing for patients with blood clotting issues with an emphasis on individualized dosing due to warfarin’s narrow therapeutic range [14]. In this method, they frame the problem as a Markov decision process (MDP) and employ an agent within a Pharmocokinetic/Pharmacodynamic (PK/PD) model to simulate dose-responses of virtual patients in which the agent learns the best dose-duration pair through experience replay. We model a medication administration environment, aiming to determine the most efective medication regime for people experiencing depression, anxiety, insomnia, and resulting speech fluency issues. The person’s health state evolves based on Hidden Markov Models (HMMs). Each health issue (depression, anxiety, insomnia) has its unique HMM, governing how the patient’s state progresses. The patient’s observed speech fluency is also influenced by these health states.

The Medication object represents diferent types of medications, each with varying efects on the aforementioned health issues. These efects include beneficial impacts on the conditions and potential side efects. Medications have properties like dosage, half_life, and time_to_efect, which dictate how they function over time.

We simulate people with disfluency by evolving the HMM state. The patient model has the following attributes: • Depression, Anxiety, Insomnia Scores: These attributes represent the initial underlying conditions of the patient. Represented as an integer between 1 and 5. A higher number denotes a more severe state. • Depression, Anxiety, Insomnia Hidden Markov Models: Models that represent the behaviour of how the severity of the patient’s depression, anxiety, and insomnia change over time based on their initial states and also through interaction with medicine. These directly impact observed speech fluency context. • Speech Fluency Score: Indicates the patient’s natural ability to speak fluently, modelled as a continuous value between 0 and 1. • Medication Accumulation: A list that keeps track of all medications that are currently in the patient’s system.

Alongside this, we also model an individual medication with the following attributes: • Name: The name of the medication. • Depression, Anxiety, Insomnia Efects : Captures the medication’s average efect and variability on each condition given the standard dosage. • Dosage: The amount of medication administered relative to the standard dose. (e.g. Dosage = 1.5 means 1.5x the standard dose). This attribute scales the efects of the medication on the patient. • Time to Efect : The amount of days it takes for the medication to start showing efects. • Half-Life: The amount of days it takes for the medication dosage in the patient’s system to reduce by half.

5.4. Hidden Markov Model (HMM)

A Hidden Markov Model (HMM) is a statistical model that represents sequences of observable data as well as hidden states. The sequences of observable data are generated based on hidden states which cannot be directly observed. Here, the “observable data” is the patient’s speech fluency score while the “hidden states” are the underlying depression, anxiety, and insomnia conditions that afect the severity of the disfluency. We base this model of of the fact that a patient’s underlying level of depression [15], insomnia [16], and anxiety [17] have an impact on their speech fluency.

Each health condition — depression, anxiety, and insomnia — has its associated HMM. The key components of these HMMs are: • The initial probability distribution over the initial state of the condition. Initialized as a uniform distribution, indicating that any severity level is equally likely at the start. • The transition matrix, which defines the probability of transitioning from one state (severity level) to another in consecutive time steps. For instance, if a patient is currently at a severity level of 3 for depression, the transition matrix, the transition matrix will dictate the probability of them improving to level 2, worsening to level 4, or remaining at level 3 in the next step. • We use a Gaussian Hidden Markov Model as the observable context is assumed to be generated from a Gaussian distribution. The means and covariances define these distributions for each hidden state. • Each state has a mean context emission, set to be the same int the depression, anxiety, and insomnia states.

See Fig. 2 for a diagram. 5.5. RL Environment

At each step, an agent can choose to administer a specific medication from the available list. The environment then evolves based on the medication’s efects and the underlying psychiatric state of the patient model by the use of a transition matrix that map the current psychiatric state to a new state based on a probability distribution that models the dynamic and evolving nature of underlying psychiatric states [18]. The agent receives a reward based on the patient’s measured fluency.

We use the LinUCB [19] algorithm to learn the optimal medication strategy. The goal is to maximize the patient’s speech fluency.

The efects of the medication on the patient model is implemented by applying the efects of the medication on each condition on each condition’s transition matrix. 5 4 3 2 1 output output output output output

5.6. Medication Selection Algorithm

We use the increase in speech fluency as our reward. Speech fluency is modelled as a linear function: = 0.1 ⋅ + 0.2 ⋅ + 0.3 ⋅ + 0.4 ⋅ where is current speech fluency, , , are the patient’s current depression, anxiety, and insomnia scores respectively, labelled on a 5 point scale, where 1 represents no symptoms and 5 represents the most severe symptoms. is the patient’s baseline fluency.

The implementation of the LinUCB algorithm observes the current state of the patient, then for each medication that is part of the environment, it estimates the reward using a linear approximation. An upper confidence bound is calculated for the estimated reward in which the medication with the highest upper confidence bound is chosen according to the equation: = arg max ( ( ) + √ () −1 () )

where () is the feature vector for action at time , is the parameter vector for action which we want to estimate, and is the hyperparameter controlling the exploration/exploitation trade-of [ 20]. In our implementation of the LinUCB algorithm, we chose the value of = 10.0 .

5.7. Results

In our simulations, we run the RL algorithm and keep track of disfluency over time. We inject noise into the algorithm’s simulated measurements of disfluency to simulate the fact that our disfluency detection system does not measure disfluency perfectly. In the experiments reported here, we inject a minimal amount of noise, cor- 0.475 0 25 50 75 100 125 150 175 200 responding to the high precision with which we can Days measure disfluency. Figure 3

We define the success of a trial as an improvement Convergence to higher speech fluency. of over 0.5 in speech fluency. We define failure as a deterioration of over 0.5 in speech fluency. Here, ≈ 0.1 Speech Fluency over Time is the standard deviation of fluency in the dataset. 0.77 Speech Fluency

Figures 3 and 4 show examples of a successful and an t0.76 unsuccessful run, respectively. Across 500 patient simu- ffcyeE0.75 lation runs, we found a high potential for reinforcement cn learning to correctly apply medication efects to reduce lcFehu0.74 speech disfluency, with 52% of runs showing success and eepS0.73 9% of runs demonstrating failure. ilzed0.72

The average fluency across these runs was 0.66/1.00 roaNm0.71 with a standard deviation of 0.1. The success rate of the simulations was 52%, with a failure rate of approximately 0.70 16%. Success and failure are defined as runs terminating 0 25 50 75 D1a0y0s 125 150 175 200 greater than 0.5 and lower than −.5 from the initial lfuency level, respectively. Figure 4

These results support the possibility of the use of rein- Lack of convergence to higher speech fluency. forcement learning to improve speech fluency under the studied conditions. Our preliminary results indicate that if speech disfluency can be measured to within 10% of the true score and the medications have plausible properties 6. Medication Literature Review (similar to the ones seen in Section 6), then reinforcement learning is a possible method to dose medications that A systematic literature review was conducted to deterpessimize speech disfluency. However, the variability mine common medications used to treat major depressive in outcomes and the presence of failed simulations that disorder (“depression”) and other mental illnesses that prompted theoretical patient deterioration indicate that afect speech fluency. Table 1 indicates 23 medications further research is needed in improving the model’s accu- whose onset time and response rates were used to inform racy and understanding factors contributing to failures, the reinforcement learning simulation. that will be important for applying these findings in a clinical setting.

7. Ethical Considerations

In this paper, we outline and evaluate a proposal to adjust medications automatically in order to improve the speech fluency of a simulated patient. Although administering medications automatically is sometimes done (e.g., with Insulin pumps), this can only be done after thorough clinical trials and with informed consent from the patient. Patients must be thoroughly informed about the nature of the automated system, its potential risks and benefits, and their rights in the decision-making progress.

This extends beyond initial consent, and includes ongo[15] P. Fossati, L. Bastard Guillaume, A.-M. Ergis, J.- (2012) 572–579.

F. Allilaire, Qualitative analysis of verbal flu- [28] S. L. Cincotta, J. S. Rodefer, Emerging role of sertinency in depression, Psychiatry Research 117 dole in the management of schizophrenia, Neu(2003) 17–24. URL: https://www.sciencedirect.com/ ropsychiatric disease and treatment (2010) 429–441. science/article/pii/S0165178102003001. doi:https: [29] J. S. Maan, T. Duong, A. Saadabadi, Carbamazepine //doi.org/10.1016/S0165-1781(02)00300-1. (2018). [16] M. M. Jacobs, S. Merlo, P. M. Briley, Sleep [30] Z. Tolou-Ghamari, M. Zare, J. M. Habibabadi, M. R. duration, insomnia, and stuttering: The re- Najafi, A quick review of carbamazepine pharmalationship in adolescents and young adults, cokinetics in epilepsy from 1953 to 2012, Journal Journal of Communication Disorders 91 (2021) of research in medical sciences: the oficial journal 106106. URL: https://www.sciencedirect.com/ of Isfahan University of Medical Sciences 18 (2013) science/article/pii/S0021992421000290. doi:https: S81.

//doi.org/10.1016/j.jcomdis.2021.106106. [31] A. C. Pande, J. G. Crockatt, D. E. Feltner, C. A. Jan[17] Z. Wang, M. Tang, M. Larrazabal, E. Toner, ney, W. T. Smith, R. Weisler, P. D. Londborg, R. J.

M. Rucker, C. Wu, B. Teachman, M. Boukhechba, Bielski, D. L. Zimbrof, J. R. Davidson, et al., PreL. Barnes, Personalized state anxiety detection: An gabalin in generalized anxiety disorder: a placeboempirical study with linguistic biomarkers and a controlled trial, American Journal of Psychiatry machine learning pipeline (2023). doi:https://doi. 160 (2003) 533–540.

org/10.48550/arXiv.2304.09928. [32] J. R. Strawn, L. Geracioti, N. Rajdev, K. Clemenza, [18] C. Gauld, D. Depannemaecker, Dynamical systems A. Levine, Pharmacotherapy for generalized anxin computational psychiatry: A toy-model to appre- iety disorder in adult and pediatric patients: an hend the dynamics of psychiatric symptoms, Fron- evidence-based treatment review, Expert opinion tiers in Psychology 14 (2023). doi:10.3389/fpsyg. on pharmacotherapy 19 (2018) 1057–1070. 2023.1099257. [33] K. Chokhawala, S. Lee, A. Saadabadi, Lithium. na[19] E. Nelson, D. Bhattacharjya, T. Gao, M. Liu, D. Boun- tional library of medicine, National Center for efouf, P. Poupart, Linearizing contextual bandits Biotechnology Information. PubMed Central (2022). with latent state dynamics, in: Uncertainty in Arti- [34] T. Hui, A. Kandola, L. Shen, G. Lewis, D. Osborn, ifcial Intelligence, PMLR, 2022, pp. 1477–1487. J. Geddes, J. Hayes, A systematic review and meta[20] L. Li, W. Chu, J. Langford, R. E. Schapire, A analysis of clinical predictors of lithium response contextual-bandit approach to personalized news in bipolar disorder, Acta Psychiatrica Scandinavica article recommendation, in: Proceedings of the 140 (2019) 94–115. 19th international conference on World wide web, [35] D. F. Ionescu, R. C. Shelton, L. Baer, K. H. Meade, 2010, pp. 661–670. M. B. Swee, M. Fava, G. I. Papakostas, Ziprasidone [21] M. Wilson, J. Tripp, Clomipramine (2019). augmentation for anxious depression, International [22] H. Abdul-Baki, I. I. El Hajj, L. ElZahabi, C. Azar, clinical psychopharmacology 31 (2016) 341.

E. Aoun, A. Skoury, H. Chaar, A. I. Sharara, A ran- [36] T. Zhao, T.-W. Park, J.-C. Yang, G.-B. Huang, M.-G. domized controlled trial of imipramine in patients Kim, K.-H. Lee, Y.-C. Chung, Eficacy and safety with irritable bowel syndrome, World journal of of ziprasidone in the treatment of first-episode psygastroenterology: WJG 15 (2009) 3636. chosis: an 8-week, open-label, multicenter trial, In[23] C. A. Townsend, Selegiline transdermal patch (em- ternational clinical psychopharmacology 27 (2012) sam) for major depressive disorder, American Fam- 184–190.

ily Physician 77 (2008) 505. [37] F.-G. Pajonk, Risperidone in acute and long[24] J. J. Moore, A. Saadabadi, Selegiline, in: StatPearls term therapy of schizophrenia—a clinical profile, [Internet], StatPearls Publishing, 2022. Progress in Neuro-Psychopharmacology and Bio[25] T. Tenjin, S. Miyamoto, Y. Ninomiya, R. Kitajima, logical Psychiatry 28 (2004) 15–23.

S. Ogino, N. Miyake, N. Yamaguchi, Profile of blo- [38] J. Peuskens, Risperidone in the treatment of pananserin for the treatment of schizophrenia, Neu- tients with chronic schizophrenia: a multi-national, ropsychiatric disease and treatment (2013) 587–594. multi-centre, double-blind, parallel-group study [26] L. Dean, Venlafaxine therapy and cyp2d6 genotype versus haloperidol, The British Journal of Psychia(2020). try 166 (1995) 712–726. [27] R. D. Gibbons, K. Hur, C. H. Brown, J. M. Davis, J. J. [39] M. J. Allen, S. Sabir, S. Sharma, Gaba receptor (2018).

Mann, Benefits from antidepressants: synthesis of [40] M. Panebianco, S. Al-Bachari, J. L. Hutton, A. G. 6-week patient-level outcomes from double-blind Marson, Gabapentin add-on treatment for drugplacebo-controlled randomized trials of fluoxetine resistant focal epilepsy, Cochrane Database of Sysand venlafaxine, Archives of general psychiatry 69 tematic Reviews (2021). [41] F. Lavergne, I. Berlin, A. Gamma, H. Stassen, prospective study in chinese population, NeuropsyJ. Angst, Onset of improvement and response to chiatric Disease and Treatment (2017) 515–526. mirtazapine in depression: a multicenter naturalis- [54] J. Cookson, P. E. Keck Jr, T. A. Ketter, W. Macfadden, tic study of 4771 patients, Neuropsychiatric disease Number needed to treat and time to response/remisand treatment 1 (2005) 59–68. sion for quetiapine monotherapy eficacy in acute [42] H. R. Song, W.-M. Bahk, Y. S. Woo, J.-H. Jeong, Y.- bipolar depression: evidence from a large, randomJ. Kwon, J. S. Seo, W. Kim, M.-D. Kim, Y.-C. Shin, ized, placebo-controlled study, International cliniS.-Y. Lee, et al., Eficacy and tolerability of generic cal psychopharmacology 22 (2007) 93–100. mirtazapine (mirtax) for major depressive disorder: [55] J.-S. Lee, J.-H. Ahn, J.-I. Lee, J.-H. Kim, I. Jung, C.multicenter, open-label, uncontrolled, prospective U. Lee, J.-Y. Lee, S.-I. Lee, C.-Y. Kim, Dose pattern study, Clinical Psychopharmacology and Neuro- and efectiveness of paliperidone extended-release science 13 (2015) 144. tablets in patients with schizophrenia, Clinical Neu[43] K. A. Fariba, A. Saadabadi, Topiramate (2020). ropharmacology 34 (2011) 186–190. [44] Y.-T. Liu, G.-T. Chen, Y.-C. Huang, J.-T. Ho, C.-C. [56] A. Kumar, S. Balan, Fluoxetine for persistent develLee, C.-C. Tsai, C.-N. Chang, Efectiveness of dose- opmental stuttering, Clinical neuropharmacology escalated topiramate monotherapy and add-on ther- 30 (2007) 58–59. apy in neurosurgery-related epilepsy: A prospective study, Medicine 99 (2020). [45] G. Lewis, L. Dufy, A. Ades, R. Amos, R. Araya,

S. Brabyn, K. S. Button, R. Churchill, C. Derrick, C. Dowrick, et al., The clinical efectiveness of sertraline in primary care and the role of depression severity and duration (panda): a pragmatic, doubleblind, placebo-controlled randomised trial, The

Lancet Psychiatry 6 (2019) 903–914. [46] M. F. Flament, R. Lane, R. Zhu, Z. Ying, Predictors of an acute antidepressant response to fluoxetine and sertraline, International clinical psychopharmacology 14 (1999) 259–276. [47] H. K. Singh, A. Saadabadi, Sertraline (2019). [48] M. H. Trivedi, A. J. Rush, S. R. Wisniewski, A. A.

Nierenberg, D. Warden, L. Ritz, G. Norquist, R. H.

Howland, B. Lebowitz, P. J. McGrath, et al., Evaluation of outcomes with citalopram for depression using measurement-based care in star* d: implications for clinical practice, American journal of

Psychiatry 163 (2006) 28–40. [49] Z. Jia, J. Yu, C. Zhao, H. Ren, F. Luo, Outcomes and predictors of response of duloxetine for the treatment of persistent idiopathic dentoalveolar pain: A retrospective multicenter observational study, Journal of Pain Research (2022) 3031–3041. [50] N. Parikh, M. Yilanli, A. Saadabadi, Tranyl

cypromine (2017). [51] W. T. Heijnen, J. De Fruyt, A. I. Wierdsma, P. Sienaert, T. K. Birkenhäger, Eficacy of tranylcypromine in bipolar depression: a systematic review, Journal of clinical psychopharmacology 35 (2015) 700–705. [52] J. Waugh, K. L. Goa, Escitalopram: a review of its use in the management of major depressive and anxiety disorders, CNS drugs 17 (2003) 343–362. [53] K. Jiang, L. Li, X. Wang, M. Fang, J. Shi, Q. Cao, J. He,

J. Wang, W. Tan, C. Hu, Eficacy and tolerability of escitalopram in treatment of major depressive disorder with anxiety symptoms: a 24-week, open-label, Clomipramine Imipramine Selegiline Low remission rate (20%); [21] 80.6% patients after 12-week treatment, compared to 48.0% in the placebo [22] 33-40% [23] Depression, PTSD, OCD, panic disorder, social anxiety disorder [47] Depression, social anxiety disorder, PTSD [48] Depression, anxiety [49] Major depressive episodes [50] Depression, generalized anxiety disorder (GAD), obsessive compulsive disorder (OCD) and panic attacks [52] Schizophrenia, manic, psychotic and depressive episodes [54] Psychotic disorders including schizophrenia [55] OCD, certain eating disorders, panic attacks [56]

6-12 weeks [21] 4-5 weeks [22] 1-2 weeks [23] 4-6 weeks [25] 4-6 weeks [26] 4-6 weeks [28] First few days [29] 3 days [31] 1-3 weeks [33] 1-2 weeks [35] 4 weeks [37] 1-4 weeks [39] 1 week [41] 2-4 weeks (epilepsy) , 3 months (migraines) [43] Within 6 weeks [45]

1-2 weeks to start working, 4-6 weeks for full benefit [ 48 ] 2 -4 weeks [49] 1 - 2 weeks to start working, 6 -8 weeks for full improvement [50] 1-2 weeks to start working, 6-8 weeks for full improvement [ 52 ]

1-2 weeks to start working, 2-3 months for full improvement [ 54 ]

2-8 weeks for full improvement [ 55 ] 4 -5 weeks [56]