Föreläsningar och seminarier Predisputationsseminarium: Fredrik Sand
Linguistic markers in writing across neurodegenerative disorders
Abstract
Language impairments are common in neurodegenerative disorders, including Alzheimer's disease (AD), frontotemporal dementia (FTD), and amyotrophic lateral sclerosis (ALS). This includes written language, as text production depends on planning, lexical retrieval, syntactic formulation, and working memory. Changes in written output, for example reductions in productivity and syntactic complexity, may precede detectable impairment on standard cognitive tests. The present thesis investigated if linguistic features extracted from written texts can act as markers of cognitive status across neurodegenerative disease. Written samples were analysed using natural language processing (NLP) pipelines adapted for Swedish text, including measures of syntactic complexity, productivity, lexical diversity, and sentence-level structure. The main complexity measure was average dependency distance (ADD), a measure based on the mean linear distance between syntactically related words in a sentence.
Study I examined whether average dependency distance (ADD), a measure based on the mean linear distance between syntactically related words in a sentence, can differentiate between levels of cognitive impairment. ADD showed a systematic association with diagnostic category, with participants in the AD group producing texts with significantly lower values in ADD than those with MCI or SCI, accounting for approximately one quarter of the variance in diagnostic group membership.
Study II compared ADD with five basal linguistic measures in the same study cohort as in study I. Word count was the strongest single predictor of diagnostic group, and the combination of ADD and word count yielded the highest classification accuracy (51 to 64%), indicating that syntactic complexity and productivity capture distinct dimensions.
Study III assessed ADD and word count in presymptomatic carriers of genetic mutations associated with FTD and in non-carrier controls from the GENFI cohort. Additionally, a small group with manifest symptoms were included. ADD showed no significant group or demographic effects, suggesting that syntactic structure is preserved in the presymptomatic stage. Word count differed by genetic status and showed an interaction with age, but the clinical significance of this finding is unclear.
Study IV evaluated whether linguistic features derived from picture descriptions and memory narratives predict cognitive test score on the Edinburgh Cognitive and Behavioural ALS Screen (ECAS) in individuals with ALS. Models using linguistic features outperformed models based on demographic and clinical covariates alone, with typical prediction errors of 8 to 10 ECAS points. Structural features, including ADD and mean sentence length, were most informative in picture description tasks, whereas lexical and semantic features showed greater predictive value in memory-based narratives.
In summary, syntactic complexity and word count provided complementary information, with ADD particularly sensitive within the AD continuum and productivity measures differing by genetic status in the FTD cohort. In ALS, linguistic features enhanced prediction of cognitive status beyond clinical variables. Automated written language analysis offers a scalable approach for detecting and tracking cognitive change in neurodegenerative disease. Establishing normative data, optimizing NLP tools for pathological text, and validating these measures in larger longitudinal cohorts are necessary next steps.
