I am a Futures Researcher on Microsoft AI's Futures team in the Office of the CEO, working within Microsoft Superintelligence.
My current work focuses on the future impacts of advanced artificial intelligence. This includes judgemental forecasting, frontier safety analysis, economic modelling, AI security and privacy research, large-scale data analysis of AI usage, and strategy and policy gaming exercises.
I have published 40 papers (peer-reviewed articles, preprints, book chapters, etc), which have been cited more than 1,750 times with an h-index of 15 and an i10-index of 19.
Connect on Twitter, LinkedIn, Google Scholar, or reach out via email at contact.schoenegger@gmail.com.
Over my research career in academia and outside of it, I have worked in fields including AI, economics, philosophy, and psychology. This inter- and multi-disciplinary work has led me to use a wide range of methodologies and approaches, ranging from experimental design and survey methods to LLMs and statistical modelling. Across these projects, I have worked on solo-author projects, collaborated in small teams, led a research team of 40+, and contributed to large international consortia.
My work has been published specialist journals such as Journal of Behavioral and Experimental Economics, Transactions on Machine Learning Research, and Environmental Psychology, as well as generalist journals like Science Advances and Nature Communications.
So far, I have published 40 papers (peer-reviewed articles, preprints, book chapters, etc), which have been cited more than 1,750 times with an h-index of 15 and an i10-index of 19. We have also filed one US patent application. See my Google Scholar for a full list of publications.
There has also been some media coverage of my work in outlets such as Forbes, BigThink, Süddeutsche Zeitung, The Times, Psychology Today, and Die Zeit. I have also presented my research at governmental agencies like the UK Ministry of Defence, academic institutions like King's College London and INSEAD, and business conferences like the TBD Conference.
My work in artificial intelligence is largely situated at the intersection of AI and social science and has been published in journals such as Transactions on Machine Learning Research and ACM Transactions on Interactive Intelligent Systems.
In my papers on AI, I focus on four sets of questions: First, how well AI systems can perform in a set of relevant tasks such as real-world geopolitical and economic forecasting, personality estimation, and persuasion. Second, how AI systems can be improved through posttraining methods such as reinforcement learning from verifiable outcomes, prompt engineering, and composable adapter architectures for personalisation. Third, frontier safety and security risks of advanced AI systems, including verifiable agent-to-agent communication protocols and risks from AI systems that elicit consciousness attribution from users. Fourth, how people actually use conversational AI in practice, drawing on large-scale analyses of usage patterns including dedicated work on health-related queries.
I have previously worked on evaluations of AI capabilities to replace human survey study participants, forecast geopolitical and economic outcomes, estimate relationships between personality items, and persuade human decision-makers.
Evaluating frontier LLMs on forecasting tasks is particularly interesting for two reasons. First, forecasting is an inherently difficult task that requires integrating diverse sources of information and reasoning about complex, uncertain futures. Second, forecasting is a core capability for many high-impact applications of AI, such as decision support, risk assessment, and strategic planning. Our work suggests that while LLMs have made significant progress in forecasting, they still lag behind human experts and at best match crowds in real-world prediction tasks when drawing on ensemble methods, highlighting the need for further research to enhance their reasoning and predictive capabilities. However, our work on personality prediction shows that LLMs can already outperform most individual humans in this specific domain, suggesting that they can serve as valuable tools for understanding human traits and behaviours.
A further important capability is persuasion, as AI systems that can effectively persuade humans will have significant influence over human decision-making and thus pose significant risks as capabilities advance. Our work shows that frontier LLMs can already outperform incentivized human persuaders in interactive settings, demonstrating superior persuasive capabilities in both truthful and deceptive contexts. These findings underscore the urgency of developing alignment and governance frameworks to manage the risks associated with increasingly capable AI persuaders.
My posttraining work, mostly in collaboration with Lightning Rod Labs uses reinforcement learning from verifiable rewards (RLVR), specifically GRPO, to improve forecasting capabilities of small models. Our work shows that this approach can significantly improve forecasting accuracy of smaller models, closing much of the gap to larger models while using far less computational resources (a 14B model matching o1). This suggests that RLVR is a promising approach for improving AI forecasting capabilities in a cost-effective manner.
I have also investigated the prompt engineering side of improving LLM forecasting capabilities. Here, our work shows that simple prompt engineering strategies often fail to yield significant improvements in forecasting accuracy, suggesting that more robust techniques may be needed to enhance LLM performance in complex tasks such as forecasting.
In other work on adapting models to individual users, we propose a privacy-preserving architecture for LLM personalisation that decouples per-user data from shared model weights. The architecture combines a static base model, composable domain-expert LoRA adapters, and per-user proxy artefacts whose deletion constitutes deterministic unlearning. This converts machine unlearning from an intractable weight-editing problem into a deterministic deletion operation while preserving personalisation.
My work on frontier safety and security focuses on emerging risks from advanced AI systems and on technical mechanisms for making multiagent AI safer and more reliable.
In our work on Seemingly Conscious AI (SCAI), we develop a unified framework connecting empirical hallmarks of consciousness attribution to a structured taxonomy of risks from AI systems that lead users to perceive them as conscious. We identify five hallmarks of SCAI spanning affective capacity, anthropomorphic features, autonomous action, self-reflective behaviour, and social-interactive behaviour, and develop a taxonomy of risks spanning individual harms (such as emotional dependence and autonomy erosion) and societal-level harms (such as human status erosion and political strife). An accompanying expert survey suggests that risks to individuals are already observable and high-probability, while societal risks, though lower-probability, carry high potential severity and path-dependence.
In other work, we propose a certification protocol for verifiable agent-to-agent communication, addressing the challenge that multiagent AI systems lack methods to verify shared understanding of terms. Based on a stimulus-meaning model, agents are tested on shared observable events and terms are certified when empirical disagreement falls below a statistical threshold. In simulations, core-guarded reasoning reduces disagreement by 72-96%, and by 51% in a validation with fine-tuned language models.
My work on AI usage analyses large-scale data from conversational AI products to understand how people actually use these systems in practice. With teams at Microsoft AI, we conducted a descriptive analysis of Copilot usage patterns throughout 2025, examining temporal and modal dynamics of how users interact with AI assistants. We find that usage patterns vary significantly by device type and context, with mobile users focusing on health-related topics consistently across time, while desktop users exhibit work and technology-related queries that align with business hours. These findings highlight the integration of AI into various facets of daily life, serving as both a professional tool and a personal companion.
In a related study focused specifically on health-related queries, we analyse over 500,000 de-identified health conversations with Microsoft Copilot. Using a hierarchical intent taxonomy validated against expert human annotation, we find that nearly one in five conversations involve personal symptom assessment or condition discussion, that one in seven personal health queries concern someone other than the user (such as a child, a parent, or a partner), and that personal symptom and emotional health queries increase markedly during evening and nighttime hours when traditional healthcare is most limited. These patterns have direct implications for platform-specific design, safety considerations, and the responsible development of health AI.
My work in economics is heavily in the experimental tradition, focusing on behavioural economics and decision science. My research has been published in journals such as Journal of Behavioral and Experimental Economics and PLOS ONE.
My economics research focuses on two main areas: First, behavioural economics, particularly charitable giving and moral decision-making. Second, decision science, including forecasting of long-run causal effects and methods for improving low-probability judgements.
Overall, my research theme in behavioural economics is to understand how people make decisions that have moral significance, particularly in the context of charitable giving. I use experimental methods to investigate how different factors, such as risk attitudes, moral arguments, and normative uncertainty, influence people's willingness to donate to charity.
In our work, we find that individual differences in risk and ambiguity attitudes, empathy, numeracy, optimism, and donor type have little to no effect on charitable giving decisions between sure-thing and probabilistic charities. We also find that moral arguments can significantly increase donations, but increasing moral demandingness does not have an additional effect. Finally, we show that people value descriptive information about charities more than normative expert advice, but both can influence charity choice and reduce uncertainty.
My work in decision science broadly focuses on two areas: forecasting long-run causal effects and improving low-probability judgements. In our work on forecasting long-run causal effects, we find that experienced forecasters and academic experts outperform lay people in predicting the effects of long-term randomized controlled trials (RCTs). However, neither group consistently outperforms simple benchmarks, highlighting the difficulty of forecasting long-run events.
In our work on improving low-probability judgements, we find that shifting from standard linear elicitation scales and Brier scoring rules to nonlinear (logarithmic) elicitation scales and logarithmic scoring rules can significantly improve accuracy for low-probability judgments. These methodological changes lead to substantial improvements in individual and aggregate accuracy, suggesting promising avenues for enhancing probability judgments of rare events.
My philosophical background is in analytic philosophy, with research spanning experimental philosophy, philosophy of science, and political theory. My work often employs empirical methods to address traditional philosophical questions, bridging the gap between philosophical theorising and empirical investigation.
I have published in journals such as Synthese, Studies in History and Philosophy of Science, and Philosophical Psychology.
My experimental philosophy research focuses on moral philosophy and population ethics. I use experimental methods to investigate how people make moral judgments and the implications of these judgments for philosophical theories.
In our work on population ethics , we find that when people face conflicts between general moral principles and specific case judgments, they are more likely to revise their general principles than their case judgments. This suggests that case judgments play a central role in reflective equilibrium reasoning in population ethics.
On more methodological work on incentivisation, we initially find evidence that the Bayesian Truth Serum (BTS) can improve honesty in experimental philosophy surveys by rewarding surprisingly common answers. However, our follow-up work fails to replicate this effect, suggesting that the effectiveness of BTS may depend on specific experimental contexts.
My philosophy of science research examines examines the relationship between scientific realism and science communication, as well as structural reforms in social science research practices. In our work on scientific realism and science communication, we find that science communicators tend to be more inclined towards scientific antirealism compared to scientists, with both groups showing an overall inclination towards realism. This has important implications for how scientific knowledge is conveyed to the public.
We also argue that discussion sections in standard academic papers can contribute to epistemic malfunctions in social science research by allowing for inappropriate narrativisation of results. We propose eliminating discussion sections from social science research papers and outsourcing them to separate publications, which we argue could lead to several epistemic advantages, including a division of academic labour and better alignment of scientists' personal aims with the aims of science.
My political theory research focuses is in the liberal tradition, examining the epistemological foundations of market liberalism and its intersections with feminist standpoint theory, as well as Lockean political philosophy more generally.
In our work on Locke, we focus on prerogative power as central to Locke's political theory. First, we argue that Locke's conception of prerogative is political rather than natural, distinguishing between natural and political executive power. Second, we argue that Locke's strong executive power is restricted only by the public good, which conflicts with libertarian interpretations of Locke. We conclude that Locke's philosophy is better understood as proto-utilitarian, challenging libertarian and democratic readings of his work. Moreover, in other work take a closer look at market liberalism and feminist standpoint theory, arguing that both traditions share similar epistemological foundations related to the fragmentation of knowledge in society.
My research in psychology is primarily in social, personality, and environmental psychology. In my personality psychology research, I have developed and validated scales for measuring antinatalist attitudes and explored their relationship to dark triad traits. In social psychology, I have contributed to large international studies examining the social psychology of the COVID-19 pandemic and national identity. This work has been published in outlets such as Psychological Test Adaptation and Development, Environmental Psychology, and Nature Communications.
My personality psychology research focuses on antinatalism, the philosophical view that procreation is morally wrong. Some of my early work showed that dark triad personality traits are associated with antinatalist views. More recently, we have developed and validated the Short Antinatalism Scale (S-ANS), a brief instrument for measuring antinatalist attitudes. This scale has been validated across multiple studies and populations, providing a reliable tool for future research on antinatalism.
With a large international research consortium, I have contributed to several large-scale studies on the social and moral psychology of the COVID-19 pandemic. This work has examined factors influencing public health compliance, national identity, and moral psychology during the pandemic.
My environmental psychology research focuses on climate change communication and behavioural interventions to promote pro-environmental behaviour. In our work on solar radiation management (SRM), we conducted a large-scale online experiment to investigate whether information about SRM leads to a reduction in climate change mitigation efforts, finding no significant moral hazard effect. In other work with a large international collaboration, we collected climate change-related data from 63 countries to better understand the psychology of climate change and the effectiveness of various interventions, finding that the impact of behavioural climate interventions was small and varied across audiences and target behaviours.
Some of my interdisciplinary work extended into epidemiology and historical research. In epidemiology, our research looking at causal language in medical and public health literature found that much of the language used implies causality, even when not explicitly stated. In historical research, we examined dissertations written during and after the second world war at the University of Graz, finding continuities in thought and terminology that reflected the lingering influence of National Socialism in academic work.
My previous positions range from academic roles to professional forecasting. Previously, I have worked as a post-doctoral researcher at the London School of Economics, the Forecasting Research Institute, and completed a PhD in philosophy and economics at the University of St Andrews. I also worked as a professional forecaster at places like Metaculus and Swift, while leading a forecasting team at Seldon Capital. During this time I have taught, presented, and published extensively, with my work receiving coverage in various media outlets, while also engaging in direct forecasting work for corporations, non-governmental organisations, and governments.
My academic positions have included post-doctoral research roles at the London School of Economics's Behavioural Lab, where I worked on applied AI and behavioural science, and at the Forecasting Research Institute, where I collaborated with Philip Tetlock on AI forecasting and prediction methodology. Before that I had completed a PhD at the University of St Andrews for the dissertation entitled 'Moral Decision-Making: Essays from Philosophy and Economics'.
2023-2025
From 2023-2025, I was a post-doctoral researcher at the London School of Economics, working at the Behavioural Lab with Barbara Fasolo and Matteo Galizzi. I worked at the intersection of applied AI and behavioural science, publishing in outlets such as ACM Transactions on Interactive Intelligent Systems and Behavior Research Methods.
My position was funded by Longview Philanthropy, with research funding support from the LSE Department of Management, the Wharton-INSEAD Alliance, and the Wharton AI & Analytics Initiative.
During this time, I also taught guest lectures at the LSE and INSEAD, and presented at governmental agencies like the UK Ministry of Defence, academic conferences like one at the King's College London, and business conferences like the TBD Conference, among others.
My work received media coverage in Forbes, BigThink, Astral Codex Ten, and Marginal Revolution. I also served as a peer reviewer for Decision, Cognition, Personality and Individual Differences, Royal Society Open, and PCI RR, among other outlets.
2023
In 2023, I worked as a research analyst at the Forecasting Research Institute, collaborating on AI forecasting, low-probability event elicitation, and prediction methodology with Philip Tetlock and other researchers.
2018-2022
From 2018-2022, I completed first a Master's and an then PhD for a thesis entitled 'Moral Decision-Making: Essays from Philosophy and Economics' at St Andrews, where my research moved more heavily into economics and experimental philosophy/psychology as my methods became more quantitatively focused.
Completed 2022
In 2022, I completed a PhD in Philosophy and Economics with the title 'Moral Decision-Making: Essays from Philosophy and Economics' under the supervision of Theron Pummer and Miguel Costa-Gomes at St Andrews. The dissertation comprises three philosophy papers and three economics papers, ranging from research on charitable giving to reflective equilibrium reasoning.
My research was funded by the Forethought Foundation, the Centre for Effective Altruism, Giving What We Can, and the Long-Term Future Fund.
During this time, I presented my research at the Scottish Economic Society Annual Conference, the UK Experimental Philosophy Workshop, the Behavioural Economics, Society, and Technology Conference, and the London Graduate Conference in History of Political Thought, among others.
My work received coverage in Süddeutsche Zeitung, The Times, Psychology Today, Die Zeit, and PsyPost, and is cited on Wikipedia.
Completed 2018
In 2018, I completed the MLitt in Moral, Political, and Legal Philosophy at St Andrews, where I was on the 2018-2019 Dean's List.
During my time at St Andrews, I held several fellowships: I was an Associate Fellow at AdvanceHE for continued education in teaching, a Bernard Marcus Fellow at the Institute for Human Studies focusing on Locke and liberalism, an ECCP programme visitor and Global Priorities Fellow at the Global Priorities Institute at the University of Oxford working on effective charitable giving, and an Oscar Morgenstern Fellow at the Mercatus Center at George Mason University working on mainline economics.
2014-2018
From 2014-2018, I was an undergraduate student at the University of Graz, completing three concurrent degrees: a Mag.phil. teaching degree in Psychology, Philosophy and German, a BA in Philosophy, and a BA in German Philology. I earned two academic performance scholarships. At the end of my studies, I was an unofficial visiting student at the University Center for Human Values at Princeton University, where I finished my Mag.phil. thesis under the supervision of Peter Singer. During this time, I was also a semi-professional Magic: The Gathering player, achieving a top-5 world ranking in the Legacy format and multiple Grand Prix Top 8 finishes.
2012-2015
Before and early on during my undergraduate studies, I was a semi-professional Magic: The Gathering player in the Legacy format, achieving a 74.6% win-rate in Grand Prix-level tournaments, the fifth highest in the world at the time.
My main accomplishments were a 4th place at GP Paris (1,587 players) and another 4th place at GP New Jersey (4,003 players). During this period, I was sponsored by Team MTG Mint Card and MTG Madness, and have written for outlets such as Eternal Central for the GP Paris Top 4 report and StarCityGames for the GP New Jersey Story, among others.
In 2022, I qualified as a Pro Forecaster on Metaculus based on my forecasting track record, calibration, and quality of written rationales. As a forecaster for Metaculus and then Swift, I worked on forecasting projects with the Federation of American Scientists, the Carnegie Endowment for International Peace, and Google DeepMind. Later, I led the forecasting team at Seldon Capital for a year, working on macroeconomic and technology factors.
2024-2025
Between 2024 and 2025, I led the forecasting team at Seldon Capital for a year, working on macroeconomic and technology factors.
2023-2025
Between 2023 and 2025, I was a forecaster for the Swift Centre, working on questions including coal consumption, drug overdose trends, geopolitical developments, and AI progress. This included forecasting for the future capabilities of frontier AI models for Google DeepMind (see our forecasting paper).
2022-2025
Between 2022 and 2025, I was a Pro Forecaster at Metaculus, qualifying based on my track record, calibration, and quality of written rationales. I worked on projects with the Federation of American Scientists and the Carnegie Endowment for International Peace.