Reinforcement Learning for Speculative Trading under Exploratory Framework

AI-generated keywords: Speculative trading Exploratory reinforcement learning Dynamic optimization Sequential optimal stopping Cox processes

AI-generated Key Points

  • Exploration of speculative trading within the exploratory reinforcement learning (RL) framework proposed by Wang et al. [2020]
  • Focus on dynamic optimization problem involving sequential optimal stopping over entry and exit times
  • Examination of a relaxed version of the problem using Cox processes controlled by bounded intensities for stopping times
  • Characterization of agent's control through a probability measure over jump intensities under exploratory RL formulation
  • Derivation of system of exploratory Hamilton-Jacobi-Bellman (HJB) equations and Gibbs distributions as optimal policy
  • Establishment of error estimates and demonstration of convergence to value function
  • Development of an RL algorithm tailored for speculative trading applications
  • Implementation in pairs-trading scenario to showcase practical application
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yun Zhao, Alex S. L. Tse, Harry Zheng

arXiv: 2604.02035v1 - DOI (q-fin.MF)
37 pages, 14 figures
License: CC BY 4.0

Abstract: We study a speculative trading problem within the exploratory reinforcement learning (RL) framework of Wang et al. [2020]. The problem is formulated as a sequential optimal stopping problem over entry and exit times under general utility function and price process. We first consider a relaxed version of the problem in which the stopping times are modeled by the jump times of Cox processes driven by bounded, non-randomized intensity controls. Under the exploratory formulation, the agent's randomized control is characterized via the probability measure over the jump intensities, and their objective function is regularized by Shannon's differential entropy. This yields a system of the exploratory HJB equations and Gibbs distributions in closed-form as the optimal policy. Error estimates and convergence of the RL objective to the value function of the original problem are established. Finally, an RL algorithm is designed, and its implementation is showcased in a pairs-trading application.

Submitted to arXiv on 02 Apr. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2604.02035v1

In this paper, we explore the realm of speculative trading within the exploratory reinforcement learning (RL) framework proposed by Wang et al. [2020]. Our focus is on a dynamic optimization problem involving sequential optimal stopping over entry and exit times, considering general utility functions and price processes. To tackle this complex issue, we first examine a relaxed version of the problem where stopping times are represented by the jump times of Cox processes controlled by bounded intensities. Under the exploratory RL formulation, we characterize the agent's control through a probability measure over jump intensities while regularizing their objective function with Shannon's differential entropy. This unique approach leads us to derive a system of exploratory Hamilton-Jacobi-Bellman (HJB) equations and Gibbs distributions in closed-form as the optimal policy. We also establish error estimates and demonstrate convergence of the RL objective to the value function of the original problem. Furthermore, our contribution extends to developing an RL algorithm tailored for speculative trading applications. By showcasing its implementation in a pairs-trading scenario, we illustrate how our theoretical framework can be effectively put into practice. Overall, this work adds to the continuous-time RL literature by addressing sequential optimal stopping problems under general diffusion dynamics and utility functions while emphasizing exploration in decision-making processes.
Created on 04 Apr. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.