Deep Reinforcement Learning for Active High Frequency Trading
AI-generated Key Points
- End-to-end deep reinforcement learning (DRL) framework introduced for active high frequency trading
- DRL agents trained to trade one unit of Intel Corporation stocks using Proximal Policy Optimization algorithm
- Training conducted on three contiguous months of high-frequency Limit Order Book (LOB) data, test carried out on following month of data
- Only samples with largest price changes selected to maximize signal-to-noise ratio in training data
- Hyperparameters tuned using Sequential Model Based Optimization technique
- Three different state characterizations considered that differ in their LOB-based meta-features
- Agents learn trading strategies that produce stable positive returns despite highly stochastic and non-stationary environment
- Agents create dynamic representation of underlying environment by highlighting occasional regularities present in data and exploiting them to create long-term profitable trading strategies
- Number of trades reflected in histogram plots drives difference in return profiles between different trading state characterizations, not individual trade quality or profitability
- DRL can be effectively applied to high-frequency trading tasks using LOB-based meta-features as input states for agents' policies.
Authors: Antonio Briola, Jeremy Turiel, Riccardo Marcaccioli, Tomaso Aste
Abstract: We introduce the first end-to-end Deep Reinforcement Learning based framework for active high frequency trading. We train DRL agents to to trade one unit of Intel Corporation stocks by employing the Proximal Policy Optimization algorithm. The training is performed on three contiguous months of high frequency Limit Order Book data. In order to maximise the signal to noise ratio in the training data, we compose the latter by only selecting training samples with largest price changes. The test is then carried out on the following month of data. Hyperparameters are tuned using the Sequential Model Based Optimization technique. We consider three different state characterizations, which differ in the LOB-based meta-features they include. Agents learn trading strategies able to produce stable positive returns in spite of the highly stochastic and non-stationary environment, which is remarkable itself. Analysing the agents' performances on the test data, we argue that the agents are able to create a dynamic representation of the underlying environment highlighting the occasional regularities present in the data and exploiting them to create long-term profitable trading strategies.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.