Data Aggregation - Bar Building¶
This notebook demonstrates how to transform raw tick-level data into structured bars using various aggregation methods provided by the Quantreo library.
Starting from synthetic tick data, you'll learn how to:
- Build time-based bars, tick-based bars, and volume-based bars
- Construct advanced bars like tick imbalance and volume imbalance bars
- Visualize the results using candlestick charts with volume overlays
These methods are essential for normalizing market data and preparing it for backtesting, signal generation, or machine learning models.
Let’s dive in. 👇
# Import the Data Aggregation Package from Quantreo
import quantreo.data_aggregation as da
# Import a dataset to test the functions and create new ones easily
from quantreo.datasets import load_generated_ticks
df = load_generated_ticks()
# Show the data
df
| price | volume | |
|---|---|---|
| datetime | ||
| 2023-03-03 13:36:36 | 114.806983 | 3 |
| 2023-03-03 13:36:37 | 114.806983 | 1 |
| 2023-03-03 13:36:38 | 114.806983 | 1 |
| 2023-03-03 13:36:39 | 114.799521 | 1 |
| 2023-03-03 13:36:40 | 114.799521 | 1 |
| ... | ... | ... |
| 2023-03-15 03:23:11 | 118.686705 | 1 |
| 2023-03-15 03:23:12 | 118.686705 | 1 |
| 2023-03-15 03:23:13 | 118.686705 | 3 |
| 2023-03-15 03:23:14 | 118.686705 | 1 |
| 2023-03-15 03:23:15 | 118.672084 | 1 |
1000000 rows × 2 columns
import plotly.graph_objects as go
from plotly.subplots import make_subplots
def plot_candlestick_with_volume(df):
"""
Plot a candlestick chart with a volume bar chart below it.
Parameters
----------
df : pd.DataFrame
DataFrame with datetime index and the following columns:
- 'open', 'high', 'low', 'close', 'volume'
"""
fig = make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.10, row_heights=[0.8, 0.2],
subplot_titles=("Price", "Volume") )
# Candlestick
fig.add_trace(go.Candlestick(x=df.index, open=df["open"], high=df["high"], low=df["low"], close=df["close"],
name="OHLC"), row=1, col=1)
# Volume
fig.add_trace(go.Bar(x=df.index, y=df["volume"], name="Volume", marker_color="lightblue"), row=2, col=1)
fig.update_layout(xaxis_rangeslider_visible=False, showlegend=False, height=500, margin=dict(t=30, b=30) )
fig.update_yaxes(title_text="Price", row=1, col=1)
fig.update_yaxes(title_text="Volume", row=2, col=1)
fig.update_xaxes(type='category', row=2, col=1)
fig.update_xaxes(type='category', row=1, col=1)
fig.update_layout(
xaxis=dict(tickangle=45, tickfont=dict(size=9), nticks=15),
xaxis2=dict(tickangle=45, tickfont=dict(size=9), nticks=15))
fig.show()
Time Bars¶
The ticks_to_time_bars function aggregates raw tick data into fixed-time bars (e.g., 1-second, 1-minute, etc.). This is the most common form of bar construction, used in nearly all trading platforms.
The function will group ticks by time intervals and compute OHLCV values per bar.
time_bars = da.bar_building.ticks_to_time_bars(df, resample_factor="4H", col_price="price", col_volume="volume", additional_metrics=[])
time_bars
| open | high | low | close | volume | number_ticks | high_time | low_time | |
|---|---|---|---|---|---|---|---|---|
| time | ||||||||
| 2023-03-03 12:00:00 | 114.806983 | 114.821643 | 114.519924 | 114.640622 | 15573.0 | 8604 | 2023-03-03 13:39:22 | 2023-03-03 15:42:52 |
| 2023-03-03 16:00:00 | 114.640622 | 115.063405 | 114.577370 | 114.681267 | 24813.0 | 14400 | 2023-03-03 19:10:23 | 2023-03-03 16:54:07 |
| 2023-03-03 20:00:00 | 114.681267 | 114.896731 | 114.411137 | 114.859736 | 24455.0 | 14400 | 2023-03-03 23:58:56 | 2023-03-03 20:54:14 |
| 2023-03-04 00:00:00 | 114.859736 | 115.313589 | 114.747424 | 115.149874 | 26909.0 | 14400 | 2023-03-04 01:54:15 | 2023-03-04 00:36:35 |
| 2023-03-04 04:00:00 | 115.157326 | 115.382338 | 115.057242 | 115.288936 | 26443.0 | 14400 | 2023-03-04 05:27:52 | 2023-03-04 07:39:06 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2023-03-14 08:00:00 | 118.248409 | 118.263260 | 117.510691 | 117.664435 | 28025.0 | 14400 | 2023-03-14 08:00:06 | 2023-03-14 11:20:01 |
| 2023-03-14 12:00:00 | 117.664435 | 117.997454 | 117.635474 | 117.843508 | 25745.0 | 14400 | 2023-03-14 15:15:01 | 2023-03-14 12:06:39 |
| 2023-03-14 16:00:00 | 117.843508 | 117.970998 | 117.649717 | 117.910534 | 24350.0 | 14400 | 2023-03-14 19:53:27 | 2023-03-14 16:39:29 |
| 2023-03-14 20:00:00 | 117.910534 | 118.255959 | 117.811763 | 118.226044 | 28402.0 | 14400 | 2023-03-14 23:58:51 | 2023-03-14 21:11:00 |
| 2023-03-15 00:00:00 | 118.226044 | 118.722931 | 118.175237 | 118.672084 | 21583.0 | 12196 | 2023-03-15 03:14:41 | 2023-03-15 01:11:08 |
70 rows × 8 columns
plot_candlestick_with_volume(time_bars)