Data Aggregation - Bar Building¶
This notebook demonstrates how to transform raw tick-level data into structured bars using various aggregation methods provided by the Quantreo library.
Starting from synthetic tick data, you'll learn how to:
- Build time-based bars, tick-based bars, and volume-based bars
- Construct advanced bars like tick imbalance and volume imbalance bars
- Visualize the results using candlestick charts with volume overlays
These methods are essential for normalizing market data and preparing it for backtesting, signal generation, or machine learning models.
Let’s dive in. 👇
# Import the Data Aggregation Package from Quantreo
import quantreo.data_aggregation as da
# Import a dataset to test the functions and create new ones easily
from quantreo.datasets import load_generated_ticks
df = load_generated_ticks()
# Show the data
df
| price | volume | |
|---|---|---|
| datetime | ||
| 2023-03-03 13:36:36 | 114.806983 | 3 |
| 2023-03-03 13:36:37 | 114.806983 | 1 |
| 2023-03-03 13:36:38 | 114.806983 | 1 |
| 2023-03-03 13:36:39 | 114.799521 | 1 |
| 2023-03-03 13:36:40 | 114.799521 | 1 |
| ... | ... | ... |
| 2023-03-15 03:23:11 | 118.686705 | 1 |
| 2023-03-15 03:23:12 | 118.686705 | 1 |
| 2023-03-15 03:23:13 | 118.686705 | 3 |
| 2023-03-15 03:23:14 | 118.686705 | 1 |
| 2023-03-15 03:23:15 | 118.672084 | 1 |
1000000 rows × 2 columns
import plotly.graph_objects as go
from plotly.subplots import make_subplots
def plot_candlestick_with_volume(df):
"""
Plot a candlestick chart with a volume bar chart below it.
Parameters
----------
df : pd.DataFrame
DataFrame with datetime index and the following columns:
- 'open', 'high', 'low', 'close', 'volume'
"""
fig = make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.10, row_heights=[0.8, 0.2],
subplot_titles=("Price", "Volume") )
# Candlestick
fig.add_trace(go.Candlestick(x=df.index, open=df["open"], high=df["high"], low=df["low"], close=df["close"],
name="OHLC"), row=1, col=1)
# Volume
fig.add_trace(go.Bar(x=df.index, y=df["volume"], name="Volume", marker_color="lightblue"), row=2, col=1)
fig.update_layout(xaxis_rangeslider_visible=False, showlegend=False, height=500, margin=dict(t=30, b=30) )
fig.update_yaxes(title_text="Price", row=1, col=1)
fig.update_yaxes(title_text="Volume", row=2, col=1)
fig.update_xaxes(type='category', row=2, col=1)
fig.update_xaxes(type='category', row=1, col=1)
fig.update_layout(
xaxis=dict(tickangle=45, tickfont=dict(size=9), nticks=15),
xaxis2=dict(tickangle=45, tickfont=dict(size=9), nticks=15))
fig.show()
Time Bars¶
The ticks_to_time_bars function aggregates raw tick data into fixed-time bars (e.g., 1-second, 1-minute, etc.). This is the most common form of bar construction, used in nearly all trading platforms.
The function will group ticks by time intervals and compute OHLCV values per bar.
time_bars = da.bar_building.ticks_to_time_bars(df, resample_factor="4H", col_price="price", col_volume="volume", additional_metrics=[])
time_bars
| open | high | low | close | volume | number_ticks | high_time | low_time | |
|---|---|---|---|---|---|---|---|---|
| time | ||||||||
| 2023-03-03 12:00:00 | 114.806983 | 114.821643 | 114.519924 | 114.640622 | 15573.0 | 8604 | 2023-03-03 13:39:22 | 2023-03-03 15:42:52 |
| 2023-03-03 16:00:00 | 114.640622 | 115.063405 | 114.577370 | 114.681267 | 24813.0 | 14400 | 2023-03-03 19:10:23 | 2023-03-03 16:54:07 |
| 2023-03-03 20:00:00 | 114.681267 | 114.896731 | 114.411137 | 114.859736 | 24455.0 | 14400 | 2023-03-03 23:58:56 | 2023-03-03 20:54:14 |
| 2023-03-04 00:00:00 | 114.859736 | 115.313589 | 114.747424 | 115.149874 | 26909.0 | 14400 | 2023-03-04 01:54:15 | 2023-03-04 00:36:35 |
| 2023-03-04 04:00:00 | 115.157326 | 115.382338 | 115.057242 | 115.288936 | 26443.0 | 14400 | 2023-03-04 05:27:52 | 2023-03-04 07:39:06 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2023-03-14 08:00:00 | 118.248409 | 118.263260 | 117.510691 | 117.664435 | 28025.0 | 14400 | 2023-03-14 08:00:06 | 2023-03-14 11:20:01 |
| 2023-03-14 12:00:00 | 117.664435 | 117.997454 | 117.635474 | 117.843508 | 25745.0 | 14400 | 2023-03-14 15:15:01 | 2023-03-14 12:06:39 |
| 2023-03-14 16:00:00 | 117.843508 | 117.970998 | 117.649717 | 117.910534 | 24350.0 | 14400 | 2023-03-14 19:53:27 | 2023-03-14 16:39:29 |
| 2023-03-14 20:00:00 | 117.910534 | 118.255959 | 117.811763 | 118.226044 | 28402.0 | 14400 | 2023-03-14 23:58:51 | 2023-03-14 21:11:00 |
| 2023-03-15 00:00:00 | 118.226044 | 118.722931 | 118.175237 | 118.672084 | 21583.0 | 12196 | 2023-03-15 03:14:41 | 2023-03-15 01:11:08 |
70 rows × 8 columns
plot_candlestick_with_volume(time_bars)
Tick Bars¶
The ticks_to_tick_bars function aggregates raw tick data into fixed-size tick bars — where each bar contains exactly N ticks (e.g., 1,000 ticks per bar). This method preserves microstructure details by standardizing the number of observations per bar rather than the time interval.
The function will sequentially split ticks into equal-sized chunks and compute OHLCV values, tick count, duration, and extrema timestamps for each bar.
tick_bars = da.bar_building.ticks_to_tick_bars(df, tick_per_bar=10_000, col_price="price", col_volume="volume", additional_metrics=[])
tick_bars
| open | high | low | close | volume | number_ticks | duration_minutes | high_time | low_time | |
|---|---|---|---|---|---|---|---|---|---|
| time | |||||||||
| 2023-03-03 13:36:36 | 114.806983 | 114.821643 | 114.519924 | 114.661382 | 17732.0 | 10000 | 166.65 | 2023-03-03 13:39:22 | 2023-03-03 15:42:52 |
| 2023-03-03 16:23:16 | 114.661382 | 115.055958 | 114.577370 | 115.048522 | 17867.0 | 10000 | 166.65 | 2023-03-03 19:09:50 | 2023-03-03 16:54:07 |
| 2023-03-03 19:09:56 | 115.048522 | 115.063405 | 114.411137 | 114.666460 | 16616.0 | 10000 | 166.65 | 2023-03-03 19:10:23 | 2023-03-03 20:54:14 |
| 2023-03-03 21:56:36 | 114.666460 | 114.928161 | 114.564990 | 114.873724 | 16559.0 | 10000 | 166.65 | 2023-03-04 00:21:44 | 2023-03-03 23:27:27 |
| 2023-03-04 00:43:16 | 114.873724 | 115.313589 | 114.845063 | 114.969048 | 19984.0 | 10000 | 166.65 | 2023-03-04 01:54:15 | 2023-03-04 00:47:55 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2023-03-14 13:29:56 | 117.833813 | 117.997454 | 117.701412 | 117.730518 | 18495.0 | 10000 | 166.65 | 2023-03-14 15:15:01 | 2023-03-14 16:14:55 |
| 2023-03-14 16:16:36 | 117.723186 | 117.925650 | 117.649717 | 117.868810 | 16815.0 | 10000 | 166.65 | 2023-03-14 18:52:12 | 2023-03-14 16:39:29 |
| 2023-03-14 19:03:16 | 117.868810 | 118.007704 | 117.718963 | 117.919602 | 18407.0 | 10000 | 166.65 | 2023-03-14 21:37:33 | 2023-03-14 19:20:44 |
| 2023-03-14 21:49:56 | 117.919602 | 118.378553 | 117.874878 | 118.341288 | 19526.0 | 10000 | 166.65 | 2023-03-15 00:28:01 | 2023-03-14 22:04:57 |
| 2023-03-15 00:36:36 | 118.334049 | 118.722931 | 118.175237 | 118.672084 | 17692.0 | 10000 | 166.65 | 2023-03-15 03:14:41 | 2023-03-15 01:11:08 |
100 rows × 9 columns
plot_candlestick_with_volume(tick_bars)
Volume Bars¶
The ticks_to_volume_bars function aggregates raw tick data into bars based on cumulative traded volume. A new bar is created every time the specified volume_per_bar threshold is reached. This method adapts to market activity: more bars during high activity, fewer during quiet periods.
The function sequentially accumulates volume and computes OHLCV, tick count, duration, and extrema timestamps for each volume bar.
volume_bars = da.bar_building.ticks_to_volume_bars(df, volume_per_bar=15_000, col_price="price", col_volume="volume", additional_metrics=[])
volume_bars
| open | high | low | close | volume | number_ticks | duration_minutes | high_time | low_time | |
|---|---|---|---|---|---|---|---|---|---|
| time | |||||||||
| 2023-03-03 13:36:36 | 114.806983 | 114.821643 | 114.519924 | 114.647499 | 15000.0 | 8219 | 136.966667 | 2023-03-03 13:39:22 | 2023-03-03 15:42:52 |
| 2023-03-03 15:53:35 | 114.647499 | 114.893622 | 114.577370 | 114.886139 | 15000.0 | 9001 | 150.000000 | 2023-03-03 18:23:30 | 2023-03-03 16:54:07 |
| 2023-03-03 18:23:36 | 114.886139 | 115.063405 | 114.471579 | 114.471579 | 15002.0 | 8673 | 144.533333 | 2023-03-03 19:10:23 | 2023-03-03 20:47:57 |
| 2023-03-03 20:48:09 | 114.471579 | 114.826655 | 114.411137 | 114.642082 | 15000.0 | 8779 | 146.300000 | 2023-03-03 22:43:32 | 2023-03-03 20:54:14 |
| 2023-03-03 23:14:28 | 114.642082 | 115.199837 | 114.564990 | 115.199837 | 15000.0 | 8934 | 148.883333 | 2023-03-04 01:42:59 | 2023-03-03 23:27:27 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2023-03-14 15:19:06 | 117.966725 | 117.966725 | 117.649717 | 117.672601 | 15000.0 | 8934 | 148.883333 | 2023-03-14 15:19:06 | 2023-03-14 16:39:29 |
| 2023-03-14 17:48:00 | 117.672601 | 117.970998 | 117.665225 | 117.881066 | 15000.0 | 8680 | 144.650000 | 2023-03-14 19:53:27 | 2023-03-14 17:48:44 |
| 2023-03-14 20:12:40 | 117.881066 | 118.007704 | 117.811763 | 117.956106 | 15005.0 | 7965 | 132.733333 | 2023-03-14 21:37:33 | 2023-03-14 21:11:00 |
| 2023-03-14 22:25:25 | 117.956106 | 118.378553 | 117.948855 | 118.371551 | 15009.0 | 7434 | 123.883333 | 2023-03-15 00:28:01 | 2023-03-14 22:25:36 |
| 2023-03-15 00:29:19 | 118.371551 | 118.572993 | 118.175237 | 118.491603 | 15000.0 | 8682 | 144.683333 | 2023-03-15 02:48:49 | 2023-03-15 01:11:08 |
120 rows × 9 columns
plot_candlestick_with_volume(volume_bars)
Tick Imbalance Bars¶
he ticks_to_tick_imbalance_bars function builds bars based on the signed tick imbalance. Unlike time- or volume-based bars, a new bar is triggered only when the cumulative imbalance between buyer-initiated and seller-initiated ticks exceeds a predefined threshold.
This technique helps normalize market activity by emphasizing price pressure rather than time or volume, which is particularly useful for event-driven strategies or volatile markets.
The function computes OHLCV, tick count, duration, and extrema timestamps for each bar.
📐 How It Works
Each incoming tick contributes to a running total based on its signed direction:
$$ \text{Signed Tick} = \begin{cases} +1 & \text{if } P_t > P_{t-1} \\ -1 & \text{if } P_t < P_{t-1} \\ 0 & \text{otherwise} \end{cases} $$
A new bar is created when the absolute value of the cumulative signed imbalance exceeds the expected_imbalance threshold.
tick_imb_bars = da.bar_building.ticks_to_tick_imbalance_bars(df, expected_imbalance=35, col_price="price", col_volume="volume", additional_metrics=[])
tick_imb_bars
| open | high | low | close | volume | number_ticks | duration_minutes | high_time | low_time | |
|---|---|---|---|---|---|---|---|---|---|
| time | |||||||||
| 2023-03-03 13:36:39 | 114.799521 | 115.063405 | 114.519924 | 114.707569 | 39305.0 | 22308 | 371.783333 | 2023-03-03 19:10:23 | 2023-03-03 15:42:52 |
| 2023-03-03 19:48:30 | 114.700351 | 114.821645 | 114.411137 | 114.804494 | 17180.0 | 10481 | 174.666667 | 2023-03-03 22:25:12 | 2023-03-03 20:54:14 |
| 2023-03-03 22:43:12 | 114.797015 | 115.382338 | 114.564990 | 115.279956 | 97677.0 | 53505 | 891.733333 | 2023-03-04 05:27:52 | 2023-03-03 23:27:27 |
| 2023-03-04 13:35:00 | 115.287366 | 115.410687 | 114.996672 | 115.382462 | 57754.0 | 34345 | 572.400000 | 2023-03-04 16:31:19 | 2023-03-04 18:38:35 |
| 2023-03-04 23:07:29 | 115.389824 | 115.714880 | 115.337433 | 115.714880 | 3078.0 | 1704 | 28.383333 | 2023-03-04 23:35:52 | 2023-03-04 23:08:28 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2023-03-14 01:32:45 | 118.246893 | 118.436774 | 117.933999 | 117.933999 | 49691.0 | 26748 | 445.783333 | 2023-03-14 06:51:56 | 2023-03-14 08:58:32 |
| 2023-03-14 08:58:35 | 117.926244 | 118.001038 | 117.678191 | 117.678191 | 9773.0 | 5992 | 99.850000 | 2023-03-14 09:09:01 | 2023-03-14 10:38:26 |
| 2023-03-14 10:38:27 | 117.692692 | 117.930283 | 117.510691 | 117.930283 | 33205.0 | 16412 | 273.516667 | 2023-03-14 15:11:58 | 2023-03-14 11:20:01 |
| 2023-03-14 15:12:05 | 117.923012 | 118.255959 | 117.649717 | 118.255959 | 57326.0 | 31607 | 526.766667 | 2023-03-14 23:58:51 | 2023-03-14 16:39:29 |
| 2023-03-14 23:58:59 | 118.248346 | 118.551315 | 118.175237 | 118.551315 | 17558.0 | 10012 | 166.850000 | 2023-03-15 02:45:50 | 2023-03-15 01:11:08 |
73 rows × 9 columns
plot_candlestick_with_volume(tick_imb_bars)
Volume Imbalance Bars¶
The ticks_to_volume_imbalance_bars function creates bars based on signed volume imbalance. A new bar is triggered when the imbalance between buying and selling volume exceeds a predefined threshold.
This method captures asymmetry in trading pressure, allowing you to detect key moments where market participation is strongly biased in one direction — often before large price moves.
The function computes OHLCV, tick count, duration, extrema timestamps, and optionally, custom metrics.
📐 How It Works
The volume imbalance is computed as:
$$ \text{Signed Volume}_t = \begin{cases} +V_t & \text{if } P_t > P_{t-1} \\ -V_t & \text{if } P_t < P_{t-1} \\ 0 & \text{otherwise} \end{cases} $$
Where ( V_t ) is the tick volume at time ( t ).
A new bar is created when the cumulative sum of signed volume exceeds expected_imbalance.
vol_imb_bars = da.bar_building.ticks_to_volume_imbalance_bars(df, expected_imbalance=50, col_price="price", col_volume="volume", additional_metrics=[])
vol_imb_bars
| open | high | low | close | volume | number_ticks | duration_minutes | high_time | low_time | |
|---|---|---|---|---|---|---|---|---|---|
| time | |||||||||
| 2023-03-03 13:36:37 | 114.806983 | 114.821643 | 114.519924 | 114.652717 | 19504.0 | 11110 | 185.150000 | 2023-03-03 13:39:22 | 2023-03-03 15:42:52 |
| 2023-03-03 16:41:47 | 114.645225 | 114.982353 | 114.577370 | 114.982353 | 11526.0 | 6772 | 112.850000 | 2023-03-03 18:34:38 | 2023-03-03 16:54:07 |
| 2023-03-03 18:34:39 | 114.982353 | 114.982353 | 114.884364 | 114.913328 | 1341.0 | 724 | 12.050000 | 2023-03-03 18:34:39 | 2023-03-03 18:46:08 |
| 2023-03-03 18:46:43 | 114.913328 | 115.063405 | 114.817075 | 114.817075 | 6310.0 | 3302 | 55.016667 | 2023-03-03 19:10:23 | 2023-03-03 19:41:44 |
| 2023-03-03 19:41:45 | 114.809735 | 114.810172 | 114.605556 | 114.613782 | 3095.0 | 1960 | 32.650000 | 2023-03-03 19:42:42 | 2023-03-03 20:12:55 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2023-03-14 21:55:17 | 117.911218 | 118.249253 | 117.874878 | 118.227864 | 14174.0 | 6977 | 116.266667 | 2023-03-14 23:37:14 | 2023-03-14 22:04:57 |
| 2023-03-14 23:51:34 | 118.227864 | 118.361885 | 118.160366 | 118.361885 | 3302.0 | 1891 | 31.500000 | 2023-03-15 00:23:04 | 2023-03-14 23:56:10 |
| 2023-03-15 00:23:05 | 118.361885 | 118.435683 | 118.175237 | 118.232546 | 12298.0 | 6815 | 113.566667 | 2023-03-15 01:26:26 | 2023-03-15 01:11:08 |
| 2023-03-15 02:16:40 | 118.232546 | 118.492814 | 118.232440 | 118.492814 | 2098.0 | 1366 | 22.750000 | 2023-03-15 02:39:25 | 2023-03-15 02:17:01 |
| 2023-03-15 02:39:26 | 118.492814 | 118.625803 | 118.476658 | 118.625803 | 2603.0 | 1332 | 22.183333 | 2023-03-15 03:01:37 | 2023-03-15 02:53:03 |
266 rows × 9 columns
plot_candlestick_with_volume(vol_imb_bars)