TOTEM: TOkenized Time Series EMbeddings TickerTrends Research Report | TickerTrends.io
The TickerTrends Social Arbitrage Hedge Fund is currently accepting capital. If you are interested in learning more send us an email admin@tickertrends.io.
Author(s): Sabera Talukder, Yisong Yue, Georgia Gkioxari
Title: TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis
Link: https://arxiv.org/pdf/2402.16412v2
Table of Contents
Introduction & Motivation
Methodology & Model Architecture
Experiments & Benchmarking
Key Findings & Performance Insights
Applications in Financial Markets
Limitations & Future Work
1. Introduction & Motivation
Time series data is fundamental across various fields, including finance, healthcare, climate science, and industrial monitoring. Despite its widespread use, current time series models are domain-specific, requiring extensive fine-tuning for different datasets and tasks.
TOTEM (TOkenized Time Series EMbeddings Model) proposes a paradigm shift by leveraging discrete tokenization—similar to language models like GPT—to build a generalist time series model. The goal is to train a single model across multiple domains and use tokenized representations to enable zero-shot and few-shot learning across time series tasks.
Key Contributions of TOTEM
Tokenization of Time Series Data
Introduces a discrete representation approach for time series data, allowing it to be processed similarly to language models.
Generalist Time Series Model
Unlike traditional models that need retraining per dataset, TOTEM enables cross-domain learning, performing well on unseen datasets.
Multi-Task Learning Across Diverse Domains
TOTEM is trained on multiple datasets simultaneously, showing strong zero-shot generalization across different time series tasks like forecasting, anomaly detection, and data imputation.
2. Methodology & Model Architecture
TOTEM Model Architecture Diagram Overview
Shows how the encoder, quantization, and decoder components interact.
Description: This figure provides a comprehensive overview of TOTEM's architecture, illustrating both generalist and specialist training paradigms, as well as inference paradigms across various tasks.
2.1. Tokenization Process
One of the most innovative aspects of TOTEM is the tokenization of time series data. Instead of processing raw time series values, TOTEM:
Uses a Vector Quantized Variational Autoencoder (VQVAE) to convert continuous signals into a discrete token space.
The encoder maps the input time series to a quantized latent space, where each time step is represented by a unique token.
The decoder reconstructs the original time series from these tokens.
This discrete representation allows time series data to be processed like natural language, enabling better generalization across domains.
2.2. Model Architecture
TOTEM is built using a VQVAE architecture, which consists of:
1D Convolutional Encoder: Extracts feature representations from the time series.
Vector Quantization Layer: Maps features to discrete embeddings, forming a "codebook."
1D Convolutional Decoder: Reconstructs time series data from token embeddings.
The model is trained using a self-supervised learning approach, meaning it does not require labeled data. Instead, it learns by minimizing reconstruction loss, ensuring that the encoded representations retain meaningful information about the original time series.
2.3 Training Pipeline & Inference
With the tokenization process (Section 2.1) and the VQVAE architecture (Section 2.2) defined, this section describes the training and inference pipeline for TOTEM, detailing how time series data is processed, learned, and applied to different tasks.
Training Process
TOTEM is trained using a self-supervised learning approach, which allows it to develop meaningful token representations for time series data without requiring labeled examples. The training pipeline consists of three main steps: tokenization, embedding learning, and reconstruction loss minimization.
Tokenization via VQVAE
The 1D Convolutional Encoder processes input time series data, extracting latent feature representations.
These representations are then quantized using a discrete codebook (Vector Quantization Layer), where each time step is mapped to a learned token.
This results in a tokenized representation of the time series, allowing for discrete processing similar to natural language.
Embedding Learning & Codebook Optimization
TOTEM learns a dictionary of embeddings (codebook), where each token represents a recurring time series pattern.
These embeddings are refined through backpropagation, ensuring they capture essential temporal structures.
The codebook continuously updates to improve generalization across different datasets.
Reconstruction & Self-Supervised Training
The 1D Convolutional Decoder reconstructs the original time series using the tokenized embeddings.
The model is optimized using reconstruction loss, ensuring that learned representations retain essential information while reducing redundancy.
Inference Pipeline
Once trained, TOTEM can be applied to various time series tasks using its learned embeddings. Inference occurs in two modes:
Specialist Models (Single-Domain Training)
Fine-tuned on specific time series domains (e.g., financial data, physiological signals).
Tokenization can occur along multiple dimensions (E, S, or T) for optimized domain-specific representations.
Generalist Models (Multi-Domain Training)
Trained on multiple datasets to develop broad generalization capabilities.
Tokenization is restricted to the T (time) dimension to ensure cross-domain applicability.
During inference:
A new time series sequence is encoded into discrete tokens using the trained VQVAE.
The model retrieves the most relevant embeddings from the codebook.
The decoder reconstructs the predicted values or missing data points, making it useful for:
Forecasting future time steps
Detecting anomalies
Imputing missing data
3. Experiments & Benchmarking
3.1. Tasks Evaluated
TOTEM was benchmarked on nearly 500 experiments covering the following core time series tasks:
Time Series Forecasting
Predicting future values based on historical trends.
Evaluated on 12 datasets with 14 baseline models.
Anomaly Detection
Identifying rare, irregular events in time series.
Evaluated on 25 datasets with 19 baseline models.
Time Series Imputation
Filling in missing or corrupted data points.
Evaluated on 12 datasets with 17 baseline models.
3.2. Datasets Used
TOTEM was tested on a wide range of real-world time series datasets, spanning:
Finance (Stock prices, market indices)
Healthcare (ECG signals, patient monitoring)
IoT & Industrial (Sensor readings)
Climate & Weather (Temperature, precipitation)
3.3. Baseline Comparisons
TOTEM was compared against a variety of time series models, including:
Deep Learning Models: Transformer-based models, RNNs, and CNNs.
Statistical Models: ARIMA, Kalman Filters, Gaussian Processes.
Self-Supervised Approaches: Recent time series embedding methods.
4. Key Findings & Performance Insights
4.1. Performance Results
TOTEM consistently outperforms or matches existing state-of-the-art models across all tasks and datasets.
Key Metrics:
Time Series Forecasting: TOTEM achieves lower error rates than traditional RNN-based and Transformer-based models.
Anomaly Detection: Shows strong zero-shot performance, detecting anomalies in datasets it was never explicitly trained on.
Imputation: Recovers missing data points with high accuracy, outperforming specialized models.
4.2. Generalist vs. Specialist Performance
Specialist Models (trained on a single dataset) still perform well, but require domain-specific training.
Generalist TOTEM (trained across domains) performs almost as well and often outperforms specialized models in zero-shot settings.
5. Applications in Financial Markets
TOTEM's ability to handle multiple time series tasks makes it highly applicable to financial markets, particularly in:
5.1. Algorithmic Trading & Market Prediction
TOTEM’s forecasting capabilities can improve stock price prediction, volatility estimation, and market trend analysis.
Its ability to learn from multiple domains makes it more adaptable than traditional finance-focused models.
5.2. Anomaly Detection for Risk Management
Financial institutions can use TOTEM to detect unusual trading patterns, fraud, and systemic market risks.
TOTEM’s strong zero-shot performance allows it to generalize to new fraud scenarios without retraining.
5.3. Data Imputation for Financial Datasets
Financial data often contains missing values due to reporting delays, outages, or incomplete datasets.
TOTEM’s self-supervised imputation can reconstruct missing financial data without requiring manual intervention.
6. Limitations & Future Work
While TOTEM shows interesting results, some limitations remain:
6.1. Computational Complexity
Training a VQVAE-based model on large datasets requires significant GPU resources.
Model inference speed is slower than simpler methods like ARIMA.
6.2. Tokenization Loss
The discretization process introduces some information loss, potentially impacting extreme outlier detection.
6.3. Financial-Specific Enhancements Needed
The model was not explicitly designed for financial time series.
Future improvements could include specialized financial embeddings to improve performance in market prediction tasks.
Conclusion
TOTEM represents a major leap in time series modeling, bringing tokenization-based embeddings to time series analysis in a way that mirrors language models.
For financial markets, its ability to forecast trends, detect anomalies, and impute missing data makes it highly valuable for hedge funds, banks, and trading firms looking to leverage alternative data sources using modern Machine Learning and AI models.