TOTEM: TOkenized Time Series EMbeddings TickerTrends Research Report | TickerTrends.io

The TickerTrends Social Arbitrage Hedge Fund is currently accepting capital. If you are interested in learning more send us an email admin@tickertrends.io.

TickerTrends

Feb 20, 2025

Author(s): Sabera Talukder, Yisong Yue, Georgia Gkioxari

Title: TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis

Link: https://arxiv.org/pdf/2402.16412v2

Table of Contents

Introduction & Motivation
Methodology & Model Architecture
Experiments & Benchmarking
Key Findings & Performance Insights
Applications in Financial Markets
Limitations & Future Work

1. Introduction & Motivation

Time series data is fundamental across various fields, including finance, healthcare, climate science, and industrial monitoring. Despite its widespread use, current time series models are domain-specific, requiring extensive fine-tuning for different datasets and tasks.

TOTEM (TOkenized Time Series EMbeddings Model) proposes a paradigm shift by leveraging discrete tokenization—similar to language models like GPT—to build a generalist time series model. The goal is to train a single model across multiple domains and use tokenized representations to enable zero-shot and few-shot learning across time series tasks.

Key Contributions of TOTEM

Tokenization of Time Series Data
- Introduces a discrete representation approach for time series data, allowing it to be processed similarly to language models.
Generalist Time Series Model
- Unlike traditional models that need retraining per dataset, TOTEM enables cross-domain learning, performing well on unseen datasets.
Multi-Task Learning Across Diverse Domains
- TOTEM is trained on multiple datasets simultaneously, showing strong zero-shot generalization across different time series tasks like forecasting, anomaly detection, and data imputation.

2. Methodology & Model Architecture

TOTEM Model Architecture Diagram Overview

Shows how the encoder, quantization, and decoder components interact.

Description: This figure provides a comprehensive overview of TOTEM's architecture, illustrating both generalist and specialist training paradigms, as well as inference paradigms across various tasks.

2.1. Tokenization Process

One of the most innovative aspects of TOTEM is the tokenization of time series data. Instead of processing raw time series values, TOTEM:

Uses a Vector Quantized Variational Autoencoder (VQVAE) to convert continuous signals into a discrete token space.
The encoder maps the input time series to a quantized latent space, where each time step is represented by a unique token.
The decoder reconstructs the original time series from these tokens.

This discrete representation allows time series data to be processed like natural language, enabling better generalization across domains.

2.2. Model Architecture

TOTEM is built using a VQVAE architecture, which consists of:

1D Convolutional Encoder: Extracts feature representations from the time series.
Vector Quantization Layer: Maps features to discrete embeddings, forming a "codebook."
1D Convolutional Decoder: Reconstructs time series data from token embeddings.

The model is trained using a self-supervised learning approach, meaning it does not require labeled data. Instead, it learns by minimizing reconstruction loss, ensuring that the encoded representations retain meaningful information about the original time series.

2.3 Training Pipeline & Inference

With the tokenization process (Section 2.1) and the VQVAE architecture (Section 2.2) defined, this section describes the training and inference pipeline for TOTEM, detailing how time series data is processed, learned, and applied to different tasks.

Training Process

TOTEM is trained using a self-supervised learning approach, which allows it to develop meaningful token representations for time series data without requiring labeled examples. The training pipeline consists of three main steps: tokenization, embedding learning, and reconstruction loss minimization.

Tokenization via VQVAE
- The 1D Convolutional Encoder processes input time series data, extracting latent feature representations.
- These representations are then quantized using a discrete codebook (Vector Quantization Layer), where each time step is mapped to a learned token.
- This results in a tokenized representation of the time series, allowing for discrete processing similar to natural language.
Embedding Learning & Codebook Optimization
- TOTEM learns a dictionary of embeddings (codebook), where each token represents a recurring time series pattern.
- These embeddings are refined through backpropagation, ensuring they capture essential temporal structures.
- The codebook continuously updates to improve generalization across different datasets.
Reconstruction & Self-Supervised Training
- The 1D Convolutional Decoder reconstructs the original time series using the tokenized embeddings.
- The model is optimized using reconstruction loss, ensuring that learned representations retain essential information while reducing redundancy.

Inference Pipeline

Once trained, TOTEM can be applied to various time series tasks using its learned embeddings. Inference occurs in two modes:

Specialist Models (Single-Domain Training)
- Fine-tuned on specific time series domains (e.g., financial data, physiological signals).
- Tokenization can occur along multiple dimensions (E, S, or T) for optimized domain-specific representations.
Generalist Models (Multi-Domain Training)
- Trained on multiple datasets to develop broad generalization capabilities.
- Tokenization is restricted to the T (time) dimension to ensure cross-domain applicability.

During inference:

A new time series sequence is encoded into discrete tokens using the trained VQVAE.
The model retrieves the most relevant embeddings from the codebook.
The decoder reconstructs the predicted values or missing data points, making it useful for:
- Forecasting future time steps
- Detecting anomalies
- Imputing missing data

3. Experiments & Benchmarking

3.1. Tasks Evaluated

TOTEM was benchmarked on nearly 500 experiments covering the following core time series tasks:

Time Series Forecasting
- Predicting future values based on historical trends.
- Evaluated on 12 datasets with 14 baseline models.

Anomaly Detection
- Identifying rare, irregular events in time series.
- Evaluated on 25 datasets with 19 baseline models.

Time Series Imputation
- Filling in missing or corrupted data points.
- Evaluated on 12 datasets with 17 baseline models.

3.2. Datasets Used

TOTEM was tested on a wide range of real-world time series datasets, spanning:

Finance (Stock prices, market indices)
Healthcare (ECG signals, patient monitoring)
IoT & Industrial (Sensor readings)
Climate & Weather (Temperature, precipitation)

3.3. Baseline Comparisons

TOTEM was compared against a variety of time series models, including:

Deep Learning Models: Transformer-based models, RNNs, and CNNs.
Statistical Models: ARIMA, Kalman Filters, Gaussian Processes.
Self-Supervised Approaches: Recent time series embedding methods.

4. Key Findings & Performance Insights

4.1. Performance Results

TOTEM consistently outperforms or matches existing state-of-the-art models across all tasks and datasets.

Key Metrics:

Time Series Forecasting: TOTEM achieves lower error rates than traditional RNN-based and Transformer-based models.

Anomaly Detection: Shows strong zero-shot performance, detecting anomalies in datasets it was never explicitly trained on.

Imputation: Recovers missing data points with high accuracy, outperforming specialized models.

4.2. Generalist vs. Specialist Performance

Specialist Models (trained on a single dataset) still perform well, but require domain-specific training.
Generalist TOTEM (trained across domains) performs almost as well and often outperforms specialized models in zero-shot settings.

5. Applications in Financial Markets

TOTEM's ability to handle multiple time series tasks makes it highly applicable to financial markets, particularly in:

5.1. Algorithmic Trading & Market Prediction

TOTEM’s forecasting capabilities can improve stock price prediction, volatility estimation, and market trend analysis.
Its ability to learn from multiple domains makes it more adaptable than traditional finance-focused models.

5.2. Anomaly Detection for Risk Management

Financial institutions can use TOTEM to detect unusual trading patterns, fraud, and systemic market risks.
TOTEM’s strong zero-shot performance allows it to generalize to new fraud scenarios without retraining.

5.3. Data Imputation for Financial Datasets

Financial data often contains missing values due to reporting delays, outages, or incomplete datasets.
TOTEM’s self-supervised imputation can reconstruct missing financial data without requiring manual intervention.

6. Limitations & Future Work

While TOTEM shows interesting results, some limitations remain:

6.1. Computational Complexity

Training a VQVAE-based model on large datasets requires significant GPU resources.
Model inference speed is slower than simpler methods like ARIMA.

6.2. Tokenization Loss

The discretization process introduces some information loss, potentially impacting extreme outlier detection.

6.3. Financial-Specific Enhancements Needed

The model was not explicitly designed for financial time series.
Future improvements could include specialized financial embeddings to improve performance in market prediction tasks.

Conclusion

TOTEM represents a major leap in time series modeling, bringing tokenization-based embeddings to time series analysis in a way that mirrors language models.

For financial markets, its ability to forecast trends, detect anomalies, and impute missing data makes it highly valuable for hedge funds, banks, and trading firms looking to leverage alternative data sources using modern Machine Learning and AI models.

TickerTrends Research