Traffic Refinery
Cost-aware data representation for ML on network traffic

Overview
Traffic Refinery is a framework and system that enables joint evaluation of ML performance (accuracy) and systems-level costs of different network traffic representations.
The Problem
Network management ML models require understanding not just accuracy but also the systems costs of deploying models in operational networks. Feature representation choices determine both where and whether models can be deployed.
What Traffic Refinery Does
- Flexible Representations: Offers extensible network data representations
- Cost Assessment: Measures systems-related costs of representations
- Performance Analysis: Evaluates effects on model performance
System Components
- Traffic categorization module (DNS/IP-based)
- Packet capture and processing with flow state tracking
- Aggregation and storage for higher-level features
Use Cases
- Video streaming quality inference
- Malware detection
- Any ML task requiring network traffic analysis
Results
Traffic Refinery operates at 10 Gbps, transforming traffic in real-time to produce various feature representations. Enables exploration of accuracy vs. cost tradeoffs.
Resources
Citation
@article{bronzino2021traffic,
title={Traffic Refinery: Cost-Aware Data Representation for Machine Learning on Network Traffic},
author={Bronzino, Francesco and Schmitt, Paul and Ayoubi, Sara and Kim, Hyojoon and Teixeira, Renata and Feamster, Nick},
journal={Proc. ACM Meas. Anal. Comput. Syst.},
volume={5},
number={3},
year={2021}
}