The Applications of Machine Learning Techniques in Networked Systems

Soheil Jamshidi

Many large networked systems ranging from the Internet to ones deployed atop the Internet (e.g., Amazon) play critical roles in our daily lives. In these systems, individual nodes (e.g., a computer) establish a physical or virtual connection/relationship to form a networked system and exchange data. An important task in these systems is the timely and accurate detection of security or management events, e.g. a denial of service attack on campus. Machine learning (ML) models offer a promising data-driven method to learn the “signature” of these events from the past instances and use that to detect future events. While ML models have been very successful in other domains (e.g., image processing), there are clear challenges in using them for event detection in networked systems including (i) limited availability of large scale labeled dataset, (ii) subtle and changing signature of target event, (iii) selecting and capturing proper traffic features for (re)training, (iv) “black-box” nature of ML models.

This dissertation presents three different applications of ML models for event detection based on exchanged messages in networked systems that tackle the above challenges. First, we develop an ML-based method to identify incentivized Amazon reviews. To this end, we present a heuristic-based signature to identify explicitly incentivized reviews (EIRs) and characterize related reviews, products, and reviewers. We use EIRs to train an ML model for detecting implicitly incentivized reviews. Second, we examine how casting and training strategies of unsupervised ML (and statistical) model affects their accuracy and overhead (and thus feasibility) for forecasting network data streams. In particular, we study the impact of the size, selection, and recency of the training data on accuracy and overhead. Third, we design and evaluate anomaly detection mechanisms based on an unsupervised ML-based method that takes input data streams from network traffic, end-system, and application load. Furthermore, we leverage model interpretation to identify the most important input data streams and deploy model extraction to infer the rules that represent model behavior. Overall, these three cases studies result in numerous insightful findings on a range of practical issues that arise in deploying ML models for event detection in networked systems.