Inference-Cost-Aware Dynamic Tree Construction for Efficient Inference in Large Language Models
💡 This research tackles the problem of language AI.
Large Language Models (LLMs) face significant inference latency challenges stemming from their autoregressive design and large size . To address this, speculative decoding emerges as a solution, enabling the simultaneous generation and validation of multiple tokens .
Distilling Multilingual Vision-Language Models: When Smaller Models Stay Multilingual
💡 This research reduces language AI.
Knowledge distillation (KD) demonstrates promising results in transferring knowledge from larger to smaller VLMs . applying KD in multilingualism is an underexplored area . We study five distillation formulations across CLIP and SigLIP2 .
STAR: A Privacy-Preserving, Energy-Efficient Edge AI Framework for Human Activity Recognition via Wi-Fi CSI in Mobile and Pervasive Computing Environments
💡 This research presents techniques for privacy-preserving AI.
Human Activity Recognition (HAR) via Wi-Fi Channel State Information (CSI) presents a privacy-preserving, contactless sensing approach suitable for smart homes, healthcare monitoring, and mobile IoT systems .
Do Students Debias Like Teachers? On the Distillability of Bias Mitigation Methods
💡 This research running AI locally on devices for computer vision.
Knowledge distillation (KD) is an effective method for model compression and transferring knowledge between models . However, its effect on model's robustness against spurious correlations that degrade performance on out-of-distribution data remains underexplored . This study investigates the effect of knowledge distillation on the transferability of ``debiasing'' capabilities from teacher models to student models .
An Agentic Framework for Rapid Deployment of Edge AI Solutions in Industry 5.0
💡 This research reduces edge computing.
We present a novel framework for Industry 5.0 that simplifies the deployment of AI models on edge devices in various industrial settings . The design reduces latency and avoids external data transfer by enabling local inference and real-time processing .
Energy-Efficient Autonomous Driving with Adaptive Perception and Robust Decision
💡 This research explores techniques in machine learning.
Autonomous driving is an emerging technology that is expected to bring significant social, economic, and environmental benefits . However, these benefits come with rising energy consumption by computation engines limiting the driving range of vehicles, especially electric ones . Perception computing is typically the most power-intensive component, as it relies on deep learning models to extract environmental features . To address these challenges, we propose an energy-efficient autonomous driving framework, called EneAD .
Resource-Efficient and Robust Inference of Deep and Bayesian Neural Networks on Embedded and Analog Computing Platforms
💡 This research makes more efficient edge computing.
While machine learning has transformed numerous application domains, its growing computational demands increasingly constrain scalability and efficiency . In practice, neural networks must not only operate efficiently but also provide reliable predictions under distributional shifts or unseen data . This work advances resource-efficient and robust inference for both conventional and Bayesian neural networks .
UHKD: A Unified Framework for Heterogeneous Knowledge Distillation via Frequency-Domain Representations
💡 This research reduces computer vision.
Knowledge distillation (KD) is an effective model compression technique that transfers knowledge from a high-performance teacher to a lightweight student, reducing cost while maintaining accuracy . In visual applications, where large-scale image models are widely used, KD enables efficient deployment .
A Survey on Efficient Vision-Language-Action Models
💡 This research presents techniques for computer vision.
Vision-Language-Action models (VLAs) represent a significant frontier in embodied intelligence, aiming to bridge digital knowledge with physical-world interaction . While these models have demonstrated remarkable generalist capabilities, deployment is severely hampered by the substantial computational and data requirements .
Rethinking Inference Placement for Deep Learning across Edge and Cloud Platforms: A Multi-Objective Optimization Perspective and Future Directions
💡 This research running AI locally on devices for language AI.
Edge intelligent applications like VR/AR and language model based chatbots have become widespread with the rapid expansion of IoT and mobile devices . But constrained edge devices often cannot serve the increasingly large and complex deep learning (DL) models . Research aims to balance accuracy, computation delay, transmission delay, and privacy concerns .