Contents
Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs
PIM-AI: A Novel Architecture for High-Efficiency LLM Inference
LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation
Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem
Towards Maximum Likelihood Training for Transducer-based Streaming Speech Recognition
Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation
Action Contextualization: Adaptive Task Planning and Action Tuning using Large Language Models
Transferable Ensemble Black-box Jailbreak Attacks on Large Language Models
Diffusion Self-Distillation for Zero-Shot Customized Image Generation
Surveying the space of descriptions of a composite system with machine learning
XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models
Goetterfunke: Creativity in Machinae Sapiens. About the Qualitative Shift in Generative AI with a Focus on Text-To-Image
Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs
Authors: Ziyi Tang, Ruilin Wang, Weixing Chen, Keze Wang, Yang Liu, Tianshui Chen, Liang Lin
Source and references: https://arxiv.org/abs/2308.11914v3
Introduction
This paper presents CausalGPT, a framework that harnesses multi-agent collaboration to bolster the faithfulness and causality of foundation models for knowledge-based reasoning tasks.
Key Points
This is the first work to illuminate faithfulness and causal consistency in foundation models via a multi-agent collaborative paradigm for knowledge reasoning.
The proposed CaCo-CoT framework implements two types of agents: faithful reasoner and causal evaluator.
Extensive experiments demonstrate that CaCo-CoT achieves state-of-the-art performance on text-based and multi-modal knowledge reasoning benchmarks.
Methodology
CaCo-CoT employs a hierarchical arrangement of multiple agents, including a set of faithful reasoner agents and causal evaluator agents. The reasoners engage in a structured reasoning process modeled after human causal reasoning, while the evaluator assesses the causal consistency of the generated reasoning chains through non-causal evaluation and counterfactual evaluation.
Results and Findings
CaCo-CoT outperforms state-of-the-art methods on three text-based knowledge reasoning benchmarks (ScienceQA, Com2Sense, BoolQ) and two multi-modal reasoning datasets (MME, MMMU). The framework demonstrates significant superiority in addressing factual errors and inferential fallacies, leading to more faithful reasoning outcomes.
Implications and Conclusions
The proposed CausalGPT framework, which integrates causality principles into foundation models, represents a significant advancement in improving the faithfulness and reasoning capabilities of these models for practical applications across diverse domains.
PIM-AI: A Novel Architecture for High-Efficiency LLM Inference
Authors: Cristobal Ortega, Yann Falevoz, Renaud Ayrignac
Source and references: https://arxiv.org/abs/2411.17309v1
Introduction
This research introduces a novel accelerator architecture called PIM-AI for Large Language Models (LLMs) and other memory-intensive workloads. The key innovation is the integration of computational units directly into the memory chip, significantly reducing data transfer bottlenecks and improving overall performance and energy efficiency.
Key Points
PIM-AI is a novel DDR5/LPDDR5 PIM architecture designed for LLM inference without modifying the memory controller or DDR/LPDDR memory PHY.
A simulator was developed to evaluate the performance of PIM-AI in various scenarios, including cloud and mobile environments.
In cloud-based scenarios, PIM-AI reduces the 3-year TCO per queries-per-second by up to 6.94x compared to state-of-the-art GPUs, depending on the LLM model used.
In mobile scenarios, PIM-AI achieves a 10- to 20-fold reduction in energy per token compared to state-of-the-art mobile SoCs, resulting in 25 to 45% more queries per second and 6.9x to 13.4x less energy per query.
These results highlight PIM-AI's potential to revolutionize LLM deployments, making them more efficient, scalable, and sustainable.
Methodology
The researchers developed a simulator for PyTorch models to compare the proposed PIM-AI architecture with state-of-the-art hardware. The simulator can execute multiple layers and functions during model inference without requiring modifications to the original PyTorch model. It collects metrics such as total number of TOPs, execution time, data transfer sizes, energy consumption, and power consumption for a given hardware profile.
The researchers parameterized the PIM-AI chip and DIMM, as well as state-of-the-art NPUs from mobile SoCs and the NVIDIA H100, to create hardware profiles for the simulator.
Two target scenarios were considered: cloud deployment with large models (e.g., larger than 7 billion parameters) and mobile deployment with smaller models (e.g., less or equal than 7 billion parameters).
Results and Findings
In the cloud scenario, the four PIM-AI servers process 55% more queries per second than the DGX-H100 server, while consuming equivalent energy per query. PIM-AI reduces the 3-year TCO per queries-per-second by up to 6.94x compared to the DGX-H100.
In the mobile scenario, the PIM-AI chip outperforms the A17 Pro, Snapdragon 8 Gen3, and Dimensity 9300 SoCs. It achieves a 49.6%, 24.5%, and 24.7% improvement in tokens per second, respectively, and consumes 20 times less energy per token than the A17 Pro and 10 times less than the other SoCs. PIM-AI processes around 45% more queries per second than the A17 Pro and 25% more than the other SoCs, while consuming 6.9x to 13.4x less energy per query.
Implications and Conclusions
The results of this research highlight the potential of the PIM-AI architecture to revolutionize LLM deployments by improving their efficiency, scalability, and sustainability. The significant advantages in performance and energy efficiency, both in cloud and mobile scenarios, demonstrate the viability of this approach in addressing the computational and memory challenges associated with running LLMs.
LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation
Authors: Weiquan Huang, Aoqi Wu, Yifan Yang, Xufang Luo, Yuqing Yang, Liang Hu, Qi Dai, Xiyang Dai, Dongdong Chen, Chong Luo, Lili Qiu
Source and references: https://arxiv.org/abs/2411.04997v3
Introduction
This research paper introduces LLM2CLIP, a method that efficiently incorporates large language models (LLMs) into CLIP training to significantly enhance cross-modal representation learning.
Keep reading with a 7-day free trial
Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.


