State of AI

State of AI

Bi-Weekly AI Research Roundup

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI's avatar
State of AI
Nov 29, 2024
∙ Paid
2
2
Share

Contents

  1. Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs

  2. PIM-AI: A Novel Architecture for High-Efficiency LLM Inference

  3. LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation

  4. Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration

  5. Pushing the Limits of Large Language Model Quantization via the Linearity Theorem

  6. Towards Maximum Likelihood Training for Transducer-based Streaming Speech Recognition

  7. Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats

  8. Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens

  9. MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation

  10. Action Contextualization: Adaptive Task Planning and Action Tuning using Large Language Models

  11. Transferable Ensemble Black-box Jailbreak Attacks on Large Language Models

  12. Diffusion Self-Distillation for Zero-Shot Customized Image Generation

  13. Surveying the space of descriptions of a composite system with machine learning

  14. XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models

  15. Goetterfunke: Creativity in Machinae Sapiens. About the Qualitative Shift in Generative AI with a Focus on Text-To-Image


Refer a friend

Thanks for reading State of AI!

Share

Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs

Authors: Ziyi Tang, Ruilin Wang, Weixing Chen, Keze Wang, Yang Liu, Tianshui Chen, Liang Lin

Source and references: https://arxiv.org/abs/2308.11914v3


Introduction

This paper presents CausalGPT, a framework that harnesses multi-agent collaboration to bolster the faithfulness and causality of foundation models for knowledge-based reasoning tasks.

Key Points

  • This is the first work to illuminate faithfulness and causal consistency in foundation models via a multi-agent collaborative paradigm for knowledge reasoning.

  • The proposed CaCo-CoT framework implements two types of agents: faithful reasoner and causal evaluator.

  • Extensive experiments demonstrate that CaCo-CoT achieves state-of-the-art performance on text-based and multi-modal knowledge reasoning benchmarks.

Methodology

CaCo-CoT employs a hierarchical arrangement of multiple agents, including a set of faithful reasoner agents and causal evaluator agents. The reasoners engage in a structured reasoning process modeled after human causal reasoning, while the evaluator assesses the causal consistency of the generated reasoning chains through non-causal evaluation and counterfactual evaluation.

Results and Findings

CaCo-CoT outperforms state-of-the-art methods on three text-based knowledge reasoning benchmarks (ScienceQA, Com2Sense, BoolQ) and two multi-modal reasoning datasets (MME, MMMU). The framework demonstrates significant superiority in addressing factual errors and inferential fallacies, leading to more faithful reasoning outcomes.

Implications and Conclusions

The proposed CausalGPT framework, which integrates causality principles into foundation models, represents a significant advancement in improving the faithfulness and reasoning capabilities of these models for practical applications across diverse domains.


PIM-AI: A Novel Architecture for High-Efficiency LLM Inference

Authors: Cristobal Ortega, Yann Falevoz, Renaud Ayrignac

Source and references: https://arxiv.org/abs/2411.17309v1


Introduction

This research introduces a novel accelerator architecture called PIM-AI for Large Language Models (LLMs) and other memory-intensive workloads. The key innovation is the integration of computational units directly into the memory chip, significantly reducing data transfer bottlenecks and improving overall performance and energy efficiency.

Key Points

  • PIM-AI is a novel DDR5/LPDDR5 PIM architecture designed for LLM inference without modifying the memory controller or DDR/LPDDR memory PHY.

  • A simulator was developed to evaluate the performance of PIM-AI in various scenarios, including cloud and mobile environments.

  • In cloud-based scenarios, PIM-AI reduces the 3-year TCO per queries-per-second by up to 6.94x compared to state-of-the-art GPUs, depending on the LLM model used.

  • In mobile scenarios, PIM-AI achieves a 10- to 20-fold reduction in energy per token compared to state-of-the-art mobile SoCs, resulting in 25 to 45% more queries per second and 6.9x to 13.4x less energy per query.

  • These results highlight PIM-AI's potential to revolutionize LLM deployments, making them more efficient, scalable, and sustainable.

Methodology

The researchers developed a simulator for PyTorch models to compare the proposed PIM-AI architecture with state-of-the-art hardware. The simulator can execute multiple layers and functions during model inference without requiring modifications to the original PyTorch model. It collects metrics such as total number of TOPs, execution time, data transfer sizes, energy consumption, and power consumption for a given hardware profile.

The researchers parameterized the PIM-AI chip and DIMM, as well as state-of-the-art NPUs from mobile SoCs and the NVIDIA H100, to create hardware profiles for the simulator.

Two target scenarios were considered: cloud deployment with large models (e.g., larger than 7 billion parameters) and mobile deployment with smaller models (e.g., less or equal than 7 billion parameters).

Results and Findings

In the cloud scenario, the four PIM-AI servers process 55% more queries per second than the DGX-H100 server, while consuming equivalent energy per query. PIM-AI reduces the 3-year TCO per queries-per-second by up to 6.94x compared to the DGX-H100.

In the mobile scenario, the PIM-AI chip outperforms the A17 Pro, Snapdragon 8 Gen3, and Dimensity 9300 SoCs. It achieves a 49.6%, 24.5%, and 24.7% improvement in tokens per second, respectively, and consumes 20 times less energy per token than the A17 Pro and 10 times less than the other SoCs. PIM-AI processes around 45% more queries per second than the A17 Pro and 25% more than the other SoCs, while consuming 6.9x to 13.4x less energy per query.

Implications and Conclusions

The results of this research highlight the potential of the PIM-AI architecture to revolutionize LLM deployments by improving their efficiency, scalability, and sustainability. The significant advantages in performance and energy efficiency, both in cloud and mobile scenarios, demonstrate the viability of this approach in addressing the computational and memory challenges associated with running LLMs.



LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation

Authors: Weiquan Huang, Aoqi Wu, Yifan Yang, Xufang Luo, Yuqing Yang, Liang Hu, Qi Dai, Xiyang Dai, Dongdong Chen, Chong Luo, Lili Qiu

Source and references: https://arxiv.org/abs/2411.04997v3


Introduction

This research paper introduces LLM2CLIP, a method that efficiently incorporates large language models (LLMs) into CLIP training to significantly enhance cross-modal representation learning.

Get 7 day free trial

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 StateOfAI
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture