KI-News Briefing
30.04.2026
Heute standen die kritischen SicherheitslĂŒcken in Software und die massiven Investitionen in KI-Technologien im Fokus. Unternehmen sollten dringend ihre SicherheitsmaĂnahmen ĂŒberprĂŒfen und die Entwicklungen im KI-Bereich beobachten.
đ° Top-Meldungen
Die Entdeckung einer kritischen SicherheitslĂŒcke im Linux-Kernel erfordert sofortige MaĂnahmen von Unternehmen, um ihre Systeme zu schĂŒtzen. Alle gröĂeren Distributionen sind betroffen, was die Dringlichkeit erhöht.
Eine kritische SicherheitslĂŒcke in cPanel und WebHost Manager wurde entdeckt, die unbefugte Zugriffe ermöglicht. Unternehmen sollten die bereitgestellten Updates schnellstmöglich installieren, um ihre Systeme zu sichern.
Tech-Konzerne investieren stark in Rechenzentren, um im KI-Wettlauf nicht zurĂŒckzufallen. Diese Entwicklungen sind fĂŒr Unternehmen von groĂer Bedeutung, um wettbewerbsfĂ€hig zu bleiben.
Die Cybergang ShinyHunters hat Vimeo-Daten gestohlen und im Darknet veröffentlicht. Dies stellt ein ernsthaftes Risiko fĂŒr die Datensicherheit dar und erfordert von Unternehmen, ihre Sicherheitsstrategien zu ĂŒberdenken.
Nvidia hat das Nemotron 3 Nano Omni veröffentlicht, ein offenes KI-Modell, das multimodale Daten verarbeiten kann. Dies eröffnet neue Möglichkeiten fĂŒr Unternehmen, die KI in verschiedenen Bereichen wie Marketing und Softwareentwicklung einsetzen möchten.
â FĂŒr Kai besonders relevant
Die Entdeckung einer kritischen SicherheitslĂŒcke im Linux-Kernel erfordert sofortige MaĂnahmen von Unternehmen, um ihre Systeme zu schĂŒtzen. Alle gröĂeren Distributionen sind betroffen, was die Dringlichkeit erhöht.
đ Zum OriginalartikelTech-Konzerne investieren stark in Rechenzentren, um im KI-Wettlauf nicht zurĂŒckzufallen. Diese Entwicklungen sind fĂŒr Unternehmen von groĂer Bedeutung, um wettbewerbsfĂ€hig zu bleiben.
đ Zum Originalartikelđ Nur beobachten
đŹ Research Highlights heute
Alle 596 Papers â
â¶
â Highlight
arXiv cs.CL
Evaluation Revisited: A Taxonomy of Evaluation Concerns in Natural Language Processing
arXiv:2604.25923v1 Announce Type: new
Abstract: Recent advances in large language models (LLMs) have prompted a growing body of work that questions the methodology of prevailing eâŠ
arXiv â
Evaluation Revisited: A Taxonomy of Evaluation Concerns in Natural Language Processing
arXiv:2604.25923v1 Announce Type: new Abstract: Recent advances in large language models (LLMs) have prompted a growing body of work that questions the methodology of prevailing eâŠ
arXiv:2604.25923v1 Announce Type: new Abstract: Recent advances in large language models (LLMs) have prompted a growing body of work that questions the methodology of prevailing evaluation practices. However, many such critiques have already been extensively debated in natural language processing (NLP): a field with a long history of methodological reflection on evaluation. We conduct a scoping review of research on evaluation concerns in NLP and develop a taxonomy, synthesizing recurring positions and trade-offs within each area. We also discuss practical implications of the taxonomy, including a structured checklist to support more deliberate evaluation design and interpretation. By situating contemporary debates within their historical context, this work provides a consolidated reference for reasoning about evaluation practices.
â VollstĂ€ndiges Paper auf arXiv lesen
â¶
â Highlight
arXiv cs.CL
One Word at a Time: Incremental Completion Decomposition Breaks LLM Safety
arXiv:2604.25921v1 Announce Type: new
Abstract: Large Language Models (LLMs) are trained to refuse harmful requests, yet they remain vulnerable to jailbreak attacks that exploit wâŠ
arXiv â
One Word at a Time: Incremental Completion Decomposition Breaks LLM Safety
arXiv:2604.25921v1 Announce Type: new Abstract: Large Language Models (LLMs) are trained to refuse harmful requests, yet they remain vulnerable to jailbreak attacks that exploit wâŠ
arXiv:2604.25921v1 Announce Type: new Abstract: Large Language Models (LLMs) are trained to refuse harmful requests, yet they remain vulnerable to jailbreak attacks that exploit weaknesses in conversational safety mechanisms. We introduce Incremental Completion Decomposition (ICD), a trajectory-based jailbreak strategy that elicits a sequence of single-word continuations related to a malicious request before eliciting the full response. In addition, we propose variants of ICD by manually picking or model-generating the one-word continuation, as well as prefilling when eliciting the full model response in the final step. We systematically evaluate these variants across a broad set of model families, demonstrating superior Attack Success Rate (ASR) on AdvBench, JailbreakBench, and StrongREJECT compared to existing methods. In addition, we provide a theoretical account of why ICD is effective and present mechanistic evidence that successful attack trajectories systematically suppress refusal-related representations and shift activations away from safety-aligned states.
â VollstĂ€ndiges Paper auf arXiv lesen
â¶
â Highlight
arXiv cs.CL
MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese
arXiv:2604.25926v1 Announce Type: new
Abstract: The use of large language models (LLMs) for complex mathematical reasoning is an emergent area of research, with fast progress in mâŠ
arXiv â
MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese
arXiv:2604.25926v1 Announce Type: new Abstract: The use of large language models (LLMs) for complex mathematical reasoning is an emergent area of research, with fast progress in mâŠ
arXiv:2604.25926v1 Announce Type: new Abstract: The use of large language models (LLMs) for complex mathematical reasoning is an emergent area of research, with fast progress in methods, models, and benchmark datasets. However, most mathematical reasoning evaluations exhibit a significant linguistic bias, with the vast majority of benchmark datasets being exclusively in English or (at best) translated from English. We address this limitation by introducing {\sc Math-PT}, a novel dataset comprising 1,729 mathematical problems written in European and Brazilian Portuguese. {\sc Math-PT} is curated from a variety of high-quality native sources, including mathematical Olympiads, competitions, and exams from Portugal and Brazil. We present a comprehensive benchmark of current state-of-the-art LLMs on {\sc Math-PT}, revealing that frontier reasoning models achieve strong performance in multiple choice questions compared to open weight models, but that their performance decreases for questions with figures or open-ended questions. To facilitate future research, we release the benchmark dataset and model outputs.
â VollstĂ€ndiges Paper auf arXiv lesen
â¶
â Highlight
arXiv cs.CL
Information Extraction from Electricity Invoices with General-Purpose Large Language Models
arXiv:2604.25927v1 Announce Type: new
Abstract: Information extraction from semi-structured business documents remains a critical challenge for enterprise management. This study eâŠ
arXiv â
Information Extraction from Electricity Invoices with General-Purpose Large Language Models
arXiv:2604.25927v1 Announce Type: new Abstract: Information extraction from semi-structured business documents remains a critical challenge for enterprise management. This study eâŠ
arXiv:2604.25927v1 Announce Type: new Abstract: Information extraction from semi-structured business documents remains a critical challenge for enterprise management. This study evaluates the capability of general-purpose Large Language Models to extract structured information from Spanish electricity invoices without task-specific fine-tuning. Using a subset of the IDSEM dataset, we benchmark two architecturally distinct models, Gemini 1.5 Pro and Mistral-small, across 19 parameter configurations and 6 prompting strategies. Our experimental framework treats prompt engineering as the primary experimental variable, comparing zero-shot baselines against increasingly sophisticated few-shot approaches and iterative extraction strategies. Results demonstrate that prompt quality dominates over hyperparameter tuning: the F1-score variation across all parameter configurations is marginal, while the gap between zero-shot and the best few-shot strategy exceeds 19 percentage points. The best configuration (few-shot with cross-validation) achieves an F1-score of 97.61% for Gemini and 96.11% for Mistral-small, with document template structure emerging as the primary determinant of extraction difficulty. These findings establish that prompt design is the critical lever for maximizing extraction fidelity in LLM-based document processing, thereby providing an empirical framework for integrating general-purpose LLMs into business document automation.
â VollstĂ€ndiges Paper auf arXiv lesen
â¶
â Highlight
arXiv cs.CL
CogRAG+: Cognitive-Level Guided Diagnosis and Remediation of Memory and Reasoning Deficiencies in Professional Exam QA
arXiv:2604.25928v1 Announce Type: new
Abstract: Professional domain knowledge underpins human civilization, serving as both the basis for industry entry and the core of complex deâŠ
arXiv â
CogRAG+: Cognitive-Level Guided Diagnosis and Remediation of Memory and Reasoning Deficiencies in Professional Exam QA
arXiv:2604.25928v1 Announce Type: new Abstract: Professional domain knowledge underpins human civilization, serving as both the basis for industry entry and the core of complex deâŠ
arXiv:2604.25928v1 Announce Type: new Abstract: Professional domain knowledge underpins human civilization, serving as both the basis for industry entry and the core of complex decision-making and problem-solving. However, existing large language models often suffer from opaque inference processes in which retrieval and reasoning are tightly entangled, causing knowledge gaps and reasoning inconsistencies in professional tasks. To address this, we propose CogRAG+, a training-free framework that decouples and aligns the retrieval-augmented generation pipeline with human cognitive hierarchies. First, we introduce Reinforced Retrieval, a judge-driven dual-path strategy with fact-centric and option-centric paths that strengthens retrieval and mitigates cascading failures caused by missing foundational knowledge. We then develop cognition-stratified Constrained Reasoning, which replaces unconstrained chain-of-thought generation with structured templates to reduce logical inconsistency and generative redundancy. Experiments on two representative models, Qwen3-8B and Llama3.1-8B, show that CogRAG+ consistently outperforms general-purpose models and standard RAG methods on the Registered Dietitian qualification exam. In single-question mode, it raises overall accuracy to 85.8\% for Qwen3-8B and 60.3\% for Llama3.1-8B, with clear gains over vanilla baselines. Constrained Reasoning also reduces the unanswered rate from 7.6\% to 1.4\%. CogRAG+ offers a robust, model-agnostic path toward training-free expert-level performance in specialized domains.
â VollstĂ€ndiges Paper auf arXiv lesen