eess updates on arXiv.org
eess updates on the arXiv.org e-print archive.
RMSup: Physics-Informed Radio Map Super-Resolution for Compute-Enhanced Integrated Sensing and Communications
oai:arXiv.org:2512.10965v1
arXiv:2512.10965v1 Announce Type: new
Abstract: Radio maps (RMs) provide a spatially continuous description of wireless propagation, enabling cross-layer optimization and unifying communication and sensing for integrated sensing and communications (ISAC). However, constructing high-fidelity RMs at operational scales is difficult, since physics-based solvers are time-consuming and require precise scene models, while learning methods degrade under incomplete priors and sparse measurements, often smoothing away critical discontinuities. We present RMSup, a physics-informed super-resolution framework that functions with uniform sparse sampling and imperfect environment priors. RMSup extracts Helmholtz equation-informed boundary and singularity prompts from the measurements, fuses them with base-station side information and coarse scene descriptors as conditional inputs, and employs a boundary-aware dual-head network to reconstruct a high-fidelity RM and recover environmental contours jointly. Experimental results show the proposed RMsup achieves state-of-the-art performance both in RM construction and ISAC-related environment sensing.
Uplink Rate Maximization for Pinching Antenna- Assisted Covert Backscatter Communication
oai:arXiv.org:2512.10970v1
arXiv:2512.10970v1 Announce Type: new
Abstract: The emerging pinching antenna (PA) technology enables flexible antenna positioning for creating line-of-sight (LoS) links, thus offering substantial potential to facilitate ambient signal-based backscatter communication (BSC). This paper investigates PA-assisted BSC for enhanced communication and covertness in the presence of a randomly distributed eavesdropper. An optimization problem is formulated to maximize the uplink covert transmission rate by jointly optimizing the transmit power and antenna positions while satisfying both communication reliability and covertness constraints. An alternative optimization (AO)-based framework is proposed to solve this problem. Numerical results demonstrate that the proposed PA-BSC effectively mitigates the double near-far problem, where energy harvesting and backscatter transmission degrade simultaneously due to distance disparities, thereby improving downlink energy harvesting and uplink data transmission while maintaining covertness performance under practical deployment scenarios.
An Open Source Realtime GPU Beamformer for Row-Column and Top Orthogonal to Bottom Electrode (TOBE) Arrays
oai:arXiv.org:2512.11086v1
arXiv:2512.11086v1 Announce Type: new
Abstract: Research ultrasound platforms have enabled many next-generation imaging sequences but have lacked realtime navigation capabilities for emerging 2D arrays such as row-column arrays (RCAs). We present an open-source, GPU-accelerated reconstruction and rendering software suite integrated with a programmable ultrasound platform and novel electrostrictive Top-Orthogonal-to-Bottom-Electrode (TOBE) arrays. The system supports advanced real-time modes, including cross-plane aperture-encoded synthetic-aperture imaging and aperture-encoded volumetric scanning. TOBE-enabled methods demonstrate improved image quality and expanded field of view compared with conventional RCA techniques. The software implements beamforming and rendering kernels using OpenGL compute shaders and is designed for maximum data throughput helping to minimize stalls and latency. Accompanying sample datasets and example scripts for offline reconstruction are provided to facilitate external testing.
Feature Compression for Machines with Range-Based Channel Truncation and Frame Packing
oai:arXiv.org:2512.11134v1
arXiv:2512.11134v1 Announce Type: new
Abstract: This paper proposes a method that enhances the compression performance of the current model under development for the upcoming MPEG standard on Feature Coding for Machines (FCM). This standard aims at providing inter-operable compressed bitstreams of features in the context of split computing, i.e., when the inference of a large computer vision neural-network (NN)-based model is split between two devices. Intermediate features can consist of multiple 3D tensors that can be reduced and entropy coded to limit the required bandwidth of such transmission. In the envisioned design for the MPEG-FCM standard, intermediate feature tensors may be reduced using Neural layers before being converted into 2D video frames that can be coded using existing video compression standards. This paper introduces an additional channel truncation and packing method which enables the system to preserve the relevant channels, depending on the statistics of the features at inference time, while preserving the computer vision task performance at the receiver. Implemented within the MPEG-FCM test model, the proposed method yields an average reduction in rate by 10.59% for a given accuracy on multiple computer vision tasks and datasets.
A Unified Theory of Dynamic Programming Algorithms in Small Target Detection
oai:arXiv.org:2512.11170v1
arXiv:2512.11170v1 Announce Type: new
Abstract: Small target detection is inherently challenging due to the minimal size, lack of distinctive features, and the presence of complex backgrounds. Heavy noise further complicates the task by both obscuring and imitating the target appearance. Weak target signals require integrating target trajectories over multiple frames, an approach that can be computationally intensive. Dynamic programming offers an efficient solution by decomposing the problem into iterative maximization. This, however, has limited the analytical tools available for their study. In this paper, we present a robust framework for this class of algorithms and establish rigorous convergence results for error rates under mild assumptions. We depart from standard analysis by modeling error probabilities as a function of distance from the target, allowing us to construct a relationship between uncertainty in location and uncertainty in existence. From this framework, we introduce a novel algorithm, Normalized Path Integration (NPI), that utilizes the similarity between sequential observations, enabling target detection with unknown or time varying features.
Mitigating Dynamic Tip-Over during Mobile Crane Slewing using Input Shaping
oai:arXiv.org:2512.11228v1
arXiv:2512.11228v1 Announce Type: new
Abstract: Payload swing during rapid slewing of mobile cranes poses a safety risk, as it generates overturning moments that can lead to tip-over accidents of mobile cranes. Currently, to limit the risk of tip-over, mobile crane operators are forced to either reduce the slewing speed (which lowers productivity) or reduce the load being carried to reduce the induced moments. Both of these approaches reduce productivity. This paper seeks to enable rapid slewing without compromising safety by applying input shaping to the crane-slewing commands generated by the operator. A key advantage of this approach is that the input shaper requires only the information about the rope length, and does not require detailed mobile crane dynamics. Simulations and experiments show that the proposed method reduces residual payload swing and enables significantly higher slewing speeds without tip over, reducing slewing completion time by at least 38% compared to unshaped control. Human control with input shaping improves task completion time by 13%, reduces the peak swing by 18%, and reduces the potential of collisions by 82% when compared to unshaped control. Moreover, shaped control with a human had no tip-over, whereas large swing led to tip-over without input shaping. Thereby, the proposed method substantially recovers the operational-safety envelope of mobile cranes (designed to avoid tip-over using static analysis) that would otherwise be lost in dynamic conditions. Videos and demonstrations are available at https://youtu.be/dVy3bbIhrBU.
Robust Detection of Underwater Target Against Non-Uniform Noise With Optical Fiber DAS Array
oai:arXiv.org:2512.11231v1
arXiv:2512.11231v1 Announce Type: new
Abstract: The detection of underwater targets is severely affected by the non-uniform spatial characteristics of marine environmental noise. Additionally, the presence of both natural and anthropogenic acoustic sources, including shipping traffic, marine life, and geological activity, further complicates the underwater acoustic landscape. Addressing these challenges requires advanced underwater sensors and robust signal processing techniques. In this paper, we present a novel approach that leverages an optical fiber distributed acoustic sensing (DAS) system combined with a broadband generalized sparse covariance-fitting framework for underwater target direction sensing, particularly focusing on robustness against non-uniform noise. The DAS system incorporates a newly developed spiral-sensitized optical cable, which significantly improves sensitivity compared to conventional submarine cables. This innovative design enables the system to capture acoustic signals with greater precision. Notably, the sensitivity of the spiral-wound sensitized cable is around -145.69 dB re: 1 rad / (uPa*m), as measured inside the standing-wave tube. Employing simulations, we assess the performance of the algorithm across diverse noise levels and target configurations, consistently revealing higher accuracy and reduced background noise compared to conventional beamforming techniques and other sparse techniques. In a controlled pool experiment, the correlation coefficient between waveforms acquired by the DAS system and a standard hydrophone reached 0.973, indicating high fidelity in signal capture.
Model Reduction of Multicellular Communication Systems via Singular Perturbation: Sender Receiver Systems
oai:arXiv.org:2512.11244v1
arXiv:2512.11244v1 Announce Type: new
Abstract: We investigate multicellular sender receiver systems embedded in hydrogel beads, where diffusible signals mediate interactions among heterogeneous cells. Such systems are modeled by PDE ODE couplings that combine three dimensional diffusion with nonlinear intracellular dynamics, making analysis and simulation challenging. We show that the diffusion dynamics converges exponentially to a quasi steady spatial profile and use singular perturbation theory to reduce the model to a finite dimensional multiagent network. A closed form communication matrix derived from the spherical Green's function captures the effective sender receiver coupling. Numerical results show the reduced model closely matches the full dynamics while enabling scalable simulation of large cell populations.
Gig-work Management System with Chance-Constraints Verification Algorithm
oai:arXiv.org:2512.11308v1
arXiv:2512.11308v1 Announce Type: new
Abstract: This paper proposes the framework of an efficient gig-work management system. A gig-work management system recommends one-off tasks with information about task hours and wages to gig-workers. To enable effective management, this paper develops a model of gig-workers' decision-making. Then, based on the model, we formulate an optimization problem to determine the optimal task hours and wages. The formulated problem belongs to the class of chance-constrained model predictive control (CC-MPC) problems. To efficiently solve the CC-MPC problem, we develop an approximate solution algorithm with guaranteed confidence levels. Finally, we develop gig-worker models based on data collected through crowdsourcing.
Controlled Evolution-Based Day-Ahead Robust Dispatch Considering Frequency Security with Frequency Regulation Loads and Curtailable Loads
oai:arXiv.org:2512.11333v1
arXiv:2512.11333v1 Announce Type: new
Abstract: With the extensive integration of volatile and uncertain renewable energy, power systems face significant challenges in primary frequency regulation due to instantaneous power fluctuations. However, the maximum frequency deviation constraint is inherently non-convex, and commonly used two-stage dispatch methods overlook causality, potentially resulting in infeasible day-ahead decisions. This paper presents a controlled evolution-based day-ahead robust dispatch method to address these issues. First, we suggest the convex relaxation technique to transform the maximum frequency deviation constraint to facilitate optimization. Then, an evolution-based robust dispatch framework is introduced to align day-ahead decisions with intraday strategies, ensuring both frequency security and power supply reliability. Additionally, a novel controlled evolution-based algorithm is developed to solve this framework efficiently. Case studies on a modified IEEE 14-bus system demonstrate the superiority of the proposed method in enhancing frequency security and system reliability.
Source Localization and Power Estimation through RISs: Performance Analysis and Prototype Validations
oai:arXiv.org:2512.11420v1
arXiv:2512.11420v1 Announce Type: new
Abstract: This paper investigates the capabilities and effectiveness of backward localization centered on reconfigurable intelligent surfaces (RISs). In the backward sensing paradigm, the region of interest (RoI) is illuminated using a set of diverse radiation patterns. These patterns encode spatial information into a sequence of measurements, which are subsequently processed to reconstruct the RoI. We show that a single RIS can estimate the direction of arrival of incident waves by leveraging configurational diversity, and that the spatial diversity provided by multiple RISs further improves the accuracy of source localization and power estimation. The underlying structure of the sensing operator in the multi-snapshot measurement process is clarified. For single-RIS localization, the sensing operator is decomposed into a product of structured matrices, each corresponding to a specific physical process: wave propagation to and from the RIS, the relative phase offsets of elements with respect to the reference point, and the applied phase configuration of each element. A unified framework for identifying key performance indicators is established by analyzing the conditioning of the sensing operators. In the multi-RIS setting, we derive--via rank analysis--the governing law among the RoI size, the number of elements, and the number of measurements. Upper bounds on the relative error of the least squares reconstruction algorithm are derived. These bounds clarify how key performance indicators affect estimation error and provide valuable guidance for system-level optimization. Numerical experiments confirm that the trend of the relative error is consistent with the theoretical bounds.
Point Target Near-Field Bistatic Imaging: Chirp-Based Aliasing Analysis
oai:arXiv.org:2512.11444v1
arXiv:2512.11444v1 Announce Type: new
Abstract: This paper presents a chirp-based framework for characterising aliasing in a bistatic Near-Field (NF) imaging system equipped with multidimensional antenna arrays. Extending monostatic formulations, we derive closed-form expressions for the maximum spatial frequency, enabling the analytical derivations of the conditions for aliasing-free image reconstruction. The framework also provides a geometric interpretation of aliasing based on the antenna array geometry, target position, and antenna element spacing. Numerical results corroborate theoretical findings and show that the aliasing-free region enlarges with smaller antenna spacing, greater target range, lower array dimensionality, and smaller arrays. These results enable more effective design of bistatic NF imaging systems.
STAR-RIS-Aided Secure Communications:Analytical Insights and Performance Comparison
oai:arXiv.org:2512.11461v1
arXiv:2512.11461v1 Announce Type: new
Abstract: Simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) have emerged as a promising technology for enabling full-space signal manipulation and enhancing wireless network coverage and capacity. In this article, we present a comprehensive analytical comparison of STAR-RIS-assisted systems with single-input single-output (SISO), conventional RISs, and decode-and-forward (DF) relaying schemes, including both half-duplex (HD) and full-duplex (FD) modes. Closed-form expressions are derived for the achievable secrecy rates of STAR-RIS-aided communications under both the absence and presence of eavesdroppers. Unlike most existing works, the direct source destination link is incorporated in all considered schemes, and optimal transmit power allocation is investigated for HD and FD-DF relaying. Furthermore, we provide the conditions under which STAR-RIS outperforms HD- and FD-DF relaying and quantify the minimum number of STAR-RIS elements required to achieve superior rates. The impacts of key system parameters including transmit power, number of elements, reflection-to-transmission power ratio, element-splitting factor, and deployment positions on both achievable and secrecy performance are investigated. The results reveal that STAR-RIS systems can achieve superior rates and secrecy rates compared to all benchmark schemes.
An Input-Output Data-Driven Dissipativity Approach for Compositional Stability Certification of Interconnected LTI MIMO Systems
oai:arXiv.org:2512.11468v2
arXiv:2512.11468v2 Announce Type: new
Abstract: We propose an input-output data-driven framework for certifying the stability of interconnected multiple-input-multiple-output linear time-invariant discrete-time systems via QSR-dissipativity. That is, by using measured input-output trajectories of each subsystem, we verify dissipative properties and extract local passivity indices without requiring an explicit model identification. These passivity indices are then used to derive conditions under which the equilibrium of the interconnected system is stable. In particular, the framework identifies how the lack of passivity in some subsystems can be compensated by surpluses in others. The proposed approach enables a compositional stability analysis by combining subsystem-level conditions into a criterion valid for the overall interconnected system. We illustrate via a numerical case study, how to compute channel-wise passivity indices and infer stability guarantees directly from data with the proposed method.
A Robust Model Predictive Control Method for Networked Control Systems
oai:arXiv.org:2512.11481v1
arXiv:2512.11481v1 Announce Type: new
Abstract: Robustly compensating network constraints such as delays and packet dropouts in networked control systems is crucial for remotely controlling dynamical systems. This work proposes a novel prediction consistent method to cope with delays and packet losses as encountered in UDP-type communication systems. The augmented control system preserves all properties of the original model predictive control method under the network constraints. Furthermore, we propose to use linear tube MPC with the novel method and show that the system converges robustly to the origin under mild conditions. We illustrate this with simulation examples of a cart pole and a continuous stirred tank reactor.
Optimal Delay Compensation in Networked Predictive Control
oai:arXiv.org:2512.11492v1
arXiv:2512.11492v1 Announce Type: new
Abstract: Networked Predictive Control is widely used to mitigate the effect of delays and dropouts in Networked Control Systems, particularly when these exceed the sampling time. A key design choice of these methods is the delay bound, which determines the prediction horizon and the robustness to information loss. This work develops a systematic method to select the optimal bound by quantifying the trade-off between prediction errors and open-loop operation caused by communication losses. Simulation studies demonstrate the performance gains achieved with the optimal bound.
Shared Situational Awareness Using Hybrid Zonotopes with Confidence Metric
oai:arXiv.org:2512.11493v1
arXiv:2512.11493v1 Announce Type: new
Abstract: Situational awareness for connected and automated vehicles describes the ability to perceive and predict the behavior of other road-users in the near surroundings. However, pedestrians can become occluded by vehicles or infrastructure, creating significant safety risks due to limited visibility. Vehicle-to-everything communication enables the sharing of perception data between connected road-users, allowing for a more comprehensive awareness. The main challenge is how to fuse perception data when measurements are inconsistent with the true locations of pedestrians. Inconsistent measurements can occur due to sensor noise, false positives, or communication issues. This paper employs set-based estimation with constrained zonotopes to compute a confidence metric for the measurement set from each sensor. These sets and their confidences are then fused using hybrid zonotopes. This method can account for inconsistent measurements, enabling reliable and robust fusion of the sensor data. The effectiveness of the proposed method is demonstrated in both simulation and real experiments.
Data-driven control-oriented modelling for MPC-based control of urban drainage systems
oai:arXiv.org:2512.11531v1
arXiv:2512.11531v1 Announce Type: new
Abstract: This article presents a data-driven, control-oriented modelling methodology for urban drainage systems (UDS). The proposed framework requires three main key components: input-output data from the element to be modelled, expert knowledge to define the model structure, and data-fitting techniques to obtain optimal parameters. The methodology is evaluated using a realistic benchmark from an UDS in Madrid, Spain. The results show high model accuracy and improved performance within a MPC scheme, reducing discharge and increasing treatment facilities utilization.
RadarFuseNet: Complex-Valued Attention-Based Fusion of IQ Time- and Frequency-Domain Radar Features for Classification Tasks
oai:arXiv.org:2512.11537v1
arXiv:2512.11537v1 Announce Type: new
Abstract: Millimeter-wave (mmWave) radar has emerged as a compact and powerful sensing modality for advanced perception tasks that leverage machine learning techniques. It is particularly effective in scenarios where vision-based sensors fail to capture reliable information, such as detecting occluded objects or distinguishing between different surface materials in indoor environments. Due to the non-linear characteristics of mmWave radar signals, deep learning-based methods are well suited for extracting relevant information from in-phase and quadrature (IQ) data. However, the current state of the art in IQ signal-based occluded-object and material classification still offers substantial potential for further improvement. In this paper, we propose a bidirectional cross-attention fusion network that combines IQ-signal and FFT-transformed radar features obtained by distinct complex-valued convolutional neural networks (CNNs). The proposed method achieves improved performance and robustness compared to standalone complex-valued CNNs. We achieve a near-perfect material classification accuracy of 99.92% on samples collected at same sensor-to-surface distances used during training, and an improved accuracy of 67.38% on samples measured at previously unseen distances, demonstrating improved generalization ability across varying measurement conditions. Furthermore, the accuracy for occluded object classification improves from 91.99% using standalone complex-valued CNNs to 94.20% using our proposed approach.
All-in-One ASR: Unifying Encoder-Decoder Models of CTC, Attention, and Transducer in Dual-Mode ASR
oai:arXiv.org:2512.11543v1
arXiv:2512.11543v1 Announce Type: new
Abstract: This paper proposes a unified framework, All-in-One ASR, that allows a single model to support multiple automatic speech recognition (ASR) paradigms, including connectionist temporal classification (CTC), attention-based encoder-decoder (AED), and Transducer, in both offline and streaming modes. While each ASR architecture offers distinct advantages and trade-offs depending on the application, maintaining separate models for each scenario incurs substantial development and deployment costs. To address this issue, we introduce a multi-mode joiner that enables seamless integration of various ASR modes within a single unified model. Experiments show that All-in-One ASR significantly reduces the total model footprint while matching or even surpassing the recognition performance of individually optimized ASR models. Furthermore, joint decoding leverages the complementary strengths of different ASR modes, yielding additional improvements in recognition accuracy.
ACCOR: Attention-Enhanced Complex-Valued Contrastive Learning for Occluded Object Classification Using mmWave Radar IQ Signals
oai:arXiv.org:2512.11556v1
arXiv:2512.11556v1 Announce Type: new
Abstract: Millimeter-wave (mmWave) radar has emerged as a robust sensing modality for several areas, offering reliable operation under adverse environmental conditions. Its ability to penetrate lightweight materials such as packaging or thin walls enables non-visual sensing in industrial and automated environments and can provide robotic platforms with enhanced environmental perception when used alongside optical sensors. Recent work with MIMO mmWave radar has demonstrated its ability to penetrate cardboard packaging for occluded object classification. However, existing models leave room for improvement and warrant a more thorough evaluation across different sensing frequencies. In this paper, we propose ACCOR, an attention-enhanced complex-valued contrastive learning approach for radar, enabling robust occluded object classification. We process complex-valued IQ radar signals using a complex-valued CNN backbone, followed by a multi-head attention layer and a hybrid loss. Our proposed loss combines a weighted cross-entropy term with a supervised contrastive term. We further extend an existing 64 GHz dataset with a 67 GHz subset of the occluded objects and evaluate our model using both center frequencies. Performance evaluation demonstrates that our approach outperforms prior radar-specific models and image classification models with adapted input, achieving classification accuracies of 96.60% at 64 GHz and 93.59% at 67 GHz for ten different objects. These results demonstrate the benefits of complex-valued deep learning with attention and contrastive learning for mmWave radar-based occluded object classification in industrial and automated environments.
A Modeling and Optimization Framework for Fostering Modal Shift through the Integration of Tradable Credits and Demand-Responsive Autonomous Shuttles
oai:arXiv.org:2512.11607v1
arXiv:2512.11607v1 Announce Type: new
Abstract: Tradable Credit Schemes (TCS) promote the use of public and shared transport by capping private car usage while maintaining fair welfare outcomes by allowing credit trading. However, most existing studies assume unlimited public transit capacity or a fixed occupancy of shared modes, often neglecting waiting time and oversimplifying time-based costs by depending solely on in-vehicle travel time. These assumptions can overstate the system's performance with TCS regulation, especially when there are insufficient public or shared transport supplies.
To address this, we develop a dynamic multimodal equilibrium model to capture operation constraints and induced waiting times under TCS regulation. The model integrates travelers' mode choices, credit trading, traffic dynamics, and waiting time, which depend on key operational features of service vehicles such as fleet size and capacity.
Besides, most TCS studies assume fixed transport supply, overlooking supply-side responses triggered by demand shifts. Therefore, we further propose integrating adaptive supply management through the deployment of Demand-Responsive Autonomous Shuttles (DRAS) and developing a bi-level optimization framework that incorporates the equilibrium model to jointly optimize TCS design and operational strategies for the DRAS.
We apply the framework to a section of the A10 highway near Paris, France, to examine demand-supply interactions and assess the potential benefits of jointly implementing TCS and DRAS. Numerical results demonstrate the importance of modeling operational features within multimodal equilibrium and incorporating flexible supply in TCS policies for mitigating overall generalized cost.
PaddleSat Optical Charging Station in Space
oai:arXiv.org:2512.11629v1
arXiv:2512.11629v1 Announce Type: new
Abstract: This work investigates the feasibility and design trade-offs for a companion spacecraft, or PaddleSat, to charge a host spacecraft by wirelessly transmitting power using a directional laser system. The primary goal of the PaddleSat is to supplement power on a host spacecraft to reduce the requirements for onboard power systems of the host spacecraft or extend mission lifetimes. System performance estimates, link budget calculations, optical transmission hardware and link analysis, design tradeoffs between beam divergence, optical efficiency, and relative orbital control requirements are examined.
Two-dimensional Decompositions of High-dimensional Configurations for Efficient Multi-vehicle Coordination at Intelligent Intersections
oai:arXiv.org:2512.11713v1
arXiv:2512.11713v1 Announce Type: new
Abstract: For multi-vehicle complex traffic scenarios in shared spaces such as intelligent intersections, safe coordination and trajectory planning is challenging due to computational complexity. To meet this challenge, we introduce a computationally efficient method for generating collision-free trajectories along predefined vehicle paths. We reformulate a constrained minimum-time trajectory planning problem as a problem in a high-dimensional configuration space, where conflict zones are modeled by high-dimensional polyhedra constructed from two-dimensional rectangles. Still, in such a formulation, as the number of vehicles involved increases, the computational complexity increases significantly. To address this, we propose two algorithms for near-optimal local optimization that significantly reduce the computational complexity by decomposing the high-dimensional problem into a sequence of 2D graph search problems. The resulting trajectories are then incorporated into a Nonlinear Model Predictive Control (NMPC) framework to ensure safe and smooth vehicle motion. We furthermore show in numerical evaluation that this approach significantly outperforms existing MILP-based time-scheduling; both in terms of objective-value and computational time.
Model Error Resonance: The Geometric Nature of Error Dynamics
oai:arXiv.org:2512.11734v1
arXiv:2512.11734v1 Announce Type: new
Abstract: This paper introduces a geometric theory of model error, treating true and model dynamics as geodesic flows generated by distinct affine connections on a smooth manifold. When these connections differ, the resulting trajectory discrepancy--termed the Latent Error Dynamic Response (LEDR)--acquires an intrinsic dynamical structure governed by curvature. We show that the LEDR satisfies a Jacobi-type equation, where curvature mismatch acts as an explicit forcing term. In the important case of a flat model connection, the LEDR reduces to a classical Jacobi field on the true manifold, causing Model Error Resonance (MER) to emerge under positive sectional curvature. The theory is extended to a discrete-time analogue, establishing that this geometric structure and its resonant behavior persist in sampled systems. A closed-form analysis of a sphere--plane example demonstrates that curvature can be inferred directly from the LEDR evolution. This framework provides a unified geometric interpretation of structured error dynamics and offers foundational tools for curvature-informed model validation.
mViSE: A Visual Search Engine for Analyzing Multiplex IHC Brain Tissue Images
oai:arXiv.org:2512.11745v1
arXiv:2512.11745v1 Announce Type: new
Abstract: Whole-slide multiplex imaging of brain tissue generates massive information-dense images that are challenging to analyze and require custom software. We present an alternative query-driven programming-free strategy using a multiplex visual search engine (mViSE) that learns the multifaceted brain tissue chemoarchitecture, cytoarchitecture, and myeloarchitecture. Our divide-and-conquer strategy organizes the data into panels of related molecular markers and uses self-supervised learning to train a multiplex encoder for each panel with explicit visual confirmation of successful learning. Multiple panels can be combined to process visual queries for retrieving similar communities of individual cells or multicellular niches using information-theoretic methods. The retrievals can be used for diverse purposes including tissue exploration, delineating brain regions and cortical cell layers, profiling and comparing brain regions without computer programming. We validated mViSE's ability to retrieve single cells, proximal cell pairs, tissue patches, delineate cortical layers, brain regions and sub-regions. mViSE is provided as an open-source QuPath plug-in.
LUCID: Learning-Enabled Uncertainty-Aware Certification of Stochastic Dynamical Systems
oai:arXiv.org:2512.11750v1
arXiv:2512.11750v1 Announce Type: new
Abstract: Ensuring the safety of AI-enabled systems, particularly in high-stakes domains such as autonomous driving and healthcare, has become increasingly critical. Traditional formal verification tools fall short when faced with systems that embed both opaque, black-box AI components and complex stochastic dynamics. To address these challenges, we introduce LUCID (Learning-enabled Uncertainty-aware Certification of stochastIc Dynamical systems), a verification engine for certifying safety of black-box stochastic dynamical systems from a finite dataset of random state transitions. As such, LUCID is the first known tool capable of establishing quantified safety guarantees for such systems. Thanks to its modular architecture and extensive documentation, LUCID is designed for easy extensibility. LUCID employs a data-driven methodology rooted in control barrier certificates, which are learned directly from system transition data, to ensure formal safety guarantees. We use conditional mean embeddings to embed data into a reproducing kernel Hilbert space (RKHS), where an RKHS ambiguity set is constructed that can be inflated to robustify the result to out-of-distribution behavior. A key innovation within LUCID is its use of a finite Fourier kernel expansion to reformulate a semi-infinite non-convex optimization problem into a tractable linear program. The resulting spectral barrier allows us to leverage the fast Fourier transform to generate the relaxed problem efficiently, offering a scalable yet distributionally robust framework for verifying safety. LUCID thus offers a robust and efficient verification framework, able to handle the complexities of modern black-box systems while providing formal guarantees of safety. These unique capabilities are demonstrated on challenging benchmarks.
Toward a Decision Support System for Energy-Efficient Ferry Operation on Lake Constance based on Optimal Control
oai:arXiv.org:2512.11786v1
arXiv:2512.11786v1 Announce Type: new
Abstract: The maritime sector is undergoing a disruptive technological change driven by three main factors: autonomy, decarbonization, and digital transformation. Addressing these factors necessitates a reassessment of inland vessel operations. This paper presents the design and development of a decision support system for ferry operations based on a shrinking-horizon optimal control framework. The problem formulation incorporates a mathematical model of the ferry's dynamics and environmental disturbances, specifically water currents and wind, which can significantly influence the dynamics. Real-world data and illustrative scenarios demonstrate the potential of the proposed system to effectively support ferry crews by providing real-time guidance. This enables enhanced operational efficiency while maintaining predefined maneuver durations. The findings suggest that optimal control applications hold substantial promise for advancing future ferry operations on inland waters. A video of the real-world ferry MS Insel Mainau operating on Lake Constance is available at: https://youtu.be/i1MjCdbEQyE
Multimodal Fusion of Regional Brain Experts for Interpretable Alzheimer's Disease Diagnosis
oai:arXiv.org:2512.10966v1
arXiv:2512.10966v1 Announce Type: cross
Abstract: Accurate and early diagnosis of Alzheimer's disease (AD) can benefit from integrating complementary information from multiple modalities, mirroring clinical practice. However, conventional fusion approaches often rely on simple concatenation of features, which cannot adaptively balance the contributions of biomarkers such as amyloid PET and MRI across brain regions. In this work, we propose MREF-AD, a Multimodal Regional Expert Fusion model for AD diagnosis. It is a Mixture-of-Experts (MoE) framework that models meso-scale brain regions in each modality as an independent expert and employs two-level gating networks to learn subject-specific fusion weights. Beyond improving diagnostic performance, MREF-AD provides modality- and region-level insight into how structural and molecular imaging jointly contribute to disease diagnosis. Using data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), MREF-AD achieves state-of-the-art performance over baselines while providing enhanced interpretability of brain region-specific biomarker relevance, underscoring its utility as a general framework for adaptive and interpretable multimodal fusion in neuroimaging.
ASR Under the Stethoscope: Evaluating Biases in Clinical Speech Recognition across Indian Languages
oai:arXiv.org:2512.10967v1
arXiv:2512.10967v1 Announce Type: cross
Abstract: Automatic Speech Recognition (ASR) is increasingly used to document clinical encounters, yet its reliability in multilingual and demographically diverse Indian healthcare contexts remains largely unknown. In this study, we conduct the first systematic audit of ASR performance on real world clinical interview data spanning Kannada, Hindi, and Indian English, comparing leading models including Indic Whisper, Whisper, Sarvam, Google speech to text, Gemma3n, Omnilingual, Vaani, and Gemini. We evaluate transcription accuracy across languages, speakers, and demographic subgroups, with a particular focus on error patterns affecting patients vs. clinicians and gender based or intersectional disparities. Our results reveal substantial variability across models and languages, with some systems performing competitively on Indian English but failing on code mixed or vernacular speech. We also uncover systematic performance gaps tied to speaker role and gender, raising concerns about equitable deployment in clinical settings. By providing a comprehensive multilingual benchmark and fairness analysis, our work highlights the need for culturally and demographically inclusive ASR development for healthcare ecosystem in India.
Benchmarking Automatic Speech Recognition Models for African Languages
oai:arXiv.org:2512.10968v1
arXiv:2512.10968v1 Announce Type: cross
Abstract: Automatic speech recognition (ASR) for African languages remains constrained by limited labeled data and the lack of systematic guidance on model selection, data scaling, and decoding strategies. Large pre-trained systems such as Whisper, XLS-R, MMS, and W2v-BERT have expanded access to ASR technology, but their comparative behavior in African low-resource contexts has not been studied in a unified and systematic way. In this work, we benchmark four state-of-the-art ASR models across 13 African languages, fine-tuning them on progressively larger subsets of transcribed data ranging from 1 to 400 hours. Beyond reporting error rates, we provide new insights into why models behave differently under varying conditions. We show that MMS and W2v-BERT are more data efficient in very low-resource regimes, XLS-R scales more effectively as additional data becomes available, and Whisper demonstrates advantages in mid-resource conditions. We also analyze where external language model decoding yields improvements and identify cases where it plateaus or introduces additional errors, depending on the alignment between acoustic and text resources. By highlighting the interaction between pre-training coverage, model architecture, dataset domain, and resource availability, this study offers practical and insights into the design of ASR systems for underrepresented languages.
Physics Informed Dynamical Modeling of Extrusion Based 3D Printing Processes
oai:arXiv.org:2512.11048v1
arXiv:2512.11048v1 Announce Type: cross
Abstract: The trade off between model fidelity and computational cost remains a central challenge in the computational modeling of extrusion based 3D printing, particularly for real time optimization and control. Although high fidelity simulations have advanced considerably for offline analysis, dynamical modeling tailored for online, control oriented applications is still significantly underdeveloped. In this study, we propose a reduced order dynamical flow model that captures the transient behavior of extrusion based 3D printing. The model is grounded in physics based principles derived from the Navier Stokes equations and further simplified through spatial averaging and input dependent parameterization. To assess its performance, the model is identified via a nonlinear least squares approach using Computational Fluid Dynamics (CFD) simulation data spanning a range of printing conditions and subsequently validated across multiple combinations of training and testing scenarios. The results demonstrate strong agreement with the CFD data within the nozzle, the nozzle substrate gap, and the deposited layer regions. Overall, the proposed reduced order model successfully captures the dominant flow dynamics of the process while maintaining a level of simplicity compatible with real time control and optimization.
E-CHUM: Event-based Cameras for Human Detection and Urban Monitoring
oai:arXiv.org:2512.11076v1
arXiv:2512.11076v1 Announce Type: cross
Abstract: Understanding human movement and city dynamics has always been challenging. From traditional methods of manually observing the city's inhabitant, to using cameras, to now using sensors and more complex technology, the field of urban monitoring has evolved greatly. Still, there are more that can be done to unlock better practices for understanding city dynamics. This paper surveys how the landscape of urban dynamics studying has evolved with a particular focus on event-based cameras. Event-based cameras capture changes in light intensity instead of the RGB values that traditional cameras do. They offer unique abilities, like the ability to work in low-light, that can make them advantageous compared to other sensors. Through an analysis of event-based cameras, their applications, their advantages and challenges, and machine learning applications, we propose event-based cameras as a medium for capturing information to study urban dynamics. They offer the ability to capture important information while maintaining privacy. We also suggest multi-sensor fusion of event-based cameras and other sensors in the study of urban dynamics. Combining event-based cameras and infrared, event-LiDAR, or vibration has to potential to enhance the ability of event-based cameras and overcome the challenges that event-based cameras have.
Linear quadratic control for discrete-time systems with stochastic and bounded noises
oai:arXiv.org:2512.11106v1
arXiv:2512.11106v1 Announce Type: cross
Abstract: This paper focuses on the linear quadratic control (LQC) design of systems corrupted by both stochastic noise and bounded noise simultaneously. When only of these noises are considered, the LQC strategy leads to stochastic or robust controllers, respectively. However, there is no LQC strategy that can simultaneously handle stochastic and bounded noises efficiently. This limits the scope where existing LQC strategies can be applied. In this work, we look into the LQC problem for discrete-time systems that have both stochastic and bounded noises in its dynamics. We develop a state estimation for such systems by efficiently combining a Kalman filter and an ellipsoid set-membership filter. The developed state estimation can recover the estimation optimality when the system is subject to both kinds of noise, the stochastic and the bounded. Upon the estimated state, we derive a robust state-feedback optimal control law for the LQC problem. The control law derivation takes into account both stochastic and bounded-state estimation errors, so as to avoid over-conservativeness while sustaining stability in the control. In this way, the developed LQC strategy extends the range of scenarios where LQC can be applied, especially those of real-world control systems with diverse sensing which are subject to different kinds of noise. We present numerical simulations, and the results demonstrate the enhanced control performance with the proposed strategy.
Learning from a Generative Oracle: Domain Adaptation for Restoration
oai:arXiv.org:2512.11121v1
arXiv:2512.11121v1 Announce Type: cross
Abstract: Pre-trained image restoration models often fail on real-world, out-of-distribution degradations due to significant domain gaps. Adapting to these unseen domains is challenging, as out-of-distribution data lacks ground truth, and traditional adaptation methods often require complex architectural changes. We propose LEGO (Learning from a Generative Oracle), a practical three-stage framework for post-training domain adaptation without paired data. LEGO converts this unsupervised challenge into a tractable pseudo-supervised one. First, we obtain initial restorations from the pre-trained model. Second, we leverage a frozen, large-scale generative oracle to refine these estimates into high-quality pseudo-ground-truths. Third, we fine-tune the original model using a mixed-supervision strategy combining in-distribution data with these new pseudo-pairs. This approach adapts the model to the new distribution without sacrificing its original robustness or requiring architectural modifications. Experiments demonstrate that LEGO effectively bridges the domain gap, significantly improving performance on diverse real-world benchmarks.
Design and Experimental Validation of Closed-Form CBF-Based Safe Control for Stewart Platform Under Multiple Constraints
oai:arXiv.org:2512.11125v1
arXiv:2512.11125v1 Announce Type: cross
Abstract: This letter presents a closed-form solution of Control Barrier Function (CBF) framework for enforcing safety constraints on a Stewart robotic platform. The proposed method simultaneously handles multiple position and velocity constraints through an explicit closed-form control law, eliminating the need to solve a Quadratic Program (QP) at every control step and enabling efficient real-time implementation. This letter derives necessary and sufficient conditions under which the closed-form expression remains non-singular, thereby ensuring well-posedness of the CBF solution to multi-constraint problem. The controller is validated in both simulation and hardware experiments on a custom-built Stewart platform prototype, demonstrating safetyguaranteed performance that is comparable to the QP-based formulation, while reducing computation time by more than an order of magnitude. The results confirm that the proposed approach provides a reliable and computationally lightweight framework for real-time safe control of parallel robotic systems. The experimental videos are available on the project website. (https://nail-uh.github.io/StewartPlatformSafeControl.github.io/)
CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound
oai:arXiv.org:2512.11169v1
arXiv:2512.11169v1 Announce Type: cross
Abstract: Combinatorial sequential decision making problems are typically modeled as mixed integer linear programs (MILPs) and solved via branch and bound (B&B) algorithms. The inherent difficulty of modeling MILPs that accurately represent stochastic real world problems leads to suboptimal performance in the real world. Recently, machine learning methods have been applied to build MILP models for decision quality rather than how accurately they model the real world problem. However, these approaches typically rely on supervised learning, assume access to true optimal decisions, and use surrogates for the MILP gradients. In this work, we introduce a proof of concept CORL framework that end to end fine tunes an MILP scheme using reinforcement learning (RL) on real world data to maximize its operational performance. We enable this by casting an MILP solved by B&B as a differentiable stochastic policy compatible with RL. We validate the CORL method in a simple illustrative combinatorial sequential decision making example.
ALS-U AR RF Equipment Protection System
oai:arXiv.org:2512.11310v1
arXiv:2512.11310v1 Announce Type: cross
Abstract: This paper presents the design and status of Accumulator Ring (AR) RF Equipment Protection System (EPS) of Advanced Light Source Upgrade project at LBNL. The key components of AR RF EPS include a Master Interlock PLC subsystem handling supervisory control and slow interlocks in \SI{}{\milli\second} scale, an FPGA-based LLRF Controller managing fast interlocks in \SI{}{\micro\second} scale, a 60 kW high-power amplifier with standalone PLC-based slow (\SI{}{\milli\second} scale) and FPGA-based fast (\SI{}{\micro\second} scale) protection systems, and an RF Drive Control Chassis acting as primary RF mitigation device. The design of AR RF EPS is presented along with internal RF and external AR subsystems interfaces.
Incremental Validation of Automated Driving Functions using Generic Volumes in Micro- Operational Design Domains
oai:arXiv.org:2512.11351v1
arXiv:2512.11351v1 Announce Type: cross
Abstract: The validation of highly automated, perception-based driving systems must ensure that they function correctly under the full range of real-world conditions. Scenario-based testing is a prominent approach to addressing this challenge, as it involves the systematic simulation of objects and environments. Operational Design Domains (ODDs) are usually described using a taxonomy of qualitative designations for individual objects. However, the process of transitioning from taxonomy to concrete test cases remains unstructured, and completeness is theoretical. This paper introduces a structured method of subdividing the ODD into manageable sections, termed micro-ODDs (mODDs), and deriving test cases with abstract object representations. This concept is demonstrated using a one-dimensional, laterally guided manoeuvre involving a shunting locomotive within a constrained ODD. In this example, mODDs are defined and refined into narrow taxonomies that enable test case generation. Obstacles are represented as generic cubes of varying sizes, providing a simplified yet robust means of evaluating perception performance. A series of tests were conducted in a closed-loop, co-simulated virtual environment featuring photorealistic rendering and simulated LiDAR, GNSS and camera sensors. The results demonstrate how edge cases in obstacle detection can be systematically explored and how perception quality can be evaluated based on observed vehicle behaviour, using crash versus safe stop as the outcome metrics. These findings support the development of a standardised framework for safety argumentation and offer a practical step towards the validation and authorisation of automated driving functions.
Results for Global Attractivity of Interior Equilibrium Points for Lotka-Volterra Systems
oai:arXiv.org:2512.11384v1
arXiv:2512.11384v1 Announce Type: cross
Abstract: This paper provides global attractivity results for the interior equilibrium point of a general Lotka-Volterra system with no restriction on the dimension of the system and with no special structure or properties of the interaction matrix. The main result contains as special cases all known general results, including the Volterra-Lyapunov theorem and the recently proposed eigenvector conditions. Moreover, global attractivity of the interior equilibrium point is shown for a three-dimensional example, where none of the existing general results can be applied.
Processing through encoding: Quantum circuit approaches for point-wise multiplication and convolution
oai:arXiv.org:2512.11457v1
arXiv:2512.11457v1 Announce Type: cross
Abstract: This paper introduces quantum circuit methodologies for pointwise multiplication and convolution of complex functions, conceptualized as "processing through encoding". Leveraging known techniques, we describe an approach where multiple complex functions are encoded onto auxiliary qubits. Applying the proposed scheme for two functions $f$ and $g$, their pointwise product $f(x)g(x)$ is shown to naturally form as the coefficients of part of the resulting quantum state. Adhering to the convolution theorem, we then demonstrate how the convolution $f*g$ can be constructed. Similarly to related work, this involves the encoding of the Fourier coefficients $\mathcal{F}[f]$ and $\mathcal{F}[g]$, which facilitates their pointwise multiplication, followed by the inverse Quantum Fourier Transform. We discuss the simulation of these techniques, their integration into an extended \verb|quantumaudio| package for audio signal processing, and present initial experimental validations. This work offers a promising avenue for quantum signal processing, with potential applications in areas such as quantum-enhanced audio manipulation and synthesis.
Safe Bayesian optimization across noise models via scenario programming
oai:arXiv.org:2512.11580v1
arXiv:2512.11580v1 Announce Type: cross
Abstract: Safe Bayesian optimization (BO) with Gaussian processes is an effective tool for tuning control policies in safety-critical real-world systems, specifically due to its sample efficiency and safety guarantees. However, most safe BO algorithms assume homoscedastic sub-Gaussian measurement noise, an assumption that does not hold in many relevant applications. In this article, we propose a straightforward yet rigorous approach for safe BO across noise models, including homoscedastic sub-Gaussian and heteroscedastic heavy-tailed distributions. We provide a high-probability bound on the measurement noise via the scenario approach, integrate these bounds into high probability confidence intervals, and prove safety and optimality for our proposed safe BO algorithm. We deploy our algorithm in synthetic examples and in tuning a controller for the Franka Emika manipulator in simulation.
Embodied Image Compression
oai:arXiv.org:2512.11612v1
arXiv:2512.11612v1 Announce Type: cross
Abstract: Image Compression for Machines (ICM) has emerged as a pivotal research direction in the field of visual data compression. However, with the rapid evolution of machine intelligence, the target of compression has shifted from task-specific virtual models to Embodied agents operating in real-world environments. To address the communication constraints of Embodied AI in multi-agent systems and ensure real-time task execution, this paper introduces, for the first time, the scientific problem of Embodied Image Compression. We establish a standardized benchmark, EmbodiedComp, to facilitate systematic evaluation under ultra-low bitrate conditions in a closed-loop setting. Through extensive empirical studies in both simulated and real-world settings, we demonstrate that existing Vision-Language-Action models (VLAs) fail to reliably perform even simple manipulation tasks when compressed below the Embodied bitrate threshold. We anticipate that EmbodiedComp will catalyze the development of domain-specific compression tailored for Embodied agents , thereby accelerating the Embodied AI deployment in the Real-world.
Architecting Large Action Models for Human-in-the-Loop Intelligent Robots
oai:arXiv.org:2512.11620v1
arXiv:2512.11620v1 Announce Type: cross
Abstract: The realization of intelligent robots, operating autonomously and interacting with other intelligent agents, human or artificial, requires the integration of environment perception, reasoning, and action. Classic Artificial Intelligence techniques for this purpose, focusing on symbolic approaches, have long-ago hit the scalability wall on compute and memory costs. Advances in Large Language Models in the past decade (neural approaches) have resulted in unprecedented displays of capability, at the cost of control, explainability, and interpretability. Large Action Models aim at extending Large Language Models to encompass the full perception, reasoning, and action cycle; however, they typically require substantially more comprehensive training and suffer from the same deficiencies in reliability. Here, we show it is possible to build competent Large Action Models by composing off-the-shelf foundation models, and that their control, interpretability, and explainability can be effected by incorporating symbolic wrappers and associated verification on their outputs, achieving verifiable neuro-symbolic solutions for intelligent robots. Our experiments on a multi-modal robot demonstrate that Large Action Model intelligence does not require massive end-to-end training, but can be achieved by integrating efficient perception models with a logic-driven core. We find that driving action execution through the generation of Planning Domain Definition Language (PDDL) code enables a human-in-the-loop verification stage that effectively mitigates action hallucinations. These results can support practitioners in the design and development of robotic Large Action Models across novel industries, and shed light on the ongoing challenges that must be addressed to ensure safety in the field.
Particle Image Velocimetry Refinement via Consensus ADMM
oai:arXiv.org:2512.11695v1
arXiv:2512.11695v1 Announce Type: cross
Abstract: Particle Image Velocimetry (PIV) is an imaging technique in experimental fluid dynamics that quantifies flow fields around bluff bodies by analyzing the displacement of neutrally buoyant tracer particles immersed in the fluid. Traditional PIV approaches typically depend on tuning parameters specific to the imaging setup, making the performance sensitive to variations in illumination, flow conditions, and seeding density. On the other hand, even state-of-the-art machine learning methods for flow quantification are fragile outside their training set. In our experiments, we observed that flow quantification would improve if different tunings (or algorithms) were applied to different regions of the same image pair. In this work, we parallelize the instantaneous flow quantification with multiple algorithms and adopt a consensus framework based on the alternating direction method of multipliers, seamlessly incorporating priors such as smoothness and incompressibility. We perform several numerical experiments to demonstrate the benefits of this approach. For instance, we achieve a decrease in end-point-error of up to 20% of a dense-inverse-search estimator at an inference rate of 60Hz, and we show how this performance boost can be increased further with outlier rejection. Our method is implemented in JAX, effectively exploiting hardware acceleration, and integrated in Flow Gym, enabling (i) reproducible comparisons with the state-of-the-art, (ii) testing different base algorithms, (iii) straightforward deployment for active fluids control applications.
High-Dimensional Surrogate Modeling for Closed-Loop Learning of Neural-Network-Parameterized Model Predictive Control
oai:arXiv.org:2512.11705v1
arXiv:2512.11705v1 Announce Type: cross
Abstract: Learning controller parameters from closed-loop data has been shown to improve closed-loop performance. Bayesian optimization, a widely used black-box and sample-efficient learning method, constructs a probabilistic surrogate of the closed-loop performance from few experiments and uses it to select informative controller parameters. However, it typically struggles with dense high-dimensional controller parameterizations, as they may appear, for example, in tuning model predictive controllers, because standard surrogate models fail to capture the structure of such spaces. This work suggests that the use of Bayesian neural networks as surrogate models may help to mitigate this limitation. Through a comparison between Gaussian processes with Matern kernels, finite-width Bayesian neural networks, and infinite-width Bayesian neural networks on a cart-pole task, we find that Bayesian neural network surrogate models achieve faster and more reliable convergence of the closed-loop cost and enable successful optimization of parameterizations with hundreds of dimensions. Infinite-width Bayesian neural networks also maintain performance in settings with more than one thousand parameters, whereas Matern-kernel Gaussian processes rapidly lose effectiveness. These results indicate that Bayesian neural network surrogate models may be suitable for learning dense high-dimensional controller parameterizations and offer practical guidance for selecting surrogate models in learning-based controller design.
EditMGT: Unleashing Potentials of Masked Generative Transformers in Image Editing
oai:arXiv.org:2512.11715v1
arXiv:2512.11715v1 Announce Type: cross
Abstract: Recent advances in diffusion models (DMs) have achieved exceptional visual quality in image editing tasks. However, the global denoising dynamics of DMs inherently conflate local editing targets with the full-image context, leading to unintended modifications in non-target regions. In this paper, we shift our attention beyond DMs and turn to Masked Generative Transformers (MGTs) as an alternative approach to tackle this challenge. By predicting multiple masked tokens rather than holistic refinement, MGTs exhibit a localized decoding paradigm that endows them with the inherent capacity to explicitly preserve non-relevant regions during the editing process. Building upon this insight, we introduce the first MGT-based image editing framework, termed EditMGT. We first demonstrate that MGT's cross-attention maps provide informative localization signals for localizing edit-relevant regions and devise a multi-layer attention consolidation scheme that refines these maps to achieve fine-grained and precise localization. On top of these adaptive localization results, we introduce region-hold sampling, which restricts token flipping within low-attention areas to suppress spurious edits, thereby confining modifications to the intended target regions and preserving the integrity of surrounding non-target areas. To train EditMGT, we construct CrispEdit-2M, a high-resolution dataset spanning seven diverse editing categories. Without introducing additional parameters, we adapt a pre-trained text-to-image MGT into an image editing model through attention injection. Extensive experiments across four standard benchmarks demonstrate that, with fewer than 1B parameters, our model achieves similarity performance while enabling 6 times faster editing. Moreover, it delivers comparable or superior editing quality, with improvements of 3.6% and 17.6% on style change and style transfer tasks, respectively.
Adaptive MIMO Radar Architecture for Energy-Efficient Wireless Sensing in the D-Band
oai:arXiv.org:2309.17110v4
arXiv:2309.17110v4 Announce Type: replace
Abstract: The D-band offering an untapped wide bandwidth is promising for high data rate communication and high-resolution wireless sensing. However, these potentials are hindered by the low performance and energy efficiency of the D-band circuits and systems. We present an adaptive multi-input multi-output (MIMO) radar architecture for energy-efficient wireless sensing in the D-band, leveraging a reconfigurable 2D array of radar transceiver front-ends, a scaling approach for the receiver (RX) signal-to-noise ratio (SNR) and the transmitter (TX) output power ($P_{\rm TX}$) with target distance, and dynamic selection of the direction-of-arrival (DOA) estimation algorithm. The reconfigurable radar array, providing an adaptive radar resolution, enhances the energy efficiency by reducing power consumption in the radar RF front-end and lowering the computational complexity in the radar back-end. The RX SNR and the TX output power are scaled with the distance as ${\rm SNR} \propto d^{-p}$ and $P_{\rm TX} \propto d^{4-p}$, where $0 < p < 4$, leading to more efficient resource allocation in varying target distance conditions. Additionally, DOA estimation results using MUSIC and MVDR algorithms indicate that the optimum algorithm, in terms of the accuracy and computational complexity, should be selected based on the number of radar array elements. Furthermore, we develop a hardware model for the MIMO radar RF front-end to evaluate the power consumption of the TX, RX, and local oscillator (LO) distribution network. It is shown that the power consumption of the LO distribution network, which can dominate the power consumption for a large MIMO radar, can be minimized through a distribution strategy for LO amplifiers employed for compensating passive losses. Performance of the adaptive MIMO radar is evaluated in the free-space and the through-wall indoor sensing scenarios in the D-band.
End-to-end transfer learning for speaker-independent cross-language and cross-corpus speech emotion recognition
oai:arXiv.org:2311.13678v3
arXiv:2311.13678v3 Announce Type: replace
Abstract: Data-driven models achieve successful results in Speech Emotion Recognition (SER). However, these models, which are often based on general acoustic features or end-to-end approaches, show poor performance when the testing set has a different language than the training set or when these sets are taken from different datasets. To alleviate these problems, this paper presents an end-to-end Deep Neural Network (DNN) model based on transfer learning for cross-language and cross-corpus SER. We use the wav2vec 2.0 pre-trained model to transform audio time-domain waveforms from different languages, different speakers and different recording conditions into a feature space shared by multiple languages, thereby reducing the language variabilities in the speech embeddings. Next, we propose a new Deep-Within-Class Covariance Normalisation (Deep-WCCN) layer that can be inserted into the DNN model and aims to reduce other variabilities including speaker variability, channel variability and so on. The entire model is fine-tuned in an end-to-end manner on a combined loss and is validated on datasets from three languages (i.e. English, German, Chinese). Experimental results show that our proposed method outperforms the baseline model that is based on common acoustic feature sets for SER in the within-language setting and the cross-language setting. In addition, we also experimentally validate the effectiveness of Deep-WCCN, which can further improve the model performance. Next, we show that the proposed transfer learning method has good data efficiency when merging target language data into the fine-tuning process. The model speaker-independent SER performance increases with up to 15.6% when only 160s of target language data is used. Finally, our proposed model shows significantly better performance than other state-of-the-art models in cross-language SER.
MarsQE: Semantic-Informed Quality Enhancement for Compressed Martian Image
oai:arXiv.org:2404.09433v4
arXiv:2404.09433v4 Announce Type: replace
Abstract: Lossy image compression is essential for Mars exploration missions, due to the limited bandwidth between Earth and Mars. However, the compression may introduce visual artifacts that complicate the geological analysis of the Martian surface. Existing quality enhancement approaches, primarily designed for Earth images, fall short for Martian images due to a lack of consideration for the unique Martian semantics. In response to this challenge, we conduct an in-depth analysis of Martian images, yielding two key insights based on semantics: the presence of texture similarities and the compact nature of texture representations in Martian images. Inspired by these findings, we introduce MarsQE, an innovative, semantic-informed, two-phase quality enhancement approach specifically designed for Martian images. The first phase involves the semantic-based matching of texture-similar reference images, and the second phase enhances image quality by transferring texture patterns from these reference images to the compressed image. We also develop a post-enhancement network to further reduce compression artifacts and achieve superior compression quality. Our extensive experiments demonstrate that MarsQE significantly outperforms existing approaches for Earth images, establishing a new benchmark for the quality enhancement on Martian images.
Multimodal Learning for Scalable Representation of High-Dimensional Medical Data
oai:arXiv.org:2409.13115v2
arXiv:2409.13115v2 Announce Type: replace
Abstract: Integrating artificial intelligence (AI) with healthcare data is rapidly transforming medical diagnostics and driving progress toward precision medicine. However, effectively leveraging multimodal data, particularly digital pathology whole slide images (WSIs) and genomic sequencing, remains a significant challenge due to the intrinsic heterogeneity of these modalities and the need for scalable and interpretable frameworks. Existing diagnostic models typically operate on unimodal data, overlooking critical cross-modal interactions that can yield richer clinical insights. We introduce MarbliX (Multimodal Association and Retrieval with Binary Latent Indexed matriX), a self-supervised framework that learns to embed WSIs and immunogenomic profiles into compact, scalable binary codes, termed ``monogram.'' By optimizing a triplet contrastive objective across modalities, MarbliX captures high-resolution patient similarity in a unified latent space, enabling efficient retrieval of clinically relevant cases and facilitating case-based reasoning. \textcolor{black}{In lung cancer, MarbliX achieves 85-89\% across all evaluation metrics, outperforming histopathology (69-71\%) and immunogenomics (73-76\%). In kidney cancer, real-valued monograms yield the strongest performance (F1: 80-83\%, Accuracy: 87-90\%), with binary monograms slightly lower (F1: 78-82\%).
Denoising Diffusion Models for Anomaly Localization in Medical Images
oai:arXiv.org:2410.23834v2
arXiv:2410.23834v2 Announce Type: replace
Abstract: This review explores anomaly localization in medical images using denoising diffusion models. After providing a brief methodological background of these models, including their application to image reconstruction and their conditioning using guidance mechanisms, we provide an overview of available datasets and evaluation metrics suitable for their application to anomaly localization in medical images. In this context, we discuss supervision schemes ranging from fully supervised segmentation to semi-supervised, weakly supervised, self-supervised, and unsupervised methods, and provide insights into the effectiveness and limitations of these approaches. Furthermore, we highlight open challenges in anomaly localization, including detection bias, domain shift, computational cost, and model interpretability. Our goal is to provide an overview of the current state of the art in the field, outline research gaps, and highlight the potential of diffusion models for robust anomaly localization in medical images.
Bayesian Multifractal Image Segmentation
oai:arXiv.org:2501.08694v2
arXiv:2501.08694v2 Announce Type: replace
Abstract: Multifractal analysis (MFA) provides a framework for the global characterization of image textures by describing the spatial fluctuations of their local regularity based on the multifractal spectrum. Several works have shown the interest of using MFA for the description of homogeneous textures in images. Nevertheless, natural images can be composed of several textures and, in turn, multifractal properties associated with those textures. This paper introduces an unsupervised Bayesian multifractal segmentation method to model and segment multifractal textures by jointly estimating the multifractal parameters and labels on images, at the pixel-level. For this, a computationally and statistically efficient multifractal parameter estimation model for wavelet leaders is firstly developed, defining different multifractality parameters for different regions of an image. Then, a multiscale Potts Markov random field is introduced as a prior to model the inherent spatial and scale correlations (referred to as cross-scale correlations) between the labels of the wavelet leaders. A Gibbs sampling methodology is finally used to draw samples from the posterior distribution of the unknown model parameters. Numerical experiments are conducted on synthetic multifractal images to evaluate the performance of the proposed segmentation approach. The proposed method achieves superior performance compared to traditional unsupervised segmentation techniques as well as modern deep learning-based approaches, showing its effectiveness for multifractal image segmentation.
VERITAS: Verifying the Performance of AI-native Transceiver Actions in Base-Stations
oai:arXiv.org:2501.09761v3
arXiv:2501.09761v3 Announce Type: replace
Abstract: Artificial Intelligence (AI)-native receivers prove significant performance improvement in high noise regimes and can potentially reduce communication overhead compared to the traditional receiver. However, their performance highly depends on the representativeness of the training dataset. A major issue is the uncertainty of whether the training dataset covers all test environments and waveform configurations, and thus, whether the trained model is robust in practical deployment conditions. To this end, we propose a joint measurement-recovery framework for AI-native transceivers post deployment, called VERITAS, that continuously looks for distribution shifts in the received signals and triggers finite re-training spurts. VERITAS monitors the wireless channel using 5G pilots fed to an auxiliary neural network that detects out-of-distribution channel profile, transmitter speed, and delay spread. As soon as such a change is detected, a traditional (reference) receiver is activated, which runs for a period of time in parallel to the AI-native receiver. Finally, VERTIAS compares the bit probabilities of the AI-native and the reference receivers for the same received data inputs, and decides whether or not a retraining process needs to be initiated. Our evaluations reveal that VERITAS can detect changes in the channel profile, transmitter speed, and delay spread with 99%, 97%, and 69% accuracies, respectively, followed by timely initiation of retraining for 86%, 93.3%, and 94.8% of inputs in channel profile, transmitter speed, and delay spread test sets, respectively.
Higher-Order Meta Distribution Reliability Analysis of Wireless Networks
oai:arXiv.org:2501.14289v3
arXiv:2501.14289v3 Announce Type: replace
Abstract: Communication reliability, as defined by 3GPP, is the probability of achieving a desired quality of service (QoS). Traditionally, this metric is evaluated by averaging the QoS success indicator over spatiotemporal random variables. Recently, the meta distribution (MD) has emerged as a two-level analysis tool that characterizes system-level reliability as a function of link-level reliability thresholds. However, existing MD studies have two limitations. First, they focus exclusively on spatial and temporal randomness corresponding to node distribution and fading channels, respectively, leaving stochastic behaviors in other domains largely unexplored. Second, they are restricted to first-order MDs with two randomness levels, restricting applicability to scenarios requiring higher-order MD characterization. To address these gaps, we propose a hierarchical framework for higher-order MD reliability in wireless networks, where each layer's success probability is formulated and fed into the next layer, yielding overall MD reliability at the highest level. We apply this framework to wireless networks by capturing three levels of temporal dynamics representing fast, slow, and static random elements, and provide a comprehensive second-order MD reliability analysis for two application scenarios. The effectiveness of the proposed approach is demonstrated via these representative scenarios, supported by detailed analytical and numerical evaluations. Our results highlight the value of hierarchical MD representations across multiple domains and reveal the significant influence of inner-layer target reliabilities on overall performance.
Recent Advances in Discrete Speech Tokens: A Review
oai:arXiv.org:2502.06490v4
arXiv:2502.06490v4 Announce Type: replace
Abstract: The rapid advancement of speech generation technologies in the era of large language models (LLMs) has established discrete speech tokens as a foundational paradigm for speech representation. These tokens, characterized by their discrete, compact, and concise nature, are not only advantageous for efficient transmission and storage, but also inherently compatible with the language modeling framework, enabling seamless integration of speech into text-dominated LLM architectures. Current research categorizes discrete speech tokens into two principal classes: acoustic tokens and semantic tokens, each of which has evolved into a rich research domain characterized by unique design philosophies and methodological approaches. This survey systematically synthesizes the existing taxonomy and recent innovations in discrete speech tokenization, conducts a critical examination of the strengths and limitations of each paradigm, and presents systematic experimental comparisons across token types. Furthermore, we identify persistent challenges in the field and propose potential research directions, aiming to offer actionable insights to inspire future advancements in the development and application of discrete speech tokens.
Stacked Intelligent Metasurfaces-Enhanced MIMO OFDM Wideband Communication Systems
oai:arXiv.org:2503.00368v3
arXiv:2503.00368v3 Announce Type: replace
Abstract: Multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) systems rely on digital or hybrid digital and analog designs for beamforming against frequency-selective fading, which suffer from high hardware complexity and energy consumption. To address this, this work introduces a fully-analog stacked intelligent metasurfaces (SIM) architecture that directly performs wave-domain beamforming, enabling diagonalization of the end-to-end channel matrix and inherently eliminating inter-antenna interference (IAI) for MIMO OFDM transmission. By leveraging cascaded programmable metasurface layers, the proposed system establishes multiple parallel subchannels, significantly improving multi-carrier transmission efficiency while reducing hardware complexity. To optimize the SIM phase shift matrices, a block coordinate descent and penalty convex-concave procedure (BCD-PCCP) algorithm is developed to iteratively minimize the channel fitting error across subcarriers. Simulation results validate the proposed approach, determining the maximum effective bandwidth and demonstrating substantial performance improvements. Moreover, for a MIMO OFDM system operating at 28 GHz with 16 subcarriers, the proposed SIM configuration method achieves over 300% enhancement in channel capacity compared to conventional SIM configuration that only accounts for the center frequency.
Joint CSI Estimation-Feedback-Precoding via DJSCC for MU-MIMO OFDM Systems
oai:arXiv.org:2503.04157v2
arXiv:2503.04157v2 Announce Type: replace
Abstract: As the number of antennas in frequency-division duplex (FDD) multiple-input multiple-output (MIMO) systems increases, acquiring channel state information (CSI) becomes increasingly challenging due to limited spectral resources and feedback overhead. In this paper, we investigate the impact of the feedback channel on CSI feedback in a multi-user MIMO orthogonal frequency-division multiplexing (OFDM) scenario, where the received downlink pilot signal is directly utilized as the source for CSI feedback in a joint design with CSI feedback and precoding. Considering the influence of the feedback channel, we propose an end-to-end joint CSI estimation-feedback-precoding network based on a deep joint source-channel coding architecture with an adaptive number of users. Experimental results demonstrate that, under the same feedback and CSI estimation overheads, the proposed joint multi-module end-to-end network achieves a higher multi-user downlink spectral efficiency than traditional algorithms based on separate architecture and partially separated artificial intelligence-based network architectures under comparable channel quality. Furthermore, compared to conventional separate architecture, the proposed network architecture with joint architecture reduces the computational burden and model storage overhead at the UE side, facilitating the deployment of low-overhead multi-module joint architectures in practice. Meanwhile, the network designed at the BS achieves user-number adaptability without increasing the number of trainable parameters, thereby reducing both model storage and distribution overhead by requiring only a single set of parameters for different numbers of users. While slightly increasing storage requirements at the base station, it reduces computational complexity and precoding design delay, effectively reducing the effects of channel aging challenges.
Integrated Sensing and Communications Over the Years: An Evolution Perspective
oai:arXiv.org:2504.06830v3
arXiv:2504.06830v3 Announce Type: replace
Abstract: Integrated Sensing and Communications (ISAC) enables efficient spectrum utilization and reduces hardware costs for beyond 5G (B5G) and 6G networks, facilitating intelligent applications that require both high-performance communication and precise sensing capabilities. This survey provides a comprehensive review of the evolution of ISAC over the years. We examine the expansion of the spectrum across RF and optical ISAC, highlighting the role of advanced technologies, along with key challenges and synergies. We further discuss the advancements in network architecture from single-cell to multi-cell systems, emphasizing the integration of collaborative sensing and interference mitigation strategies. Moreover, we analyze the progress from single-modal to multi-modal sensing, with a focus on the integration of edge intelligence to enable real-time data processing, reduce latency, and enhance decision-making. Finally, we extensively review standardization efforts by 3GPP, IEEE, and ITU, examining the transition of ISAC-related technologies and their implications for the deployment of 6G networks.
Multi-dimensional Parameter Estimation in RIS-aided MU-MIMO-OFDM Channels
oai:arXiv.org:2505.02611v3
arXiv:2505.02611v3 Announce Type: replace
Abstract: We address the channel estimation (CE) problem in reconfigurable intelligent surface (RIS) aided orthogonal frequency-division multiplexing (OFDM) systems by proposing a dual-structure and multi-dimensional transformations (DS-MDT) algorithm.The proposed approach leverages the dual-structure features of the channel parameters to assist users experiencing weaker channel conditions, thereby enhancing CE performance. Moreover, given that the channel parameters are distributed across multiple dimensions of the received tensor, the proposed algorithm employs multi-dimensional transformations to isolate and extract distinct parameters. The numerical results demonstrate the proposed algorithm reduces the normalized mean square error (NMSE) by up to 10 dB while maintaining lower complexity compared to state-of-the-art methods.
VoxelRF: Voxelized Radiance Field for Fast Wireless Channel Modeling
oai:arXiv.org:2507.09987v2
arXiv:2507.09987v2 Announce Type: replace
Abstract: Wireless channel modeling in complex environments is crucial for wireless communication system design and deployment. Traditional channel modeling approaches face challenges in balancing accuracy, efficiency, and scalability, while recent neural approaches such as neural radiance field (NeRF) suffer from long training and slow inference. To tackle these challenges, we propose voxelized radiance field (VoxelRF), a novel neural representation for wireless channel modeling that enables fast and accurate synthesis of spatial spectra. VoxelRF replaces the costly multilayer perception (MLP) used in NeRF-based methods with trilinear interpolation of voxel grid-based representation, and two shallow MLPs to model both propagation and transmitter-dependent effects. To further accelerate training and improve generalization, we introduce progressive learning, empty space skipping, and an additional background entropy loss function. Experimental results demonstrate that VoxelRF achieves competitive accuracy with significantly reduced computation and limited training data, making it more practical for real-time and resource-constrained wireless applications.
Convex computation of regions of attraction from data using Sums-of-Squares programming
oai:arXiv.org:2507.14073v3
arXiv:2507.14073v3 Announce Type: replace
Abstract: This paper focuses on the analysis of the Region of Attraction (RoA) for unknown autonomous dynamical systems. A data-driven approach based on the moment-Sum-of-Squares (SoS) hierarchy is proposed, enabling novel RoA outer approximations despite the reduced information on the dynamics. The main contribution consists of bypassing the system model and, hence, the recurring constraint on its polynomial structure. Numerical experiments showcase the influence of data on learned approximating sets, highlighting the potential of this method.
E2E Learning Massive MIMO for Multimodal Semantic Non-Orthogonal Transmission and Fusion
oai:arXiv.org:2509.19312v2
arXiv:2509.19312v2 Announce Type: replace
Abstract: This paper investigates multimodal semantic non-orthogonal transmission and fusion in hybrid analog-digital massive multiple-input multiple-output (MIMO). A Transformer-based cross-modal source-channel semantic-aware network (CSC-SA-Net) framework is conceived, where channel state information (CSI) reference signal (RS), feedback, analog-beamforming/combining, and baseband semantic processing are data-driven end-to-end (E2E) optimized at the base station (BS) and user equipments (UEs). CSC-SA-Net comprises five sub-networks: BS-side CSI-RS network (BS-CSIRS-Net), UE-side channel semantic-aware network (UE-CSANet), BS-CSANet, UE-side multimodal semantic fusion network (UE-MSFNet), and BS-MSFNet. Specifically, we firstly E2E train BS-CSIRS-Net, UE-CSANet, and BS-CSANet to jointly design CSI-RS, feedback, analog-beamforming/combining with maximum {\emph{physical-layer's}} spectral-efficiency. Meanwhile, we E2E train UE-MSFNet and BS-MSFNet for optimizing {\emph{application-layer's}} source semantic downstream tasks. On these pre-trained models, we further integrate application-layer semantic processing with physical-layer tasks to E2E train five subnetworks. Extensive simulations show that the proposed CSC-SA-Net outperforms traditional separated designs, revealing the advantage of cross-modal channel-source semantic fusion.
Efficient Domain Generalization in Wireless Networks with Scarce Multi-Modal Data
oai:arXiv.org:2510.04359v2
arXiv:2510.04359v2 Announce Type: replace
Abstract: In 6G wireless networks, multi-modal ML models can be leveraged to enable situation-aware network decisions in dynamic environments. However, trained ML models often fail to generalize under domain shifts when training and test data distributions are different because they often focus on modality-specific spurious features. In practical wireless systems, domain shifts occur frequently due to dynamic channel statistics, moving obstacles, or hardware configuration. Thus, there is a need for learning frameworks that can achieve robust generalization under scarce multi-modal data in wireless networks. In this paper, a novel and data-efficient two-phase learning framework is proposed to improve generalization performance in unseen and unfamiliar wireless environments with minimal amount of multi-modal data. In the first stage, a physics-based loss function is employed to enable each BS to learn the physics underlying its wireless environment captured by multi-modal data. The data-efficiency of the physics-based loss function is analytically investigated. In the second stage, collaborative domain adaptation is proposed to leverage the wireless environment knowledge of multiple BSs to guide under-performing BSs under domain shift. Specifically, domain-similarity-aware model aggregation is proposed to utilize the knowledge of BSs that experienced similar domains. To validate the proposed framework, a new dataset generation framework is developed by integrating CARLA and MATLAB-based mmWave channel modeling to predict mmWave RSS. Simulation results show that the proposed physics-based training requires only 13% of data samples to achieve the same performance as a state-of-the-art baseline that does not use physics-based training. Moreover, the proposed collaborative domain adaptation needs only 25% of data samples and 20% of FLOPs to achieve the convergence compared to baselines.
Local Dissipativity Analysis of Nonlinear Systems
oai:arXiv.org:2511.20838v2
arXiv:2511.20838v2 Announce Type: replace
Abstract: Dissipativity is an input-output (IO) characterization of nonlinear systems that enables compositional robust control through Vidyasagar's Network Dissipativity Theorem (VDNT). However, determining the dissipativity of a system is an involved and, often, model-specific process. We present a general method to determine the local dissipativity properties of smooth, nonlinear, control affine systems. We simultaneously search for the optimal IO characterization of a system and synthesize a continuous piecewise affine (CPA) storage function via a convex optimization problem. To do so, we reformulate the dissipation inequality as a matrix inequality (MI) and develop novel linear matrix inequality (LMI) bounds for a triangulation to impose the dissipativity conditions on the CPA storage function Further, we develop a method to synthesize a combined quadratic and CPA storage function to expand the systems the optimization problem is applicable to. Finally, we establish that our method will always find a feasible IO characterization and storage function given that the system is sufficiently strictly locally dissipative and demonstrate the efficacy of our method in determining the conic bounds and gain of various nonlinear systems.
Multiport Analytical Pixel Electromagnetic Simulator (MAPES) for AI-assisted RFIC and Microwave Circuit Design
oai:arXiv.org:2511.21274v2
arXiv:2511.21274v2 Announce Type: replace
Abstract: This paper proposes a novel analytical framework, termed the Multiport Analytical Pixel Electromagnetic Simulator (MAPES). MAPES enables efficient and accurate prediction of the electromagnetic (EM) performance of arbitrary pixel-based microwave (MW) and RFIC structures. Inspired by the Integrated Internal Multiport Method (IMPM), MAPES extends the concept to the pixel presence/absence domain used in AI-assisted EM design. By introducing virtual pixels and diagonal virtual pixels and inserting virtual ports at critical positions, MAPES captures all horizontal, vertical, and diagonal electromagnetic couplings within a single multiport impedance matrix. Only a small set of full-wave simulations (typically about 1% of the datasets required by AI-assisted EM simulators) is needed to construct this matrix. Subsequently, any arbitrary pixel configuration can be evaluated analytically using a closed-form multiport relation without additional full-wave calculations. The proposed approach eliminates data-driven overfitting and ensures accurate results across all design variations. Comprehensive examples for single- and double-layer CMOS processes (180 nm and 65 nm) and PCBs confirm that MAPES achieves high prediction accuracy with 600- 2000x speed improvement compared to CST simulations. Owing to its efficiency, scalability and reliability, MAPES provides a practical and versatile tool for AI-assisted MW circuit and RFIC design across diverse fabrication technologies.
Near-Field Channel Estimation and Joint Angle-Range Recovery in XL-MIMO Systems: A Gridless Super-Resolution Approach
oai:arXiv.org:2511.23187v2
arXiv:2511.23187v2 Announce Type: replace
Abstract: Existing near-field channel estimation methods for extremely large-scale MIMO (XL-MIMO) typically discretize angle and range parameters jointly, resulting in large polar-domain codebooks. This paper proposes a novel framework that formulates near-field channel estimation as a gridless super-resolution problem, eliminating the need for explicitly constructed codebooks. By employing a second-order approximation of spherical-wave steering vectors, the near-field channel is represented as a superposition of complex exponentials modulated by unknown waveforms. We demonstrate that these waveforms lie tightly in a common discrete chirp rate (DCR) subspace, with a dimension that scales as $\Theta(\sqrt{N})$ for an $N$-element array. By leveraging this structure and applying a lifting technique, we reformulate the non-convex problem as a convex program using regularized atomic norm minimization, which admits an equivalent semidefinite program. From the solution to the convex program, we obtain gridless angle estimates and derive closed-form coarse range estimates, followed by refinement under the exact spherical model using gradient-based nonlinear least squares. The proposed method avoids basis mismatch and exhaustive two-dimensional grid searches while enabling accurate joint angle-range estimation with pilot budgets that scale sublinearly with array size in sparse multipath regimes. Simulations demonstrate accurate channel reconstruction and user localization across representative near-field scenarios.
A Highly Configurable Framework for Large-Scale Thermal Building Data Generation to drive Machine Learning Research
oai:arXiv.org:2512.00483v2
arXiv:2512.00483v2 Announce Type: replace
Abstract: Data-driven modeling of building thermal dynamics is emerging as an increasingly important field of research for large-scale intelligent building control. However, research in data-driven modeling using machine learning (ML) techniques requires massive amounts of thermal building data, which is not easily available. Neither empirical public datasets nor existing data generators meet the needs of ML research in terms of data quality and quantity. Moreover, existing data generation approaches typically require expert knowledge in building simulation. To fill this gap, we present a thermal building data generation framework which we call BuilDa. BuilDa is designed to produce synthetic data of adequate quality and quantity for ML research. The framework does not require profound building simulation knowledge to generate large volumes of data. BuilDa uses a single-zone Modelica model that is exported as a Functional Mock-up Unit (FMU) and simulated in Python. We demonstrate BuilDa by generating data and utilizing it for a transfer learning study involving the fine-tuning of 486 data-driven models.
Channel Knowledge Map Enabled Low-Altitude ISAC Networks: Joint Air Corridor Planning and Base Station Deployment
oai:arXiv.org:2512.02464v2
arXiv:2512.02464v2 Announce Type: replace
Abstract: This letter addresses the joint air corridor planning and base station (BS) deployment problem for low-altitude integrated sensing and communication (ISAC) networks. In the considered system, unmanned aerial vehicles (UAVs) operate within a structured air corridor composed of connected cubic segments, and multiple BSs need to be selectively deployed at a set of candidate locations to ensure both sensing and communication coverage throughout the corridor. In particular, we leverage the channel knowledge map (CKM) to characterize wireless channels for candidate BS sites prior to deployment, thereby facilitating the offline planning. Under this setup, we minimize the system cost in terms of the weighted sum of the air corridor length and the number of deployed BSs, subject to the constraints on both sensing and communication performance across the corridor. To solve the formulated large-scale nonconvex integer programming problem, we develop a hierarchical coarse-to-fine grid decomposition algorithm. Simulation results demonstrate the benefit of the proposed joint design in reducing the overall deployment cost while ensuring the coverage of the low-altitude ISAC networks.
Channel Knowledge Map Construction via Physics-Inspired Diffusion Model Without Prior Observations
oai:arXiv.org:2512.02757v2
arXiv:2512.02757v2 Announce Type: replace
Abstract: The ability to construct channel knowledge map (CKM) with high precision is essential for environment awareness in 6G wireless systems. However, most existing CKM construction methods formulate the task as an image super-resolution or generation problem, thereby employing models originally developed for computer vision. As a result, the generated CKMs often fail to capture the underlying physical characteristics of wireless propagation. In this paper, considering that acquiring channel observations incurs non-negligible time and cost, we focus on constructing CKM for large-scale fading scenarios without relying on prior observations, and we design three physics-based constraints to characterize the spatial distribution patterns of large-scale fading. By integrating these physical constraints with state-of-the-art diffusion model that possesses superior generative capability, a physics-inspired diffusion model for CKM construction is proposed. Following this motivation, we derive the loss function of the diffusion model augmented with physics-based constraint terms and further design the training and generation framework for the proposed physics-inspired CKM generation diffusion model. Extensive experiments show that our approach outperforms all existing methods in terms of construction accuracy. Moreover, the proposed model provides a unified and effective framework with strong potential for generating diverse, accurate, and physically consistent CKM.
Joint Learning of Wording and Formatting for Singable Melody-to-Lyric Generation
oai:arXiv.org:2307.02146v3
arXiv:2307.02146v3 Announce Type: replace-cross
Abstract: Despite progress in melody-to-lyric generation, a substantial singability gap remains between machine-generated lyrics and those written by human lyricists. In this work, we aim to narrow this gap by jointly learning both wording and formatting for melody-to-lyric generation. After general-domain pretraining, our model acquires length awareness through an self-supervised stage trained on a large text-only lyric corpus. During supervised melody-to-lyric training, we introduce multiple auxiliary supervision objective informed by musicological findings on melody--lyric relationships, encouraging the model to capture fine-grained prosodic and structural patterns. Compared with na\"ive fine-tuning, our approach improves adherence to line-count and syllable-count requirements by 3.8% and 21.4% absolute, respectively, without degrading text quality. In human evaluation, it achieves 42.2% and 74.2% relative gains in overall quality over two task-specific baselines, underscoring the importance of formatting-aware training for generating singable lyrics.
Adaptive Compressive Tactile Subsampling: Enabling High Spatiotemporal Resolution in Scalable Robotic Skin
oai:arXiv.org:2410.13847v3
arXiv:2410.13847v3 Announce Type: replace-cross
Abstract: Robots require full-body, high-resolution tactile sensing to operate safely in unstructured environments, enabling reflexive responses and closed-loop control. However, the pixel counts needed for dense, large-area coverage limit readout rates of most tactile arrays to <100 Hz, hindering their use in high-speed tasks. We present Adaptive Compressive Tactile Subsampling (ACTS), a scalable and data-driven method that greatly enhances traditional tactile matrices by leveraging adaptive sensor sampling and sparse recovery. By adaptively allocating measurements to informative regions, ACTS is especially effective for spatially sparse signals common in real-world interactions. Tested on a 1024-pixel tactile sensor array (32x32), ACTS achieved frame rates up to 1,000 Hz, an 18X improvement over conventional raster scanning, with minimal reconstruction error. For the first time, ACTS enables wearable, large-area, high-density tactile sensing systems that can deliver high-speed results. We demonstrate rapid object classification within 20 ms of contact, high-speed projectile detection, ricochet angle estimation, and soft deformation tracking, in tactile and robotics applications, all using flexible, high-density tactile arrays. These include high-resolution tactile gloves, pressure insoles, and full-body configurations covering robotic arms and human-sized mannequins. We further showcase tactile-based closed-loop control by guiding a metallic ball to trace letters using tactile feedback and by executing tactile-only whole-hand reflexes on a fully sensorized LEAP hand to stabilize grasps, prevent slip, and avoid sharp objects, validating ACTS for real-time interaction and motion control. ACTS transforms standard, low-cost, and robust tactile sensors into high-speed systems enabling scalable, responsive, and adaptive tactile perception for robots and wearables operating in dynamic environments.
UStyle: Waterbody Style Transfer of Underwater Scenes by Depth-Guided Feature Synthesis
oai:arXiv.org:2503.11893v3
arXiv:2503.11893v3 Announce Type: replace-cross
Abstract: The concept of waterbody style transfer remains largely unexplored in the underwater imaging and vision literature. Traditional image style transfer (STx) methods primarily focus on artistic and photorealistic blending, often failing to preserve object and scene geometry in images captured in high-scattering mediums such as underwater. The wavelength-dependent nonlinear attenuation and depth-dependent backscattering artifacts further complicate learning underwater image STx from unpaired data. This paper introduces UStyle, the first data-driven learning framework for transferring waterbody styles across underwater images without requiring prior reference images or scene information. We propose a novel depth-aware whitening and coloring transform (DA-WCT) mechanism that integrates physics-based waterbody synthesis to ensure perceptually consistent stylization while preserving scene structure. To enhance style transfer quality, we incorporate carefully designed loss functions that guide UStyle to maintain colorfulness, lightness, structural integrity, and frequency-domain characteristics, as well as high-level content in VGG and CLIP (contrastive language-image pretraining) feature spaces. By addressing domain-specific challenges, UStyle provides a robust framework for no-reference underwater image STx, surpassing state-of-the-art (SOTA) methods that rely solely on end-to-end reconstruction loss. Furthermore, we introduce the UF7D dataset, a curated collection of high-resolution underwater images spanning seven distinct waterbody styles, establishing a benchmark to support future research in underwater image STx. The UStyle inference pipeline and UF7D dataset are released at: https://github.com/uf-robopi/UStyle.
CTorch: PyTorch-Compatible GPU-Accelerated Auto-Differentiable Projector Toolbox for Computed Tomography
oai:arXiv.org:2503.16741v5
arXiv:2503.16741v5 Announce Type: replace-cross
Abstract: This work introduces CTorch, a PyTorch-compatible, GPU-accelerated, and auto-differentiable projector toolbox designed to handle various CT geometries with configurable projector algorithms. CTorch provides flexible scanner geometry definition, supporting 2D fan-beam, 3D circular cone-beam, and 3D non-circular cone-beam geometries. Each geometry allows view-specific definitions to accommodate variations during scanning. Both flat- and curved-detector models may be specified to accommodate various clinical devices. CTorch implements four projector algorithms: voxel-driven, ray-driven, distance-driven (DD), and separable footprint (SF), allowing users to balance accuracy and computational efficiency based on their needs. All the projectors are primarily built using CUDA C for GPU acceleration, then compiled as Python-callable functions, and wrapped as PyTorch network module. This design allows direct use of PyTorch tensors, enabling seamless integration into PyTorch's auto-differentiation framework. These features make CTorch an flexible and efficient tool for CT imaging research, with potential applications in accurate CT simulations, efficient iterative reconstruction, and advanced deep-learning-based CT reconstruction.
Random-phase Wave Splatting of Translucent Primitives for Computer-generated Holography
oai:arXiv.org:2508.17480v3
arXiv:2508.17480v3 Announce Type: replace-cross
Abstract: Holographic near-eye displays offer ultra-compact form factors for VR/AR systems but rely on advanced computer-generated holography (CGH) algorithms to convert 3D scenes into interference patterns on spatial light modulators (SLMs). Conventional CGH typically generates smooth-phase holograms, limiting view-dependent effects and realistic defocus blur, while severely under-utilizing the SLM space-bandwidth product. We propose Random-phase Wave Splatting (RPWS), a unified wave optics rendering framework that converts arbitrary 3D representations based on 2D translucent primitives into random-phase holograms. RPWS is fully compatible with modern 3D representations such as Gaussians and triangles, improves bandwidth utilization which effectively enlarges eyebox size, reconstructs accurate defocus blur and parallax, and leverages time-multiplexed rendering not as a heuristic for speckle suppression, but as a mathematically exact alpha-blending mechanism derived from first principles in statistics. At the core of RPWS are (1) a new wavefront compositing procedure and (2) an alpha-blending scheme for random-phase geometric primitives, ensuring correct color reconstruction and robust occlusion when compositing millions of primitives. RPWS departs substantially from the recent primitive-based CGH algorithm, Gaussian Wave Splatting (GWS). Because GWS uses smooth-phase primitives, it struggles to capture view-dependent effects and realistic defocus blur and under-utilizes the SLM space-bandwidth product; moreover, naively extending GWS to random-phase primitives fails to reconstruct accurate colors. In contrast, RPWS is designed from the ground up for arbitrary random-phase translucent primitives, and through simulations and experimental validations we demonstrate state-of-the-art image quality and perceptually faithful 3D holograms for next-generation near-eye displays.
A Digital SRAM-Based Compute-In-Memory Macro for Weight-Stationary Dynamic Matrix Multiplication in Transformer Attention Score Computation
oai:arXiv.org:2511.12152v3
arXiv:2511.12152v3 Announce Type: replace-cross
Abstract: Compute-in-memory (CIM) techniques are widely employed in energy-efficient artificial intelligent (AI) processors. They alleviate power and latency bottlenecks caused by extensive data movements between compute and storage units. To extend these benefits to Transformer, this brief proposes a digital CIM macro to compute attention score. To eliminate dynamic matrix multiplication (MM), we reconstruct the computation as static MM using a combined QK-weight matrix, so that inputs can be directly fed to a single CIM macro to obtain the score results. However, this introduces a new challenge of 2-input static MM. The computation is further decomposed into four groups of bit-serial logical and addition operations. This allows 2-input to directly activate the word line via AND gate, thus realizing 2-input static MM with minimal overhead. A hierarchical zero-value bit skipping mechanism is introduced to prioritize skipping zero-value bits in the 2-input case. This mechanism effectively utilizes data sparsity of 2-input, significantly reducing redundant operations. Implemented in a 65-nm process, the 0.35 mm2 macro delivers 42.27 GOPS at 1.24 mW, yielding 34.1 TOPS/W energy and 120.77 GOPS/mm2 area efficiency. Compared to CPUs and GPUs, it achieves ~25x and ~13x higher efficiency, respectively. Against other Transformer-CIMs, it demonstrates at least 7x energy and 2x area efficiency gains, highlighting its strong potential for edge intelligence.
MTTR-A: Measuring Cognitive Recovery Latency in Multi-Agent Systems
oai:arXiv.org:2511.20663v3
arXiv:2511.20663v3 Announce Type: replace-cross
Abstract: Ensuring cognitive stability in autonomous multi-agent systems (MAS) is a central challenge for large-scale, distributed AI. While existing observability tools monitor system outputs, they cannot quantify how rapidly agentic workflows recover once reasoning coherence has been lost. We adapt classical reliability metrics-Mean Time-to-Recovery (MTTR), Mean Time Between Failures (MTBF), and related ratios-into the cognitive domain, defining MTTR-A (Mean Time-to-Recovery for Agentic Systems) as a runtime measure of cognitive recovery latency. MTTR-A quantifies the time required for a MAS to detect reasoning drift and restore consistent operation, capturing the recovery of reasoning coherence rather than infrastructural repair.
A benchmark simulation using the AG~News corpus and the LangGraph orchestration framework was conducted, modeling recovery latencies across multiple reflex modes. Automated reflexes restored stability within approximately 6s on average, while human-approval interventions required about 12s. Across 200 runs, the median simulated MTTR-A was 6.21+-2.14s, MTBF=6.7+-2.14s, and NRR=0.08, demonstrating measurable runtime resilience across reflex strategies.
By formalizing recovery latency as a quantifiable property of distributed reasoning-and deriving reliability bounds linking recovery time and cognitive uptime-this work establishes a foundation for runtime dependability in agentic cognition, transforming cognitive recovery from an ad-hoc process into a standardized, interpretable performance
Time-Series at the Edge: Tiny Separable CNNs for Wearable Gait Detection and Optimal Sensor Placement
oai:arXiv.org:2512.00396v2
arXiv:2512.00396v2 Announce Type: replace-cross
Abstract: We study on-device time-series analysis for gait detection in Parkinson's disease (PD) from short windows of triaxial acceleration, targeting resource-constrained wearables and edge nodes. We compare magnitude thresholding to three 1D CNNs for time-series analysis: a literature baseline (separable convolutions) and two ultra-light models - one purely separable and one with residual connections. Using the BioStampRC21 dataset, 2 s windows at 30 Hz, and subject-independent leave-one-subject-out (LOSO) validation on 16 PwPD with chest-worn IMUs, our residual separable model (Model 2, 533 params) attains PR-AUC = 94.5%, F1 = 91.2%, MCC = 89.4%, matching or surpassing the baseline (5,552 params; PR-AUC = 93.7%, F1 = 90.5%, MCC = 88.5%) with approximately 10x fewer parameters. The smallest model (Model 1, 305 params) reaches PR-AUC = 94.0%, F1 = 91.0%, MCC = 89.1%. Thresholding obtains high recall (89.0%) but low precision (76.5%), yielding many false positives and high inter-subject variance. Sensor-position analysis (train-on-all) shows chest and thighs are most reliable; forearms degrade precision/recall due to non-gait arm motion; naive fusion of all sites does not outperform the best single site. Both compact CNNs execute within tight memory/latency budgets on STM32-class MCUs (sub-10 ms on low-power boards), enabling on-sensor gating of transmission/storage. Overall, ultra-light separable CNNs provide a superior accuracy-efficiency-generalization trade-off to fixed thresholds for wearable PD gait detection and underscore the value of tailored time-series models for edge deployment.
Stabilizing Rate of Stochastic Control Systems
oai:arXiv.org:2512.06349v2
arXiv:2512.06349v2 Announce Type: replace-cross
Abstract: This paper develops a quantitative framework for analyzing the mean-square exponential stabilization of stochastic linear systems with multiplicative noise, focusing specifically on the optimal stabilizing rate, which characterizes the fastest exponential stabilization achievable under admissible control policies. Our contributions are twofold. First, we extend norm-based techniques from deterministic switched systems to the stochastic setting, deriving a verifiable necessary and sufficient condition for the exact attainability of the optimal stabilizing rate, together with computable upper and lower bounds. Second, by restricting attention to state-feedback policies, we reformulate the optimal stabilizing rate problem as an optimal control problem with a nonlinear cost function and derive a Bellman-type equation. Since this Bellman-type equation is not directly tractable, we recast it as a nonlinear matrix eigenvalue problem whose valid solutions require strictly positive-definite matrices. To ensure the existence of such solutions, we introduce a regularization scheme and develop a Regularized Normalized Value Iteration (RNVI) algorithm, which in turn generates strictly positive-definite fixed points for a perturbed version of original nonlinear matrix eigenvalue problem while producing feedback controllers. Evaluating these regularized solutions further yields certified lower and upper bounds for the optimal stabilizing rate, resulting in a constructive and verifiable framework for determining the fastest achievable mean-square stabilization under multiplicative noise.
Linear Quadratic Regulators: A New Look
oai:arXiv.org:2512.10641v2
arXiv:2512.10641v2 Announce Type: replace-cross
Abstract: Linear time-invariant control systems can be considered as finitely generated modules over the commutative principal ideal ring $\mathbb{R}[\frac{d}{dt}]$ of linear differential operators with respect to the time derivative. The Kalman controllability in this algebraic language is translated as the freeness of the system module. Linear quadratic regulators rely on quadratic Lagrangians, or cost functions. Any flat output, i.e., any basis of the corresponding free module leads to an open-loop control strategy via an Euler-Lagrange equation, which becomes here a linear ordinary differential equation with constant coefficients. In this approach, the two-point boundary value problem, including the control variables, becomes tractable. It yields notions of optimal time horizon, optimal parameter design and optimal rest-to-rest trajectories. The loop is closed via an intelligent controller derived from model-free control, which is known to exhibit excellent performance concerning model mismatches and disturbances.