## Plan & Vision

My research focuses on enabling **robust and trustworthy wireless autonomy**, by leveraging and developing tools spanning the areas of** machine learning, optimization & control, and signal processing**.

Wireless Autonomous Networked Systems (**WANS**) are already everywhere around us, sensing, processing, learning and making collaborative decisions, potentially affecting the surrounding environment, as well as connected infrastructure. Typical examples include wireless communication networks, drone swarms, mobile and/or robotic networks, Unmanned Aerial Vehicles (UAVs), and self-driving cars. However, while the adoption of WANS in modern society presents a high potential for socioeconomic growth, it also presents nontrivial challenges. In fact, the design and operation of such systems require not only to be efficient and **driven by actual, observable data**, but also to meet strict specifications, induced by the need for **performance robustness**, while simultaneously ensuring **system** **trustworthiness**, realized in the form of **safety**, **security**, **privacy/secrecy** and **domain adaptation** guarantees.

In my work, application domains of particular interest are multiuser cooperative communications, 5G standards and mmWave technologies, urban communications & networking, process monitoring & control, urban information systems, path planning and spatial network control, network secrecy, target detection & tracking, and the Internet-of-Things (IoT), including futuristic networking scenarios involving smart infrastructures as well as autonomous vehicle coordination.

My research plan is organized around the following coherent, cross-interacting thrusts (click to expand for details):

SPA**Thrust 1: Reinforcement Learning for Wireless Comms & Networking**

•

**Autonomous Network Control for Wireless Communications.**In the context of relay-assisted wireless communications, I have proposed a novel WANS approach called spatially controlled relay beamforming, which optimally exploits autonomous relay mobility jointly with optimal transmit beamforming for distributed network Quality-of-Service (QoS) maximization, via a combination of model-based reinforcement learning and 2-stage stochastic programming. This system achieves experimentally observed QoS gains of up to 80% as compared to channel agnostic, randomized relay motion, or no motion at all. This idea has been subsequently extended to the realm of urban communications by introducing urban WANS for Non-Line-Of-Sight (NLOS) mmWave networks assisted by multiple relay clusters, where network QoS is maximized via joint distributed adaptive relay selection and power/phase allocation. Compared to the state of the art, the resulting wireless system achieves near-ideal performance with significantly lower Channel State Information (CSI) estimation overhead, while providing substantially better network utilization.

•

**Model-Free Ergodic Resource Allocation in Wireless Systems.**Optimal resource allocation in real-world wireless systems is rather challenging, not only due to the unavailability of accurate statistical channel models, but also because expressions of maximal or achievable information rates are often unknown, or not adequately precise. Under a modular stochastic functional optimization framework, I have proposed a new, strongly theoretically justified, zeroth-order stochastic primal-dual algorithm for completely data-driven and model-free learning of optimal resource allocation policies for ergodic network optimization. My contribution relies on Gaussian smoothing of the original constrained policy search problem, and on the representation power of universal policy parameterizations, such as Deep Neural Networks (DNNs). In fact, DNN-based data-driven policies produced by the proposed primal-dual method provably attain near-ideal performance, based exclusively on limited channel probing, completely bypassing the need for gradient computations, and at the absence of baseline channel or information rate models.

**Future Research Plan (Thrust 1):**

**• Reinforcement Learning for Robust Resource Allocation in Wireless Systems.**I will work towards the development of data-driven, model-free algorithms for

*stochastically robust*, dynamic resource allocation in wireless systems, balancing high rates and stable system performance across users, and across time. As an example, robust resource allocation policies may be obtained by replacing expectations in ergodic resource allocation problems with quantiles. These are instances of constrained risk-aware stochastic programs, solvable due to recent advances in the area of risk-aware optimization (see Thrust 2). Robustifying resource allocation in such ways will naturally result in more uniform system performance with user-tunable reliability. Further, the associated policies will be superior to those resulting from (rigidly robust) minimax formulations, the latter being well-known for their notorious difficulty, and for often achieving overly pessimistic system performance.

**• Data-Driven Network Task Optimization over Nonstationary/Unknown Channels.**Network task optimization problems appear naturally in many application areas involving WANS, such as urban communication and information systems, mobility-enabled networks, IoT, autonomous vehicle coordination, and military operations. More specific examples are comm-aware path planning, and spatially-aware communications. Although task optimization over WANS is conveniently channel model-based, the right model hyperparameters are most often unknown apriori, and must be learned from observations. Such parameter estimation procedures are numerically inefficient, imprecise, waste power and bandwidth, and introduce extra model-mismatch inaccuracies, especially in large-scale networking problems. Therefore, they are not suitable for real-time settings that often involve fast-changing, or even unknown channel conditions. My goal will be to approach channel-adaptive task optimization over WANS from a data-driven, model-free perspective, bypassing the need for (re)fitting explicit intermediate channel models, and by

*focusing on the task*. Fundamental challenges include dealing with complex interactions among agents, and across multiple periods, stages and timescales. Robust adaptive formulations of task optimization problems over WANS will also be explored.

SPA**Thrust 2: Risk-Aware Optimization for Learning, Estimation & Control**

**relatively infrequent, but statistically significant**events in decision processes. Consequently, risk-awareness is of fundamental importance in the design and optimization of WANS, since it is inherently connected with operation stability, and stochastic robustness to variations in performance, caused by exogenous and endogenous (or systemic) uncertainties; thus, risk-awareness is naturally connected with system trustworthiness, as well (see Thrust 3). Still, the development of practical and computationally efficient methods for risk-aware optimization is very unexplored, especially in nonstationary, dynamic and real-time scenarios, so common in a large variety of WANS applications involving all kinds of learning, estimation and control tasks. My contributions in this area are:

•

**Gradient-Based Algorithms for Risk-Aware Optimization.**I have introduced the MESSAGEp algorithm, a data-driven stochastic subgradient scheme for optimizing a new class of measures of risk, termed mean-semideviations, generalizing the classical mean-upper-semideviation risk measure. Under the mean-risk tradeoff framework, mean-semideviations extend expectation-based uncertainty quantification, and provide an intuitive, powerful, application-driven and operationally significant alternative to risk-neutral stochastic decision making, and expectation-based machine learning. Under the most flexible set of assumptions to date, I have provided a complete asymptotic characterization of the MESSAGEp algorithm, including explicit convergence rates and sample complexity guarantees, strictly extending and improving on the state of the art.

•

**Zeroth-order Algorithms for Risk-Aware Optimization.**In many applications involving either risk-neutral or risk-averse optimization, (sub)gradient information is very difficult, or even impossible to obtain. Examples of immediate interest include dynamic resource allocation in wireless systems, and training of complex neural network architectures, such as recurrent DNNs (see Thrust 1). In this context, I have introduced Free-MESSAGEp , the first zeroth-order algorithm for gradient-free mean-semideviation optimization, for which I have established path convergence, as well as explicit convergence rates for both convex and strongly convex costs/rewards. Most importantly, I have demonstrated virtually no sacrifice in convergence speed as compared to the MESSAGEp algorithm (the gradient-based counterpart of Free-MESSAGEp), and I have explicitly quantified the benefits of strong convexity on problem conditioning. These results present certain insightful tradeoffs between algorithmic precision and problem dimension, and naturally extend fundamental prior work on zeroth-order risk-neutral optimization.

**Future Research Plan (Thrust 2):**

**• Risk-Aware Reinforcement Learning.**I will work towards the development and analysis of risk-aware extensions of standard, risk-neutral (expectation-based) Q-Learning as well as off-policy learning variants for optimizing dynamic measures of risk not admitting expectation representations; typical examples include mean-semideviation and mean-variance measures of risk, and other application-specific, composite risk functionals.

**• Risk-Aware Linear Estimation.**Linear models are always on the forefront of modern machine learning, statistics, control and signal processing, arguably being the most popular representations used in moumerous applications. Leveraging my expertise in gradient-based risk-aware optimization, I will develop

*a new class of risk-aware linear adaptive filters, extending classical tools in risk-neutral stochastic optimization and statistical signal processing, such as the celebrated Least Mean Squares (LMS) and Recursive Least Squares (RLS) algorithms. This is challenging due to the nonlinearities induced by the use of measures of risk instead of expectations, despite model linearity. More specifically, I will work on fast algorithm design, sample complexity and convergence rates, distributed risk-aware linear estimation over networks, and system identifiability.*

**• Risk-Aware State Estimation & Control.**Risk-sensitive optimization has a long history in estimation and control, with the vast majority of existing contributions replacing the classical quadratic cost with its exponentiation, which quantifies risk in a certain way, as seen by the Taylor expansion of the exponential function. However, this approach is significantly limited. First, it is not tunable, meaning that one cannot effectively control the trade-off between mean performance and risk. Second, it cannot be applied in problems where explicit risk constraints need to be met. I will develop new filtering and control strategies resulting from both explicit risk-constrained formulations, such as variance-constrained dynamic estimation and control problems, and penalty-based formulations. I will further focus on variational duality in risk-aware state estimation and control, revealing close relationships between risk constraints and risk penalties, and enabling effective dual-domain algorithm development. Expected outcomes of this research include efficient, semi-Bayesian/data-driven risk-aware analogs of nonlinear recursive filters, linear-quadratic stochastic control, etc.

**• Risk-Aware Statistical Leaning Theory.**This research will put forward a new paradigm in statistical learning, departing from the risk-neutral, expected utility (loss or reward or so-called “risk”) approach, which, virtually due to linearity as a fundamental property of expectation, is almost universally accepted in machine learning theory and practice. Generalizing the objective of learning to explicitly account for risk, I will develop new consistency and generalization theory, establishing the foundations of what I call

*risk-aware statistical learning*. This is will also trigger new developments in modern statistics, such as concentration of measure and the theory of empirical processes. Further, I will explore deep connections between risk-aware and risk-neutral learning, such as the effects of finite-sample training of risk-aware models on risk-neutral generalization and overfitting. Along this line of research, I will support new theory with real-world experimental studies, in order to emphasize the usefulness of risk-awareness in the practice of machine learning and artificial intelligence.

SPA**Thrust 3: Constrained Reinforcement Learning for Trustworthy Autonomy**

**trust constraints**are of fundamental importance in the design and optimization of WANS. Specific application domains of interest include autonomous vehicle coordination, process monitoring and field exploration, networked path planning, target tracking, network secrecy, physical layer security, and privacy-aware/adversarial decision making.

**Future Research Plan (Thrust 3):**

**• Interfaces between Trustworthiness & Risk.**At least in stochastic problems, imposing trustworthiness is actually a form of risk-awareness (see Thrust 2). Although this might sound like a bold statement, it is largely true, since, as previously mentioned, any form of trust constraint controls the occurrence of undesirable but statistically significant events, such as safety, security or privacy breaches. Indeed, stochastic trust constraints appearing in modern reinforcement learning are in the form of stagewise expectations, probabilities (chance constraints), and quantiles. In this context, I will explore fundamental connections between trustworthiness and risk, in particular how successful ideas in constrained risk-aware optimization can be exploited for effective

**data-driven learning for trust**

**development**. In particular, the role of problem duality in constrained risk-aware optimization will be prominent: Appropriate dualization of trust constraints will allow the development of efficient algorithms for trustworthy reinforcement learning with predictable performance and sample complexity guarantees, especially useful in data-driven, model-free design and optimization for safe and resilient WANS.

**• Rapid Learning for Domain Adaptation over WANS.**There is no shortage of applications involving WANS operating over changing, adversarial, degraded, hostile or even completely unknown environments (also see Thrust 1). In such cases, starting with an initial nominal resource allocation and physical configuration, system adaptation and learning must be performed on very fast time scales, while provably preserving desired system efficiency, autonomy and trustworthiness. My first goal is to exploit and advance model-free domain adaptation methods, such as meta-reinforcement learning (Meta-RL) and its variants, for efficient resource allocation and task optimization over WANS involving complex, interacting decision processes related to wireless channels, data obtained via sensing, processing, and multi-agent collaboration. Additionally, I will work on the development of algorithms for adaptation, learning, and planning in task optimization problems where information is revealed incrementally over time (e.g., rescue operations or warfare missions) and useful actions need to be taken, relying on partial information and imperfect knowledge of the involved optimization surface.

## Recent Projects

Click on the titles and images to learn more!

Data-Driven Risk-Aware Optimization & Learning

Data-Driven Risk-Aware Optimization & Learning

**Learn robustly** by optimizing **risk, not averages**.

Space-Aware Wireless Comms: mmWaves, 5G & Beyond

Space-Aware Wireless Comms: mmWaves, 5G & Beyond

**Communicate better, faster and safer**: Let your network **sense, move, and adapt**.

Approximate Nonlinear Filtering

Approximate Nonlinear Filtering

**Consistent, Stable and fast** state estimation in centralized and distributed settings.

Matrix Completion-based MIMO Radar

Matrix Completion-based MIMO Radar

**Efficient and denoised** target detection from a **minimal number of observations**.