## Plan & Vision

My research focuses on enabling **robust and trustworthy wireless autonomy**, by leveraging and developing tools spanning the areas of** machine learning, optimization & control, and signal processing**.

Wireless Autonomous Networked Systems (**WANS**) are already everywhere around us, sensing, processing, learning and making collaborative decisions, potentially affecting the surrounding environment, as well as connected infrastructure. Typical examples include wireless communication networks, drone swarms, mobile and/or robotic networks, Unmanned Aerial Vehicles (UAVs), and self-driving cars. However, while the adoption of WANS in modern society presents a high potential for socioeconomic growth, it also presents nontrivial challenges. In fact, the design and operation of such systems require not only to be efficient and **driven by actual, observable data**, but also to meet strict specifications, induced by the need for **performance robustness**, while simultaneously ensuring **system** **trustworthiness**, realized in the form of **safety**, **security**, **privacy/secrecy** and **domain adaptation** guarantees.

In my work, application domains of particular interest are multiuser cooperative communications, 5G standards and mmWave technologies, urban communications & networking, process monitoring & control, urban information systems, path planning and spatial network control, network secrecy, target detection & tracking, and the Internet-of-Things (IoT), including futuristic networking scenarios involving smart infrastructures as well as autonomous vehicle coordination.

My research plan is organized around the following coherent, cross-interacting thrusts (click to expand for details):

SPA**Thrust 1: Reinforcement Learning for Wireless Comms & Networking**

**Autonomous Network Control for Wireless Communications:****Model-Free Ergodic Resource Allocation in Wireless Systems:****Zeroth-order Deterministic Policy Gradient Algorithms:****[KKPR20].**

**Future Research Plan (Thrust 1):**

**• Reinforcement Learning for Robust Resource Allocation in Wireless Systems.**I will work towards the development of data-driven, model-free algorithms for

*stochastically robust*, dynamic resource allocation in wireless systems, balancing high rates and stable system performance across users, and across time. As an example, robust resource allocation policies may be obtained by replacing expectations in ergodic resource allocation problems with quantiles. These are instances of constrained risk-aware stochastic programs, solvable due to recent advances in the area of risk-aware optimization (see Thrust 2). Robustifying resource allocation in such ways will naturally result in more uniform system performance with user-tunable reliability. Further, the associated policies will be superior to those resulting from (rigidly robust) minimax formulations, the latter being well-known for their notorious difficulty, and for often achieving overly pessimistic system performance.

**• Data-Driven Network Task Optimization over Nonstationary/Unknown Channels.**Network task optimization problems appear naturally in many application areas involving WANS, such as urban communication and information systems, mobility-enabled networks, IoT, autonomous vehicle coordination, and military operations. More specific examples are comm-aware path planning, and spatially-aware communications. Although task optimization over WANS is conveniently channel model-based, the right model hyperparameters are most often unknown apriori, and must be learned from observations. Such parameter estimation procedures are numerically inefficient, imprecise, waste power and bandwidth, and introduce extra model-mismatch inaccuracies, especially in large-scale networking problems. Therefore, they are not suitable for real-time settings that often involve fast-changing, or even unknown channel conditions. My goal will be to approach channel-adaptive task optimization over WANS from a data-driven, model-free perspective, bypassing the need for (re)fitting explicit intermediate channel models, and by

*focusing on the task*. Fundamental challenges include dealing with complex interactions among agents, and across multiple periods, stages and timescales. Robust adaptive formulations of task optimization problems over WANS will also be explored.

SPA**Thrust 2: Risk-Aware Optimization for Learning, Estimation & Control**

**relatively infrequent, but statistically significant**events in decision processes. Consequently, risk-awareness is of fundamental importance in the design and optimization of WANS, since it is inherently connected with operation stability, and stochastic robustness to variations in performance, caused by exogenous and endogenous (or systemic) uncertainties; thus, risk-awareness is naturally connected with system trustworthiness, as well (see Thrust 3). Still, the development of practical and computationally efficient methods for risk-aware optimization is very unexplored, especially in nonstationary, dynamic and real-time scenarios, so common in a large variety of WANS applications involving all kinds of learning, estimation and control tasks. My contributions in this area are:

**Gradient-Based Algorithms for Risk-Aware Optimization**:**[KP18a].**

**Zeroth-order Algorithms for Risk-Aware Optimization:****[KP19].**

**Risk-Aware MMSE Estimation:****Risk-Aware Linear-Quadratic Control:****[TKCRP20].**

**Noisy Linear Convergence of SGD for CV@R Learning under the Polyak-****Łojasiewicz****Inequality:****[Kal21].**

• **Gradient-Based Algorithms for Risk-Aware Optimization.** I have introduced the MESSAGEp algorithm, a data-driven stochastic subgradient scheme for optimizing a new class of measures of risk, termed mean-semideviations, generalizing the classical mean-upper-semideviation risk measure. Under the mean-risk tradeoff framework, mean-semideviations extend expectation-based uncertainty quantification, and provide an intuitive, powerful, application-driven and operationally significant alternative to risk-neutral stochastic decision making, and expectation-based machine learning. Under the most flexible set of assumptions to date, I have provided a complete asymptotic characterization of the MESSAGEp algorithm, including explicit convergence rates and sample complexity guarantees, strictly extending and improving on the state of the art.

• **Zeroth-order Algorithms for Risk-Aware Optimization.** In many applications involving either risk-neutral or risk-averse optimization, (sub)gradient information is very difficult, or even impossible to obtain. Examples of immediate interest include dynamic resource allocation in wireless systems, and training of complex neural network architectures, such as recurrent DNNs (see Thrust 1). In this context, I have introduced Free-MESSAGEp , the first zeroth-order algorithm for gradient-free mean-semideviation optimization, for which I have established path convergence, as well as explicit convergence rates for both convex and strongly convex costs/rewards. Most importantly, I have demonstrated virtually no sacrifice in convergence speed as compared to the MESSAGEp algorithm (the gradient-based counterpart of Free-MESSAGEp), and I have explicitly quantified the benefits of strong convexity on problem conditioning. These results present certain insightful tradeoffs between algorithmic precision and problem dimension, and naturally extend fundamental prior work on zeroth-order risk-neutral optimization.

**Future Research Plan (Thrust 2):**

**• Risk-Aware Reinforcement Learning.**I will develop and analyze new risk-aware versions of standard, risk-neutral (expectation-based) RL algorithms (e.g., Q-Learning, Policy Gradient, Actor-Critic), suitable for optimizing dynamic measures of risk without expectation representations; examples include mean-semideviation and mean-variance measures of risk, and other application-specific, composite risk functionals.

**• Risk-Aware Linear Estimation.**Linear models are always on the forefront of modern machine learning, statistics, control and signal processing, arguably being the most popular representations used in numerous applications. Leveraging my expertise in gradient-based risk-aware optimization, I will develop

*a new class of risk-aware linear adaptive filters, extending classical tools in risk-neutral stochastic optimization and statistical signal processing, such as the celebrated Least Mean Squares (LMS) and Recursive Least Squares (RLS) algorithms. This is challenging due to the nonlinearities induced by the use of measures of risk instead of expectations, despite model linearity. More specifically, I will work on fast algorithm design, sample complexity and convergence rates, distributed risk-aware linear estimation over networks, and system identifiability.*

**• Risk-Aware State Estimation & Control.**Risk-sensitive optimization has a long history in estimation and control, with the vast majority of existing contributions replacing the classical quadratic cost with its exponentiation, which quantifies risk in a certain way, as seen by the Taylor expansion of the exponential function. However, this approach is significantly limited. First, it is not tunable, meaning that one cannot interpretably control the trade-off between mean performance and risk. Second, it cannot be applied in problems where explicit risk constraints need to be met. I will develop new filtering and control strategies resulting from both explicit risk-constrained formulations, such as variance-constrained dynamic estimation and control problems, and penalty-based formulations. I will further focus on variational duality in risk-aware state estimation and control, revealing close relationships between risk constraints and risk penalties, and enabling effective dual-domain algorithm development. Expected outcomes of this research include efficient, semi-Bayesian/data-driven risk-aware analogs of nonlinear recursive filters, linear-quadratic stochastic control, etc.

**• Robust Training and Adversarial Examples.**I will explore intrinsic connections between risk-aware learning and robustified learning achieved via tackling nonconvex stochastic minimax problems. The latter are rather difficult to handle, but have very recently found success and extensive practical applicability, including in standard cases of WANS, such as self-driving cars, and beyond; three such examples are l∞ -based robust training, Generative Adversarial Networks (GANs), and Lagrangian duality in constrained statistical learning. By exploiting core properties of certain coherent risk measures, and in conjunction with the power of universal data representations (e.g., deep neural networks), I will develop a new framework enabling

*principled relaxation*of minimax learning problems into

*minimization-only*stochastic problems. This not only will simplify robust training from a practical perspective providing (near-)optimal solutions to adversarial problems, but will also allow the development of rigorous

*and*implementable accuracy/robustness trade-offs. This research is expected to impact general (nonconvex) minimax optimization as well, related to constrained optimization, duality, and game theory.

SPA**Thrust 3: Constrained Learning for Trustworthy Autonomy**

**trust constraints**are of fundamental importance in the design and optimization of WANS. Specific application domains of interest include autonomous vehicle coordination, process monitoring and field exploration, networked path planning, target tracking, network secrecy, physical layer security, and privacy-aware/adversarial decision making.

**Future Research Plan (Thrust 3):**

**• Interfaces between Trustworthiness & Risk.**At least in stochastic problems, imposing trustworthiness is actually a form of risk-awareness (see Thrust 2). Although this might sound like a bold statement, it is largely true, since, as previously mentioned, any form of trust constraint controls the occurrence of undesirable but statistically significant events, such as safety, security or privacy breaches. Indeed, stochastic trust constraints appearing in modern machine and reinforcement learning are in the form of stagewise expectations, probabilities (chance constraints), and quantiles. In this context, I will explore fundamental connections between trustworthiness and risk, in particular how successful ideas in constrained risk-aware optimization can be exploited for effective

**data-driven learning for trust**

**development**. In particular, the role of problem duality in constrained risk-aware optimization will be prominent: Appropriate dualization of trust constraints will allow the development of efficient algorithms for trustworthy reinforcement learning with predictable performance and sample complexity guarantees, especially useful in data-driven, model-free design and optimization for safe and resilient WANS.

**• Rapid Learning for Domain Adaptation over WANS.**There is no shortage of applications involving WANS operating over changing, adversarial, degraded, hostile or even completely unknown environments (also see Thrust 1). In such cases, starting with an initial nominal resource allocation and physical configuration, system adaptation and learning must be performed on very fast time scales, while provably preserving desired system efficiency, autonomy and trustworthiness. My first goal is to exploit and advance model-free domain adaptation methods, such as meta-reinforcement learning (Meta-RL) and its variants, for efficient resource allocation and task optimization over WANS involving complex, interacting decision processes related to wireless channels, data obtained via sensing, processing, and multi-agent collaboration. Additionally, I will work on the development of algorithms for adaptation, learning, and planning in task optimization problems where information is revealed incrementally over time (e.g., rescue operations or warfare missions) and useful actions need to be taken, relying on partial information and imperfect knowledge of the involved optimization surface.

## Recent and Past Projects

Click on the titles and images to learn more!

Data-Driven Risk-Aware Optimization & Learning

Data-Driven Risk-Aware Optimization & Learning

**Learn robustly** by optimizing **risk, not averages**.

Space-Aware Wireless Comms: mmWaves, 5G & Beyond

Space-Aware Wireless Comms: mmWaves, 5G & Beyond

**Communicate better, faster and safer**: Let your network **sense, move, and adapt**.

Approximate Nonlinear Filtering

Approximate Nonlinear Filtering

**Consistent, Stable and fast** state estimation in centralized and distributed settings.

Matrix Completion-based MIMO Radar

Matrix Completion-based MIMO Radar

**Efficient and denoised** target detection from a **minimal number of observations**.