I Introduction
The future tactical wireless communications network must cope with an increasingly congested and contested spectrum environment, and under adversarial conditions and high tempo as envisaged in multidomain operations (MDO) [1]. The emerging area of massive MIMO is a potential solution to increase rate and survivability [2, 10], with increasing DoD interest in adapting commercial 5G solutions to tactical network challenges [7], [4]
. Concomitant with this are challenges related to computational complexity of implementing (near)optimal solutions, as the number of devices and/or number of antennas increase. Domainaware machine learning algorithms have recently emerged as promising solution approaches
[9, 18, 3, 12, 29]. The idea here is to combine relevant domain information available in the form of a structured algorithmic solution for a specific problem with neural architectures that learn directly from data [17, 25]. These datadriven hybrid connectionist models leverage the universal approximation property of neural networks to learn a suitable functional transformation under the framework of a classical method to achieve a better convergence. Some of these techniques have been successfully applied in designing modelaware deep receivers to carry out symbol detection and recovery in (singleinput singleoutput) SISO as well as MIMO systems
[20, 23, 6, 19]. Another crucial problem that has been investigated within this framework is that of power allocation in wireless networks through interference management [26, 13, 5, 24].Optimal power allocation in wireless networks, especially under fastchanging channel conditions, is a challenging task that requires solving a nonconvex constrained sumrate optimization problem. The classical solution approach is to reformulate it as a weighted minimum mean square error [22] (WMMSE) optimization problem. The WMMSE objective is triconvex, i.e., convex in each of its three variables when the others are fixed and, therefore, more tractable than the original problem. However, its performance is limited by the cumbersome iterative steps which tend to be computationally expensive (quadratic in the number of users) and time consuming. This is especially detrimental for fastchanging channels. UWMMSESISO [3] addressed this drawback for a singleantenna adhoc interference network by truncating WMMSE iterations and embedding a graph neural network (GNN) based learnable component within the structure to compensate for the truncation. However, this method in its original form cannot handle MIMO systems. On the other hand, WMMSEPGD [18] and IAIDNN [9] solve the same problem using distinct unfolding schemes that seek to reduce computational complexity by replacing expensive matrix operations using learnable components for MISO and MIMO systems respectively. Nevertheless, neither WMMSEPGD nor IAIDNN preserve the original blockcoordinatedescent (BCD) update structure that UWMMSESISO captures. Further, the use of GNNs allows it to leverage the underlying topology of wireless network graphs and confers the architecture with permutation equivariance, thereby enhancing generalizability. In this work, we design an unfolding scheme for WMMSE to make it amenable to MIMO systems.
Contribution.
The contributions of this paper are twofold:
i) We propose UWMMSEMIMO architecture for power allocation in multiantenna interference scenarios.
ii) We empirically evaluate the performance of the proposed method in comparison to WMMSE, demonstrate its generalization to networks of unseen sizes, and illustrate its robustness with respect to varying channel distributions.
Ii System Model and Problem Formulation
Consider a singlehop adhoc interference network with distinct transmitterreceiver pairs. Each transmitter is equipped with antennas while the number of receiver antennas is . Let denote the beamformer that transmitter uses to transmit a signal to its corresponding receiver . Assuming a linear channel model, the received signal , at is given by
(1) 
where represents the channel from transmitter to receiver , while
denotes additive white Gaussian noise sampled from a standard normal distribution
. Finally, the transmitted signal is estimated at
using a receiver beamformer , to obtain for all .The performance of a wireless network, such as the one described above, is measured in terms of certain utilities of the system. For instance, the capacity of the network is defined as a weighted combination of the instantaneous data rate achievable at each receiver , given by Shannon’s theorem as a function of the signaltointerferenceplusnoiseratio. More precisely, if we define the channel state information (CSI) tensor such that and the transmitter beamformer tensor such that , we have that for every user its rate is given by
(2) 
Hence, our objective is to determine the transmit beamformer that maximizes the network capacity under a given power constraint
(3)  
Henceforth, for simplicity we focus on the case where every user is given the same weight in the objective. However, the optimization problem in (3) is nonconvex and has been shown to be NPhard [16, 8]. The standard approach to solve this problem is to reformulate it as a constrained WMMSE optimization problem [22]. Specifically, introducing the tensors and we have that
(4)  
where is a weight matrix at receiver while is the mean square error between transmitted and received signals, which depends on and .
The optimization problem (4) is equivalent to (3) as it can be shown [22, Thm. 3] that the variable in the global optimal solution of the former is the same as the optimal transmit beamformer in the latter. Moreover, the problem in (4) is triconvex, i.e., the objective is convex in each of the three variables when the other two are fixed. This makes (4) amenable to a BCD solution, which we leverage in Section III for our unfolded architecture. In spite of this triconvexity, WMMSE performance is limited by its cumbersome iterative steps which are computationally complex (periteration complexity scales where is the number of users) [3]. It can also be time consuming depending on the size and complexity of the wireless network, making it particularly ineffective for fastchanging channel conditions.
Iii Unfolded WMMSE for MIMO
Algorithm unfolding [15, 17, 6] refers to constructing a learnable connectionist architecture using the update structure of an iterative algorithm. The process essentially involves truncating the iterations of a classical algorithm to form a cascade of a limited number of layers each of which copies the update operations of the algorithm in addition to carrying a neural module that approximates one or more parameters of the algorithm. In this way, an unfolded algorithm synergistically combines the domain information available in the form of a structured solution method for a given problem with neuralnetworkbased modules that can be learned from data.
Recalling that our goal is to solve (3), we seek a parametric function where contains learnable parameters and approximates the solution to (3) for given CSI . In this work, we propose a neural network architecture for inspired by the unfolding of WMMSE into a set of hybrid layers each of which carries an embedded graph neural network (GNN) learning module to accelerate convergence. More specifically, by setting the matrices for every user , we have that for layers ,
(5)  
(6)  
(7)  
(8) 
where updates (6)(8) are repeated for every user , in (5) is a GNN architecture with a set of trainable weights , is a transformation to the CSI to be described in (9), and in (8) is a nonlinear activation to be detailed in (10). The output of our layered architecture is then given by such that for all . A schematic view of the variable dependence of UWMMSE is given in Fig. 1.
The first notable point about the UWMMSE equations in (5)(8) is that if we ignore (5) and instead set and for every layer , then (6)(8) boil down to the classical BCD closedform updates of WMMSE. However, by providing the additional flexibility to UWMMSE of learning and in (5) – which are implemented as weights of an affine transform in (7) –, we enable faster convergence and better performance compared with the classical WMMSE, as we illustrate in Section IV.
The choice of the specific GNN in (5) and the size of the trainable weights are arbitrary and can be made depending on the nature of the problem. Independent of the choice of architecture, we treat the channel matrix as a weighted adjacency matrix of a directed graph which is used to aggregate information from the neighboring nodes [21]. In the multiantenna setting that we consider here, the channel between a transmitter and a receiver is described by an matrix that depends on the the number of transmitter and receiver antennas. However, since GNN requires the CSI between and to be represented by a scalar [11], we propose the use of a singlelayered depthwise convolution operation with shared filter weights to reduce to an amenable structure. Essentially, we define an additional fully connected neural layer where and
(9) 
forms the input to in (5)^{1}^{1}1In practice, we observed that a rownormalization operation on prior to inputting it in (5) had the effect of stabilizing training and thereby improving the overall performance. See footnote 2 for access to implementation code.. Figure 2 illustrates the operational structure of where we first reshape the last two dimensions in into a single dimensions of size and then apply a convolution. Indeed, this operation can be interpreted as a learnable weighted combination of the antenna coefficients for each channel matrix to generate a scalar representation for the channel. It is important to note that the combining set of weights are identical for all channel elements and learned during training. This is appropriate as all channel representations must have identical functional mapping from their respective antenna coefficients in a way that is analogous to shared convolutions of image pixels. Additionally, having a shared filter kernel allows for an reduction in the number of learnable weights.
The proposed architecture has very few trainable weights making it easy to train, and likely to generalize as illustrated in the numerical examples (see Fig 4). The number of weights in each of the two is , where is the number of hidden layers; Further, the linear layer has parameters, where and represent number of receive and transmit antennas. The total number of trainable weights is thus , and is independent of the number of users .
We now explain the nonlinear function in (8). To incorporate the power constraint in the optimization problem, the proposed UWMMSE uses a saturation nonlinearity in its unfolded layers, ensuring that individual layer outputs obey the power constraint. For the multiantenna setup that we consider here, this involves constraining the transmit beamformer such that all its elements satisfy . To attain this, the activation applied to an arbitrary matrix is defined as
(10) 
where denotes the Frobenius norm. A nonlinear mapping of this form was used in the PGD based beamforming strategy of [18] as the projection step.
Having explained the inner workings of the proposed UWMMSE for a given set of parameters we now shift focus to the training of the architecture. For fixed , the transmitter beamforming for CSI is given by and results in a sumrate utility of ; [cf. (3) for
]. Hence, we define the loss function
(11) 
where is the channel state distribution of interest. Even if is known, minimizing with respect to is a nonconvex problem. However, notice that in (5) and in (9) are differentiable with respect to and , respectively. Thus, given samples of drawn from , we seek to minimize (11
) through stochastic gradient descent. In this sense, we can claim that UWMMSE is an unsupervised method since it requires access to samples of the channel state tensors
but does not require access to the optimal beamformers (labels) associated with those channels. Further notice that a single pair of GNNs is shared by all unfolding layers, i.e., and do not depend on the layer index . Such a scheme ensures that and are trained using gradient feedback that accumulates across layers and is a function of the overall optimization process. This arrangement creates an implicit memory through which the sequential updates of the WMMSE parameters are captured in the GNN. Moreover, a formulation of this form decouples training and inference architectures and thereby allows for flexibility in adding or removing unfolded layers at deployment. Another immediate advantage is an reduction in the number of trainable weights with respect to the layerdependent alternative, making the training process less computationally expensive and less time consuming.Remark (Application to SISO systems).
Although designed for the general MIMO setting, the algorithm proposed here can be seamlessly used for power allocation in SISO wireless networks. In such a setting, there would be no need for the neural network in (9) since when the CSI is already in matrix form, thus reducing the trainable parameters only to . Similarly, intermediate variables are reduced to scalars, but the update equations in (6)(8) are still valid (and computationally less demanding). In this sense, for the SISO case, our unfolding UWMMSE highly resembles the unfolding scheme presented in [3]. However, in [3] each layer of the model has its own independent set of GNNs resulting in growing complexity with increasing number of layers. More importantly, a model of this form requires the number of layers to be fixed at both training and inference making it inflexible at deployment. In this respect, our more general UWMMSE framework for MIMO still presents advantages with respect to existing works even if we focus on the particular SISO setting.
Iv Numerical Experiments
In this section, we illustrate the performance of the proposed model in allocating power to multiantenna adhoc wireless networks operating under various fading conditions and topologies. A detailed description of the datasets is provided in Section IVA. In Section IVB, we compare our model performance with WMMSE and its truncated version, in terms of achieved sumrate and allocation time. Further, in Section IVC we investigate robustness of our proposed model to variations in network size and channel conditions. Our proposed architecture is composed of unfolded WMMSE layers with two layered GCNs in each unfolded layer modeling the function in (5). The hidden layer dimension of all GCNs is set to . A maximum of training iterations are performed with early stopping. Batch size is fixed at and learning rate is set to . We run test iterations with the same batch size. All computations were performed on an Intel Core i79850H CPU with 16GB RAM and an Nvidia Quadro T2000 GPU.
Iva Datasets
We use randomly generated channel realizations to evaluate the model performance.
Similar to [27, 26, 14], we choose the following random channel models:
Rayleigh: For each channel matrix corresponding to the transceiver pair , we generate Rayleigh channel coefficients independently for all antenna pairs using real and imaginary components sampled from a standard normal distribution.
Rician: For each channel matrix corresponding to the transceiver pair , we generate Rician channel coefficients with dB factor [28] independently for all antenna pairs using real and imaginary components sampled from a normal distribution.
Geometric: A geometric channel model has a composite structure with path loss and Rayleigh fading components. To that end, we construct a random geometric graph in two dimensions having transceiver pairs. All transmitters and receivers are dropped uniformly at random at location and . Path loss between transmitter and receiver is then computed as a function of their corresponding physical distance . Incorporating Rayleigh fading, elements of the channel matrix are given by
IvB Performance comparison
We compare the performance attained by our method with that of established baselines in the lownoise regime [ in (II)] where the achievable capacity depends strongly on the interference between users making the power allocation process more challenging [3]. We choose the following baselines for comparison:

WMMSE [22] forms the baseline for our experiments. We set a maximum of iterations per sample.

Truncated WMMSE (TrWMMSE) provides a performance lower bound to UWMMSE. We fix the number of iterations to to match UWMMSE unrollings.
The comparisons are shown in Figure 3. Since the channel realizations are sampled randomly, we observe significant variation in the utility values for individual samples under optimal power allocation.
The comparison shows that the performance of our method exceeds that of WMMSE, on average, for all three channel models under various antenna configurations. Moreover, there is a significant gap in achieved mean sumrate between UWMMSE and TrWMMSE with the same number of iterations as unfolded layers in all cases, proving the effectiveness of a learningbased module in a hybrid method of this form. The superiority of our method can be further illustrated using the inference time comparisons. While WMMSE requires on average to converge, per sample, our method has an average inference time of effectively providing a speedup. Time taken by TrWMMSE is , however, the achieved performance is poor. Our method, therefore, exceeds the performance of WMMSE with a full set of iterations under a time complexity that is comparable to its truncated version.
IvC Robustness
In this section, we investigate the robustness of our proposed method to variations in channel distributions and wireless network size. These are realistic scenarios since the channel distribution depends largely on geographical and climatic conditions as well as the community structure (rural, urban, suburban) in a given area. Further, an adhoc wireless network may undergo modifications in size as new users join the network and old users leave. It is imperative that the power allocation method be able to maintain a steady performance in spite of these variations. To this end, we compare the performances of two versions of our model, one of which is trained on Rayleigh and the other is trained on Rician channel realizations. These comparisons are performed independently on a Rayleigh and a Rician test set. Figures 4 and 4 show that both models are effective in achieving comparable performance for outofdistribution data points. This observation indicates that our model can endure changes in channel conditions without significant performance degradation. In Figure 4
, we consider a slightly more challenging scenario wherein, under a Geometric channel setting, the network size varies within a fixed range. To evaluate the interpolation behaviour of our model, we sample only oddnumber network sizes in training while the test set comprises of all integer values in the specified range. In this case, the Rayleigh and Rician versions of UWMMSE fail to maintain a steady performance across the range of sizes, doing slightly better for a few smaller sizes but degrading fast as the network grows. However, UWMMSE trained on Geometric channels is able to maintain a constant gain over WMMSE at all sizes. This superiority can be attributed to the GNN learning module which is regularized by the inherent graph structure of Geometric channels. Since realworld channel models are more aligned toward the Geometric setup composed of path loss and fading, our model has a natural advantage in dealing with more challenging scenarios that will be considered in future versions of this work.
V Conclusion
We proposed a MIMO version of UWMMSE, a hybrid learnable power allocation algorithm for wireless networks. The main contribution of this method is the use of GNNbased learning modules that leverage the connectivity structure of the transceivers to learn a mapping for faster parameter updates of an unfolded iterative algorithm tuned to approximate a global solution of the NPhard power allocation problem. It is this hybrid architecture that captures on the one hand the dynamic structure of the iterative solution while on the other hand approximates the cumbersome computational steps with learnable weights and drives the model towards faster convergence. In the future, we will go beyond synthetic data and validate empirically the effectiveness of our proposed method on realworld datasets.
References
 [1] (2018) The US army in multidomain operations 2028. TRADOC Pamphlet 525, pp. 3–1. Cited by: §I.
 [2] (2019) Massive MIMO is a reality—what is next?: five promising research directions for antenna arrays. Digital Signal Processing 94, pp. 3–20. Cited by: §I.
 [3] (2021) Unfolding WMMSE using Graph Neural Networks for Efficient Power Allocation. twc (), pp. . External Links: Document Cited by: §I, §I, §II, §IVB, Remark.
 [4] (8 Oct 2020) DoD announces $600 million for 5G experimentation and testing at five installations. online. Cited by: §I.
 [5] (2019) Learning optimal resource allocations in wireless systems. tsp 67 (10), pp. 2775–2790. Cited by: §I.
 [6] (2020) Datadriven symbol detection via modelbased machine learning. arXiv preprint arXiv:2002.07806. Cited by: §I, §III.
 [7] (2019) Exploiting high millimeter wave bands for military communications, applications and design. IEEE Access. Cited by: §I.
 [8] (2014) Chapter 8  Signal Processing and Optimal Resource Allocation for the Interference Channel. In Academic Press Library in Signal Processing: Volume 2, N. D. Sidiropoulos, F. Gini, R. Chellappa, and S. Theodoridis (Eds.), Academic Press Library in Signal Processing, Vol. 2, pp. 409 – 469. Cited by: §II.
 [9] (2020) Iterative algorithm induced deepunfolding neural networks: precoding design for multiuser mimo systems. IEEE Transactions on Wireless Communications 20 (2), pp. 1394–1410. Cited by: §I, §I.
 [10] (2019) Massive MIMO for tactical adhoc networks in RF contested environments. In IEEE Military Communications Conference (MILCOM), pp. 658–663. Cited by: §I.
 [11] (2017) Semisupervised classification with graph convolutional networks. In iclr, Cited by: §III.
 [12] (2021) Adaptive contention window design using deep qlearning. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4950–4954. Cited by: §I.

[13]
(2018)
Deep power control: transmit power control scheme based on convolutional neural network
. coml 22 (6), pp. 1276–1279. Cited by: §I.  [14] (2019) Towards optimal power control via ensembling deep neural networks. tcom 68 (3), pp. 1760–1776. Cited by: §IVA.
 [15] (2019) Deep proximal unrolling: algorithmic framework, convergence analysis and applications. tip 28 (10), pp. 5013–5026. Cited by: §III.
 [16] (2008) Dynamic Spectrum Management: Complexity and Duality. jstsp 2 (1), pp. 57–73. Cited by: §II.

[17]
(2019)
Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing
. arXiv preprint arXiv:1912.10557. Cited by: §I, §III.  [18] (2020) Deep unfolding of the weighted MMSE beamforming algorithm. arXiv preprint arXiv:2006.08448. Cited by: §I, §I, §III.
 [19] (2020) REmimo: recurrent and permutation equivariant neural mimo detection. IEEE Transactions on Signal Processing 69, pp. 459–473. Cited by: §I.
 [20] (2019) Learning to detect. tsp 67 (10), pp. 2554–2564. Cited by: §I.
 [21] (2017) Optimal graphfilter design and applications to distributed linear network operators. tsp 65 (15), pp. 4117–4131. Cited by: §III.
 [22] (2011) An iteratively weighted MMSE approach to distributed sumutility maximization for a MIMO interfering broadcast channel. tsp 59 (9), pp. 4331–4340. Cited by: §I, §II, §II, item 1.
 [23] (2020) ViterbiNet: a deep learning based Viterbi algorithm for symbol detection. twc 19 (5), pp. 3319–3331. Cited by: §I.
 [24] (2020) DeepSIC: deep soft interference cancellation for multiuser mimo detection. IEEE Transactions on Wireless Communications 20 (2), pp. 1349–1362. Cited by: §I.
 [25] (2020) Modelbased deep learning. arXiv preprint arXiv:2012.08405. Cited by: §I.
 [26] (2018) Learning to optimize: training deep neural networks for interference management. tsp 66 (20), pp. 5438–5453. Cited by: §I, §IVA.
 [27] (2021) Learning to continuously optimize wireless resource in episodically dynamic environment. In icassp, pp. 4945–4949. Cited by: §IVA.
 [28] (2005) Fundamentals of wireless communication. Cambridge university press. Cited by: §IVA.
 [29] (2021) Distributed scheduling using graph neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4720–4724. Cited by: §I.
Comments
There are no comments yet.