Connection Throughput Maximization for Grant-Based NOMA Massive IoT with Graph Matching

We propose a framework for maximizing the number of machine-type devices connected in the uplink of a Narrow-band Internet of Things (NB-IoT) network using non-orthogonal multiple access (NOMA). The system is based on the fast-uplink grant (FUG), where the base station (BS) schedules the access for active devices requesting connection. This problem is a mixed-integer non-convex problem and real-time solutions using general solvers are computationally prohibitive. The proposed scheduling solution comprises efficient device clustering and optimum power allocation using a bipartite graph matching approach, termed connection throughput maximizing full matching with pruning (CTMBM). Different from the other solutions of state-of-the-art, our proposed scheme considers scheduling over multiple transmission time intervals while considering the transmission deadlines and quality of service (QoS) for the devices. Additionally, we provide a method for priority scheduling of a subset of devices. We compare our solution to the state-of-the-art schemes and analyze the achieved gains through Monte-Carlo computer simulations.


I. INTRODUCTION
The trends in recent network growth suggest a paradigm shift in the traffic demands, as they move dominantly towards machine-to-machine (M2M) communication supported over legacy LTE technology [1].The distinguishing feature of M2M devices is minimal human interaction and sporadic short packet transmission.Massive machine type communication (mMTC), coined by 3GPP (3rd Generation Partnership Program), is expected to be adopted for industrial IoT (IIoT) use cases and beyond-5G (B5G) systems [2] which will be driven by the proliferation of M2M deployments.Consequently, there is a compelling need to develop systems that can support a massive number of connected devices using limited radio spectrum and computational capabilities.A common feature among all the deployments is predominantly uplink transmission where low-power transmission is desirable.As the competition for radio resources between coexisting technologies increases, the importance of efficient power control and device scheduling becomes increasingly critical.
Non-orthogonal multiple access (NOMA) is a promising solution to alleviate the need for additional spectrum for supporting massive access [3].NOMA operates through the superposition of signals from multiple transmitters on a given time-frequency resource by using successive interference cancellation at the base station (BS).Several studies propose grantfree solutions for NOMA-based mMTC systems, where the detection at the BS is carried out using compressive sensingbased solutions founded on the sparse activity assumption.Typical approaches in these kinds of solutions include approximate message passing-based user-activity detection and channel estimation [4].However, these solutions are usually computationally complex, sometimes requiring customized machine-learning models for implementation.Additionally, as the density of devices increases, the sparsity assumption may not hold which is crucial for compressive sensing-based approaches.It is important to note that uncoordinated random access leads to excessive collisions among device packets and subsequently increases the latency.As such, it is difficult to guarantee the quality of service (QoS) to devices.On the other hand, fully coordinated schemes will result in intensive signaling overheads and may not be useful for short packets.Therefore, some solutions like [5] become less effective due to excessive scheduling overheads.
To support an expanding category of low mobility devices such as smart grids, smart homes, and environment monitoring, 3GPP has proposed fast-uplink grant [6] as a practical compromise between signaling overheads and access latency.In a fast-uplink grant scheme, devices do not send random access (RA) scheduling requests.Instead, the BS will actively allocate uplink resources to those devices.Moreover, in contrast to uncoordinated transmission, devices are scheduled by the BS, and hence collisions can be avoided.Recently, there has been a growing interest in leveraging fast-uplink grants to support mMTC.The authors in [7] propose a fastuplink grant method that additionally allows for NOMA-based user-pairing, outperforming the scheme in [8].However, this scheme relies on a source traffic predictor along with the use of a probabilistic sleeping multi-armed bandit, which is computationally prohibitive for high device density scenarios.
Motivated by the current state-of-the-art solutions, we propose a bi-partite graph-matching framework that performs joint user-pairing and power allocation to users, utilizing NOMA to enhance the uplink connection throughput, which is the total number of devices in the system that achieve their target data rate under the power budget, transmission deadline and successive interference cancellation (SIC) constraint.In our previous work [9], we addressed the problem of connectivity maximization without any deadlines in the downlink for NB-IoT networks using a similar graph-matching approach with the key distinction of per-PRB power budget instead of the per-device power budget which is important in the uplink.We believe that the metric of user connection throughput is much more relevant for the presented system model as opposed to other metrics like sum-data rate and user access delay as studied in the literature.Note that our model already takes into account the user deadlines and scheduling priority.The proposed framework enhances connectivity, executed at the BS to schedule devices, and allocates them power using the fastuplink grant mechanism.The key contributions of this work are as follows: • We propose a graph matching-based technique called CTMBM, for user association and power allocation.Unlike the work in [5], our framework is not limited to 2 devices per time-frequency resource and allows the scheduling of devices with constraints of deadlines.• Furthermore, we compare CTMBM with the single-tone sub-carrier and power assignment Algorithm 1 in [5].
Since the latter algorithm is originally limited to a single time interval, we propose a greedy strategy that accommodates multiple time intervals and device deadlines.We analyze the complexity of this heuristic against the CTMBM algorithm.• Additionally, we evaluate the performance of CTMBM considering different service classes for devices, each with a different priority of transmission, and analyze the trade-off between priority scheduling and connection throughput.

II. SYSTEM DESCRIPTION
Consider a set of single antenna machine-type devices D = {1, • • • , D}, each with the same target data rate of R kbps sending data using the fast-uplink grant-based access protocol [6].We consider a single-cell system with a single antenna BS located at the center.We assume that the channel gain for device d, consisting of the path-loss and the Rayleigh fast fading, is available at the BS without error for the scheduling operation.The channel gain for device d is where h d ∼ CN (0, 1) represents the Rayleigh fast fading and PL d is the distance-dependent path-loss as specified in the cellular IoT specification [10].We chose this specific model to cover a broad range of deployment scenarios.We assume that the devices have a quasi-static channel gain for the duration of one allocation round, i.e. starting from the initiation of the access request to the transmission of the packet, which is a realistic assumption for narrow-band channel under low mobility conditions [11].
A sub-carrier is denoted as f and has a bandwidth of 3.75 kHz for the single-tone uplink operation.This gives us the set F of 48 sub-carriers in a system with 180 kHz bandwidth.The resource allocation grid in the present work consists of these 48 sub-carriers and extends across 10 frames, the set of which is denoted as T .Each frame t has a duration of 10 ms in the time domain, following LTE frame definition [12].We consider single-tone allocation, meaning each user equipment (UE) gets assigned only one frequency sub-carrier f from the set of subcarriers F. We consider that each user is assigned a resource grant (RG), characterized by the tuple (t, f ), consisting of one sub-carrier and extending for one frame.

A. Signaling Characterization for NOMA
We employ power domain NOMA for user multiplexing in the uplink.The index of the device receiving the m-th packet on sub-carrier f at time t is denoted by x t,f (m).In other words, the first decoded packet is x t,f (1), followed by x t,f (2), etc.We define X t,f ≜ {x t,f (1), . . ., x t,f (m)} as the set of devices allocated to the RG corresponding to sub-carrier f at time t.The cardinality of this set is denoted by |X t,f | and must satisfy |X t,f | ≤ M , due to the system SIC constraint, i.e., one RG cannot support more than M superposed devices.In a practical implementation, X t,f is represented by a list sorted in the SIC decoding order so that accessing any element x t,f (m) from its decoding order m can be done in constant time.
We choose the SIC decoding order on RG (t, f ) in the decreasing order of the users' received power [13], i.e. the strongest received signal is decoded first, as this guarantees minimum power consumption for the users.Therefore we have the following relation among users superimposed on RG (t, f ): The signal-to-interference-plus-noise ratio (SINR) for the mth decoded device on RG (t, f ) can be expressed as: where p x t,f (m) is the transmit power for device x t,f (m), and I t,f,m is the interference caused by other devices on the same sub-carrier given by: N = N 0 BF is the additive white Gaussian noise for the devices where N 0 is the noise spectral density, B is the subcarrier bandwidth and F is the noise figure.

B. System Constraints
We recall again the system SIC constraint specified in the description of the NOMA signaling in II-A such that each RG can support a superposition for at most M devices.Additionally, each device in the system must obey the following constraints.Firstly, the uplink transmit power of each scheduled device must be under the device's power budget: Here, P is the power budget of the device, as specified by the 3GPP cellular IoT standard [10] and subsequently elaborated in Section V.The data rate for device x t,f (m) is expressed as: Each device must achieve a minimum data rate of R kbps, therefore the device x t,f (m) is considered to be connected if: Note that R is the instantaneous rate achieved by devices by transmitting their packets on the assigned resource grant.We maintain this definition for the present work to make a consistent comparison with the state-of-the-art techniques [5].However, in practice, we are concerned with the total number of packets transmitted by the device over the scheduled resource grant rather than the instantaneous rate.Therefore, we assume that each device has a fixed number of bits to transmit during our system simulations.
The power required to achieve this rate R can be obtained from ( 4) and (2) as: where Γ = 2 R B − 1 is the target SINR to achieve the target rate of R kbps.Additionally, the grant-based devices must respect the delay requirements for data transmission.As a result of this, the packet of device d must be received before its deadline t d .It is expected that the deadlines for devices in B5G networks may be a few tens of milliseconds even though the devices have limited mobility and relatively static fading.Therefore even within the current resource allocation grid, all RGs may not be usable by all devices.

III. PROBLEM FORMULATION
We define the following function Z(•, •) that represents the system's connection throughput, which is the number of devices successfully connected under their QoS and delay requirements: Here, 1 is the indicator function that takes value 1 if the rate for device x t,f (m) is greater than or equal to the target service rate R, and the device follows its delay constraints.Thus, Z(p x t,f , X t,f ) embeds both the QoS constraint (5) and the delay constraint.Thus, the uplink connection throughput maximization problem can now be formulated as follows: The objective function in (??) maximizes the total number of devices in the system that satisfies their QoS and deadline requirement.Constraint C1 signifies the maximum transmit power for each device as defined in (3).Constraint C2 stands for the system limit that supports at most M devices superimposed per RG.Constraint C3 enforces that each device is allocated at most one sub-carrier, which is aligned to the singletone uplink operation [10] for supporting massive connectivity.We can readily verify that this problem is a mixed integer problem due to the binary nature of Z(, ., ) and non-convex due to the rate constraint (5).The problem is known to be NP hard [14] for the general case but can be solved under practical system considerations.

IV. PROPOSED FRAMEWORK
The following construction is based on the insight that the uplink transmit power for a device essentially depends on the SIC level m at which it wants to transmit and its channel gain.The transmit power for device x t,f (m) in order to achieve the target data rate R can be written as: which is obtained by iteratively evaluating ( 6) using ( 2) [15].
We shall now elaborate on the construction of bipartite graph G = (V, E, W) as shown in Figure 1, where V is the set of vertices, divided into two parts.The first part, shown on the left side of Figure 1, contains the set of devices D requesting a connection, and the second part, on the right side in Figure 1, corresponds to the resource vertices made of the set of RGs (t, f ) for all t ∈ T , f ∈ F, where each RG has up to M devices superimposed on it, one on each of the SIC levels m ∈ {1, . . ., M }.A slot is uniquely identified by the triplet (t, f, m).The set of edges is denoted by E. We put an edge between device d and slot (t, f, m) if and only if t ≤ t d and its power calculated using (7) is less than or equal to P .Due to the deadline requirement for devices, there does not necessarily exist an edge between all devices and all resource vertices, making the bipartite graph G incomplete.The set of weights is denoted by W. The weight of an edge, w d,t,f,m , is the transmit power of device d when connected to slot (t, f, m), obtained using (7) as: We now elaborate Algorithm 1 for maximizing user connection throughput in a frame.On line 1, we construct the bipartite graph G at the BS after obtaining the CSI from the contending devices, as described in the last paragraph.Then we obtain the minimum weight full matching E * of G using an optimal linear sum assignment algorithm according to [16] as shown on line More precisely, E * is a subset of E such that the sum of all edge weights is the least among all full matchings, where each device gets connected to one RG.The proposed formulation effectively considers one transmit packet per device per RB, however, it would be possible to offer differentiated services x t,f (m) ← d 5: end for 6: Derive the power vector p using Eqn.( 2) and ( 6) Output: p and X t,f for all t ∈ T , f ∈ F edge weight of device d belonging to class C i as follows: Effectively, this formulation decreases the edge weight for high-priority devices, thus incentivizing their inclusion into the full matching despite having a lower channel gain.It is evident from (9) that this formulation does not affect the weights of other devices on the same RG.However, we must note that the actual transmit power assignment for the device must still be in accordance to (7) and must be upper-capped at the power budget P due to (3).We set α n = 1.For all i ∈ {1, . . ., n−1}, the value of α i is chosen as follows: Equation ( 10) guarantees that w d,t,f,m < w d ′ ,t ′ ,f ′ ,m ′ , for any edges (d, t, f, m), (d ′ , t ′ , f ′ , m ′ ) ∈ E, where d belongs to class C i , and d ′ belongs to a lower priority class C j , j > i.In other words, device scheduling priority is guaranteed.

V. SIMULATION RESULTS
We now present the performance of the proposed algorithms, evaluated through system-level simulations.The key simulation parameters are given in Table I, which are taken from the experimental studies and standards [10], [17], [18].These parameters are suitable for the industrial wireless IoT use case.The fast fading in the system is frequency flat Rayleigh fading.We assume that perfect SIC for NOMA can be carried out by the receiver unless otherwise stated.We consider that all UEs have a single transmit antenna and that the BS uses one receive antenna.Devices are deployed randomly following a uniform distribution in a square cell of side 1000 m unless specified otherwise.All devices have the same target data rate of R kbps.The presented simulation results are averaged over 1000 independent trials.Each RB of 180 kHz bandwidth has 48 sub-carriers and spans 10 frames each of duration 10 ms.We analyze the performance in terms of the system connection throughput in a critically loaded system with 480 RGs and {480, 960, 1440} devices requesting a connection with M ∈ {1, 2, 3} devices superimposed on each sub-carrier respectively.
In Figure 2, we compare our proposed algorithm with the state-of-the-art algorithms.Here, we set |T | = 1, i.e., only onetime frame and we remove the deadline constraint t ≤ t x t,f (m) from the counting function Z so that there are not any deadlines for the devices anymore.We set M = 2 for all the algorithms so that there are at most two devices on each sub-carrier.ALG 1 represents Algorithm 1 in [5], which is a heuristic based on binary integer programming.Optimal is the linear programming formulation in [5], which is solved by using CVX [19] to obtain the optimal solution with the branch and bound method.We see that CTMBM achieves similar connection throughput as the Optimal scheme.However, Optimal and ALG 1 have a computational .Furthermore, CTMBM generalizes the connection throughput optimization problem in [5] to multiple time-frames, while also allowing for transmission deadlines and priority scheduling as shown in the results below.
Figure 3 shows the connection throughput with the variation of data rate.Effectively, we consider that the device has x bits to transmit in the 10 ms time-frame, giving it a data rate of x/10 kbps.Starting with 80 bits, we go up to 390 bits per frame, which is typical for status update packets [11], [20].We introduce a baseline solution, labeled in the figure as Greedy, by augmenting Algorithm 1 from [5] which uses binary integer programming for joint sub-carrier and power allocation, The solution is near-optimal for a single transmit time interval, but the original formulation does not take into account the users' transmission deadlines.The proposed extension is the greedy strategy Greedy, which runs sequential allocation per time interval over the subset of devices that are yet unassigned and have deadlines falling within the current time interval.We observe that CTMBM outperforms Greedy due to its flexibility in assigning the devices to the best possible RG over all the time intervals in the RB.The gain in connection throughput is 52% for M = 3 and 79% for M = 2 respectively over the Greedy strategy.CTMBM provides a gain of 149% M = 3 and 95% at M = 2 respectively over CTMBM OMA with M = 1, where we assign a single device to each RG.
In Figure 4, we show the performance of CTMBM under different deployment sizes at R = 26 kbps.We see that CTMBM achieves a superior connection throughput to OMA over a broad range of cell sizes.The connection throughput of all the algorithms naturally declines as the cell size becomes bigger due to the increase in the pathloss for distant users.However, CTMBM consistently outperforms the OMA solution of CTMBM with M = 1, connecting 125% more devices with M = 3 and 95% more devices with M = 2. Furthermore, CTMBM with M = 3 connects 59% more devices than the Greedy solution with M = 3.
We highlight the performance of CTMBM when considering 2 different service classes with M = 2 in Figure 5. Here, the plots in red and blue are priority-based CTMBM (P-CTMBM), where edge weights are assigned using (9).The target data rate is fixed as R = 26 kbps.We consider that the devices are divided into two classes, with the high-priority class labeled Class 1 and the rest of the devices labeled Class 2. Such a system may be used to support heterogeneous traffic, for example, to prioritize alarms over sensor observation.Further, we assume an overloaded scenario where there are 25% more devices than the available RGs, i.e. 1200 devices and 480 RGs.This assumption is made since assigning priorities is meaningful only when we have sufficient competing devices otherwise the total connectivity will be affected adversely by a marginal increase in the performance of high-priority devices, since the high-priority devices may have a worse channel condition than other candidate devices.As a result, the opportunity for better devices to be connected in the same RG is reduced, hence decreasing the connection throughput.In this overloaded case, P-CTMBM outperforms CTMBM.We see that with priority scheduling, we enhance the connection throughput of high-