Review 1
Relevance and      Technical Content and                                         Quality of
                                                Novelty and Originality
 Timeliness         Scientific Rigour                                             Presentation
                    Marginal work and           Some interesting ideas and
                                                                                  Substantial revision
 Good. (4)          simple contribution.        results on a subject well
                                                                                  work is needed. (2)
                    Some flaws. (2)             investigated. (3)
Strong Aspects (Comments to the author: What are the strong aspects of the paper?)
       Exploring the impact of lossy compression in the trade-off between communication efficiency and
       convergence speed for compression-aided Federated Learning schemes is a relevant and timely
       problem.
Weak Aspects (Comments to the author: What are the weak aspects of the paper?)
      -Some aspects of the proposed approach call for stronger justification or additional explanations.
      -The evaluation results are less than comprehensive. -The presentation of the material needs
      improvement. More details on these issues are provided in the next section of the review.
Recommended Changes (Recommended changes. Please indicate any changes that should be
made to the paper if accepted.)
      Some parts of the proposed approach are obscure and require further justification and/or
      elaboration. In particular: -The paper employs Lemma 1, through eq. (8), to determine the number
      K^{SE} of participating devices that minimizes the number of communication rounds required for
      convergence and states that this problem has complexity linear in the total number of devices K.
      However, (in view of the factor \chi and the last sum in term B, within (6)) this is valid only for
      symmetrical environments where all devices employ the same loss function and exhibit the same
      distance between local and global loss. -The paper assumes that the communication between
      participating devices and the BS employ a number S of subchannels and that each subchannel
      may be used by only one of the participating devices. This arrangement imposes, as a
      consequence of eqs. (11d-f), the strong limitation that the number of participating devices K^{SE}
      cannot exceed the number of subchannels S. Why is such a restriction necessary? Can't the
      participating devices share the available spectral resources by taking turns in the use of
      subchannels when appropriate? Also, assuming that the policy of a single device per subchannel
      is in force, why is the transmission time for a communication round expressed as a sum (in the rhs
      of (10)) rather than a max operation? -The appropriateness of employing the coalition game
      requires some further justification. Given that the participating devices have already been
      determined, can't the search for the solution proceed in a more oriented way (e.g., by prioritizing
      on the basis of channel gains or transmission power)? Also, the remark on the computational
      complexity of the game is incomplete, as no bound is provided for the number of required
      iterations. The discussion on the evaluation results omits several important system-related
      characteristics. For example, no details are given about the Federated learning environment (e.g.,
      characteristics of the local loss functions). The impact of compression is explored in connection
      with the convergence rate, but not the communication efficiency (which provided the primary
      motivation for introducing compression). Also, there is some inconsistency across results: While
      the results about the convergence rate dictate that the optimal number K^{SE} of participating
      devices should be about 40, this value is not feasible in the setup employed for studying the
       transmission time optimization (the number of available subchannels is kept below 30). The
       presentation can be improved in several respects: -The manuscript needs a check for consistency
       and the correct use of notation. For example, equations like (1), (2) and (10) should have their
       sums span the range from 1 to K, not K^{SE}. -The meaning of symbols should be introduced
       close to their first occurrence. (The meaning of quantity E is defined at the end of Section II-B,
       almost half a page after its first encounter.) -While it is legitimate to omit the proof of Lemma 1
       due to space restrictions, some insight should be provided about the nature of the result and how
       it relates to earlier related results (e.g., those in [17] for an uncompressed setting). -The
       manuscript should be checked for the correctness of factual claims therein. For example, it is
       stated (more than once) that "each IoT device can only occupy one sub-channel", which is
       different than the intended meaning (that each subchannel can be used by only one IoT device).
Review 2
 Relevance and      Technical Content and
                                              Novelty and Originality           Quality of Presentation
 Timeliness         Scientific Rigour
                                              Some interesting ideas and        Readable, but revision is
                    Valid work but limited
 Good. (4)                                    results on a subject well         needed in some
                    contribution. (3)
                                              investigated. (3)                 parts. (3)
Strong Aspects (Comments to the author: What are the strong aspects of the paper?)
       This paper studied a resource allocation scheme for compression-aided federated learning. Both
       energy efficiency and federated learning performance are jointly optimized.
Weak Aspects (Comments to the author: What are the weak aspects of the paper?)
      The formulated original problem is decomposed into two sub problems, and the proposed
      algorithm is designed. It needs to clarify how the proposed algorithm can find the most suitable
      solution of the original problem. The simulation results can also show the optimality gap when
      using the proposed algorithm.
Recommended Changes (Recommended changes. Please indicate any changes that should be
made to the paper if accepted.)
       -It is mentioned that the proposed wireless network meets the 3GPP standard. However this
       paper does not explain how/why this paper's network meets which clause(s) of TS38.300. -This
       paper is mainly considering IoT networks. Please explain why this paper assumes to run the FL on
       the IoT devices. IoT devices defined in the current wireless standards may not have enough
       computing capability. -It is assumed that all data are IID. However the environment nearby could
       be have some similarity. Moreover IoT devices are redundantly deployed and therefore their
       sensory data can be correlated. How does the IID assumption always hold in the proposed system
       model? -Please explain how the lossy model is implemented in the simulations. When the
       proposed scheme is used, the simulation results can additionally show the system performance
       with respect to the different degrees of error (between the original and decompressed models). -
       It would be required to clarify why a certain number of devices are required/helpful to ensure the
       FL convergence.
Review 3
 Relevance and     Technical Content and                                        Quality of
                                               Novelty and Originality
 Timeliness        Scientific Rigour                                            Presentation
                   Marginal work and           Some interesting ideas and       Readable, but revision
 Good. (4)         simple contribution.        results on a subject well        is needed in some
                   Some flaws. (2)             investigated. (3)                parts. (3)
Strong Aspects (Comments to the author: What are the strong aspects of the paper?)
       - The paper provides the theoretical analysis of the proposed algorithm. - The topic of the paper
       is timely.
Weak Aspects (Comments to the author: What are the weak aspects of the paper?)
      - Not rigorous communication model and some strong assumptions - Questionable problem
      formulation - It is not clearly shown that the proposed method thoroughly considers both
      communication model and model compression in FL - Lack of performance evaluation and
      comparisons
Recommended Changes (Recommended changes. Please indicate any changes that should be
made to the paper if accepted.)
      1. The assumption of iid data is too strong. 2. The communication model is not rigorous. The
      authors assumed that the proposed wireless network meets the 3GPP standard. However, it is not
      clear that the communication model fits into the 3GPP standard because of the gap between
      communication rounds in FL and timeslots in the wireless network. 3. There is no consideration of
      local training time at devices and lagged data samples at unselected devices. 4. Why is minimizing
      the transmission time so important? 5. It is questionable that the proposed algorithm is closely
      related to the model compression since it does not jointly consider K^{SE} and r_t, \Theta_t. 6. The
      performance evaluation is lacking. There is no comparison against state-of-the-art methods.