-
Notifications
You must be signed in to change notification settings - Fork 134
Description
I am seeking advice on two recurring issues that arise during bandwidth selection via ‘bw_Sel’, as well as some general advice.
To provide context, I have been working on a project looking at vulnerability and natural hazard events. I seek to develop two Poisson GWR models to understand count data over a time period, one for total fatalities and one for total damage.
1 ) Should the independent variables be standardized? I have tested both the raw and standardized independent variables. Sometimes this improves the model and sometimes it makes it worse. The example in the notebook standardizes the data, but that is for a Guassian regression. If the values are already normalized (percentages from 0 to 1), should they still be standardized?
2 ) While a Poisson regression is meant for whole number count data, would it be appropriate to use the log of the dependent variable? Really log1p to keep the values positive. I have tested both, and sometimes it improves the performance and sometimes it makes it worse. Can using float values break the poisson regression calculations?
3 ) When using a poisson model, during bandwidth selection I frequently get the error
array must not contain infs or NaNs
Subsequently, mgwr is unable to complete the gwr regression. However, I explicitly remove any potential nan/inf/null values before bandwidth selection. Why might this be occurring?
Could it be due to numerous zeros in the dependent variable? I understand that a Negative Binomial regression or Zero-Inflated regression may be more appropriate in this situation but given that these are not yet implemented in mgwr, I will have to stick with poisson.
4 ) I additionally frequently experience the below error during bandwidth selection. I again do not understand why this is occurring.
UnboundLocalError: cannot access local variable 'influ' where it is not associated with a value
5 ) Lastly, I am confused why the spglm and mgwr global outputs differ in value. Using the same inputs, the spglm GLM global poisson model has different D2 values than that of the mgwr global poisson regression. My understanding is that mgwr uses spglm under the hood. Why might these values be drastically different? Note that I am only talking about the global regressions (i.e. without the spatial component).
Any help is greatly appreciated.