Skip to content

Issues with Bandwidth Selection and General Advice #162

@jagreen1

Description

@jagreen1

I am seeking advice on two recurring issues that arise during bandwidth selection via ‘bw_Sel’, as well as some general advice.

To provide context, I have been working on a project looking at vulnerability and natural hazard events. I seek to develop two Poisson GWR models to understand count data over a time period, one for total fatalities and one for total damage.

1 ) Should the independent variables be standardized? I have tested both the raw and standardized independent variables. Sometimes this improves the model and sometimes it makes it worse. The example in the notebook standardizes the data, but that is for a Guassian regression. If the values are already normalized (percentages from 0 to 1), should they still be standardized?

2 ) While a Poisson regression is meant for whole number count data, would it be appropriate to use the log of the dependent variable? Really log1p to keep the values positive. I have tested both, and sometimes it improves the performance and sometimes it makes it worse. Can using float values break the poisson regression calculations?

3 ) When using a poisson model, during bandwidth selection I frequently get the error
array must not contain infs or NaNs

Subsequently, mgwr is unable to complete the gwr regression. However, I explicitly remove any potential nan/inf/null values before bandwidth selection. Why might this be occurring?

Could it be due to numerous zeros in the dependent variable? I understand that a Negative Binomial regression or Zero-Inflated regression may be more appropriate in this situation but given that these are not yet implemented in mgwr, I will have to stick with poisson.

4 ) I additionally frequently experience the below error during bandwidth selection. I again do not understand why this is occurring.
UnboundLocalError: cannot access local variable 'influ' where it is not associated with a value

5 ) Lastly, I am confused why the spglm and mgwr global outputs differ in value. Using the same inputs, the spglm GLM global poisson model has different D2 values than that of the mgwr global poisson regression. My understanding is that mgwr uses spglm under the hood. Why might these values be drastically different? Note that I am only talking about the global regressions (i.e. without the spatial component).

Any help is greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions