Hi authors, could you please help elaborate on the definition of world, camera and human coordinate frame (and the rendering view) used in the paper and code? I think this info will be really helpful for readers to understand and apply the method.
For me, I am really confused trying to understand the transformations by reading the code. Thank you so much for your help!