-
Notifications
You must be signed in to change notification settings - Fork 19
Description
When using a fixed parametrization on categorical layers the output of MixingLayer is NaN for some combination of the inputs.
The issue seems to be related to the sum layer itself and particularly on the sum parameters.
By fixing the sum parameters to be unitary and picking the LSE-sum semiring, the result is wrong when both variables have zero value, but it is ok when either one of them has a non-zero.
Minimale code:
import numpy as np
import torch
import random
random.seed(0)
np.random.seed(0)
torch.manual_seed(0)
torch.cuda.manual_seed(0)
from cirkit.symbolic.layers import CategoricalLayer, MixingLayer, HadamardLayer
from cirkit.symbolic.parameters import Parameter, ConstantParameter
from cirkit.utils.scope import Scope
from cirkit.pipeline import PipelineContext
from cirkit.symbolic.circuit import Circuit
probs = lambda: Parameter.from_input(ConstantParameter(1, 1, 2, value=np.array([0.0, 1.0]).reshape(1, 1, -1)))
cl_1 = CategoricalLayer(Scope([0]), 1, 1, num_categories=2, probs=probs())
cl_2 = CategoricalLayer(Scope([1]), 1, 1, num_categories=2, probs=probs())
sum_layer = MixingLayer(1, 2)
ctx = PipelineContext(backend='torch', fold=False, optimize=False, semiring='lse-sum')
symbolic_circuit = Circuit(
1,
[cl_1, cl_2, sum_layer],
{ sum_layer: [cl_1, cl_2] },
[sum_layer]
)
circuit = ctx.compile(symbolic_circuit)
print(circuit(torch.tensor([0, 0]).reshape(1, 1, 2)))
# >>> tensor([[[nan]]], grad_fn=<TransposeBackward0>)
print(circuit(torch.tensor([0, 1]).reshape(1, 1, 2)))
# >>> tensor([[[nan]]], grad_fn=<TransposeBackward0>)
print(circuit(torch.tensor([1, 0]).reshape(1, 1, 2)))
# >>> tensor([[[0.4324]]], grad_fn=<TransposeBackward0>)
print(circuit(torch.tensor([1, 1]).reshape(1, 1, 2)))
# >>> tensor([[[0.2212]]], grad_fn=<TransposeBackward0>)changing the random seed changes the results (e.g. with random seed 1 only the first evaluation is NaN) and the same happens when fixing the sum parameters by replacing
sum_layer = MixingLayer(1, 2)with
sum_layer = MixingLayer(1, 2, weight_factory=lambda n: Parameter.from_input(ConstantParameter(*n, value=np.ones(n))))I tracked down the error and it appears to be in cirkit.backend.torch.semiring.LSESumSemiring in the method apply_reduce: when both inputs are 0s the variable xs is (tensor([[[[-inf]], [[-inf]]]]),). In line 375 the subtraction between two -inf results in an undefined operation.