Skip to content

Results from link phase change between runs #601

@spentelow

Description

@spentelow

Describe the bug
The output data produced by the link phase change each time the model is run.

To Reproduce
Steps to reproduce the behavior:

  1. Follow quick start instructions:
docker pull zingg/zingg:0.3.4
docker run -it zingg/zingg:0.3.4 bash
  1. Change matchType in examples/febrl/configLink.json from 'exact' to 'fuzzy' (resolves issue with this example in version 0.3.4 realted to Issue 427)

  2. Run the 'febrl' model in link mode

./scripts/zingg.sh --phase link --conf examples/febrl/configLink.json
  1. Examine output files (/tmp/zinggOutput)
  2. Re-run steps 3. and 4. (without making changes to configuration or input files) and observe different results . Results differ in the number of output rows, the subset of input datasets included in the output, and the z_score values.

Expected behavior
My expectation is that sequential runs without config or input file changes would produce identical results (except, perhaps, in z_cluster labels).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

Relationships

None yet

Development

No branches or pull requests

Issue actions