Skip to content

new symbolic perturbation#1445

Merged
elalish merged 8 commits into
masterfrom
perturb1
Dec 16, 2025
Merged

new symbolic perturbation#1445
elalish merged 8 commits into
masterfrom
perturb1

Conversation

@elalish

@elalish elalish commented Dec 1, 2025

Copy link
Copy Markdown
Owner

Fixes #1430

Well, I need to add some testing first to verify this really fixes the problem. I'm also having some trouble with the CondensedMatter test, so will need to look into what's been uncovered there.

@elalish elalish self-assigned this Dec 1, 2025
@elalish

elalish commented Dec 1, 2025

Copy link
Copy Markdown
Owner Author

@pca006132 it looks like I'm getting some non-determinism, especially obvious in the BatchBoolean test. Do you have any idea how I might have introduced that? I guess I should run an asan check for uninitialized data or something, but I'm only reading vectors we had already created...

@elalish elalish requested a review from pca006132 December 1, 2025 10:09
@pca006132

Copy link
Copy Markdown
Collaborator

Not obvious to me, I don't think your code is introducing any race condition.

@zalo

zalo commented Dec 2, 2025

Copy link
Copy Markdown
Contributor

Nastygear tests for convenience... the test passes, but the gear itself is still producing slivers (and crunchier topology).

master:
image

perturb1:
image

import trimesh
import manifold3d
import numpy as np
from time import perf_counter

def to_mesh(model):
    print("Converting model to mesh...")
    mesh = model.to_mesh()
    print("Model converted to mesh.")
    if mesh.vert_properties.shape[1] > 3:
        vertices =  mesh.vert_properties[:, :3]
        colors   = (mesh.vert_properties[:, 3:] * 255).astype(np.uint8)
    else:
        vertices = mesh.vert_properties
        colors   = None
    print("Mesh created with", len(vertices), "vertices and", len(mesh.tri_verts), "faces.")
    meshOut = trimesh.Trimesh(vertices=vertices, faces=mesh.tri_verts,
                              vertex_colors=colors)
    print("Trimesh mesh created with", len(meshOut.vertices), "vertices and", len(meshOut.faces), "faces.")
    return meshOut

def save_mesh(model, name):
    meshOut = to_mesh(model)
    trimesh.exchange.export.export_mesh(meshOut, name+'.stl', 'stl')
    print('Exported model to ', name, '.stl')
    return meshOut

def multi_rot_cube(N=10):
   alpha = 360/N
   return manifold3d.Manifold.batch_boolean(
      [manifold3d.Manifold.cube([1, 1, 1], True).rotate([0, 0, alpha*i]) for i in range(N)], manifold3d.OpType.Add
   )

if __name__ == "__main__":
    t0 = perf_counter()
    nasty = multi_rot_cube(50).scale([2, 2, 1]) - multi_rot_cube(50)
    save_mesh(nasty, "nasty_gear_0")
    print("Nasty Gear took", (perf_counter()-t0)*1000, "ms")

This test passes; I never bothered to write a sliver detector for it... should just be the existence anti-parallel (180 degree) dihedral edges.

TEST(Boolean, NastyGears) {
  // Create a nasty gear pattern with many rotated cubes that creates
  // antiparallel slivers (triangles with normals ~180 degrees apart)
  // https://github.com/BrunoLevy/thingiCSG/blob/main/DATABASE/Basic/nasty_gear_1.scad

  const int N = 50;  // Number of rotations for the gear pattern
  const double alpha = 360.0 / N;

  // Create outer gear - many rotated cubes unioned together
  std::vector<Manifold> outerCubes;
  for (int i = 0; i < N; i++) {
    outerCubes.push_back(
        Manifold::Cube({1, 1, 1}, true).Rotate(0, 0, alpha * i));
  }
  Manifold gear = Manifold::BatchBoolean(outerCubes, OpType::Add);
  Manifold outerGear = gear.Scale({2, 2, 1});

  // Subtract inner from outer to create the nasty gear with slivers
  Manifold nastyGear = outerGear - gear;

  // The gear should be valid and manifold
  EXPECT_EQ(nastyGear.Status(), Manifold::Error::NoError);
  EXPECT_FALSE(nastyGear.IsEmpty());
  EXPECT_EQ(nastyGear.Genus(), 1);
  EXPECT_NEAR(nastyGear.Volume(), outerGear.Volume() - gear.Volume(), 1e-5);

  // These should be eliminated after simplification
  nastyGear = nastyGear.AsOriginal().Simplify(0.0000000001);
  EXPECT_EQ(nastyGear.Status(), Manifold::Error::NoError);
  EXPECT_FALSE(nastyGear.IsEmpty());
  EXPECT_EQ(nastyGear.Genus(), 1);
  EXPECT_NEAR(nastyGear.Volume(), outerGear.Volume() - gear.Volume(), 1e-5);
}

@elalish

elalish commented Dec 2, 2025

Copy link
Copy Markdown
Owner Author

Well, I've added a decent perturbation test, which shows this is still not working at all - need some time to debug. Hopefully just a sign error or something, but hard to say.

@elalish

elalish commented Dec 3, 2025

Copy link
Copy Markdown
Owner Author

Okay, I've figured out one reason (the reason?) that my new test is still failing: It has coincident quad faces, not just triangles, and some of those coincident quads have opposite bisectors from each other. This means those skew edges from coplanar faces (which are not axis-aligned) still get rounding error, and hence are randomly joining or unjoining their surfaces, hence still getting noisy internal geometry.

So, either this isn't doable, at least unless the triangulations of coplanar faces match, or we need to look into a tolerance-based solution. I'm curious if this is possible within our framework. Since it's designed to be robust to rounding error, which is effectively random noise, perhaps we can also add a bit of our own error in the form of a tolerance, so long as we follow the same rules? Worth considering...

@elalish

elalish commented Dec 7, 2025

Copy link
Copy Markdown
Owner Author

Okay, indeed, once the triangulations are aligned, I get good results. I also added a test from #1430 that failed on master and passes here. However, I'm still getting non-determinism.

@pca006132 I verified that if I drop any usage of faceNormal_ in the symbolic perturbation, it becomes deterministic again. I started looking at CalculateNormals(), which you updated to get determinism early this year. I'm having trouble understanding what's going on with the faceNormal_ calculations with atomicMin (why is that necessary?). Can you explain your reasoning a bit and maybe review it to see if anything looks off to you? The asan build didn't find any problems.

@pca006132

Copy link
Copy Markdown
Collaborator

Okay, indeed, once the triangulations are aligned, I get good results. I also added a test from #1430 that failed on master and passes here. However, I'm still getting non-determinism.

@pca006132 I verified that if I drop any usage of faceNormal_ in the symbolic perturbation, it becomes deterministic again. I started looking at CalculateNormals(), which you updated to get determinism early this year. I'm having trouble understanding what's going on with the faceNormal_ calculations with atomicMin (why is that necessary?). Can you explain your reasoning a bit and maybe review it to see if anything looks off to you? The asan build didn't find any problems.

Weird, the atomic min was only related to vertex normal, not face normals. For each vertex, we have a bunch of halfedges pointing out, and each is associated with a triangle that should be summed. If we sum them starting from an arbitrary edge, because floating-point addition is not associative, the result can be different depending on which edge we start from. The code uses atomic min to determine the starting halfedge for each vertex, which is the edge with the smallest edge ID.

I will do the code review next week.

@zalo

zalo commented Dec 7, 2025

Copy link
Copy Markdown
Contributor

Unsure if this is relevant, but I recall seeing a numerical stability optimization for triangle face normals: jpcy/xatlas#131

@pca006132

Copy link
Copy Markdown
Collaborator

Yeah we can have some numerical stability improvements, but I don't think they are related to determinism. (and I doub't if numerical stability matters too much in this case, it will be the worse for near degenerate triangles, but this optimization probably cannot do much for those cases)

@elalish

elalish commented Dec 10, 2025

Copy link
Copy Markdown
Owner Author

Okay, the atomic min makes sense now - I did a little cleanup but didn't really change anything. I'm wondering if faceNormal_ has always had some non-determinism and we just never noticed before? How did you go about finding the source of non-determinism before? I wish I had a way to record and compare the whole internal state of two runs so I could see where it first diverges...

@pca006132

Copy link
Copy Markdown
Collaborator

I just print them out and diff them.

I'm wondering if faceNormal_ has always had some non-determinism and we just never noticed before?

Maybe, I never looked at that. I will be surprised if this is the case since we are not doing anything fishy with this...

@neilpa

neilpa commented Dec 10, 2025

Copy link
Copy Markdown
Contributor

I wish I had a way to record and compare the whole internal state of two runs so I could see where it first diverges...

On Windows there's Time Travel Tracing w/ WinDBG. In a past life I've used that a few times to track down hairy C++ bugs. With the trace you can essentially do a binary search in the debugger of the process state. That would let you "diff" between two different runs of the same inputs.

Linux also has rr which I've never used and not sure about macos.

These days you might have better luck adding a ridiculous amount of logging and then throwing it at an LLM and have it play spot the first difference to narrow it down.

@elalish

elalish commented Dec 14, 2025

Copy link
Copy Markdown
Owner Author

@pca006132 I'm a bit at a loss here - it is deterministic in single threaded mode (at least on my macbook). In parallel mode, the CondensedMatter tests are not. If I print out faceNormal_ when it's regenerated, it always matches. MatchesTriNormals always passes. If I print faceNormal_ every time I think it doesn't match, but the diff gets a lot trickier in parallel mode, so I'm not sure. Even when I force the face normals to be regenerated every time, I still don't get determinism. My best guess is perhaps one of the places we copy face normals over (in the boolean, and various edge collapse ops and such) we might be mixing them up? But somehow only for degenerate triangles, because MatchesTriNormals still passes...

Can you repro?

@pca006132

Copy link
Copy Markdown
Collaborator

Will look into it a few days later.

@pca006132

Copy link
Copy Markdown
Collaborator

OK found the bug, one is clear to me why (and I am surprised that we never caught it previously), and another one I am not so sure, will need some time to think about the old logic. The buggy logic is quite non-trivial.

diff --git i/src/boolean_result.cpp w/src/boolean_result.cpp
index 4dc857ea..0b49b39c 100644
--- i/src/boolean_result.cpp
+++ w/src/boolean_result.cpp
@@ -200,6 +200,12 @@ struct EdgePos {
   int vert;
   int collisionId;
   bool isStart;
+
+  bool operator<(const EdgePos& other) const {
+    return edgePos < other.edgePos ||
+           // we also sort by collisionId to make things deterministic
+           (edgePos == other.edgePos && collisionId < other.collisionId);
+  }
 };
 
 // thread sanitizer doesn't really know how to check when there are too many
@@ -291,13 +297,8 @@ std::vector<Halfedge> PairUp(std::vector<EdgePos>& edgePos) {
                                [](EdgePos x) { return x.isStart; });
   DEBUG_ASSERT(static_cast<size_t>(middle - edgePos.begin()) == nEdges,
                topologyErr, "Non-manifold edge!");
-  auto cmp = [](EdgePos a, EdgePos b) {
-    return a.edgePos < b.edgePos ||
-           // we also sort by collisionId to make things deterministic
-           (a.edgePos == b.edgePos && a.collisionId < b.collisionId);
-  };
-  std::stable_sort(edgePos.begin(), middle, cmp);
-  std::stable_sort(middle, edgePos.end(), cmp);
+  std::stable_sort(edgePos.begin(), middle);
+  std::stable_sort(middle, edgePos.end());
   std::vector<Halfedge> edges;
   for (size_t i = 0; i < nEdges; ++i)
     edges.push_back({edgePos[i].vert, edgePos[i + nEdges].vert, -1});
@@ -322,7 +323,8 @@ void AppendPartialEdges(Manifold::Impl& outR, Vec<char>& wholeHalfedgeP,
 
   for (auto& value : edgesP) {
     const int edgeP = value.first;
-    std::vector<EdgePos>& edgePosP = value.second;
+    std::vector<EdgePos> edgePosP = value.second;
+    std::stable_sort(edgePosP.begin(), edgePosP.end());
 
     const Halfedge& halfedge = halfedgeP[edgeP];
     wholeHalfedgeP[edgeP] = false;
@@ -397,6 +399,7 @@ void AppendNewEdges(
     const int faceP = value.first.first;
     const int faceQ = value.first.second;
     std::vector<EdgePos>& edgePos = value.second;
+    std::stable_sort(edgePos.begin(), edgePos.end());
 
     Box bbox;
     for (auto edge : edgePos) {
diff --git i/src/edge_op.cpp w/src/edge_op.cpp
index ae038896..b2cddd64 100644
--- i/src/edge_op.cpp
+++ w/src/edge_op.cpp
@@ -731,6 +731,9 @@ void Manifold::Impl::RecursiveEdgeSwap(const int edge, int& tag,
                         halfedge_[tri0edge[1]].pairedHalfedge});
 }
 
+#undef MANIFOLD_PAR
+#define MANIFOLD_PAR -1
+
 void Manifold::Impl::SplitPinchedVerts() {
   ZoneScoped;
 

@pca006132 pca006132 mentioned this pull request Dec 15, 2025
@codecov

codecov Bot commented Dec 15, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.26%. Comparing base (499347b) to head (8438f99).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1445      +/-   ##
==========================================
- Coverage   92.27%   92.26%   -0.01%     
==========================================
  Files          32       32              
  Lines        5783     5782       -1     
==========================================
- Hits         5336     5335       -1     
  Misses        447      447              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread include/manifold/common.h
* Intersect.
*/
enum class OpType { Add, Subtract, Intersect };
enum class OpType : char { Add, Subtract, Intersect };

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change needed?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just figured I should start getting in the habit of sizing my enums. Not important here though.

Comment thread src/boolean3.cpp
if (forward) {
if (!Shadows(inA.vertPos_[a0].y, yz01[0], expandP * normalP[a0].y))
s01 = 0;
if (!Shadows(inA.vertPos_[a0].y, yz01[0], -dir)) s01 = 0;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this doesn't care about expandP now?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is using the faces of Q now, which is always expanded, unlike P.

Comment thread src/boolean3.cpp
Interpolate(inB.vertPos_[b1s], inB.vertPos_[b1e], inA.vertPos_[a0].x);
const int b1pair = inB.halfedge_[b1].pairedHalfedge;
const double dir =
inB.faceNormal_[b1 / 3].y + inB.faceNormal_[b1pair / 3].y;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of summing up the y of the opposing face normals? In the case of non-degenerate faces, shouldn't they cancel each other?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an edge normal - I would expect the two faces to add more often than cancel.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok I see, forgot that these are the adjacent faces normals

Comment thread src/boolean3.cpp
// Union -> expand inP, expand inQ
// Difference, Intersection -> contract inP, expand inQ
// Technically Intersection should contract inQ, but doing it this way makes
// Split faster and any suboptimal cases seem pretty rare.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how much faster? or how complicated will the code be otherwise?

btw, thinking about it, it feels like if we make expandP a boolean and assign the double inside the function, the compiler might be able to optimize those multiplications to a simple negation. Not important for this PR though.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's faster because then the intersection and difference in split can reuse the same results from boolean3. We could even potentially reuse their shared triangulations and make them match exactly.

Interesting idea - I was hoping the compiler could already figure it out this way, but maybe not.

@elalish elalish merged commit 337e13f into master Dec 16, 2025
36 checks passed
@elalish elalish deleted the perturb1 branch December 16, 2025 07:52
@pca006132

Copy link
Copy Markdown
Collaborator

Btw, we still have slivers for the nasty gear example, idk if this is expected:

image

@elalish

elalish commented Dec 16, 2025

Copy link
Copy Markdown
Owner Author

Hmm, that's a good point - I'm a little surprised that's so bad since it's all axis-aligned. I should probably take another pass at this.

@pca006132

Copy link
Copy Markdown
Collaborator

So there are two questions here:

  1. Is the perturb test you added before too simple? I think it is a simplified version of this nasty gear?
  2. Should we reopen the issue to track this?

@zalo

zalo commented Dec 17, 2025

Copy link
Copy Markdown
Contributor

It feels like there are a few different cases in ascending difficulty...

  • Co-triangular (where all three vertices match, but the normal is reversed)
  • Truly Co-planar (axis-aligned, but where the triangles don't necessarily overlap)
  • Almost Co-planar (non-axis aligned, non-full-overlap triangles)

Given #1445 (comment), I assume this PR only handles the first case?

@elalish

elalish commented Dec 17, 2025

Copy link
Copy Markdown
Owner Author

The original version was intended to handle the second, but I think both that and this leave something to be desired still...

The 3rd is out of scope, by design. I'm not sure it's even possible while guaranteeing manifoldness.

@mmiscool

Copy link
Copy Markdown
Contributor

I might be crazy but if 2 triangles are almost co-planar and overlap (within a tolerance) could the points of both triangles be used to make a convex hull sliver that could subtracted from the mesh to remove the defect?

@elalish

elalish commented Dec 21, 2025

Copy link
Copy Markdown
Owner Author

I might be crazy but if 2 triangles are almost co-planar and overlap (within a tolerance) could the points of both triangles be used to make a convex hull sliver that could subtracted from the mesh to remove the defect?

In some cases, but it gets tricky pretty quick. Is the sliver a positive object that needs to be removed or a negative object that needs to be filled? The difference between these cases is itself hard to judge when within rounding error. This is why adding a tolerance rarely solves anything, and usually makes things even more complex.

@elalish

elalish commented Dec 21, 2025

Copy link
Copy Markdown
Owner Author

Looks like this is actually okay after all - see #1467. Please file a new issue if you find something symbolic perturbation isn't working for.

@mmiscool

Copy link
Copy Markdown
Contributor

I might be crazy but if 2 triangles are almost co-planar and overlap (within a tolerance) could the points of both triangles be used to make a convex hull sliver that could subtracted from the mesh to remove the defect?

In some cases, but it gets tricky pretty quick. Is the sliver a positive object that needs to be removed or a negative object that needs to be filled? The difference between these cases is itself hard to judge when within rounding error. This is why adding a tolerance rarely solves anything, and usually makes things even more complex.

Would it be possible to do both positive and negative operations in an order that removes the defect?

@elalish

elalish commented Dec 21, 2025

Copy link
Copy Markdown
Owner Author

Would it be possible to do both positive and negative operations in an order that removes the defect?

I think the answer is no, but you're welcome to give it a shot and prove me wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Symbolic perturbation occasionally fails to handle perfectly coincident faces

5 participants