Skip to content

IterDomain-centric graph analysis #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: devel
Choose a base branch
from
Open

IterDomain-centric graph analysis #2

wants to merge 8 commits into from

Conversation

jacobhinkle
Copy link
Owner

@jacobhinkle jacobhinkle commented Mar 6, 2023

This is an attempt at creating an IterDomain graph using manual rules for various Expr types. This provides another way to visualize or analyze a Fusion. For example, a simple Fusion like this

  auto tv0 = makeConcreteTensor({2, 3});
  fusion.addInput(tv0);
  auto tv1 = makeConcreteTensor({2});
  fusion.addInput(tv1);

  auto tv2 = broadcast(tv1, {false, true});
  auto tv3 = mul(tv0, tv2);

  fusion.addOutput(tv3);

can be represented as a TV-centric graph like so:
image
or as an ID-centric graph like so:
image
where the ID classes above represent the following sets of IDs:

Inputs:
  T0_g[ iS0{2}, iS1{3} ], float
  T1_g[ iS2{2} ], float
Outputs:
  T3_g[ iS5{2}, iS6{3} ], float

%kernel_math {
T2_l[ iS3{2}, bS4{1} ]
   = broadcast( T1_g[ iS2{2} ] )
T3_g[ iS5{2}, iS6{3} ]
   = T0_g[ iS0{2}, iS1{3} ]
   * T2_l[ iS3{2}, bS4{1} ];
}

Broadcast op: T2_l[ iS3{2}, bS4{1} ]
   = broadcast( T1_g[ iS2{2} ] )

Equivalence classes of IterDomains:
  c2: bS4{1}, 
  c4: iS5{2}, iS3{2}, iS0{2}, iS2{2}, 
  c5: iS6{3}, iS1{3}, 
Equivalence classes of extents:
  e0: 1, 
  e1: 2, 2, 
  e3: 3, 

Clearly this lets us derive some equality constraints on extents, which we also track. So far we do not perform any kind of term rewriting on Vals, but we could do so. Also, so far I do not have support for ViewOp, or many of the other op types like scatter and gather; unsupported ops are skipped with a warning so in their presence there will be more apparent ID classes than there should be. A challenge in this PR's approach is that for example ViewOp does not carry direct information about which input domains are transformed, or even what the original int vector arguments were, which we could use to reconstruct.

What can we do with ID graphs

ID graphs give us way to pattern match certain cases that we'll need to handle. For example, a Gram matrix computation looks like the following:

  // [n, d]
  auto tv0 = makeConcreteTensor({5, 7});
  fusion.addInput(tv0);

  // [1, n, d]
  auto tv1 = broadcast(tv0, {true, false, false});
  // [n, 1, d]
  auto tv2 = broadcast(tv0, {false, true, false});

  // [n, n, d]
  auto tv3 = mul(tv1, tv2);

  // [n, n]
  auto tv4 = sum(tv3, {2});

  fusion.addOutput(tv4);

image
We see that two separate output IDs use the same ID class c7. This is a problem and indicates we need to recompute class c7 so that it appears as two classes that can be separately parallelized (note that recomputing is an operation we don't yet support).

We can also infer the ordering of ID classes and persist those back using reorder(). We can split and merge domains at the ID class level then persist those as well. Generally, this approach might allow us to transform nodes in our Fusion based on groups of ID classes, instead of the current reference tensor approach.

It does not yet show all classes since relations are not yet printed.
It seems like we should not explicitly print out classes that have no
relations, but that might be good for finding relations that exist but
that we haven't yet captured explicitly.
I have some uncommitted changes trying to process Views. However, these
will be extensive since we currently do not keep either the sizes
provided to the view() command, or the AnalyzeViewResult object.
Furthermore, the transform types in AnalyzeViewResult are not currently
exposed, so it will require a few commits to make that accessible. I am
considering other options.
@jacobhinkle
Copy link
Owner Author

ComputeAt with ID-centric graphs

One thing to notice about the above graphs are that they represent only EXACT mappings (in the language of csarofeen#2316). That is to say in an ID graph as constructed above, each equivalence class (and any other classes that are related to it by an edge, not counting the edges to and from input/output TVs) could be parallelized in the same manner and the computeAt for each tensor could be above any ID in the class. However, this is not always possible or practical. NEED EXAMPLE HERE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant