Skip to content

add option decodeMappings to remapping #88

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

milahu
Copy link
Contributor

@milahu milahu commented Sep 28, 2020

option decodeMappings: skip unnecessary encode+decode steps

option segmentsAreSorted: skip unnecessary sorting of segments
todo: is it sufficient to sort segments only once after all sourcemaps are combined?
probably not, otherwise column-ranges are broken, so input segments must always be sorted

related to issue #85

@milahu milahu force-pushed the add-option-decode-mappings branch 2 times, most recently from 28a42b3 to c5a23be Compare September 28, 2020 10:44
@milahu milahu force-pushed the add-option-decode-mappings branch from c5a23be to 2819f98 Compare September 28, 2020 10:47
@milahu
Copy link
Contributor Author

milahu commented Sep 28, 2020

need help to port 4c25d9c to typescript : /

edit: probably a segmentsAreSorted option is less useful
but a decodeMappings is still needed to optimize sequential calls to remapping

@jridgewell
Copy link
Collaborator

need help to port 4c25d9c to typescript : /

Just need to add DecodedSourceMap to the import {...} from './types' import.

@milahu
Copy link
Contributor Author

milahu commented Oct 5, 2020

easy. the only thing i crit is the name decodeMappings
'decode mappings' implies that decoding is more work

how ugly is the correct name skipEncodeMappings?
or returnDecodedMappings?

or use encodeMappings with default true?
con: not consistent with other code

Copy link
Collaborator

@jridgewell jridgewell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skipEncodeMappings isn't the prettiest name, but I'm ok with it.

@@ -78,7 +84,7 @@ export default function buildSourceMapTree(

// Else, it's a real sourcemap, and we need to recurse into it to load its
// source files.
return buildSourceMapTree(decodeSourceMap(sourceMap), loader, uri);
return buildSourceMapTree(decodeSourceMap(sourceMap, segmentsAreSorted), loader, uri);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that we don't forward segmentsAreSorted to the recursive buildSourceMapTree (4th param). Which makes me a bit hesitant. I'm not sure we should allow an options that gives a broken result if misused.

What if instead we just intelligently detect if the segmentLine is sorted? It should allow us to be pretty fast overall (certainly faster than cloning and sorting always), and we'd be guaranteed to get correct results even if a sourcemap is improperly sorted.

// src/decode-source-map.ts
/**
 * Decodes an input sourcemap into a `DecodedSourceMap` sourcemap object.
 *
 * Valid input maps include a `DecodedSourceMap`, a `RawSourceMap`, or JSON
 * representations of either type.
 */
export default function decodeSourceMap(map: SourceMapInput): DecodedSourceMap {
  let needsClone = true;
  let sorted = true;

  if (typeof map === 'string') {
    needsClone = false;
    map = JSON.parse(map) as DecodedSourceMap | RawSourceMap;
  }

  let { mappings } = map;
  if (typeof mappings === 'string') {
    needsClone = false;
    mappings = decode(mappings);
  }

  for (let i = 0; i < mappings.length; i++) {
    const line = mappings[i];
    for (let j = 0; j < line.length - 1; j++) {
      if (line[j] > line[j + 1]) {
        sorted = false;
        break;
      }
    }
  }

  // Sort each Line's segments if needed. There's no guarantee that segments are
  // sorted for us, and even Chrome's implementation sorts:
  // https://cs.chromium.org/chromium/src/third_party/devtools-frontend/src/front_end/sdk/SourceMap.js?l=507-508&rcl=109232bcf479c8f4ef8ead3cf56c49eb25f8c2f0
  if (!sorted) {
    if (needsClone) {
      // Clone the Line so that we can sort it. We don't want to mutate an array
      // that we don't own directly.
      mappings = mappings.map(cloneAndSortSegmentLine);
    } else {
      mappings.forEach(sortSegments);
    }
  }

  return defaults({ mappings }, map);
}

function cloneAndSortSegmentLine(segments: SourceMapSegment[]): SourceMapSegment[] {
  return sortSegments(segments.slice());
}

function sortSegments(segments: SourceMapSegment[]): SourceMapSegment[] {
  return segments.sort(segmentComparator);
}

function segmentComparator(a: SourceMapSegment, b: SourceMapSegment): number {
  return a[0] - b[0];
}

Copy link
Contributor Author

@milahu milahu Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if instead we just intelligently detect if the segmentLine is sorted?

looping all lines and all segments sounds like expensive overhead, if everything is sorted (you also need to break the outer for loop)

more something like .... 'check if the first 100 segments are sorted, if yes, assume the rest is sorted too'

one problem with that approach is:
sourcemaps can be concatted, so the result can be a mix of sorted and unsorted, which would give a false positive (segments are sorted) and a broken result

we could take random samples - random inputs require random strategies - so we can at least minimize the risk of false positives

still, under very very bad conditions, there will always be false positives and broken results, so i would keep the segmentsAreSorted option - only the user knows his data, and his sourcemap generators. if he wants to use this optimization, he must make sure the 'segments are sorted'

similar problem: detect low-resolution sourcemaps - this is simply not possible in a cheap way, and we would have to tokenize the source string, and check how many tokens are mapped

see unix philosophy - focus on one task

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looping all lines and all segments sounds like expensive overhead, if everything is sorted (you also need to break the outer for loop)

Compared to doing nothing, it is expensive. But compared to the rest of the work we're doing (tracing the segment through multiple sourcemaps), it's still very cheap to do. https://jsbench.github.io/#b5130ae558bbca6b002025aca004201e has checking the full sourcemap's sort at 14x faster than doing the binarySearch tracing for just 1/8th of a map. So 112x faster than the rest of the work.

And sorting an already sorted mapping is itself faster than the tracing. I think we this is still a huge speed improvement over the current sorting code, and the correctness outweighs the cost we'd pay.

Copy link
Contributor Author

@milahu milahu Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the benchmark

Compared to doing nothing, it is expensive.

still, that is the special case i want to optimize with segmentsAreSorted. lets have both, no? even if 'check if sorted + sort' is slower than 'sort'

why is tracing not done like this? this is 10 times faster than binary search
edit: the input data was too simple (see my next comment)

caching a space-time tradeoff but that should be fine, since sourcemaps are usually small (max a few megabytes)

// tracing 1/8th of the map -- via cache array

const cache_array = [];
for (let L = 0; L < mappings.length; L++) {
    const line = mappings[L];
    cache_array.push([]);
    const cache_line = cache_array[cache_array.length - 1];
    for (let S = 0; S < line.length; S++) {
        const seg = line[S];
        cache_line[seg[0]] = seg; // pass array by reference
    }
}

for (let i = 0; i < 25; i++) {
    const line = cache_array[i];
    for (let j = 0; j < 250; j++) {
        line[j];
    }
}

this works, cos javascript allows empty items in arrays, so we can do 'random access write'

Copy link
Contributor Author

@milahu milahu Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is tracing not done like this?

found the answer:
the input data was too simple: dense columns, only one value

with sparse columns and more values, binarySearch is fastest

 const mappings = [];
 for (let i = 0; i < 100; i++) {
     const line = [];
     for (let j = 0; j < 1000; j++) {
-        line.push([i]);
+        line.push([j*10, 0, i+10, j]); // output_column, source, line, column
     }
     mappings.push(line);
 }

but still, binarySearch can be optimized for sequential search, see this benchmark

@CLAassistant
Copy link

CLAassistant commented Oct 8, 2020

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@milahu
Copy link
Contributor Author

milahu commented Oct 26, 2020

push. any progress? the skipEncodeMappings feature is needed most

regarding the "avoid binarySearch" optimization:
this is out of scope for remapping.
we need sourcemap-generators to directly produce the optimized format
optimized for many lookups = indexed by line and column
for possible solutions see Rich-Harris/sourcemap-codec#82

@jridgewell
Copy link
Collaborator

Closing this with #97.

@jridgewell jridgewell closed this Dec 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants