Skip to content

add codepoint-based string functions as Data.String.CodePoints #79

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 48 commits into from
Jul 10, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
428b995
WIP code point based string functions
michaelficarra May 25, 2017
dc0577c
more progress
michaelficarra May 25, 2017
25572de
minor stuff
michaelficarra May 25, 2017
8279da8
count
michaelficarra May 25, 2017
292e0de
drop and take
michaelficarra May 25, 2017
fd91b0b
length
michaelficarra May 25, 2017
8387641
singleton
michaelficarra May 25, 2017
fb47387
splitAt
michaelficarra May 26, 2017
5a6cfd0
use String.fromCodePoint in singleton implementation when available
michaelficarra May 26, 2017
3003c09
re-export Data.String
michaelficarra May 26, 2017
75117d2
uncons
michaelficarra May 26, 2017
ecfbf0b
re-arrange imports
michaelficarra May 26, 2017
d5b6d92
re-arrange JS exports
michaelficarra May 26, 2017
8860295
fix count; implement dropWhile and takeWhile
michaelficarra May 26, 2017
a6855b4
indexOf and lastIndexOf
michaelficarra May 27, 2017
8c55257
add some initial tests and fix some bugs
michaelficarra May 27, 2017
a26afdf
trailing whitespace
michaelficarra May 27, 2017
c1ff8c5
finished the tests
michaelficarra May 27, 2017
04154a5
fix linting errors
michaelficarra May 27, 2017
c798dfe
change re-export of Data.String
michaelficarra May 27, 2017
71cdcf2
bugfixes
michaelficarra May 28, 2017
2c2418a
move fromCodePoint from JS to purs
michaelficarra May 28, 2017
46e9545
move codePointAt0 from JS to purs
michaelficarra May 28, 2017
c59f340
remove TODOs
michaelficarra May 29, 2017
71c5156
use charCodeAt from Data.String.Unsafe
michaelficarra Jun 1, 2017
e8ca6f3
open imports for Prelude
michaelficarra Jun 5, 2017
8d6d263
add some comments
michaelficarra Jun 5, 2017
8e99c39
remove unused parameters
michaelficarra Jun 5, 2017
557186c
remove some redundant JS implementations
michaelficarra Jun 5, 2017
4ec116b
remove unnecessary qualification in import
michaelficarra Jun 5, 2017
5490d46
prefer 10e3 over 1024e1
michaelficarra Jun 12, 2017
af2db11
prefer string iteration over Array.from in _codePointAt FFI function
michaelficarra Jun 23, 2017
205838c
remove Newtype instance for CodePoint
michaelficarra Jul 4, 2017
3b57fd4
remove duplication
michaelficarra Jul 4, 2017
7eac69e
remove unused function
michaelficarra Jul 4, 2017
cde0d26
bug fix for unsafeCodePointAt0Fallback
michaelficarra Jul 5, 2017
4292a8b
consistent code unit variable names
michaelficarra Jul 5, 2017
0d81e0b
bug fix lastIndexOf'
michaelficarra Jul 6, 2017
370af7c
add comments and complexity notes
michaelficarra Jul 6, 2017
cef521a
update Data.String import warning comment
michaelficarra Jul 6, 2017
b38eb80
refactor to avoid lists dep; better complexity adherence in fallbacks
michaelficarra Jul 7, 2017
4f3d71d
remove fallback to Array.from in codePointAt JS implementation for now
michaelficarra Jul 7, 2017
e3cea19
prefer let over where
michaelficarra Jul 7, 2017
db3eba3
change JS implementation of count to use string iterator if possible
michaelficarra Jul 7, 2017
3a24c8d
update comments
michaelficarra Jul 7, 2017
82a502f
pull functions out of where clauses
michaelficarra Jul 8, 2017
085022e
change complexity documentation for drop{,While} and take{,While}
michaelficarra Jul 8, 2017
6edb70f
forgot about a prime
michaelficarra Jul 8, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion bower.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,9 @@
"purescript-either": "^3.0.0",
"purescript-gen": "^1.1.0",
"purescript-maybe": "^3.0.0",
"purescript-partial": "^1.2.0"
"purescript-partial": "^1.2.0",
"purescript-unfoldable": "^3.0.0",
"purescript-arrays": "^4.0.1"
},
"devDependencies": {
"purescript-assert": "^3.0.0",
Expand Down
108 changes: 108 additions & 0 deletions src/Data/String/CodePoints.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
"use strict";
/* global Symbol */

var hasArrayFrom = typeof Array.from === "function";
var hasStringIterator =
typeof Symbol !== "undefined" &&
Symbol != null &&
typeof Symbol.iterator !== "undefined" &&
typeof String.prototype[Symbol.iterator] === "function";
var hasFromCodePoint = typeof String.prototype.fromCodePoint === "function";
var hasCodePointAt = typeof String.prototype.codePointAt === "function";

exports._unsafeCodePointAt0 = function (fallback) {
return hasCodePointAt
? function (str) { return str.codePointAt(0); }
: fallback;
};

exports._codePointAt = function (fallback) {
return function (Just) {
return function (Nothing) {
return function (unsafeCodePointAt0) {
return function (index) {
return function (str) {
var length = str.length;
if (index < 0 || index >= length) return Nothing;
if (hasStringIterator) {
var iter = str[Symbol.iterator]();
for (var i = index;; --i) {
var o = iter.next();
if (o.done) return Nothing;
if (i === 0) return Just(unsafeCodePointAt0(o.value));
}
}
return fallback(index)(str);
};
};
};
};
};
};

exports._count = function (fallback) {
return function (unsafeCodePointAt0) {
if (hasStringIterator) {
return function (pred) {
return function (str) {
var iter = str[Symbol.iterator]();
for (var cpCount = 0; ; ++cpCount) {
var o = iter.next();
if (o.done) return cpCount;
var cp = unsafeCodePointAt0(o.value);
if (!pred(cp)) return cpCount;
}
};
};
}
return fallback;
};
};

exports._fromCodePointArray = function (singleton) {
return hasFromCodePoint
? function (cps) {
// Function.prototype.apply will fail for very large second parameters,
// so we don't use it for arrays with 10,000 or more entries.
if (cps.length < 10e3) {
return String.fromCodePoint.apply(String, cps);
}
return cps.map(singleton).join("");
}
: function (cps) {
return cps.map(singleton).join("");
};
};

exports._singleton = function (fallback) {
return hasFromCodePoint ? String.fromCodePoint : fallback;
};

exports._take = function (fallback) {
return function (n) {
if (hasStringIterator) {
return function (str) {
var accum = "";
var iter = str[Symbol.iterator]();
for (var i = 0; i < n; ++i) {
var o = iter.next();
if (o.done) return accum;
accum += o.value;
}
return accum;
};
}
return fallback(n);
};
};

exports._toCodePointArray = function (fallback) {
return function (unsafeCodePointAt0) {
if (hasArrayFrom) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have three separate code paths here? Is Array.from likely to be faster than using an iterator, which in turn is likely to be faster than the purescript fallback? Are there many platforms which provide String iteration but not Array.from?

Copy link
Contributor Author

@michaelficarra michaelficarra Jun 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hdgarrood Array.from is likely faster except in cases where we are more likely to only look at earlier code points, since it requires a scan of the entire string. Of course, implementations may be doing really clever things that make this naive reasoning worthless without real world benchmarks.

I've removed some paths that, after review, I felt were unlikely to be any better supported or faster than the alternative paths.

return function (str) {
return Array.from(str, unsafeCodePointAt0);
};
}
return fallback;
};
};
Loading