Skip to content

Eval function for string_to_lower/upper_case builtin functions #2678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

romainbrenguier
Copy link
Contributor

No description provided.

if(
('A' <= c && c <= 'Z') || (0xc0 <= c && c <= 0xd6) ||
(0xd8 <= c && c <= 0xde))
c += 0x20;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

braces preferred

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These codes don't mean anything to me. Could you perhaps add some comments, or better, define meaningful constant names for them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see you've added some documentation. it seems I shouldn't have reviewed commit by commit. I still think defining constants would be cleaner.

@@ -177,6 +177,37 @@ class string_set_char_builtin_functiont
exprt length_constraint() const override;
};

class string_to_lower_case_builtin_functiont
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doxygen

return {};
for(mp_integer &c : input_opt.value())
{
if('a' <= c && c <= 'z')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you consider the same set of characters as in lower_case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's to mimic the function encoded by the constraints for these two functions. Maybe I should adapt the constraints for the to_upper_case function to work for the same characters as lower_case, otherwise that can be confusing.

@@ -208,6 +208,37 @@ class string_to_lower_case_builtin_functiont
};
};

class string_to_upper_case_builtin_functiont
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doxygen

@@ -456,10 +456,6 @@ exprt string_constraint_generatort::add_axioms_for_function_application(
res=add_axioms_for_compare_to(expr);
else if(id==ID_cprover_string_literal_func)
res=add_axioms_from_literal(expr);
else if(id==ID_cprover_string_concat_func)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in commit message: redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

{
StringBuilder builder = new StringBuilder();
builder.append("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ".toLowerCase());
builder.append("abcdeghijklmnopfrstuqwxyzABCDvFGHIJKLMENOPQRSTUVWXYZ".toLowerCase());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have tests for the other letters too?

Copy link
Contributor Author

@romainbrenguier romainbrenguier Aug 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but we don't test the result here anyway, we are just testing that unnecessary constraints are not added. I didn't find a good way to test it here: we cannot add assertions because that could change the result and recovering the string from the trace is difficult.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general, unhelpful point -- are we doing ASCII case conversion or Unicode case conversion, because the later, frankly, fills me with fear:

https://unicode.org/faq/casemap_charprop.html
https://www.unicode.org/reports/tr21/tr21-5.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to recall seeing tests for upper-casing non-ascii characters...

@romainbrenguier romainbrenguier force-pushed the feature/extend-builtin-functions-part3 branch 3 times, most recently from af85ec3 to 72f1459 Compare August 6, 2018 09:02
@romainbrenguier romainbrenguier force-pushed the feature/extend-builtin-functions-part3 branch 4 times, most recently from 15c01de to 182256a Compare August 6, 2018 11:00
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR failed Diffblue compatibility checks (cbmc commit: 15c01de).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/81007322
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.

Common spurious failures:

  • the cbmc commit has disappeared in the mean time (e.g. in a force-push)
  • the author is not in the list of contributors (e.g. first-time contributors).

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passed Diffblue compatibility checks (cbmc commit: 182256a).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/81011045

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, if a bit painful to review with all these commits that sometimes overwrite the same lines multiple times. Fixup commits can be useful for the first reviewer to check that all change request have been implemented, but are a nuisance for the second reviewer.

@@ -175,6 +175,9 @@ class string_constraint_generatort final
const array_string_exprt &str,
const exprt &position,
const exprt &character);
exprt add_axioms_for_to_lower_case(
const array_string_exprt &res,
const array_string_exprt &str);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why exported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going to use it in the string_builtin_function object evaluation function

if(
('A' <= c && c <= 'Z') || (0xc0 <= c && c <= 0xd6) ||
(0xd8 <= c && c <= 0xde))
c += 0x20;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These codes don't mean anything to me. Could you perhaps add some comments, or better, define meaningful constant names for them?

optionalt<exprt> string_to_lower_case_builtin_functiont::eval(
const std::function<exprt(const exprt &)> &get_value) const
{
auto input_opt = eval_string(input, get_value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but eval_string could benefit from some documentation.

@@ -187,6 +187,25 @@ exprt string_set_char_builtin_functiont::length_constraint() const
equal_exprt(return_code, return_value));
}

optionalt<exprt> string_to_lower_case_builtin_functiont::eval(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what the role of the get_value function is. Could you please document? (probably in string_builtin_functiont.

if(
('A' <= c && c <= 'Z') || (0xc0 <= c && c <= 0xd6) ||
(0xd8 <= c && c <= 0xde))
c += 0x20;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see you've added some documentation. it seems I shouldn't have reviewed commit by commit. I still think defining constants would be cleaner.

/// Note that index expressions are only allowed in the body of universal
/// axioms, so we use a trivial premise and push our premise into the body.
/// 1. res.length = str.length
/// 2. forall i< str.length.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing space before <

@@ -178,6 +178,9 @@ class string_constraint_generatort final
exprt add_axioms_for_to_lower_case(
const array_string_exprt &res,
const array_string_exprt &str);
exprt add_axioms_for_to_upper_case(
const array_string_exprt &res,
const array_string_exprt &expr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why exported?

@@ -210,6 +210,8 @@ class string_to_lower_case_builtin_functiont
};
};

/// Converting each lowercase character of Basic Latin and Latin-1 supplement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Converting -> converting.

assertion at file Test.java line 46 .*: FAILURE
--
--
Check that when there are dependency, axioms are added correctly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ni: dependency -> dependencies

if(s.length() < 20)
return "Short string";
if(!s.startsWith("A"))
return "String not starting with A";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you planned to exclude lower case a here.

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear on what the "right" answers should be. Is this for ASCII (in which case some parts are not needed), Unicode (in which case I am not qualified to comment but it doesn't look compliant), some Java standard (likewise), etc. ?

{
StringBuilder builder = new StringBuilder();
builder.append("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ".toLowerCase());
builder.append("abcdeghijklmnopfrstuqwxyzABCDvFGHIJKLMENOPQRSTUVWXYZ".toLowerCase());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general, unhelpful point -- are we doing ASCII case conversion or Unicode case conversion, because the later, frankly, fills me with fear:

https://unicode.org/faq/casemap_charprop.html
https://www.unicode.org/reports/tr21/tr21-5.html

static bool eval_is_upper_case(const mp_integer &c)
{
return ('A' <= c && c <= 'Z') || (0xc0 <= c && c <= 0xd6) ||
(0xd8 <= c && c <= 0xde);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says to me we're not doing ASCII. Might you add a comment to say where these numbers come from? Also, is there a reason to not use isupper et al.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I documented that so that it should be clearer now. We do a bit more that ASCII but not all unicode (that dould be quite complicated indeed). My idea is that we will provide this builtin function accessible as CProverString.toUpperCase, which only coincide with String.toUpperCase for a fragment of unicode (basic latin and latin-1 supplement for now), you can then provide a model of String.toUpperCase which either over-approximate (useful for verification) or under-approximate (useful for test generation) the behaviour of the JDK function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds scary. Please add tests and documentation (really, both!) that very precisely document the boundaries of what is/is not covered. I accept that there may not be enough time to do a perfect job (cover all of unicode), but then there needs to be clear documentation of what has (not) been done to avoid surprises.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; please, what @tautschnig said.

for(mp_integer &c : input_opt.value())
{
if(eval_is_upper_case(c))
c += 0x20;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that always work? Does it work better than to_lower?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It mimics the behavior of the function specified by the constraints, which is not exactly the same as to_lower

return {};
for(mp_integer &c : input_opt.value())
{
if(eval_is_upper_case(c - 0x20))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same again.

(0xd8 <= c && c <= 0xde);
}

optionalt<exprt> string_to_lower_case_builtin_functiont::eval(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably not paying attention, but why not just return the exprt anyway? This is how the rewriter works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the eval function is not successful, the solver will get the valuation in another way (calling get of the underlying solver and doing some post-processing).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK... is this called before or after solving?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's called from get at the end of each string refinement step after solving by the underlying solver.

This could conflict with other constraints and make SAT insatisfiable
leading to false positive in verification. Instead it should be to the
user of these builtin function to make sure they are used in a way that
corresponds to the function they want.
These functions are already handled by the string dependency graph.
They should not appear in the add_axioms_for_function_application input.
@romainbrenguier romainbrenguier force-pushed the feature/extend-builtin-functions-part3 branch from 182256a to a44468b Compare August 6, 2018 15:40
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passed Diffblue compatibility checks (cbmc commit: a44468b).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/81049758

@romainbrenguier romainbrenguier merged commit 318474f into diffblue:develop Aug 7, 2018
NathanJPhillips pushed a commit to NathanJPhillips/cbmc that referenced this pull request Aug 22, 2018
5d3ea03 Merge pull request diffblue#2693 from zhixing-xu/fix_rw_range_upper
2b40338 Update test.desc
c6d0427 Merge pull request diffblue#2674 from diffblue/msvc-link
318474f Merge pull request diffblue#2678 from romainbrenguier/feature/extend-builtin-functions-part3
c63030f Regression test for goto-link personality
64fedd0 Microsoft LINK personality
d9f9dd9 Merge pull request diffblue#2688 from tautschnig/concat-dir-file
99946fc Merge pull request diffblue#2675 from diffblue/goto-cl-echo-file
252d857 Merge pull request diffblue#2687 from tautschnig/fo-directory
0d7ebd5 Fixed a problem where the rw_set range's upper bound not set correctly
29eab32 goto-cl: Fail invocation of trying to compile multiple files to non-directory
7f587c4 Fix concat_dir_file for Windows and unit-test it
a44468b Document string_builtin_function::eval
ecc0e43 Document eval_is_upper_case
a345d30 Document eval_string
58307a0 Tests for String.toUpperCase
71de2e3 Tests for String.toLowerCase
579d00c Remove redundant function application ID check
e2961b5 Add builtin class for string_to_upper_case
fe9071b Make to_upper_case work for Latin-1 supplement
53ccd3f Refactor string_to_upper_case
be477c9 Remove assumptions that input char are < 0x100
836cbad Extract an is_upper_case function
1137ffd Improve documentation of add_axioms_for_to_lower_case
a71f2c0 Implement builtin string_to_lower_case function
fae6a48 Merge pull request diffblue#2467 from tautschnig/vs-except
24de513 Merge pull request diffblue#2474 from tautschnig/vs-identifier
e14f2f2 Merge pull request diffblue#2685 from diffblue/clcache-again
08698cc Merge pull request diffblue#2681 from diffblue/remove-aig
d42020b remove --aig option
cd4a163 AWS codebuild windows: set clcache base directory
1ba928c cleanup unnecessary path from configuration file
8c4801b Refactor add_axioms_for_to_lower_case
4291232 Merge pull request diffblue#2686 from diffblue/buildspec-apt-cache
3adb717 AWS codebuild: cache apt lists and packages
0efb169 remove AIGs
3a9c825 Merge pull request diffblue#2682 from diffblue/fix-clcache
b9b5660 Merge pull request diffblue#2483 from tautschnig/vs-java-parameters
a66ab1e CL prints the name of the file that's compiled onto stdout
0698a5f Merge pull request diffblue#2673 from diffblue/goto-cl-Fo
75855bf Java front-end: remove unused parameters
4df2187 debugging output to resolve seg fault
5bc7456 goto-cl: /Fo can set an output directory
a43e4fa add is_directory to file_util.h
4ad91fb Codebuild for windows: set up cache path properly
effb01b Merge pull request diffblue#2641 from diffblue/typedef-type
5ef2802 Merge pull request diffblue#2679 from tautschnig/version-string
2efea52 Refine test patterns to avoid spurious matches
aa7ebbc Merge pull request diffblue#2672 from diffblue/goto-cc-multiple-source-files
aaea781 Merge pull request diffblue#2671 from diffblue/fix_get_base_name
8b51faf fix get_base_name
694daaf gcc mode: error in case multiple files are given with -c and -o
09fdca3 Merge pull request diffblue#2643 from svorenova/fixup-nondet-static
d42054a Merge pull request diffblue#2669 from diffblue/spurious-cover-test
b391c9e introduce typedef_type in the C frontend
114030b Merge pull request diffblue#2664 from romainbrenguier/feature/extend-builtin-functions-part2
46f6231 cbmc test no longer uses --cover
daff1d1 Make nondet-static replace lines in CPROVER_init
1bca129 Merge pull request diffblue#2665 from tautschnig/gcc-conditional-stmt
31366ad Tests for StringBuilder.setCharAt
f4285e7 Add builtin support for string_set_char
2a8ea0f Better specify out-of-bounds case for set_char
9415a24 Refactor add_axioms_for_set_char
9762886 Refactor string_concat_char builtin function
b62bf01 Make nondet-static check for ID_C_no_nondet_initialization
a50562e Mark java.lang.String.Literals with ID_C_constant
b3c08d3 Mark internal Java variables with ID_C_no_nondet_initialization
f2dc978 Add a new comment to mark variables that should not be nondet initialized
da7966c Merge pull request diffblue#2645 from mmuesly/feature_test_posix_memalign
cb4a340 Merge pull request diffblue#2647 from diffblue/cleanout-jar-filet
2b27a2d Merge pull request diffblue#2622 from martin-cs/feature/context-sensitive-ait-merge-2
00b26a8 Adds a test for posix_memalign in stdlib.c
0f5a057 Merge pull request diffblue#2667 from tautschnig/slicer-cleanup
16e6462 Remove no-longer-used ifdef
96d345b Clean GCC conditional expressions in right-hand sides of declarations
086c266 Merge pull request diffblue#2651 from smowton/smowton/feature/unroll-enum-clone-loops
f69b244 Merge pull request diffblue#2625 from smowton/smowton/feature/value-set-accuracy
730b3e2 Merge pull request diffblue#2655 from romainbrenguier/feature/extend-builtin-functions
c4569dd remove java_class_loader_limitt from jar_filet
e0f954d Merge pull request diffblue#2659 from smowton/smowton/fix/cmake-third-time-is-charm
7f547b1 Merge pull request diffblue#2656 from smowton/smowton/fix/testsuite-name
3a1593b Tests for String.valueOf(int)
7414d76 Add builtin support for string_of_int
0b44c89 Add version of make_string accepting iterators
f81f082 Rename function to add_axioms_for_string_of_int
9a2a7c2 Switch version.cpp from a rule product to a byproduct
943d60c Value set: handle with, array and struct expressions more accurately
4f158a3 Mark regression tests as expecting failure for symex driven loading
c9a53f9 Add regression tests for changes to JBMC enumeration support
a985eae Interpreter: deal with member-of-constant-struct expressions
361469b Change source location of jump target in {table|lookup}switch
6270968 java-unwind-enum-static: also unwind clone loop in Enum.values()
714de0d Symex: expose call stack to unwinding decision making
fb239ef Fix jbmc-generic-symex-driven-lazy-loading test name
28ba192 Strengthen the invariant on what are acceptable function calls.
773bc86 Convert various comments, asserts and throws into invariants.
e65f027 Add comments to the abstract interpreter interface.
1fe0796 Convert various older domains to use the more recent ait API.
afe32b7 Refactor the methods that access "the abstract domain at a location".
aa743b3 Remove unused exception name from catch statement
5703504 Remove unused parameter identifier

git-subtree-dir: cbmc
git-subtree-split: 5d3ea03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants