|
| 1 | +.. Jython companion to buildbots.rst |
| 2 | +
|
| 3 | +.. _buildbots-jy: |
| 4 | + |
| 5 | +Continuous Integration |
| 6 | +====================== |
| 7 | + |
| 8 | +.. warning:: At present, this is not much more than a copy of the CPython original |
| 9 | + with the obviously inapplicable crudely hacked out. |
| 10 | + |
| 11 | +.. note:: This description, while aimed at the future process, is applicable to |
| 12 | + PRs raised today, as far as test goes, but we merge the PR by a round-about |
| 13 | + route using Mercurial. It is also fair to point out that we have difficulty |
| 14 | + keeping these buildbots green. |
| 15 | + |
| 16 | +To assert that there are no regressions in the :doc:`development and maintenance |
| 17 | +branches <devcycle>`, Jython uses continuous integration services, integrated |
| 18 | +into GitHub. |
| 19 | +When a new commit appears in a PR on, all corresponding builders |
| 20 | +will build and run the regression tests. |
| 21 | + |
| 22 | +The build steps run by the buildbots are the following: |
| 23 | + |
| 24 | +* Checkout of the source tree for the changeset which triggered the build |
| 25 | +* Compiling Jython for a particular JVM |
| 26 | +* Running the test suite |
| 27 | +* Cleaning up |
| 28 | + |
| 29 | +It is your responsibility, as a core developer, to check the automatic |
| 30 | +build results as part of assessing a PR. It is therefore |
| 31 | +important that you get acquainted with the way these results are presented, |
| 32 | +and how various kinds of failures can be explained and diagnosed. |
| 33 | + |
| 34 | +Checking results of automatic builds |
| 35 | +------------------------------------ |
| 36 | + |
| 37 | +The way to view recent build results is to go to the list of commits on GitHub |
| 38 | +and click on the little red cross (or whatever the symbol is when a build |
| 39 | +passes). In a PR-based process, the PR page itself contains a panel which shows |
| 40 | +what tests would pass or fail if the PR were merged. |
| 41 | + |
| 42 | +When several changes are committed in a quick succession in the same |
| 43 | +branch, it often happens that only the latest is tested. |
| 44 | + |
| 45 | + |
| 46 | +Stability |
| 47 | +--------- |
| 48 | + |
| 49 | +At the time of writing, it is common for one or more build bots |
| 50 | +to show failures that are due to configuration errors or transient failures. |
| 51 | + |
| 52 | +The rule is that all stable builders must be free of |
| 53 | +persistent failures when the release is cut. It is absolutely **vital** |
| 54 | +that core developers fix any issue they introduce on the buildbots, |
| 55 | +as soon as possible. |
| 56 | + |
| 57 | + |
| 58 | +Flags-dependent failures |
| 59 | +------------------------ |
| 60 | + |
| 61 | +Sometimes, while you have run the :doc:`whole test suite <runtests_jy>` before |
| 62 | +committing, you may witness unexpected failures on the buildbots. One source |
| 63 | +of such discrepancies is if different flags have been passed to the test runner |
| 64 | +or to Python itself. To reproduce, make sure you use the same flags as the |
| 65 | +buildbots: they can be found out simply by clicking the **stdio** link for |
| 66 | +the failing build's tests. For example:: |
| 67 | + |
| 68 | + ant regrtest |
| 69 | + |
| 70 | +.. note:: Mention subtly different ant targets. Is this a problem? |
| 71 | + |
| 72 | +.. note:: |
| 73 | + Running ``Lib/test/regrtest.py`` is nearly equivalent to running |
| 74 | + ``-m test``. |
| 75 | + |
| 76 | +Ordering-dependent failures |
| 77 | +--------------------------- |
| 78 | + |
| 79 | +.. warning:: CPython content |
| 80 | + |
| 81 | +Sometimes the failure is even subtler, as it relies on the order in which |
| 82 | +the tests are run. The buildbots *randomize* test order (by using the ``-r`` |
| 83 | +option to the test runner) to maximize the probability that potential |
| 84 | +interferences between library modules are exercised; the downside is that it |
| 85 | +can make for seemingly sporadic failures. |
| 86 | + |
| 87 | +The ``--randseed`` option makes it easy to reproduce the exact randomization |
| 88 | +used in a given build. Again, open the ``stdio`` link for the failing test |
| 89 | +run, and check the beginning of the test output proper. |
| 90 | + |
| 91 | +Let's assume, for the sake of example, that the output starts with:: |
| 92 | + |
| 93 | + ./python -Wd -E -bb Lib/test/regrtest.py -uall -rwW |
| 94 | + == CPython 3.3a0 (default:22ae2b002865, Mar 30 2011, 13:58:40) [GCC 4.4.5] |
| 95 | + == Linux-2.6.36-gentoo-r5-x86_64-AMD_Athlon-tm-_64_X2_Dual_Core_Processor_4400+-with-gentoo-1.12.14 little-endian |
| 96 | + == /home/buildbot/buildarea/3.x.ochtman-gentoo-amd64/build/build/test_python_29628 |
| 97 | + Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=1, verbose=0, bytes_warning=2, quiet=0) |
| 98 | + Using random seed 2613169 |
| 99 | + [ 1/353] test_augassign |
| 100 | + [ 2/353] test_functools |
| 101 | + |
| 102 | +You can reproduce the exact same order using:: |
| 103 | + |
| 104 | + ./python -Wd -E -bb -m test -uall -rwW --randseed 2613169 |
| 105 | + |
| 106 | +It will run the following sequence (trimmed for brevity):: |
| 107 | + |
| 108 | + [ 1/353] test_augassign |
| 109 | + [ 2/353] test_functools |
| 110 | + [ 3/353] test_bool |
| 111 | + [ 4/353] test_contains |
| 112 | + [ 5/353] test_compileall |
| 113 | + [ 6/353] test_unicode |
| 114 | + |
| 115 | +If this is enough to reproduce the failure on your setup, you can then |
| 116 | +bisect the test sequence to look for the specific interference causing the |
| 117 | +failure. Copy and paste the test sequence in a text file, then use the |
| 118 | +``--fromfile`` (or ``-f``) option of the test runner to run the exact |
| 119 | +sequence recorded in that text file:: |
| 120 | + |
| 121 | + ./python -Wd -E -bb -m test -uall -rwW --fromfile mytestsequence.txt |
| 122 | + |
| 123 | +In the example sequence above, if ``test_unicode`` had failed, you would |
| 124 | +first test the following sequence:: |
| 125 | + |
| 126 | + [ 1/353] test_augassign |
| 127 | + [ 2/353] test_functools |
| 128 | + [ 3/353] test_bool |
| 129 | + [ 6/353] test_unicode |
| 130 | + |
| 131 | +And, if it succeeds, the following one instead (which, hopefully, shall |
| 132 | +fail):: |
| 133 | + |
| 134 | + [ 4/353] test_contains |
| 135 | + [ 5/353] test_compileall |
| 136 | + [ 6/353] test_unicode |
| 137 | + |
| 138 | +Then, recursively, narrow down the search until you get a single pair of |
| 139 | +tests which triggers the failure. It is very rare that such an interference |
| 140 | +involves more than **two** tests. If this is the case, we can only wish you |
| 141 | +good luck! |
| 142 | + |
| 143 | +.. note:: |
| 144 | + You cannot use the ``-j`` option (for parallel testing) when diagnosing |
| 145 | + ordering-dependent failures. Using ``-j`` isolates each test in a |
| 146 | + pristine subprocess and, therefore, prevents you from reproducing any |
| 147 | + interference between tests. |
| 148 | + |
| 149 | + |
| 150 | +Transient failures |
| 151 | +------------------ |
| 152 | + |
| 153 | +While we try to make the test suite as reliable as possible, some tests do |
| 154 | +not reach a perfect level of reproducibility. Some of them will sometimes |
| 155 | +display spurious failures, depending on various conditions. Here are common |
| 156 | +offenders: |
| 157 | + |
| 158 | +* Network-related tests, such as ``test_poplib``, ``test_urllibnet``, etc. |
| 159 | + Their failures can stem from adverse network conditions, or imperfect |
| 160 | + thread synchronization in the test code, which often has to run a |
| 161 | + server in a separate thread. |
| 162 | + |
| 163 | +* Tests dealing with delicate issues such as inter-thread or inter-process |
| 164 | + synchronization, or Unix signals: ``test_multiprocessing``, |
| 165 | + ``test_threading``, ``test_subprocess``, ``test_threadsignals``. |
| 166 | + |
| 167 | +When you think a failure might be transient, it is recommended you confirm by |
| 168 | +waiting for the next build. Still, even if the failure does turn out sporadic |
| 169 | +and unpredictable, the issue should be reported on the bug tracker; even |
| 170 | +better if it can be diagnosed and suppressed by fixing the test's implementation, |
| 171 | +or by making its parameters - such as a timeout - more robust. |
| 172 | + |
| 173 | + |
0 commit comments